openshift-docs/modules/oadp-restic-issues.adoc

// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc

:_content-type: CONCEPT
[id="oadp-restic-issues_{context}"]
= Restic issues

You might encounter these issues when you back up applications with Restic.

[id="restic-permission-error-nfs-root-squash-enabled_{context}"]
== Restic permission error for NFS data volumes with root_squash enabled

The `Restic` pod log displays the error message, `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.

.Cause

If your NFS data volumes have `root_squash` enabled, `Restic` maps to `nfsnobody` and does not have permission to create backups.

.Solution

You can resolve this issue by creating a supplemental group for `Restic` and adding the group ID to the `DataProtectionApplication` manifest:

. Create a supplemental group for `Restic` on the NFS data volume.
. Set the `setgid` bit on the NFS directories so that group ownership is inherited.
. Add the `spec.configuration.restic.supplementalGroups` parameter and the group ID to the `DataProtectionApplication` manifest, as in the following example:
+
[source,yaml]
----
spec:
  configuration:
    restic:
      enable: true
      supplementalGroups:
      - <group_id> <1>
----
<1> Specify the supplemental group ID.

. Wait for the `Restic` pods to restart so that the changes are applied.

[id="restic-restore-deploymentconfig-issue_{context}"]
== Restore CR of Restic backup is "PartiallyFailed", "Failed", or remains "InProgress"

The `Restore` CR of a Restic backup completes with a `PartiallyFailed` or `Failed` status or it remains `InProgress` and does not complete.

If the status is `PartiallyFailed` or `Failed`, the `Velero` pod log displays the error message, `level=error msg="unable to successfully complete restic restores of pod's volumes"`.

If the status is `InProgress`, the `Restore` CR logs are unavailable and no errors appear in the `Restic` pod logs.

.Cause

The `DeploymentConfig` object redeploys the `Restore` pod, causing the `Restore` CR to fail.

.Solution

. Create a `Restore` CR that excludes the `ReplicationController`, `DeploymentConfig`, and `TemplateInstances` resources:
+
[source,terminal]
----
$ velero restore create --from-backup=<backup> -n openshift-adp \ <1>
  --include-namespaces <namespace> \ <2>
  --exclude-resources replicationcontroller,deploymentconfig,templateinstances.template.openshift.io \
  --restore-volumes=true
----
<1> Specify the name of the `Backup` CR.
<2> Specify the `include-namespaces` in the `Backup` CR.

. Verify that the status of the `Restore` CR is `Completed`:
+
[source,terminal]
----
$ oc get restore -n openshift-adp <restore> -o jsonpath='{.status.phase}'
----

. Create a `Restore` CR that includes the `ReplicationController` and `DeploymentConfig` resources:
+
[source,terminal]
----
$ velero restore create --from-backup=<backup> -n openshift-adp \
  --include-namespaces <namespace> \
  --include-resources replicationcontroller,deploymentconfig \
  --restore-volumes=true
----

. Verify that the status of the `Restore` CR is `Completed`:
+
[source,terminal]
----
$ oc get restore -n openshift-adp <restore> -o jsonpath='{.status.phase}'
----

. Verify that the backup resources have been restored:
+
[source,terminal]
----
$ oc get all -n <namespace>
----

[id="restic-backup-cannot-be-recreated-after-s3-bucket-emptied_{context}"]
== Restic Backup CR cannot be recreated after bucket is emptied

If you create a Restic `Backup` CR for a namespace, empty the S3 bucket, and then recreate the `Backup` CR for the same namespace, the recreated `Backup` CR fails.

The `velero` pod log displays the error message, `msg="Error checking repository for stale locks"`.

.Cause

Velero does not create the Restic repository from the `ResticRepository` manifest if the Restic directories are deleted on object storage. See (link:https://github.com/vmware-tanzu/velero/issues/4421[Velero issue 4421]) for details.

// For https://issues.redhat.com/browse/OADP-177
// I am linking to GitHub in this isolated instance because it is a link to an issue, not to the repo. The user needs to have this link in order to know whether it has been resolved.