1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

OADP-4561: Update parent ToC for Troubleshooting modules

Removed unnecessary syntax from newly added assemblies

fixed xref issue

Added xrefs of all modules to the troubleshooting assembly

Added missing xref anchor ID

Peer review

Merge review suggestions
This commit is contained in:
Apurva Bhide
2025-02-25 00:27:02 +05:30
committed by Steven Smith
parent 8e6e7a80cc
commit 11f0d49a81
17 changed files with 323 additions and 215 deletions

View File

@@ -3670,12 +3670,32 @@ Topics:
File: oadp-backup-restore-csi-snapshots
- Name: Overriding Kopia algorithms
File: overriding-kopia-algorithms
- Name: Troubleshooting
File: troubleshooting
- Name: OADP API
File: oadp-api
- Name: Advanced OADP features and functionalities
File: oadp-advanced-topics
- Name: Troubleshooting OADP
File: troubleshooting
- Name: Velero CLI tool
File: velero-cli-tool
- Name: Pods crash or restart due to lack of memory or CPU
File: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
- Name: Issues with Velero and admission webhooks
File: issues-with-velero-and-admission-webhooks
- Name: OADP installation issues
File: oadp-installation-issues
- Name: OADP Operator issues
File: oadp-operator-issues
- Name: OADP timeouts
File: oadp-timeouts
- Name: Backup and Restore CR issues
File: backup-and-restore-cr-issues
- Name: Restic issues
File: restic-issues
- Name: Using the must-gather tool
File: using-the-must-gather-tool
- Name: OADP monitoring
File: oadp-monitoring
- Name: Control plane backup and restore
Dir: control_plane_backup_and_restore
Topics:

View File

@@ -61,7 +61,7 @@ You can schedule backups by creating a `Schedule` CR instead of a `Backup` CR. S
This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.
ifndef::openshift-rosa,openshift-rosa-hcp[]
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-restore-failing-psa-policy_oadp-troubleshooting[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/restic-issues.adoc#oadp-restic-restore-failing-psa-policy_restic-issues[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
endif::openshift-rosa,openshift-rosa-hcp[]
// TODO: Add xrefs to ROSA HCP when Operators book is added.

View File

@@ -1,17 +1,20 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: CONCEPT
[id="oadp-backup-restore-cr-issues_{context}"]
:_mod-docs-content-type: ASSEMBLY
[id="backup-and-restore-cr-issues"]
= Backup and Restore CR issues
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: backup-and-restore-cr-issues
:namespace: openshift-adp
:local-product: OADP
toc::[]
You might encounter these common issues with `Backup` and `Restore` custom resources (CRs).
[id="backup-cannot-retrieve-volume_{context}"]
== Backup CR cannot retrieve volume
The `Backup` CR displays the error message, `InvalidVolume.NotFound: The volume vol-xxxx does not exist`.
The `Backup` CR displays the following error message: `InvalidVolume.NotFound: The volume vol-xxxx does not exist`.
.Cause
@@ -33,7 +36,7 @@ If a backup is interrupted, it cannot be resumed.
.Solution
. Retrieve the details of the `Backup` CR:
. Retrieve the details of the `Backup` CR by running the following command:
+
[source,terminal]
----
@@ -41,18 +44,18 @@ $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
backup describe <backup>
----
. Delete the `Backup` CR:
. Delete the `Backup` CR by running the following command:
+
[source,terminal]
----
$ oc delete backups.velero.io <backup> -n openshift-adp
----
+
You do not need to clean up the backup location because a `Backup` CR in progress has not uploaded files to object storage.
You do not need to clean up the backup location because an in progress `Backup` CR has not uploaded files to object storage.
. Create a new `Backup` CR.
. View the Velero backup details
. View the Velero backup details by running the following command:
+
[source,terminal, subs="+quotes"]
----
@@ -62,11 +65,11 @@ $ velero backup describe _<backup-name>_ --details
[id="backup-cr-remains-partiallyfailed_{context}"]
== Backup CR status remains in PartiallyFailed
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and does not complete. A snapshot of the affiliated PVC is not created.
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and is not completed. A snapshot of the affiliated PVC is not created.
.Cause
If the backup is created based on the CSI snapshot class, but the label is missing, CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following:
If the backup created based on the CSI snapshot class is missing a label, the CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following message:
[source,text]
----
@@ -75,7 +78,7 @@ time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=opens
.Solution
. Delete the `Backup` CR:
. Delete the `Backup` CR by running the following command::
+
[source,terminal]
----
@@ -84,11 +87,11 @@ $ oc delete backups.velero.io <backup> -n openshift-adp
. If required, clean up the stored data on the `BackupStorageLocation` to free up space.
. Apply label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object:
. Apply the label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object by running the following command:
+
[source,terminal]
----
$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true
----
. Create a new `Backup` CR.
. Create a new `Backup` CR.

View File

@@ -0,0 +1,33 @@
:_mod-docs-content-type: ASSEMBLY
[id="issues-with-velero-and-admission-webhooks"]
= Issues with Velero and admission webhooks
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: issues-with-velero-and-admission-webhooks
:namespace: openshift-adp
:local-product: OADP
toc::[]
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
For example, creating or restoring a top-level object such as `service.serving.knative.dev` typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
[id="velero-restore-workarounds-for-workloads-with-admission-webhooks_{context}"]
== Restoring workarounds for Velero backups that use admission webhooks
You need additional steps to restore resources for several types of Velero backups that use admission webhooks.
include::modules/migration-debugging-velero-admission-webhooks-knative.adoc[leveloffset=+2]
include::modules/migration-debugging-velero-admission-webhooks-ibm-appconnect.adoc[leveloffset=+2]
include::modules/oadp-features-plugins-known-issues.adoc[leveloffset=+1]
include::modules/oadp-plugins-receiving-eof-message.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../architecture/admission-plug-ins.adoc#admission-plug-ins[Admission plugins]
* xref:../../architecture/admission-plug-ins.adoc#admission-webhooks-about_admission-plug-ins[Webhook admission plugins]
* xref:../../architecture/admission-plug-ins.adoc#admission-webhook-types_admission-plug-ins[Types of webhook admission plugins]

View File

@@ -1,17 +1,20 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: ASSEMBLY
[id="oadp-installation-issues"]
= OADP installation issues
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: installation-issues
:namespace: openshift-adp
:local-product: OADP
:_mod-docs-content-type: CONCEPT
[id="oadp-installation-issues_{context}"]
= Installation issues
toc::[]
You might encounter issues caused by using invalid directories or incorrect credentials when you install the Data Protection Application.
[id="oadp-backup-location-contains-invalid-directories_{context}"]
== Backup storage contains invalid directories
The `Velero` pod log displays the error message, `Backup storage contains invalid top-level directories`.
The `Velero` pod log displays the following error message: `Backup storage contains invalid top-level directories`.
.Cause
@@ -24,9 +27,9 @@ If the object storage is not dedicated to Velero, you must specify a prefix for
[id="oadp-incorrect-aws-credentials_{context}"]
== Incorrect AWS credentials
The `oadp-aws-registry` pod log displays the error message, `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
The `oadp-aws-registry` pod log displays the following error message: `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
The `Velero` pod log displays the error message, `NoCredentialProviders: no valid providers in chain`.
The `Velero` pod log displays the following error message: `NoCredentialProviders: no valid providers in chain`.
.Cause

View File

@@ -0,0 +1,32 @@
:_mod-docs-content-type: ASSEMBLY
[id="oadp-monitoring"]
= OADP monitoring
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: oadp-monitoring
:namespace: openshift-adp
:local-product: OADP
toc::[]
By using the {product-title} monitoring stack, users and administrators can effectively perform the following tasks:
* Monitor and manage clusters
* Analyze the workload performance of user applications
* Monitor services running on the clusters
* Receive alerts if an event occurs
[role="_additional-resources"]
.Additional resources
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
include::modules/oadp-monitoring-setup.adoc[leveloffset=+1]
include::modules/oadp-creating-service-monitor.adoc[leveloffset=+1]
include::modules/oadp-creating-alerting-rule.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../observability/monitoring/managing-alerts/managing-alerts-as-an-administrator.adoc#managing-alerts-as-an-administrator[Managing alerts as an Administrator]
include::modules/oadp-list-of-metrics.adoc[leveloffset=+1]
include::modules/oadp-viewing-metrics-ui.adoc[leveloffset=+1]

View File

@@ -1,17 +1,20 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: PROCEDURE
[id="oadp-operator-issues_{context}"]
:_mod-docs-content-type: ASSEMBLY
[id="oadp-operator-issues"]
= OADP Operator issues
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: oadp-operator-issues
:namespace: openshift-adp
:local-product: OADP
toc::[]
The {oadp-first} Operator might encounter issues caused by problems it is not able to resolve.
[id="oadp-operator-fails-silently_{context}"]
== OADP Operator fails silently
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <OADP_Operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <oadp_operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
.Cause
@@ -23,31 +26,28 @@ Retrieve a list of backup storage locations (BSLs) and check the manifest of eac
.Procedure
. Run one of the following commands to retrieve a list of BSLs:
.. Using the OpenShift CLI:
. Retrieve a list of BSLs by using either the OpenShift or Velero command-line interface (CLI):
.. Retrieve a list of BSLs by using the OpenShift CLI (`oc`):
+
[source,terminal]
----
$ oc get backupstoragelocations.velero.io -A
----
.. Using the Velero CLI:
.. Retrieve a list of BSLs by using the `velero` CLI:
+
[source,terminal]
----
$ velero backup-location get -n <OADP_Operator_namespace>
$ velero backup-location get -n <oadp_operator_namespace>
----
. Using the list of BSLs, run the following command to display the manifest of each BSL, and examine each manifest for an error.
. Use the list of BSLs from the previous step and run the following command to examine the manifest of each BSL for an error:
+
[source,terminal]
----
$ oc get backupstoragelocations.velero.io -n <namespace> -o yaml
----
+
.Example result
[source, yaml]
----
apiVersion: v1
@@ -90,4 +90,4 @@ items:
kind: List
metadata:
resourceVersion: ""
----
----

View File

@@ -0,0 +1,36 @@
:_mod-docs-content-type: ASSEMBLY
[id="oadp-timeouts"]
= OADP timeouts
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: oadp-timeouts
:namespace: openshift-adp
:local-product: OADP
toc::[]
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce errors, retries, or failures.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
The following OADP timeouts show instructions of how and when to implement these parameters:
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#restic-timeout_oadp-timeouts[Restic timeout]
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#velero-timeout_oadp-timeouts[Velero resource timeout]
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#datamover-timeout_oadp-timeouts[Data Mover timeout]
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#csisnapshot-timeout_oadp-timeouts[CSI snapshot timeout]
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-backup_oadp-timeouts[Item operation timeout - backup]
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-restore_oadp-timeouts[Item operation timeout - restore]
include::modules/oadp-restic-timeouts.adoc[leveloffset=+1]
include::modules/oadp-velero-timeouts.adoc[leveloffset=+1]
include::modules/oadp-velero-default-timeouts.adoc[leveloffset=+2]
include::modules/oadp-datamover-timeouts.adoc[leveloffset=+1]
include::modules/oadp-csi-snapshot-timeouts.adoc[leveloffset=+1]
include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+1]
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+1]

View File

@@ -0,0 +1,31 @@
:_mod-docs-content-type: ASSEMBLY
[id="pods-crash-or-restart-due-to-lack-of-memory-or-cpu"]
= Pods crash or restart due to lack of memory or CPU
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
:namespace: openshift-adp
:local-product: OADP
:must-gather-v1-3: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3
:must-gather-v1-4: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
toc::[]
If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
The values for the resource request fields must follow the same format as Kubernetes resource requirements.
If you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, see the following default `resources` specification configuration for a Velero or Restic pod:
[source,yaml]
----
requests:
cpu: 500m
memory: 128Mi
----
[role="_additional-resources"]
.Additional resources
* xref:../../backup_and_restore/application_backup_and_restore/installing/about-installing-oadp.adoc#oadp-velero-cpu-memory-requirements_about-installing-oadp[Velero CPU and memory requirements based on collected data]
include::modules/oadp-pod-crash-set-resource-request-velero.adoc[leveloffset=+1]
include::modules/oadp-pod-crash-set-resource-request-restic.adoc[leveloffset=+1]

View File

@@ -1,17 +1,20 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: CONCEPT
[id="oadp-restic-issues_{context}"]
:_mod-docs-content-type: ASSEMBLY
[id="restic-issues"]
= Restic issues
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: restic-issues
:namespace: openshift-adp
:local-product: OADP
toc::[]
You might encounter these issues when you back up applications with Restic.
[id="restic-permission-error-nfs-root-squash-enabled_{context}"]
== Restic permission error for NFS data volumes with root_squash enabled
The `Restic` pod log displays the error message: `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.
The `Restic` pod log displays the following error message: `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.
.Cause
@@ -83,3 +86,5 @@ In the following error log, `mysql-persistent` is the problematic Restic reposit
pkg/restic.(*backupper).BackupPodVolumes"
logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds
----
include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+1]

View File

@@ -12,149 +12,21 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
toc::[]
You can debug Velero custom resources (CRs) by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-debugging-oc-cli_oadp-troubleshooting[OpenShift CLI tool] or the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-debugging-velero-resources_oadp-troubleshooting[Velero CLI tool]. The Velero CLI tool provides more detailed logs and information.
You can troubleshoot OADP issues by using the following methods:
You can check xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-installation-issues_oadp-troubleshooting[installation issues], xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-backup-restore-cr-issues_oadp-troubleshooting[backup and restore CR issues], and xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-issues_oadp-troubleshooting[Restic issues].
* Debug Velero custom resources (CRs) by using the xref:../../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#oadp-debugging-oc-cli_velero-cli-tool[OpenShift CLI tool] or the xref:../../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#migration-debugging-velero-resources_velero-cli-tool[Velero CLI tool]. The Velero CLI tool provides more detailed logs and information.
You can collect logs and CR information by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-using-must-gather_oadp-troubleshooting[`must-gather` tool].
* Debug Velero or Restic pod crashes, which are caused due to a lack of memory or CPU by using xref:../../backup_and_restore/application_backup_and_restore/pods-crash-or-restart-due-to-lack-of-memory-or-cpu.adoc#pods-crash-or-restart-due-to-lack-of-memory-or-cpu[Pods crash or restart due to lack of memory or CPU].
You can obtain the Velero CLI tool by:
* Debug issues with Velero and admission webhooks by using xref:../../backup_and_restore/application_backup_and_restore/issues-with-velero-and-admission-webhooks.adoc#issues-with-velero-and-admission-webhooks[Issues with Velero and admission webhooks].
* Downloading the Velero CLI tool
* Accessing the Velero binary in the Velero deployment in the cluster
* Check xref:../../backup_and_restore/application_backup_and_restore/oadp-installation-issues.adoc#oadp-installation-issues[OADP installation issues], xref:../../backup_and_restore/application_backup_and_restore/oadp-operator-issues.adoc#oadp-operator-issues[OADP Operator issues], xref:../../backup_and_restore/application_backup_and_restore/backup-and-restore-cr-issues.adoc#backup-and-restore-cr-issues[backup and restore CR issues], and xref:../../backup_and_restore/application_backup_and_restore/restic-issues.adoc#restic-issues[Restic issues].
include::modules/velero-obtaining-by-downloading.adoc[leveloffset=+1]
include::modules/velero-oadp-version-relationship.adoc[leveloffset=+2]
include::modules/velero-obtaining-by-accessing-binary.adoc[leveloffset=+1]
* Use the available xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#oadp-timeouts[OADP timeouts] to reduce errors, retries, or failures.
include::modules/oadp-debugging-oc-cli.adoc[leveloffset=+1]
include::modules/migration-debugging-velero-resources.adoc[leveloffset=+1]
* Collect logs and CR information by using the xref:../../backup_and_restore/application_backup_and_restore/using-the-must-gather-tool.adoc#using-the-must-gather-tool[`must-gather` tool].
* Monitor and analyze the workload performance with the help of xref:../../backup_and_restore/application_backup_and_restore/oadp-monitoring.adoc#oadp-monitoring[OADP monitoring].
[id="oadp-pod-crash-resource-request"]
== Pods crash or restart due to lack of memory or CPU
If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
[role="_additional-resources"]
.Additional resources
* xref:../../backup_and_restore/application_backup_and_restore/installing/about-installing-oadp.adoc#oadp-velero-cpu-memory-requirements_about-installing-oadp[CPU and memory requirements]
include::modules/oadp-pod-crash-set-resource-request-velero.adoc[leveloffset=+2]
include::modules/oadp-pod-crash-set-resource-request-restic.adoc[leveloffset=+2]
[IMPORTANT]
====
The values for the resource request fields must follow the same format as Kubernetes resource requirements.
Also, if you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, the default `resources` specification for a Velero pod or a Restic pod is as follows:
[source,yaml]
----
requests:
cpu: 500m
memory: 128Mi
----
====
[id="podvolumerestore-fails_{context}"]
== PodVolumeRestore fails to complete when StorageClass is NFS
The restore operation fails when there is more than one volume during a NFS restore by using `Restic` or `Kopia`. `PodVolumeRestore` either fails with the following error or keeps trying to restore before finally failing.
.Error message
[source,terminal]
----
Velero: pod volume restore failed: data path restore failed: \
Failed to run kopia restore: Failed to copy snapshot data to the target: \
restore error: copy file: error creating file: \
open /host_pods/b4d...6/volumes/kubernetes.io~nfs/pvc-53...4e5/userdata/base/13493/2681: \
no such file or directory
----
.Cause
The NFS mount path is not unique for the two volumes to restore. As a result, the `velero` lock files use the same file on the NFS server during the restore, causing the `PodVolumeRestore` to fail.
.Solution
You can resolve this issue by setting up a unique `pathPattern` for each volume, while defining the `StorageClass` for `nfs-subdir-external-provisioner` in the `deploy/class.yaml` file. Use the following `nfs-subdir-external-provisioner` `StorageClass` example:
[source,yaml]
----
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-client
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}" # <1>
onDelete: delete
----
<1> Specifies a template for creating a directory path by using `PVC` metadata such as labels, annotations, name, or namespace. To specify metadata, use `${.PVC.<metadata>}`. For example, to name a folder: `<pvc-namespace>-<pvc-name>`, use `${.PVC.namespace}-${.PVC.name}` as `pathPattern`.
[id="issues-with-velero-and-admission-workbooks"]
== Issues with Velero and admission webhooks
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
For example, creating or restoring a top-level object such as `service.serving.knative.dev` typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
[id="velero-restore-workarounds-for-workloads-with-admission-webhooks"]
=== Restoring workarounds for Velero backups that use admission webhooks
This section describes the additional steps required to restore resources for several types of Velero backups that use admission webhooks.
include::modules/migration-debugging-velero-admission-webhooks-knative.adoc[leveloffset=+3]
include::modules/migration-debugging-velero-admission-webhooks-ibm-appconnect.adoc[leveloffset=+3]
include::modules/oadp-features-plugins-known-issues.adoc[leveloffset=+2]
include::modules/oadp-plugins-receiving-eof-message.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../architecture/admission-plug-ins.adoc[Admission plugins]
* xref:../../architecture/admission-plug-ins.adoc#admission-webhooks-about_admission-plug-ins[Webhook admission plugins]
* xref:../../architecture/admission-plug-ins.adoc#admission-webhook-types_admission-plug-ins[Types of webhook admission plugins]
include::modules/oadp-installation-issues.adoc[leveloffset=+1]
include::modules/oadp-operator-issues.adoc[leveloffset=+1]
include::modules/oadp-timeouts.adoc[leveloffset=+1]
include::modules/oadp-restic-timeouts.adoc[leveloffset=+2]
include::modules/oadp-velero-timeouts.adoc[leveloffset=+2]
include::modules/oadp-datamover-timeouts.adoc[leveloffset=+2]
include::modules/oadp-csi-snapshot-timeouts.adoc[leveloffset=+2]
include::modules/oadp-velero-default-timeouts.adoc[leveloffset=+2]
include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+2]
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+2]
include::modules/oadp-backup-restore-cr-issues.adoc[leveloffset=+1]
include::modules/oadp-restic-issues.adoc[leveloffset=+1]
include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+2]
include::modules/migration-using-must-gather.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../support/gathering-cluster-data.adoc#gathering-cluster-data[Gathering cluster data]
include::modules/support-insecure-tls-connections.adoc[leveloffset=+2]
include::modules/migration-combining-must-gather.adoc[leveloffset=+2]
include::modules/oadp-monitoring.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
include::modules/oadp-monitoring-setup.adoc[leveloffset=+2]
include::modules/oadp-creating-service-monitor.adoc[leveloffset=+2]
include::modules/oadp-creating-alerting-rule.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../observability/monitoring/managing-alerts/managing-alerts-as-an-administrator.adoc#managing-alerts-as-an-administrator[Managing alerts as an Administrator]
include::modules/oadp-list-of-metrics.adoc[leveloffset=+2]
include::modules/oadp-viewing-metrics-ui.adoc[leveloffset=+2]
:oadp-troubleshooting!:

View File

@@ -0,0 +1,76 @@
:_mod-docs-content-type: ASSEMBLY
[id="using-the-must-gather-tool"]
= Using the must-gather tool
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: using-the-must-gather-tool
:namespace: openshift-adp
:local-product: OADP
:must-gather-v1-3: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3
:must-gather-v1-4: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
toc::[]
You can collect logs, metrics, and information about {local-product} custom resources by using the `must-gather` tool. The `must-gather` data must be attached to all customer cases.
You can run the `must-gather` tool with the following data collection options:
* Full `must-gather` data collection collects Prometheus metrics, pod logs, and Velero CR information for all namespaces where the OADP Operator is installed.
* Essential `must-gather` data collection collects pod logs and Velero CR information for a specific duration of time, for example, one hour or 24 hours. Prometheus metrics and duplicate logs are not included.
* `must-gather` data collection with timeout. Data collection can take a long time if there are many failed `Backup` CRs. You can improve performance by setting a timeout value.
* Prometheus metrics data dump downloads an archive file containing the metrics data collected by Prometheus.
.Prerequisites
* You have logged in to the {product-title} cluster as a user with the `cluster-admin` role.
* You have installed the OpenShift CLI (`oc`).
* You must use {op-system-base-full} {op-system-version-9} with {oadp-short} 1.4.
.Procedure
. Navigate to the directory where you want to store the `must-gather` data.
. Run the `oc adm must-gather` command for one of the following data collection options:
* For full `must-gather` data collection, including Prometheus metrics, run the following command:
+
[source,terminal,subs="attributes+"]
----
$ oc adm must-gather --image={must-gather-v1-4}
----
+
The data is saved as `must-gather/must-gather.tar.gz`. You can upload this file to a support case on the link:https://access.redhat.com/[Red{nbsp}Hat Customer Portal].
+
For essential `must-gather` data collection, without Prometheus metrics, for a specific time duration, run the following command:
+
[source,terminal,subs="attributes+"]
----
$ oc adm must-gather --image={must-gather-v1-4} \
-- /usr/bin/gather_<time>_essential <1>
----
<1> Specify the time in hours. Allowed values are `1h`, `6h`, `24h`, `72h`, or `all`, for example, `gather_1h_essential` or `gather_all_essential`.
* For `must-gather` data collection with timeout, run the following command:
+
[source,terminal,subs="attributes+"]
----
$ oc adm must-gather --image={must-gather-v1-4} \
-- /usr/bin/gather_with_timeout <timeout> <1>
----
<1> Specify a timeout value in seconds.
* For a Prometheus metrics data dump, run the following command:
+
[source,terminal,subs="attributes+"]
----
$ oc adm must-gather --image={must-gather-v1-4} -- /usr/bin/gather_metrics_dump
----
This operation can take a long time. The data is saved as `must-gather/metrics/prom_data.tar.gz`.
[role="_additional-resources"]
.Additional resources
* xref:../../support/gathering-cluster-data.adoc#gathering-cluster-data[Gathering cluster data]
include::modules/support-insecure-tls-connections.adoc[leveloffset=+1]
include::modules/migration-combining-must-gather.adoc[leveloffset=+1]

View File

@@ -0,0 +1,22 @@
:_mod-docs-content-type: ASSEMBLY
[id="velero-cli-tool"]
= Velero CLI tool
include::_attributes/common-attributes.adoc[]
include::_attributes/attributes-openshift-dedicated.adoc[]
:context: velero-cli-tool
:namespace: openshift-adp
:local-product: OADP
toc::[]
You can obtain the `velero` CLI tool by using the following options:
* Downloading the `velero` CLI tool
* Accessing the `velero` binary in the Velero deployment in the cluster
include::modules/velero-obtaining-by-downloading.adoc[leveloffset=+1]
include::modules/velero-oadp-version-relationship.adoc[leveloffset=+2]
include::modules/velero-obtaining-by-accessing-binary.adoc[leveloffset=+1]
include::modules/oadp-debugging-oc-cli.adoc[leveloffset=+1]
include::modules/migration-debugging-velero-resources.adoc[leveloffset=+1]

View File

@@ -38,7 +38,7 @@ You can always recover from a disaster situation by xref:../backup_and_restore/c
As a cluster administrator, you can back up and restore applications running on {product-title} by using the OpenShift API for Data Protection (OADP).
OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in xref:../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#velero-obtaining-by-downloading_oadp-troubleshooting[Downloading the Velero CLI tool]. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see xref:../backup_and_restore/application_backup_and_restore/oadp-features-plugins.adoc#oadp-features_oadp-features-plugins[OADP features].
OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in xref:../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#velero-obtaining-by-downloading_velero-cli-tool[Downloading the Velero CLI tool]. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see xref:../backup_and_restore/application_backup_and_restore/oadp-features-plugins.adoc#oadp-features_oadp-features-plugins[OADP features].
[id="oadp-requirements"]
=== OADP requirements

View File

@@ -1,11 +0,0 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: CONCEPT
[id="oadp-monitoring_{context}"]
= OADP Monitoring
The {product-title} provides a monitoring stack that allows users and administrators to effectively monitor and manage their clusters, as well as monitor and analyze the workload performance of user applications and services running on the clusters, including receiving alerts if an event occurs.

View File

@@ -1,6 +1,6 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
// * backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc
:_mod-docs-content-type: PROCEDURE
[id="restic-timeout_{context}"]

View File

@@ -1,14 +0,0 @@
// Module included in the following assemblies:
//
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
:_mod-docs-content-type: REFERENCE
[id="oadp-timeouts_{context}"]
= OADP timeouts
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce the likelihood of errors, retries, or failures.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Carefully consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
The following are various OADP timeouts, with instructions of how and when to implement these parameters: