mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
OADP-4561: Update parent ToC for Troubleshooting modules
Removed unnecessary syntax from newly added assemblies fixed xref issue Added xrefs of all modules to the troubleshooting assembly Added missing xref anchor ID Peer review Merge review suggestions
This commit is contained in:
committed by
Steven Smith
parent
8e6e7a80cc
commit
11f0d49a81
@@ -3670,12 +3670,32 @@ Topics:
|
||||
File: oadp-backup-restore-csi-snapshots
|
||||
- Name: Overriding Kopia algorithms
|
||||
File: overriding-kopia-algorithms
|
||||
- Name: Troubleshooting
|
||||
File: troubleshooting
|
||||
- Name: OADP API
|
||||
File: oadp-api
|
||||
- Name: Advanced OADP features and functionalities
|
||||
File: oadp-advanced-topics
|
||||
- Name: Troubleshooting OADP
|
||||
File: troubleshooting
|
||||
- Name: Velero CLI tool
|
||||
File: velero-cli-tool
|
||||
- Name: Pods crash or restart due to lack of memory or CPU
|
||||
File: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
|
||||
- Name: Issues with Velero and admission webhooks
|
||||
File: issues-with-velero-and-admission-webhooks
|
||||
- Name: OADP installation issues
|
||||
File: oadp-installation-issues
|
||||
- Name: OADP Operator issues
|
||||
File: oadp-operator-issues
|
||||
- Name: OADP timeouts
|
||||
File: oadp-timeouts
|
||||
- Name: Backup and Restore CR issues
|
||||
File: backup-and-restore-cr-issues
|
||||
- Name: Restic issues
|
||||
File: restic-issues
|
||||
- Name: Using the must-gather tool
|
||||
File: using-the-must-gather-tool
|
||||
- Name: OADP monitoring
|
||||
File: oadp-monitoring
|
||||
- Name: Control plane backup and restore
|
||||
Dir: control_plane_backup_and_restore
|
||||
Topics:
|
||||
|
||||
@@ -61,7 +61,7 @@ You can schedule backups by creating a `Schedule` CR instead of a `Backup` CR. S
|
||||
This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.
|
||||
|
||||
ifndef::openshift-rosa,openshift-rosa-hcp[]
|
||||
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-restore-failing-psa-policy_oadp-troubleshooting[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
|
||||
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/restic-issues.adoc#oadp-restic-restore-failing-psa-policy_restic-issues[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
|
||||
endif::openshift-rosa,openshift-rosa-hcp[]
|
||||
|
||||
// TODO: Add xrefs to ROSA HCP when Operators book is added.
|
||||
|
||||
@@ -1,17 +1,20 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="oadp-backup-restore-cr-issues_{context}"]
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="backup-and-restore-cr-issues"]
|
||||
= Backup and Restore CR issues
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: backup-and-restore-cr-issues
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
You might encounter these common issues with `Backup` and `Restore` custom resources (CRs).
|
||||
|
||||
[id="backup-cannot-retrieve-volume_{context}"]
|
||||
== Backup CR cannot retrieve volume
|
||||
|
||||
The `Backup` CR displays the error message, `InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist`.
|
||||
The `Backup` CR displays the following error message: `InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist`.
|
||||
|
||||
.Cause
|
||||
|
||||
@@ -33,7 +36,7 @@ If a backup is interrupted, it cannot be resumed.
|
||||
|
||||
.Solution
|
||||
|
||||
. Retrieve the details of the `Backup` CR:
|
||||
. Retrieve the details of the `Backup` CR by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
@@ -41,18 +44,18 @@ $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
|
||||
backup describe <backup>
|
||||
----
|
||||
|
||||
. Delete the `Backup` CR:
|
||||
. Delete the `Backup` CR by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc delete backups.velero.io <backup> -n openshift-adp
|
||||
----
|
||||
+
|
||||
You do not need to clean up the backup location because a `Backup` CR in progress has not uploaded files to object storage.
|
||||
You do not need to clean up the backup location because an in progress `Backup` CR has not uploaded files to object storage.
|
||||
|
||||
. Create a new `Backup` CR.
|
||||
|
||||
. View the Velero backup details
|
||||
. View the Velero backup details by running the following command:
|
||||
+
|
||||
[source,terminal, subs="+quotes"]
|
||||
----
|
||||
@@ -62,11 +65,11 @@ $ velero backup describe _<backup-name>_ --details
|
||||
[id="backup-cr-remains-partiallyfailed_{context}"]
|
||||
== Backup CR status remains in PartiallyFailed
|
||||
|
||||
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and does not complete. A snapshot of the affiliated PVC is not created.
|
||||
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and is not completed. A snapshot of the affiliated PVC is not created.
|
||||
|
||||
.Cause
|
||||
|
||||
If the backup is created based on the CSI snapshot class, but the label is missing, CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following:
|
||||
If the backup created based on the CSI snapshot class is missing a label, the CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following message:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
@@ -75,7 +78,7 @@ time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=opens
|
||||
|
||||
.Solution
|
||||
|
||||
. Delete the `Backup` CR:
|
||||
. Delete the `Backup` CR by running the following command::
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
@@ -84,11 +87,11 @@ $ oc delete backups.velero.io <backup> -n openshift-adp
|
||||
|
||||
. If required, clean up the stored data on the `BackupStorageLocation` to free up space.
|
||||
|
||||
. Apply label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object:
|
||||
. Apply the label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true
|
||||
----
|
||||
|
||||
. Create a new `Backup` CR.
|
||||
. Create a new `Backup` CR.
|
||||
@@ -0,0 +1,33 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="issues-with-velero-and-admission-webhooks"]
|
||||
= Issues with Velero and admission webhooks
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: issues-with-velero-and-admission-webhooks
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
|
||||
|
||||
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
|
||||
|
||||
For example, creating or restoring a top-level object such as `service.serving.knative.dev` typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
|
||||
|
||||
[id="velero-restore-workarounds-for-workloads-with-admission-webhooks_{context}"]
|
||||
== Restoring workarounds for Velero backups that use admission webhooks
|
||||
|
||||
You need additional steps to restore resources for several types of Velero backups that use admission webhooks.
|
||||
|
||||
include::modules/migration-debugging-velero-admission-webhooks-knative.adoc[leveloffset=+2]
|
||||
include::modules/migration-debugging-velero-admission-webhooks-ibm-appconnect.adoc[leveloffset=+2]
|
||||
include::modules/oadp-features-plugins-known-issues.adoc[leveloffset=+1]
|
||||
include::modules/oadp-plugins-receiving-eof-message.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
|
||||
* xref:../../architecture/admission-plug-ins.adoc#admission-plug-ins[Admission plugins]
|
||||
* xref:../../architecture/admission-plug-ins.adoc#admission-webhooks-about_admission-plug-ins[Webhook admission plugins]
|
||||
* xref:../../architecture/admission-plug-ins.adoc#admission-webhook-types_admission-plug-ins[Types of webhook admission plugins]
|
||||
@@ -1,17 +1,20 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="oadp-installation-issues"]
|
||||
= OADP installation issues
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: installation-issues
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="oadp-installation-issues_{context}"]
|
||||
= Installation issues
|
||||
toc::[]
|
||||
|
||||
You might encounter issues caused by using invalid directories or incorrect credentials when you install the Data Protection Application.
|
||||
|
||||
[id="oadp-backup-location-contains-invalid-directories_{context}"]
|
||||
== Backup storage contains invalid directories
|
||||
|
||||
The `Velero` pod log displays the error message, `Backup storage contains invalid top-level directories`.
|
||||
The `Velero` pod log displays the following error message: `Backup storage contains invalid top-level directories`.
|
||||
|
||||
.Cause
|
||||
|
||||
@@ -24,9 +27,9 @@ If the object storage is not dedicated to Velero, you must specify a prefix for
|
||||
[id="oadp-incorrect-aws-credentials_{context}"]
|
||||
== Incorrect AWS credentials
|
||||
|
||||
The `oadp-aws-registry` pod log displays the error message, `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
|
||||
The `oadp-aws-registry` pod log displays the following error message: `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
|
||||
|
||||
The `Velero` pod log displays the error message, `NoCredentialProviders: no valid providers in chain`.
|
||||
The `Velero` pod log displays the following error message: `NoCredentialProviders: no valid providers in chain`.
|
||||
|
||||
.Cause
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="oadp-monitoring"]
|
||||
= OADP monitoring
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: oadp-monitoring
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
By using the {product-title} monitoring stack, users and administrators can effectively perform the following tasks:
|
||||
|
||||
* Monitor and manage clusters
|
||||
* Analyze the workload performance of user applications
|
||||
* Monitor services running on the clusters
|
||||
* Receive alerts if an event occurs
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
|
||||
|
||||
include::modules/oadp-monitoring-setup.adoc[leveloffset=+1]
|
||||
include::modules/oadp-creating-service-monitor.adoc[leveloffset=+1]
|
||||
include::modules/oadp-creating-alerting-rule.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../observability/monitoring/managing-alerts/managing-alerts-as-an-administrator.adoc#managing-alerts-as-an-administrator[Managing alerts as an Administrator]
|
||||
|
||||
include::modules/oadp-list-of-metrics.adoc[leveloffset=+1]
|
||||
include::modules/oadp-viewing-metrics-ui.adoc[leveloffset=+1]
|
||||
@@ -1,17 +1,20 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="oadp-operator-issues_{context}"]
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="oadp-operator-issues"]
|
||||
= OADP Operator issues
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: oadp-operator-issues
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
The {oadp-first} Operator might encounter issues caused by problems it is not able to resolve.
|
||||
|
||||
[id="oadp-operator-fails-silently_{context}"]
|
||||
== OADP Operator fails silently
|
||||
|
||||
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <OADP_Operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
|
||||
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <oadp_operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
|
||||
|
||||
.Cause
|
||||
|
||||
@@ -23,31 +26,28 @@ Retrieve a list of backup storage locations (BSLs) and check the manifest of eac
|
||||
|
||||
.Procedure
|
||||
|
||||
. Run one of the following commands to retrieve a list of BSLs:
|
||||
|
||||
.. Using the OpenShift CLI:
|
||||
. Retrieve a list of BSLs by using either the OpenShift or Velero command-line interface (CLI):
|
||||
.. Retrieve a list of BSLs by using the OpenShift CLI (`oc`):
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get backupstoragelocations.velero.io -A
|
||||
----
|
||||
|
||||
.. Using the Velero CLI:
|
||||
.. Retrieve a list of BSLs by using the `velero` CLI:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ velero backup-location get -n <OADP_Operator_namespace>
|
||||
$ velero backup-location get -n <oadp_operator_namespace>
|
||||
----
|
||||
|
||||
. Using the list of BSLs, run the following command to display the manifest of each BSL, and examine each manifest for an error.
|
||||
. Use the list of BSLs from the previous step and run the following command to examine the manifest of each BSL for an error:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get backupstoragelocations.velero.io -n <namespace> -o yaml
|
||||
----
|
||||
|
||||
+
|
||||
.Example result
|
||||
|
||||
[source, yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
@@ -90,4 +90,4 @@ items:
|
||||
kind: List
|
||||
metadata:
|
||||
resourceVersion: ""
|
||||
----
|
||||
----
|
||||
@@ -0,0 +1,36 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="oadp-timeouts"]
|
||||
= OADP timeouts
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: oadp-timeouts
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce errors, retries, or failures.
|
||||
|
||||
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
|
||||
|
||||
The following OADP timeouts show instructions of how and when to implement these parameters:
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#restic-timeout_oadp-timeouts[Restic timeout]
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#velero-timeout_oadp-timeouts[Velero resource timeout]
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#datamover-timeout_oadp-timeouts[Data Mover timeout]
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#csisnapshot-timeout_oadp-timeouts[CSI snapshot timeout]
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-backup_oadp-timeouts[Item operation timeout - backup]
|
||||
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-restore_oadp-timeouts[Item operation timeout - restore]
|
||||
|
||||
include::modules/oadp-restic-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-velero-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-velero-default-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-datamover-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-csi-snapshot-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+1]
|
||||
@@ -0,0 +1,31 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="pods-crash-or-restart-due-to-lack-of-memory-or-cpu"]
|
||||
= Pods crash or restart due to lack of memory or CPU
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
:must-gather-v1-3: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3
|
||||
:must-gather-v1-4: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
|
||||
|
||||
toc::[]
|
||||
|
||||
If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
|
||||
|
||||
The values for the resource request fields must follow the same format as Kubernetes resource requirements.
|
||||
If you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, see the following default `resources` specification configuration for a Velero or Restic pod:
|
||||
|
||||
[source,yaml]
|
||||
----
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
----
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/installing/about-installing-oadp.adoc#oadp-velero-cpu-memory-requirements_about-installing-oadp[Velero CPU and memory requirements based on collected data]
|
||||
|
||||
include::modules/oadp-pod-crash-set-resource-request-velero.adoc[leveloffset=+1]
|
||||
include::modules/oadp-pod-crash-set-resource-request-restic.adoc[leveloffset=+1]
|
||||
@@ -1,17 +1,20 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="oadp-restic-issues_{context}"]
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="restic-issues"]
|
||||
= Restic issues
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: restic-issues
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
You might encounter these issues when you back up applications with Restic.
|
||||
|
||||
[id="restic-permission-error-nfs-root-squash-enabled_{context}"]
|
||||
== Restic permission error for NFS data volumes with root_squash enabled
|
||||
|
||||
The `Restic` pod log displays the error message: `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.
|
||||
The `Restic` pod log displays the following error message: `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.
|
||||
|
||||
.Cause
|
||||
|
||||
@@ -83,3 +86,5 @@ In the following error log, `mysql-persistent` is the problematic Restic reposit
|
||||
pkg/restic.(*backupper).BackupPodVolumes"
|
||||
logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds
|
||||
----
|
||||
|
||||
include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+1]
|
||||
@@ -12,149 +12,21 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
|
||||
toc::[]
|
||||
|
||||
You can debug Velero custom resources (CRs) by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-debugging-oc-cli_oadp-troubleshooting[OpenShift CLI tool] or the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-debugging-velero-resources_oadp-troubleshooting[Velero CLI tool]. The Velero CLI tool provides more detailed logs and information.
|
||||
You can troubleshoot OADP issues by using the following methods:
|
||||
|
||||
You can check xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-installation-issues_oadp-troubleshooting[installation issues], xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-backup-restore-cr-issues_oadp-troubleshooting[backup and restore CR issues], and xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-issues_oadp-troubleshooting[Restic issues].
|
||||
* Debug Velero custom resources (CRs) by using the xref:../../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#oadp-debugging-oc-cli_velero-cli-tool[OpenShift CLI tool] or the xref:../../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#migration-debugging-velero-resources_velero-cli-tool[Velero CLI tool]. The Velero CLI tool provides more detailed logs and information.
|
||||
|
||||
You can collect logs and CR information by using the xref:../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#migration-using-must-gather_oadp-troubleshooting[`must-gather` tool].
|
||||
* Debug Velero or Restic pod crashes, which are caused due to a lack of memory or CPU by using xref:../../backup_and_restore/application_backup_and_restore/pods-crash-or-restart-due-to-lack-of-memory-or-cpu.adoc#pods-crash-or-restart-due-to-lack-of-memory-or-cpu[Pods crash or restart due to lack of memory or CPU].
|
||||
|
||||
You can obtain the Velero CLI tool by:
|
||||
* Debug issues with Velero and admission webhooks by using xref:../../backup_and_restore/application_backup_and_restore/issues-with-velero-and-admission-webhooks.adoc#issues-with-velero-and-admission-webhooks[Issues with Velero and admission webhooks].
|
||||
|
||||
* Downloading the Velero CLI tool
|
||||
* Accessing the Velero binary in the Velero deployment in the cluster
|
||||
* Check xref:../../backup_and_restore/application_backup_and_restore/oadp-installation-issues.adoc#oadp-installation-issues[OADP installation issues], xref:../../backup_and_restore/application_backup_and_restore/oadp-operator-issues.adoc#oadp-operator-issues[OADP Operator issues], xref:../../backup_and_restore/application_backup_and_restore/backup-and-restore-cr-issues.adoc#backup-and-restore-cr-issues[backup and restore CR issues], and xref:../../backup_and_restore/application_backup_and_restore/restic-issues.adoc#restic-issues[Restic issues].
|
||||
|
||||
include::modules/velero-obtaining-by-downloading.adoc[leveloffset=+1]
|
||||
include::modules/velero-oadp-version-relationship.adoc[leveloffset=+2]
|
||||
include::modules/velero-obtaining-by-accessing-binary.adoc[leveloffset=+1]
|
||||
* Use the available xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#oadp-timeouts[OADP timeouts] to reduce errors, retries, or failures.
|
||||
|
||||
include::modules/oadp-debugging-oc-cli.adoc[leveloffset=+1]
|
||||
include::modules/migration-debugging-velero-resources.adoc[leveloffset=+1]
|
||||
* Collect logs and CR information by using the xref:../../backup_and_restore/application_backup_and_restore/using-the-must-gather-tool.adoc#using-the-must-gather-tool[`must-gather` tool].
|
||||
|
||||
* Monitor and analyze the workload performance with the help of xref:../../backup_and_restore/application_backup_and_restore/oadp-monitoring.adoc#oadp-monitoring[OADP monitoring].
|
||||
|
||||
|
||||
[id="oadp-pod-crash-resource-request"]
|
||||
== Pods crash or restart due to lack of memory or CPU
|
||||
|
||||
If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../backup_and_restore/application_backup_and_restore/installing/about-installing-oadp.adoc#oadp-velero-cpu-memory-requirements_about-installing-oadp[CPU and memory requirements]
|
||||
|
||||
include::modules/oadp-pod-crash-set-resource-request-velero.adoc[leveloffset=+2]
|
||||
include::modules/oadp-pod-crash-set-resource-request-restic.adoc[leveloffset=+2]
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
The values for the resource request fields must follow the same format as Kubernetes resource requirements.
|
||||
Also, if you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, the default `resources` specification for a Velero pod or a Restic pod is as follows:
|
||||
|
||||
[source,yaml]
|
||||
----
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
----
|
||||
====
|
||||
|
||||
[id="podvolumerestore-fails_{context}"]
|
||||
== PodVolumeRestore fails to complete when StorageClass is NFS
|
||||
|
||||
The restore operation fails when there is more than one volume during a NFS restore by using `Restic` or `Kopia`. `PodVolumeRestore` either fails with the following error or keeps trying to restore before finally failing.
|
||||
|
||||
.Error message
|
||||
|
||||
[source,terminal]
|
||||
----
|
||||
Velero: pod volume restore failed: data path restore failed: \
|
||||
Failed to run kopia restore: Failed to copy snapshot data to the target: \
|
||||
restore error: copy file: error creating file: \
|
||||
open /host_pods/b4d...6/volumes/kubernetes.io~nfs/pvc-53...4e5/userdata/base/13493/2681: \
|
||||
no such file or directory
|
||||
----
|
||||
|
||||
.Cause
|
||||
|
||||
The NFS mount path is not unique for the two volumes to restore. As a result, the `velero` lock files use the same file on the NFS server during the restore, causing the `PodVolumeRestore` to fail.
|
||||
|
||||
.Solution
|
||||
|
||||
You can resolve this issue by setting up a unique `pathPattern` for each volume, while defining the `StorageClass` for `nfs-subdir-external-provisioner` in the `deploy/class.yaml` file. Use the following `nfs-subdir-external-provisioner` `StorageClass` example:
|
||||
|
||||
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: nfs-client
|
||||
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
|
||||
parameters:
|
||||
pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}" # <1>
|
||||
onDelete: delete
|
||||
----
|
||||
|
||||
<1> Specifies a template for creating a directory path by using `PVC` metadata such as labels, annotations, name, or namespace. To specify metadata, use `${.PVC.<metadata>}`. For example, to name a folder: `<pvc-namespace>-<pvc-name>`, use `${.PVC.namespace}-${.PVC.name}` as `pathPattern`.
|
||||
|
||||
[id="issues-with-velero-and-admission-workbooks"]
|
||||
== Issues with Velero and admission webhooks
|
||||
|
||||
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
|
||||
|
||||
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
|
||||
|
||||
For example, creating or restoring a top-level object such as `service.serving.knative.dev` typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
|
||||
|
||||
[id="velero-restore-workarounds-for-workloads-with-admission-webhooks"]
|
||||
=== Restoring workarounds for Velero backups that use admission webhooks
|
||||
|
||||
This section describes the additional steps required to restore resources for several types of Velero backups that use admission webhooks.
|
||||
|
||||
include::modules/migration-debugging-velero-admission-webhooks-knative.adoc[leveloffset=+3]
|
||||
include::modules/migration-debugging-velero-admission-webhooks-ibm-appconnect.adoc[leveloffset=+3]
|
||||
include::modules/oadp-features-plugins-known-issues.adoc[leveloffset=+2]
|
||||
include::modules/oadp-plugins-receiving-eof-message.adoc[leveloffset=+2]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
|
||||
* xref:../../architecture/admission-plug-ins.adoc[Admission plugins]
|
||||
* xref:../../architecture/admission-plug-ins.adoc#admission-webhooks-about_admission-plug-ins[Webhook admission plugins]
|
||||
* xref:../../architecture/admission-plug-ins.adoc#admission-webhook-types_admission-plug-ins[Types of webhook admission plugins]
|
||||
|
||||
include::modules/oadp-installation-issues.adoc[leveloffset=+1]
|
||||
include::modules/oadp-operator-issues.adoc[leveloffset=+1]
|
||||
include::modules/oadp-timeouts.adoc[leveloffset=+1]
|
||||
include::modules/oadp-restic-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-velero-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-datamover-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-csi-snapshot-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-velero-default-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+2]
|
||||
include::modules/oadp-backup-restore-cr-issues.adoc[leveloffset=+1]
|
||||
include::modules/oadp-restic-issues.adoc[leveloffset=+1]
|
||||
include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+2]
|
||||
|
||||
include::modules/migration-using-must-gather.adoc[leveloffset=+1]
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../support/gathering-cluster-data.adoc#gathering-cluster-data[Gathering cluster data]
|
||||
|
||||
include::modules/support-insecure-tls-connections.adoc[leveloffset=+2]
|
||||
include::modules/migration-combining-must-gather.adoc[leveloffset=+2]
|
||||
include::modules/oadp-monitoring.adoc[leveloffset=+1]
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
|
||||
|
||||
include::modules/oadp-monitoring-setup.adoc[leveloffset=+2]
|
||||
include::modules/oadp-creating-service-monitor.adoc[leveloffset=+2]
|
||||
include::modules/oadp-creating-alerting-rule.adoc[leveloffset=+2]
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../observability/monitoring/managing-alerts/managing-alerts-as-an-administrator.adoc#managing-alerts-as-an-administrator[Managing alerts as an Administrator]
|
||||
|
||||
include::modules/oadp-list-of-metrics.adoc[leveloffset=+2]
|
||||
include::modules/oadp-viewing-metrics-ui.adoc[leveloffset=+2]
|
||||
|
||||
:oadp-troubleshooting!:
|
||||
|
||||
@@ -0,0 +1,76 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="using-the-must-gather-tool"]
|
||||
= Using the must-gather tool
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: using-the-must-gather-tool
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
:must-gather-v1-3: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3
|
||||
:must-gather-v1-4: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
|
||||
|
||||
toc::[]
|
||||
|
||||
You can collect logs, metrics, and information about {local-product} custom resources by using the `must-gather` tool. The `must-gather` data must be attached to all customer cases.
|
||||
|
||||
You can run the `must-gather` tool with the following data collection options:
|
||||
|
||||
* Full `must-gather` data collection collects Prometheus metrics, pod logs, and Velero CR information for all namespaces where the OADP Operator is installed.
|
||||
* Essential `must-gather` data collection collects pod logs and Velero CR information for a specific duration of time, for example, one hour or 24 hours. Prometheus metrics and duplicate logs are not included.
|
||||
* `must-gather` data collection with timeout. Data collection can take a long time if there are many failed `Backup` CRs. You can improve performance by setting a timeout value.
|
||||
* Prometheus metrics data dump downloads an archive file containing the metrics data collected by Prometheus.
|
||||
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have logged in to the {product-title} cluster as a user with the `cluster-admin` role.
|
||||
* You have installed the OpenShift CLI (`oc`).
|
||||
* You must use {op-system-base-full} {op-system-version-9} with {oadp-short} 1.4.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Navigate to the directory where you want to store the `must-gather` data.
|
||||
. Run the `oc adm must-gather` command for one of the following data collection options:
|
||||
|
||||
* For full `must-gather` data collection, including Prometheus metrics, run the following command:
|
||||
+
|
||||
[source,terminal,subs="attributes+"]
|
||||
----
|
||||
$ oc adm must-gather --image={must-gather-v1-4}
|
||||
----
|
||||
+
|
||||
The data is saved as `must-gather/must-gather.tar.gz`. You can upload this file to a support case on the link:https://access.redhat.com/[Red{nbsp}Hat Customer Portal].
|
||||
+
|
||||
For essential `must-gather` data collection, without Prometheus metrics, for a specific time duration, run the following command:
|
||||
+
|
||||
[source,terminal,subs="attributes+"]
|
||||
----
|
||||
$ oc adm must-gather --image={must-gather-v1-4} \
|
||||
-- /usr/bin/gather_<time>_essential <1>
|
||||
----
|
||||
<1> Specify the time in hours. Allowed values are `1h`, `6h`, `24h`, `72h`, or `all`, for example, `gather_1h_essential` or `gather_all_essential`.
|
||||
|
||||
* For `must-gather` data collection with timeout, run the following command:
|
||||
+
|
||||
[source,terminal,subs="attributes+"]
|
||||
----
|
||||
$ oc adm must-gather --image={must-gather-v1-4} \
|
||||
-- /usr/bin/gather_with_timeout <timeout> <1>
|
||||
----
|
||||
<1> Specify a timeout value in seconds.
|
||||
|
||||
* For a Prometheus metrics data dump, run the following command:
|
||||
+
|
||||
[source,terminal,subs="attributes+"]
|
||||
----
|
||||
$ oc adm must-gather --image={must-gather-v1-4} -- /usr/bin/gather_metrics_dump
|
||||
----
|
||||
This operation can take a long time. The data is saved as `must-gather/metrics/prom_data.tar.gz`.
|
||||
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* xref:../../support/gathering-cluster-data.adoc#gathering-cluster-data[Gathering cluster data]
|
||||
|
||||
include::modules/support-insecure-tls-connections.adoc[leveloffset=+1]
|
||||
include::modules/migration-combining-must-gather.adoc[leveloffset=+1]
|
||||
@@ -0,0 +1,22 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="velero-cli-tool"]
|
||||
= Velero CLI tool
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
:context: velero-cli-tool
|
||||
:namespace: openshift-adp
|
||||
:local-product: OADP
|
||||
|
||||
toc::[]
|
||||
|
||||
You can obtain the `velero` CLI tool by using the following options:
|
||||
|
||||
* Downloading the `velero` CLI tool
|
||||
* Accessing the `velero` binary in the Velero deployment in the cluster
|
||||
|
||||
include::modules/velero-obtaining-by-downloading.adoc[leveloffset=+1]
|
||||
include::modules/velero-oadp-version-relationship.adoc[leveloffset=+2]
|
||||
include::modules/velero-obtaining-by-accessing-binary.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/oadp-debugging-oc-cli.adoc[leveloffset=+1]
|
||||
include::modules/migration-debugging-velero-resources.adoc[leveloffset=+1]
|
||||
@@ -38,7 +38,7 @@ You can always recover from a disaster situation by xref:../backup_and_restore/c
|
||||
|
||||
As a cluster administrator, you can back up and restore applications running on {product-title} by using the OpenShift API for Data Protection (OADP).
|
||||
|
||||
OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in xref:../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#velero-obtaining-by-downloading_oadp-troubleshooting[Downloading the Velero CLI tool]. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see xref:../backup_and_restore/application_backup_and_restore/oadp-features-plugins.adoc#oadp-features_oadp-features-plugins[OADP features].
|
||||
OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in xref:../backup_and_restore/application_backup_and_restore/velero-cli-tool.adoc#velero-obtaining-by-downloading_velero-cli-tool[Downloading the Velero CLI tool]. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see xref:../backup_and_restore/application_backup_and_restore/oadp-features-plugins.adoc#oadp-features_oadp-features-plugins[OADP features].
|
||||
|
||||
[id="oadp-requirements"]
|
||||
=== OADP requirements
|
||||
|
||||
@@ -1,11 +0,0 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="oadp-monitoring_{context}"]
|
||||
= OADP Monitoring
|
||||
|
||||
The {product-title} provides a monitoring stack that allows users and administrators to effectively monitor and manage their clusters, as well as monitor and analyze the workload performance of user applications and services running on the clusters, including receiving alerts if an event occurs.
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
// * backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="restic-timeout_{context}"]
|
||||
|
||||
@@ -1,14 +0,0 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="oadp-timeouts_{context}"]
|
||||
= OADP timeouts
|
||||
|
||||
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce the likelihood of errors, retries, or failures.
|
||||
|
||||
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Carefully consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
|
||||
|
||||
The following are various OADP timeouts, with instructions of how and when to implement these parameters:
|
||||
|
||||
Reference in New Issue
Block a user