openshift-docs/modules/migration-error-messages.adoc

// Module included in the following assemblies:
//
// * migrating_from_ocp_3_to_4/troubleshooting-3-4.adoc
// * migration_toolkit_for_containers/troubleshooting-mtc

:_mod-docs-content-type: PROCEDURE
[id="migration-error-messages_{context}"]
= Error messages and resolutions

This section describes common error messages you might encounter with the {mtc-first} and how to resolve their underlying causes.

[id="ca-certificate-error-displayed-when-accessing-console-for-first-time_{context}"]
== CA certificate error displayed when accessing the {mtc-short} console for the first time

If a `CA certificate error` message is displayed the first time you try to access the {mtc-short} console, the likely cause is the use of self-signed CA certificates in one of the clusters.

To resolve this issue, navigate to the `oauth-authorization-server` URL displayed in the error message and accept the certificate. To resolve this issue permanently, add the certificate to the trust store of your web browser.

If an `Unauthorized` message is displayed after you have accepted the certificate, navigate to the {mtc-short} console and refresh the web page.

[id="oauth-timeout-error-in-console_{context}"]
== OAuth timeout error in the {mtc-short} console

If a `connection has timed out` message is displayed in the {mtc-short} console after you have accepted a self-signed certificate, the causes are likely to be the following:

* Interrupted network access to the OAuth server
* Interrupted network access to the {product-title} console
* Proxy configuration that blocks access to the `oauth-authorization-server` URL. See link:https://access.redhat.com/solutions/5514491[MTC console inaccessible because of OAuth timeout error] for details.

To determine the cause of the timeout:

* Inspect the {mtc-short} console web page with a browser web inspector.
* Check the `Migration UI` pod log for errors.

[id="certificate-signed-by-unknown-authority-error_{context}"]
== Certificate signed by unknown authority error

If you use a self-signed certificate to secure a cluster or a replication repository for the {mtc-short}, certificate verification might fail with the following error message: `Certificate signed by unknown authority`.

You can create a custom CA certificate bundle file and upload it in the {mtc-short} web console when you add a cluster or a replication repository.

.Procedure

Download a CA certificate from a remote endpoint and save it as a CA bundle file:

[source,terminal]
----
$ echo -n | openssl s_client -connect <host_FQDN>:<port> \ <1>
  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert> <2>
----
<1> Specify the host FQDN and port of the endpoint, for example, `api.my-cluster.example.com:6443`.
<2> Specify the name of the CA bundle file.

[id="backup-storage-location-errors-in-velero-pod-log_{context}"]
== Backup storage location errors in the Velero pod log

If a `Velero` `Backup` custom resource contains a reference to a backup storage location (BSL) that does not exist, the `Velero` pod log might display the following error messages:

[source,terminal]
----
$ oc logs <Velero_Pod> -n openshift-migration
----

.Example output
[source,terminal]
----
level=error msg="Error checking repository for stale locks" error="error getting backup storage location: BackupStorageLocation.velero.io \"ts-dpa-1\" not found" error.file="/remote-source/src/github.com/vmware-tanzu/velero/pkg/restic/repository_manager.go:259"
----

You can ignore these error messages. A missing BSL cannot cause a migration to fail.

[id="pod-volume-backup-timeout-error-in-velero-pod-log_{context}"]
== Pod volume backup timeout error in the Velero pod log

If a migration fails because Restic times out, the following error is displayed in the `Velero` pod log.

[source,terminal]
----
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
----

The default value of `restic_timeout` is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.

.Procedure

. In the {product-title} web console, navigate to *Ecosystem* -> *Installed Operators*.
. Click *{mtc-full} Operator*.
. In the *MigrationController* tab, click *migration-controller*.
. In the *YAML* tab, update the following parameter value:
+
[source,yaml]
----
spec:
  restic_timeout: 1h <1>
----
<1> Valid units are `h` (hours), `m` (minutes), and `s` (seconds), for example, `3h30m15s`.

. Click *Save*.

[id="restic-verification-errors-in-migmigration-custom-resource_{context}"]
== Restic verification errors in the MigMigration custom resource

If data verification fails when migrating a persistent volume with the file system data copy method, the following error is displayed in the `MigMigration` CR.

.Example output
[source,yaml]
----
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: 2020-04-16T20:35:16Z
    message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
      for details <1>
    status: "True"
    type: ResticVerifyErrors <2>
----
<1> The error message identifies the `Restore` CR name.
<2> `ResticVerifyErrors` is a general error warning type that includes verification errors.

[NOTE]
====
A data verification error does not cause the migration process to fail.
====

You can check the `Restore` CR to identify the source of the data verification error.

.Procedure

. Log in to the target cluster.
. View the `Restore` CR:
+
[source,terminal]
----
$ oc describe <registry-example-migration-rvwcm> -n openshift-migration
----
+
The output identifies the persistent volume with `PodVolumeRestore` errors.
+
.Example output
[source,yaml]
----
status:
  phase: Completed
  podVolumeRestoreErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration
  podVolumeRestoreResticErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration
----

. View the `PodVolumeRestore` CR:
+
[source,terminal]
----
$ oc describe <migration-example-rvwcm-98t49>
----
+
The output identifies the `Restic` pod that logged the errors.
+
.Example output
[source,yaml]
----
  completionTimestamp: 2020-05-01T20:49:12Z
  errors: 1
  resticErrors: 1
  ...
  resticPod: <restic-nr2v5>
----

. View the `Restic` pod log to locate the errors:
+
[source,terminal]
----
$ oc logs -f <restic-nr2v5>
----

[id="restic-permission-error-when-migrating-from-nfs-storage-with-root-squash-enabled_{context}"]
== Restic permission error when migrating from NFS storage with root_squash enabled

If you are migrating data from NFS storage and `root_squash` is enabled, `Restic` maps to `nfsnobody` and does not have permission to perform the migration. The following error is displayed in the `Restic` pod log.

.Example output
[source,terminal]
----
backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec /usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/pod_volume_backup_controller.go:280" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup" logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id> namespace=openshift-migration
----

You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the `MigrationController` CR manifest.

.Procedure

. Create a supplemental group for Restic on the NFS storage.
. Set the `setgid` bit on the NFS directories so that group ownership is inherited.
. Add the `restic_supplemental_groups` parameter to the `MigrationController` CR manifest on the source and target clusters:
+
[source,yaml]
----
spec:
  restic_supplemental_groups: <group_id> <1>
----
<1> Specify the supplemental group ID.

. Wait for the `Restic` pods to restart so that the changes are applied.