Merge pull request #71060 from openshift-cherrypick-robot/cherry-pick-70568-to-enterprise-4.15

[enterprise-4.15] OCPBUGS-20194: Stop other control plane components on non-recovery hosts during etcd restore procedure
2026-02-05 12:46:18 +01:00 · 2024-01-31 17:47:28 -05:00
parent 429d27108e 089749839d
commit dc5d8b4589
1 changed files with 94 additions and 60 deletions
--- a/modules/dr-restoring-cluster-state.adoc
+++ b/modules/dr-restoring-cluster-state.adoc
@@ -8,16 +8,16 @@
 [id="dr-scenario-2-restoring-cluster-state_{context}"]
 = Restoring to a previous cluster state

-You can use a saved etcd backup to restore a previous cluster state or restore a cluster that has lost the majority of control plane hosts.
+You can use a saved `etcd` backup to restore a previous cluster state or restore a cluster that has lost the majority of control plane hosts.

 [NOTE]
 ====
-If your cluster uses a control plane machine set, see "Troubleshooting the control plane machine set" for a more simple etcd recovery procedure.
+If your cluster uses a control plane machine set, see "Troubleshooting the control plane machine set" for a more simple `etcd` recovery procedure.
 ====

 [IMPORTANT]
 ====
-When you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an {product-title} 4.7.2 cluster must use an etcd backup that was taken from 4.7.2.
+When you restore your cluster, you must use an `etcd` backup that was taken from the same z-stream release. For example, an {product-title} 4.7.2 cluster must use an `etcd` backup that was taken from 4.7.2.
 ====

 .Prerequisites
@@ -25,7 +25,7 @@ When you restore your cluster, you must use an etcd backup that was taken from t
 * Access to the cluster as a user with the `cluster-admin` role through a certificate-based `kubeconfig` file, like the one that was used during installation.
 * A healthy control plane host to use as the recovery host.
 * SSH access to control plane hosts.
-* A backup directory containing both the etcd snapshot and the resources for the static pods, which were from the same backup. The file names in the directory must be in the following formats: `snapshot_<datetimestamp>.db` and `static_kuberesources_<datetimestamp>.tar.gz`.
+* A backup directory containing both the `etcd` snapshot and the resources for the static pods, which were from the same backup. The file names in the directory must be in the following formats: `snapshot_<datetimestamp>.db` and `static_kuberesources_<datetimestamp>.tar.gz`.

 [IMPORTANT]
 ====
@@ -38,16 +38,16 @@ For non-recovery control plane nodes, it is not required to establish SSH connec

 . Establish SSH connectivity to each of the control plane nodes, including the recovery host.
 +
-The Kubernetes API server becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to establish SSH connectivity to each control plane host in a separate terminal.
+`kube-apiserver` becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to establish SSH connectivity to each control plane host in a separate terminal.
 +
 [IMPORTANT]
 ====
 If you do not complete this step, you will not be able to access the control plane hosts to complete the restore procedure, and you will be unable to recover your cluster from this state.
 ====

-. Copy the etcd backup directory to the recovery control plane host.
+. Copy the `etcd` backup directory to the recovery control plane host.
 +
-This procedure assumes that you copied the `backup` directory containing the etcd snapshot and the resources for the static pods to the `/home/core/` directory of your recovery control plane host.
+This procedure assumes that you copied the `backup` directory containing the `etcd` snapshot and the resources for the static pods to the `/home/core/` directory of your recovery control plane host.

 . Stop the static pods on any other control plane nodes.
 +
@@ -58,39 +58,69 @@ You do not need to stop the static pods on the recovery host.

 .. Access a control plane host that is not the recovery host.

-.. Move the existing etcd pod file out of the kubelet manifest directory:
+.. Move the existing etcd pod file out of the kubelet manifest directory by running:
 +
 [source,terminal]
 ----
 $ sudo mv /etc/kubernetes/manifests/etcd-pod.yaml /tmp
 ----

-.. Verify that the etcd pods are stopped.
+.. Verify that the `etcd` pods are stopped by using:
 +
 [source,terminal]
 ----
 $ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard"
 ----
 +
-The output of this command should be empty. If it is not empty, wait a few minutes and check again.
+If the output of this command is not empty, wait a few minutes and check again.

-.. Move the existing Kubernetes API server pod file out of the kubelet manifest directory:
+.. Move the existing `kube-apiserver` file out of the kubelet manifest directory by running:
 +
 [source,terminal]
 ----
 $ sudo mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp
 ----

-.. Verify that the Kubernetes API server pods are stopped.
+.. Verify that the `kube-apiserver` containers are stopped by running:
 +
 [source,terminal]
 ----
 $ sudo crictl ps | grep kube-apiserver | egrep -v "operator|guard"
 ----
 +
-The output of this command should be empty. If it is not empty, wait a few minutes and check again.
+If the output of this command is not empty, wait a few minutes and check again.

-.. Move the etcd data directory to a different location:
+.. Move the existing `kube-controller-manager` file out of the kubelet manifest directory by using:
+
+[source,terminal]
+----
+$ sudo mv /etc/kubernetes/manifests/kube-controller-manager-pod.yaml /tmp
+----
+
+.. Verify that the `kube-controller-manager` containers are stopped by running:
+
+[source,terminal]
+----
+$ sudo crictl ps | grep kube-controller-manager | egrep -v "operator|guard"
+----
+If the output of this command is not empty, wait a few minutes and check again.
+
+.. Move the existing `kube-scheduler` file out of the kubelet manifest directory by using:
+
+[source,terminal]
+----
+$ sudo mv /etc/kubernetes/manifests/kube-scheduler-pod.yaml /tmp
+----
+
+.. Verify that the `kube-scheduler` containers are stopped by using:
+
+[source,terminal]
+----
+$ sudo crictl ps | grep kube-scheduler | egrep -v "operator|guard"
+---- 
+If the output of this command is not empty, wait a few minutes and check again.
+
+.. Move the `etcd` data directory to a different location with the following example:
 +
 [source,terminal]
 ----
@@ -108,7 +138,7 @@ $ sudo mv /var/lib/etcd/ /tmp
 You can check whether the proxy is enabled by reviewing the output of `oc get proxy cluster -o yaml`. The proxy is enabled if the `httpProxy`, `httpsProxy`, and `noProxy` fields have values set.
 ====

-. Run the restore script on the recovery control plane host and pass in the path to the etcd backup directory:
+. Run the restore script on the recovery control plane host and pass in the path to the `etcd` backup directory:
 +
 [source,terminal]
 ----
@@ -144,13 +174,15 @@ starting kube-scheduler-pod.yaml
 static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml
 ----
 +
+
+The cluster-restore.sh script must show that `etcd`, `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods are stopped and then started at the end of the restore process.
+
 [NOTE]
 ====
-The restore process can cause nodes to enter the `NotReady` state if the node certificates were updated after the last etcd backup.
+The restore process can cause nodes to enter the `NotReady` state if the node certificates were updated after the last `etcd` backup.
 ====

 . Check the nodes to ensure they are in the `Ready` state.
-
 .. Run the following command:
 +
 [source,terminal]
@@ -193,7 +225,7 @@ kubelet-client-current.pem              kubelet-server-current.pem

 . Restart the kubelet service on all control plane hosts.

-.. From the recovery host, run the following command:
+.. From the recovery host, run:
 +
 [source,terminal]
 ----
@@ -202,14 +234,14 @@ $ sudo systemctl restart kubelet.service

 .. Repeat this step on all other control plane hosts.

-. Approve the pending CSRs:
+. Approve the pending Certificate Signing Requests (CSRs):
 +
 [NOTE]
 ====
 Clusters with no worker nodes, such as single-node clusters or clusters consisting of three schedulable control plane nodes, will not have any pending CSRs to approve. You can skip all the commands listed in this step.
 ====

-.. Get the list of current CSRs:
+.. Get the list of current CSRs by running:
 +
 [source,terminal]
 ----
@@ -225,10 +257,12 @@ csr-4hl85   13m    kubernetes.io/kube-apiserver-client-kubelet   system:servicea
 csr-zhhhp   3m8s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending <2>
 ...
 ----
-<1> A pending kubelet service CSR (for user-provisioned installations).
+<1> A pending kubelet service 
+
+CSR (for user-provisioned installations).
 <2> A pending `node-bootstrapper` CSR.

-.. Review the details of a CSR to verify that it is valid:
+.. Review the details of a CSR to verify that it is valid by running:
 +
 [source,terminal]
 ----
@@ -236,14 +270,14 @@ $ oc describe csr <csr_name> <1>
 ----
 <1> `<csr_name>` is the name of a CSR from the list of current CSRs.

-.. Approve each valid `node-bootstrapper` CSR:
+.. Approve each valid `node-bootstrapper` CSR by running:
 +
 [source,terminal]
 ----
 $ oc adm certificate approve <csr_name>
 ----

-.. For user-provisioned installations, approve each valid kubelet service CSR:
+.. For user-provisioned installations, approve each valid kubelet service CSR by running:
 +
 [source,terminal]
 ----
@@ -252,7 +286,7 @@ $ oc adm certificate approve <csr_name>

 . Verify that the single member control plane has started successfully.

-.. From the recovery host, verify that the etcd container is running.
+.. From the recovery host, verify that the `etcd` container is running by using:
 +
 [source,terminal]
 ----
@@ -265,7 +299,7 @@ $ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard"
 3ad41b7908e32       36f86e2eeaaffe662df0d21041eb22b8198e0e58abeeae8c743c3e6e977e8009                                                         About a minute ago   Running             etcd                                          0                   7c05f8af362f0
 ----

-.. From the recovery host, verify that the etcd pod is running.
+.. From the recovery host, verify that the `etcd` pod is running by using:
 +
 [source,terminal]
 ----
@@ -279,16 +313,16 @@ NAME                                             READY   STATUS      RESTARTS
 etcd-ip-10-0-143-125.ec2.internal                1/1     Running     1          2m47s
 ----
 +
-If the status is `Pending`, or the output lists more than one running etcd pod, wait a few minutes and check again.
+If the status is `Pending`, or the output lists more than one running `etcd` pod, wait a few minutes and check again.

 . If you are using the `OVNKubernetes` network plugin, you must restart `ovnkube-controlplane` pods.
-.. Delete all of the `ovnkube-controlplane` pods by running the following command:
+.. Delete all of the `ovnkube-controlplane` pods by running:
 +
 [source,terminal]
 ----
 $ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-plane
 ----
-.. Verify that all of the `ovnkube-controlplane` pods were redeployed by running the following command:
+.. Verify that all of the `ovnkube-controlplane` pods were redeployed by using:
 +
 [source,terminal]
 ----
@@ -312,7 +346,7 @@ Validating and mutating admission webhooks can reject pods. If you add any addit
 Alternatively, you can temporarily set the `failurePolicy` to `Ignore` while restoring the cluster state. After the cluster state is restored successfully, you can set the `failurePolicy` to `Fail`.
 ====

-.. Remove the northbound database (nbdb) and southbound database (sbdb). Access the recovery host and the remaining control plane nodes by using Secure Shell (SSH) and run the following command:
+.. Remove the northbound database (nbdb) and southbound database (sbdb). Access the recovery host and the remaining control plane nodes by using Secure Shell (SSH) and run:
 +
 [source,terminal]
 ----
@@ -334,7 +368,7 @@ $ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector
 ----
 +

-.. Verify that the `ovnkube-node` pod is running again with the following command:
+.. Verify that the `ovnkube-node` pod is running again with:
 +
 [source,terminal]
 ----
@@ -346,7 +380,7 @@ $ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=sp
 It might take several minutes for the pods to restart.
 ====

-. Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and etcd automatically scales up.
+. Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and `etcd` automatically scales up.
 +
 ** If you use a user-provisioned bare metal installation, you can re-create a control plane machine by using the same method that you used to originally create it. For more information, see "Installing a user-provisioned cluster on bare metal".
 +
@@ -386,7 +420,7 @@ clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us
 ----
 <1> This is the control plane machine for the lost control plane host, `ip-10-0-131-183.ec2.internal`.

-.. Save the machine configuration to a file on your file system:
+.. Save the machine configuration to a file on your file system by running:
 +
 [source,terminal]
 ----
@@ -399,7 +433,7 @@ $ oc get machine clustername-8qw5l-master-0 \ <1>

 .. Edit the `new-master-machine.yaml` file that was created in the previous step to assign a new name and remove unnecessary fields.

-... Remove the entire `status` section:
+... Remove the entire `status` section by running:
 +
 [source,terminal]
 ----
@@ -431,7 +465,7 @@ status:
    kind: AWSMachineProviderStatus
 ----

-... Change the `metadata.name` field to a new name.
+... Change the `metadata.name` field to a new name by running:
 +
 It is recommended to keep the same base name as the old machine and change the ending number to the next available number. In this example, `clustername-8qw5l-master-0` is changed to `clustername-8qw5l-master-3`:
 +
@@ -445,14 +479,14 @@ metadata:
  ...
 ----

-... Remove the `spec.providerID` field:
+... Remove the `spec.providerID` field by running:
 +
 [source,terminal]
 ----
 providerID: aws:///us-east-1a/i-0fdb85790d76d0c3f
 ----

-... Remove the `metadata.annotations` and `metadata.generation` fields:
+... Remove the `metadata.annotations` and `metadata.generation` fields by running:
 +
 [source,terminal]
 ----
@@ -462,7 +496,7 @@ annotations:
 generation: 2
 ----

-... Remove the `metadata.resourceVersion` and `metadata.uid` fields:
+... Remove the `metadata.resourceVersion` and `metadata.uid` fields by running:
 +
 [source,terminal]
 ----
@@ -470,7 +504,7 @@ resourceVersion: "13291"
 uid: a282eb70-40a2-4e89-8009-d05dd420d31a
 ----

-.. Delete the machine of the lost control plane host:
+.. Delete the machine of the lost control plane host by running:
 +
 [source,terminal]
 ----
@@ -478,7 +512,7 @@ $ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 <1>
 ----
 <1> Specify the name of the control plane machine for the lost control plane host.

-.. Verify that the machine was deleted:
+.. Verify that the machine was deleted by running:
 +
 [source,terminal]
 ----
@@ -497,14 +531,14 @@ clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us
 clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
 ----

-.. Create a machine by using the `new-master-machine.yaml` file:
+.. Create a machine by using the `new-master-machine.yaml` file by running:
 +
 [source,terminal]
 ----
 $ oc apply -f new-master-machine.yaml
 ----

-.. Verify that the new machine has been created:
+.. Verify that the new machine has been created by running:
 +
 [source,terminal]
 ----
@@ -525,11 +559,11 @@ clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1
 ----
 <1> The new machine, `clustername-8qw5l-master-3` is being created and is ready after the phase changes from `Provisioning` to `Running`.
 +
-It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
+It might take a few minutes for the new machine to be created. The `etcd` cluster Operator will automatically sync when the machine or node returns to a healthy state.

 .. Repeat these steps for each lost control plane host that is not the recovery host.

-. Turn off the quorum guard by entering the following command:
+. Turn off the quorum guard by entering:
 +
 [source,terminal]
 ----
@@ -538,16 +572,16 @@ $ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides":
 +
 This command ensures that you can successfully re-create secrets and roll out the static pods.

-. In a separate terminal window within the recovery host, export the recovery `kubeconfig` file by running the following command:
+. In a separate terminal window within the recovery host, export the recovery `kubeconfig` file by running:
 +
 [source,terminal]
 ----
 $ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig
 ----

-. Force etcd redeployment.
+. Force `etcd` redeployment.
 +
-In the same terminal window where you exported the recovery `kubeconfig` file, run the following command:
+In the same terminal window where you exported the recovery `kubeconfig` file, run:
 +
 [source,terminal]
 ----
@@ -555,16 +589,16 @@ $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$(
 ----
 <1> The `forceRedeploymentReason` value must be unique, which is why a timestamp is appended.
 +
-When the etcd cluster Operator performs a redeployment, the existing nodes are started with new pods similar to the initial bootstrap scale up.
+When the `etcd` cluster Operator performs a redeployment, the existing nodes are started with new pods similar to the initial bootstrap scale up.

-. Turn the quorum guard back on by entering the following command:
+. Turn the quorum guard back on by entering:
 +
 [source,terminal]
 ----
 $ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'
 ----

-. You can verify that the `unsupportedConfigOverrides` section is removed from the object by entering this command:
+. You can verify that the `unsupportedConfigOverrides` section is removed from the object by running:
 +
 [source,terminal]
 ----
@@ -573,14 +607,14 @@ $ oc get etcd/cluster -oyaml

 . Verify all nodes are updated to the latest revision.
 +
-In a terminal that has access to the cluster as a `cluster-admin` user, run the following command:
+In a terminal that has access to the cluster as a `cluster-admin` user, run:
 +
 [source,terminal]
 ----
 $ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
 ----
 +
-Review the `NodeInstallerProgressing` status condition for etcd to verify that all nodes are at the latest revision. The output shows `AllNodesAtLatestRevision` upon successful update:
+Review the `NodeInstallerProgressing` status condition for `etcd` to verify that all nodes are at the latest revision. The output shows `AllNodesAtLatestRevision` upon successful update:
 +
 [source,terminal]
 ----
@@ -591,11 +625,11 @@ AllNodesAtLatestRevision
 +
 If the output includes multiple revision numbers, such as `2 nodes are at revision 6; 1 nodes are at revision 7`, this means that the update is still in progress. Wait a few minutes and try again.

-. After etcd is redeployed, force new rollouts for the control plane. The Kubernetes API server will reinstall itself on the other nodes because the kubelet is connected to API servers using an internal load balancer.
+. After `etcd` is redeployed, force new rollouts for the control plane. `kube-apiserver` will reinstall itself on the other nodes because the kubelet is connected to API servers using an internal load balancer.
 +
-In a terminal that has access to the cluster as a `cluster-admin` user, run the following commands.
+In a terminal that has access to the cluster as a `cluster-admin` user, run:

-.. Force a new rollout for the Kubernetes API server:
+.. Force a new rollout for `kube-apiserver`:
 +
 [source,terminal]
 ----
@@ -620,14 +654,14 @@ AllNodesAtLatestRevision
 +
 If the output includes multiple revision numbers, such as `2 nodes are at revision 6; 1 nodes are at revision 7`, this means that the update is still in progress. Wait a few minutes and try again.

-.. Force a new rollout for the Kubernetes controller manager:
+.. Force a new rollout for the Kubernetes controller manager by running the following command:
 +
 [source,terminal]
 ----
 $ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
 ----
 +
-Verify all nodes are updated to the latest revision.
+Verify all nodes are updated to the latest revision by running:
 +
 [source,terminal]
 ----
@@ -645,14 +679,14 @@ AllNodesAtLatestRevision
 +
 If the output includes multiple revision numbers, such as `2 nodes are at revision 6; 1 nodes are at revision 7`, this means that the update is still in progress. Wait a few minutes and try again.

-.. Force a new rollout for the Kubernetes scheduler:
+.. Force a new rollout for the `kube-scheduler` by running:
 +
 [source,terminal]
 ----
 $ oc patch kubescheduler cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
 ----
 +
-Verify all nodes are updated to the latest revision.
+Verify all nodes are updated to the latest revision by using:
 +
 [source,terminal]
 ----
@@ -687,7 +721,7 @@ etcd-ip-10-0-154-194.ec2.internal                2/2     Running     0
 etcd-ip-10-0-173-171.ec2.internal                2/2     Running     0          9h
 ----

-To ensure that all workloads return to normal operation following a recovery procedure, restart each pod that stores Kubernetes API information. This includes {product-title} components such as routers, Operators, and third-party components.
+To ensure that all workloads return to normal operation following a recovery procedure, restart each pod that stores `kube-apiserver` information. This includes {product-title} components such as routers, Operators, and third-party components.

 [NOTE]
 ====