diff --git a/backup_and_restore/replacing-unhealthy-etcd-member.adoc b/backup_and_restore/replacing-unhealthy-etcd-member.adoc index 0b5dcc8a33..9546604450 100644 --- a/backup_and_restore/replacing-unhealthy-etcd-member.adoc +++ b/backup_and_restore/replacing-unhealthy-etcd-member.adoc @@ -7,7 +7,7 @@ toc::[] This document describes the process to replace a single unhealthy etcd member. -This process depends on whether the etcd member is unhealthy because the machine is not running or the node is not ready, or whether it is unhealthy because the etcd Pod is crashlooping. +This process depends on whether the etcd member is unhealthy because the machine is not running or the node is not ready, or whether it is unhealthy because the etcd pod is crashlooping. [NOTE] ==== @@ -33,10 +33,10 @@ include::modules/restore-determine-state-etcd-member.adoc[leveloffset=+1] Depending on the state of your unhealthy etcd member, use one of the following procedures: * xref:../backup_and_restore/replacing-unhealthy-etcd-member.adoc#restore-replace-stopped-etcd-member_replacing-unhealthy-etcd-member[Replacing an unhealthy etcd member whose machine is not running or whose node is not ready] -* xref:../backup_and_restore/replacing-unhealthy-etcd-member.adoc#restore-replace-crashlooping-etcd-member_replacing-unhealthy-etcd-member[Replacing an unhealthy etcd member whose etcd Pod is crashlooping] +* xref:../backup_and_restore/replacing-unhealthy-etcd-member.adoc#restore-replace-crashlooping-etcd-member_replacing-unhealthy-etcd-member[Replacing an unhealthy etcd member whose etcd pod is crashlooping] // Replacing an unhealthy etcd member whose machine is not running or whose node is not ready include::modules/restore-replace-stopped-etcd-member.adoc[leveloffset=+2] -// Replacing an unhealthy etcd member whose etcd Pod is crashlooping +// Replacing an unhealthy etcd member whose etcd pod is crashlooping include::modules/restore-replace-crashlooping-etcd-member.adoc[leveloffset=+2] diff --git a/modules/dr-restoring-cluster-state.adoc b/modules/dr-restoring-cluster-state.adoc index 142424ab73..097758cd2e 100644 --- a/modules/dr-restoring-cluster-state.adoc +++ b/modules/dr-restoring-cluster-state.adoc @@ -41,7 +41,7 @@ It is not required to manually stop the pods on the recovery host. The recovery .. Access a control plane host that is not the recovery host. -.. Move the existing etcd Pod file out of the kubelet manifest directory: +.. Move the existing etcd pod file out of the kubelet manifest directory: + [source,terminal] ---- @@ -57,7 +57,7 @@ It is not required to manually stop the pods on the recovery host. The recovery + The output of this command should be empty. If it is not empty, wait a few minutes and check again. -.. Move the existing Kubernetes API server Pod file out of the kubelet manifest directory: +.. Move the existing Kubernetes API server pod file out of the kubelet manifest directory: + [source,terminal] ---- @@ -154,7 +154,7 @@ static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml 3ad41b7908e32 36f86e2eeaaffe662df0d21041eb22b8198e0e58abeeae8c743c3e6e977e8009 About a minute ago Running etcd 0 7c05f8af362f0 ---- -.. From the recovery host, verify that the etcd Pod is running. +.. From the recovery host, verify that the etcd pod is running. + [source,terminal] ---- diff --git a/modules/graceful-restart.adoc b/modules/graceful-restart.adoc index b8c6415bba..339c505f93 100644 --- a/modules/graceful-restart.adoc +++ b/modules/graceful-restart.adoc @@ -39,7 +39,7 @@ ip-10-0-170-223.ec2.internal Ready master 75m v1.19.0 ip-10-0-211-16.ec2.internal Ready master 75m v1.19.0 ---- -. If the master nodes are _not_ ready, then check whether there are any pending certificate signing requests that must be approved. +. If the master nodes are _not_ ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved. .. Get the list of current CSRs: + @@ -80,7 +80,7 @@ ip-10-0-182-134.ec2.internal Ready worker 64m v1.19.0 ip-10-0-250-100.ec2.internal Ready worker 64m v1.19.0 ---- -. If the worker nodes are _not_ ready, then check whether there are any pending certificate signing requests that must be approved. +. If the worker nodes are _not_ ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved. .. Get the list of current CSRs: + @@ -129,7 +129,7 @@ etcd 4.6.0 True False F ... ---- -.. Check that all nodes are in the ready state: +.. Check that all nodes are in the `Ready` state: + [source,terminal] ---- diff --git a/modules/restore-determine-state-etcd-member.adoc b/modules/restore-determine-state-etcd-member.adoc index 7aff463f0b..8773402843 100644 --- a/modules/restore-determine-state-etcd-member.adoc +++ b/modules/restore-determine-state-etcd-member.adoc @@ -8,7 +8,7 @@ The steps to replace an unhealthy etcd member depend on which of the following states your etcd member is in: * The machine is not running or the node is not ready -* The etcd Pod is crashlooping +* The etcd pod is crashlooping This procedure determines which state your etcd member is in. This enables you to know which procedure to follow to replace the unhealthy etcd member. @@ -79,9 +79,9 @@ ip-10-0-131-183.ec2.internal NotReady master 122m v1.19.0 <1> If the *node is not ready*, then follow the _Replacing an unhealthy etcd member whose machine is not running or whose node is not ready_ procedure. -. Determine if the *etcd Pod is crashlooping*. +. Determine if the *etcd pod is crashlooping*. + -If the machine is running and the node is ready, then check whether the etcd Pod is crashlooping. +If the machine is running and the node is ready, then check whether the etcd pod is crashlooping. .. Verify that all master nodes are listed as `Ready`: + @@ -99,7 +99,7 @@ ip-10-0-164-97.ec2.internal Ready master 6h13m v1.19.0 ip-10-0-154-204.ec2.internal Ready master 6h13m v1.19.0 ---- -.. Check whether the status of an etcd Pod is either `Error` or `CrashloopBackoff`: +.. Check whether the status of an etcd pod is either `Error` or `CrashloopBackoff`: + [source,terminal] ---- @@ -113,8 +113,8 @@ etcd-ip-10-0-131-183.ec2.internal 2/3 Error 7 etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 6h6m etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 6h6m ---- -<1> Since this status of this Pod is `Error`, then the *etcd Pod is crashlooping*. +<1> Since this status of this pod is `Error`, then the *etcd pod is crashlooping*. + // TODO: xref -If the *etcd Pod is crashlooping*, then follow the _Replacing an unhealthy etcd member whose etcd Pod is crashlooping_ procedure. +If the *etcd pod is crashlooping*, then follow the _Replacing an unhealthy etcd member whose etcd pod is crashlooping_ procedure. diff --git a/modules/restore-replace-crashlooping-etcd-member.adoc b/modules/restore-replace-crashlooping-etcd-member.adoc index 4760885116..000ee322ee 100644 --- a/modules/restore-replace-crashlooping-etcd-member.adoc +++ b/modules/restore-replace-crashlooping-etcd-member.adoc @@ -3,14 +3,14 @@ // * backup_and_restore/replacing-unhealthy-etcd-member.adoc [id="restore-replace-crashlooping-etcd-member_{context}"] -= Replacing an unhealthy etcd member whose etcd Pod is crashlooping += Replacing an unhealthy etcd member whose etcd pod is crashlooping -This procedure details the steps to replace an etcd member that is unhealthy because the etcd Pod is crashlooping. +This procedure details the steps to replace an etcd member that is unhealthy because the etcd pod is crashlooping. .Prerequisites * You have identified the unhealthy etcd member. -* You have verified that the etcd Pod is crashlooping. +* You have verified that the etcd pod is crashlooping. * You have access to the cluster as a user with the `cluster-admin` role. * You have taken an etcd backup. + @@ -40,7 +40,7 @@ $ oc debug node/ip-10-0-131-183.ec2.internal <1> sh-4.2# chroot /host ---- -.. Move the existing etcd Pod file out of the kubelet manifest directory: +.. Move the existing etcd pod file out of the kubelet manifest directory: + [source,terminal] ---- @@ -63,7 +63,7 @@ You can now exit the node shell. . Remove the unhealthy member. -.. Choose a Pod that is _not_ on the affected node. +.. Choose a pod that is _not_ on the affected node. + In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: + @@ -80,7 +80,7 @@ etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 6h6m ---- -.. Connect to the running etcd container, passing in the name of a Pod that is not on the affected node. +.. Connect to the running etcd container, passing in the name of a pod that is not on the affected node. + In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: + diff --git a/modules/restore-replace-stopped-etcd-member.adoc b/modules/restore-replace-stopped-etcd-member.adoc index 3bf9df2ed2..5f3ffc1455 100644 --- a/modules/restore-replace-stopped-etcd-member.adoc +++ b/modules/restore-replace-stopped-etcd-member.adoc @@ -23,7 +23,7 @@ It is important to take an etcd backup before performing this procedure so that . Remove the unhealthy member. -.. Choose a Pod that is _not_ on the affected node: +.. Choose a pod that is _not_ on the affected node: + In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: + @@ -40,7 +40,7 @@ etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 124m ---- -.. Connect to the running etcd container, passing in the name of a Pod that is not on the affected node: +.. Connect to the running etcd container, passing in the name of a pod that is not on the affected node: + In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: +