1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00
Files
openshift-docs/modules/nodes-remove-unhealthy-etcd-member.adoc
2025-12-01 22:03:12 +00:00

178 lines
5.8 KiB
Plaintext

// Module included in the following assemblies:
//
// * nodes/nodes/nodes-nodes-replace-control-plane.adoc
:_mod-docs-content-type: PROCEDURE
[id="removing-etcd-member_{context}"]
= Removing the unhealthy etcd member
Begin removing the failed control plane node by first removing the unhealthy etcd member.
.Procedure
. List etcd pods by running the following command and make note of a pod that is not on the affected node:
+
[source,terminal]
----
$ oc -n openshift-etcd get pods -l k8s-app=etcd -o wide
----
+
.Example output
[source,terminal]
----
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none>
etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none>
etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>
----
. Connect to a running etcd container by running the following command:
+
[source,terminal]
----
$ oc rsh -n openshift-etcd <etcd_pod>
----
+
Replace `<etcd_pod>` with the name of an etcd pod associated with one of the healthy nodes.
+
.Example command
[source,terminal]
----
$ oc rsh -n openshift-etcd etcd-openshift-control-plane-0
----
. View the etcd member list by running the following command. Make note of the ID and the name of the unhealthy etcd member because these values are required later.
+
[source,terminal]
----
sh-4.2# etcdctl member list -w table
----
+
.Example output
[source,terminal]
----
+------------------+---------+------------------------------+---------------------------+---------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 6fc1e7c9db35841d | started | openshift-control-plane-2 | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
| 757b6793e2408b6c | started | openshift-control-plane-1 | https://10.0.164.97:2380 | https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | openshift-control-plane-0 | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+
----
+
[IMPORTANT]
====
The `etcdctl endpoint health` command will list the removed member until the replacement is complete and the new member is added.
====
. Remove the unhealthy etcd member by running the following command:
+
[source,terminal]
----
sh-4.2# etcdctl member remove <unhealthy_member_id>
----
+
Replace `<unhealthy_member_id>` with the ID of the etcd member on the unhealthy node.
+
.Example command
[source,terminal]
----
sh-4.2# etcdctl member remove 6fc1e7c9db35841d
----
+
.Example output
[source,terminal]
----
Member 6fc1e7c9db35841d removed from cluster b23536c33f2cdd1b
----
. View the member list again by running the following command and verify that the member was removed:
+
[source,terminal]
----
sh-4.2# etcdctl member list -w table
----
+
.Example output
[source,terminal]
----
+------------------+---------+------------------------------+---------------------------+---------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 757b6793e2408b6c | started | openshift-control-plane-1 | https://10.0.164.97:2380 | https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | openshift-control-plane-0 | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+
----
+
[IMPORTANT]
====
After you remove the member, the cluster might be unreachable for a short time while the remaining etcd instances reboot.
====
. Exit the rsh session into the etcd pod by running the following command:
+
[source,terminal]
----
sh-4.2# exit
----
. Turn off the etcd quorum guard by running the following command:
+
[source,terminal]
----
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'
----
+
This command ensures that you can successfully re-create secrets and roll out the static pods.
. List the secrets for the removed, unhealthy etcd member by running the following command:
+
[source,terminal]
----
$ oc get secrets -n openshift-etcd | grep <node_name>
----
+
Replace `<node_name>` with the name of the failed node whose etcd member you removed.
+
.Example command
[source,terminal]
----
$ oc get secrets -n openshift-etcd | grep openshift-control-plane-2
----
+
.Example output
[source,terminal]
----
etcd-peer-openshift-control-plane-2 kubernetes.io/tls 2 134m
etcd-serving-metrics-openshift-control-plane-2 kubernetes.io/tls 2 134m
etcd-serving-openshift-control-plane-2 kubernetes.io/tls 2 134m
----
. Delete the secrets associated with the affected node that was removed:
.. Delete the peer secret by running the following command:
+
[source,terminal]
----
$ oc delete secret -n openshift-etcd etcd-peer-<node_name>
----
+
Replace `<node_name>` with the name of the affected node.
.. Delete the serving secret by running the following command:
+
[source,terminal]
----
$ oc delete secret -n openshift-etcd etcd-serving-<node_name>
----
+
Replace `<node_name>` with the name of the affected node.
.. Delete the metrics secret by running the following command:
+
[source,terminal]
----
$ oc delete secret -n openshift-etcd etcd-serving-metrics-<node_name> <1>
----
+
Replace `<node_name>` with the name of the affected node.