mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
178 lines
5.8 KiB
Plaintext
178 lines
5.8 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * nodes/nodes/nodes-nodes-replace-control-plane.adoc
|
|
|
|
:_mod-docs-content-type: PROCEDURE
|
|
[id="removing-etcd-member_{context}"]
|
|
= Removing the unhealthy etcd member
|
|
|
|
Begin removing the failed control plane node by first removing the unhealthy etcd member.
|
|
|
|
.Procedure
|
|
|
|
. List etcd pods by running the following command and make note of a pod that is not on the affected node:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc -n openshift-etcd get pods -l k8s-app=etcd -o wide
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none>
|
|
etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none>
|
|
etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>
|
|
----
|
|
|
|
. Connect to a running etcd container by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc rsh -n openshift-etcd <etcd_pod>
|
|
----
|
|
+
|
|
Replace `<etcd_pod>` with the name of an etcd pod associated with one of the healthy nodes.
|
|
+
|
|
.Example command
|
|
[source,terminal]
|
|
----
|
|
$ oc rsh -n openshift-etcd etcd-openshift-control-plane-0
|
|
----
|
|
|
|
. View the etcd member list by running the following command. Make note of the ID and the name of the unhealthy etcd member because these values are required later.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
sh-4.2# etcdctl member list -w table
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
| 6fc1e7c9db35841d | started | openshift-control-plane-2 | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
|
|
| 757b6793e2408b6c | started | openshift-control-plane-1 | https://10.0.164.97:2380 | https://10.0.164.97:2379 |
|
|
| ca8c2990a0aa29d1 | started | openshift-control-plane-0 | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
----
|
|
+
|
|
[IMPORTANT]
|
|
====
|
|
The `etcdctl endpoint health` command will list the removed member until the replacement is complete and the new member is added.
|
|
====
|
|
|
|
. Remove the unhealthy etcd member by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
sh-4.2# etcdctl member remove <unhealthy_member_id>
|
|
----
|
|
+
|
|
Replace `<unhealthy_member_id>` with the ID of the etcd member on the unhealthy node.
|
|
+
|
|
.Example command
|
|
[source,terminal]
|
|
----
|
|
sh-4.2# etcdctl member remove 6fc1e7c9db35841d
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
Member 6fc1e7c9db35841d removed from cluster b23536c33f2cdd1b
|
|
----
|
|
|
|
. View the member list again by running the following command and verify that the member was removed:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
sh-4.2# etcdctl member list -w table
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
| 757b6793e2408b6c | started | openshift-control-plane-1 | https://10.0.164.97:2380 | https://10.0.164.97:2379 |
|
|
| ca8c2990a0aa29d1 | started | openshift-control-plane-0 | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
|
|
+------------------+---------+------------------------------+---------------------------+---------------------------+
|
|
----
|
|
+
|
|
[IMPORTANT]
|
|
====
|
|
After you remove the member, the cluster might be unreachable for a short time while the remaining etcd instances reboot.
|
|
====
|
|
|
|
. Exit the rsh session into the etcd pod by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
sh-4.2# exit
|
|
----
|
|
|
|
. Turn off the etcd quorum guard by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'
|
|
----
|
|
+
|
|
This command ensures that you can successfully re-create secrets and roll out the static pods.
|
|
|
|
. List the secrets for the removed, unhealthy etcd member by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get secrets -n openshift-etcd | grep <node_name>
|
|
----
|
|
+
|
|
Replace `<node_name>` with the name of the failed node whose etcd member you removed.
|
|
+
|
|
.Example command
|
|
[source,terminal]
|
|
----
|
|
$ oc get secrets -n openshift-etcd | grep openshift-control-plane-2
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
etcd-peer-openshift-control-plane-2 kubernetes.io/tls 2 134m
|
|
etcd-serving-metrics-openshift-control-plane-2 kubernetes.io/tls 2 134m
|
|
etcd-serving-openshift-control-plane-2 kubernetes.io/tls 2 134m
|
|
----
|
|
|
|
. Delete the secrets associated with the affected node that was removed:
|
|
|
|
.. Delete the peer secret by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc delete secret -n openshift-etcd etcd-peer-<node_name>
|
|
----
|
|
+
|
|
Replace `<node_name>` with the name of the affected node.
|
|
|
|
.. Delete the serving secret by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc delete secret -n openshift-etcd etcd-serving-<node_name>
|
|
----
|
|
+
|
|
Replace `<node_name>` with the name of the affected node.
|
|
|
|
.. Delete the metrics secret by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc delete secret -n openshift-etcd etcd-serving-metrics-<node_name> <1>
|
|
----
|
|
+
|
|
Replace `<node_name>` with the name of the affected node.
|