1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

BZ:1994596 - Adding Clearing CRI-O storage section

This commit is contained in:
Kelly Brown
2021-09-17 10:34:31 -04:00
committed by openshift-cherrypick-robot
parent 7525a1f5df
commit 3f6c31ea95
2 changed files with 126 additions and 0 deletions

View File

@@ -0,0 +1,123 @@
[id="cleaning-crio-storage"]
= Cleaning CRI-O storage
You can manually clear the CRI-O ephemeral storage if you experience the following issues:
* A node cannot run on any pods and this error appears:
[source, terminal]
+
----
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container XXX: error recreating the missing symlinks: error reading name of symlink for XXX: open /var/lib/containers/storage/overlay/XXX/link: no such file or directory
----
+
* You cannot create a new container on a working node and the “cant stat lower layer” error appears:
[source, terminal]
+
----
can't stat lower layer ... because it does not exist. Going through storage to recreate the missing symlinks.
----
+
* Your node is in the `NotReady` state after a cluster upgrade or if you attempt to reboot it.
* The container runtime implementation (`crio`) is not working properly.
* You are unable to start a debug shell on the node using `oc debug node/<nodename>` because the container runtime instance (`crio`) is not working.
Follow this process to completely wipe the CRI-O storage and resolve the errors.
.Prerequisites:
* You have access to the cluster as a user with the `cluster-admin` role.
* You have installed the OpenShift CLI (`oc`).
.Procedure
. Use `cordon` on the node. This is to avoid any workload getting scheduled if the node gets into the `Ready` status. You will know that scheduling is disabled when `SchedulingDisabled` is in your Status section:
[source, terminal]
+
----
$ oc adm cordon <nodename>
----
+
. Drain the node as the cluster-admin user:
[source, terminal]
+
----
$ oc adm drain <nodename> --ignore-daemonsets --delete-local-data
----
+
. When the node returns, connect back to the node via SSH or Console. Then connect to the root user:
[source, terminal]
+
----
$ ssh core@node1.example.com
$ sudo -i
----
+
. Manually stop the kublet:
[source, terminal]
+
----
# systemctl stop kubelet
----
+
. Stop the containers and pods:
[source, terminal]
+
----
# crictl rmp -fa
----
+
. Manually stop the crio services:
[source, terminal]
+
----
# systemctl stop crio
----
+
. After you run those commands, you can completely wipe the ephemeral storage:
[source, terminal]
+
----
# crio wipe -f
----
+
. Start the crio and kublet service:
[source, terminal]
+
----
# systemctl start crio
# systemctl start kubelet
----
+
. You will know if the clean up worked if the crio and kublet services are started, and the node is in the `Ready` status:
[source, terminal]
+
----
$ oc get nodes
----
+
.Example output
[source, terminal]
+
----
NAME STATUS ROLES AGE VERSION
ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready, SchedulingDisabled master 133m v1.22.0-rc.0+75ee307
----
+
. Mark the node schedulable. You will know that the scheduling is enabled when `SchedulingDisabled` is no longer in status:
[source, terminal]
+
----
$ oc adm uncordon <nodename>
----
+
.Example output
[source, terminal]
+
----
NAME STATUS ROLES AGE VERSION
ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready master 133m v1.22.0-rc.0+75ee307
----
+

View File

@@ -13,3 +13,6 @@ include::modules/verifying-crio-status.adoc[leveloffset=+1]
// Gathering CRI-O journald unit logs
include::modules/gathering-crio-logs.adoc[leveloffset=+1]
// Cleaning CRI-O storage
include::modules/cleaning-crio-storage.adoc[leveloffset=+1]