From 62d34ad170ec9184ecadd5737ffe85418149bb82 Mon Sep 17 00:00:00 2001 From: aireilly Date: Fri, 25 Aug 2023 10:56:22 +0100 Subject: [PATCH] updates for OCPBUGS-18111 --- .../sno-clusters-reboot-without-drain.adoc | 23 +++++++++++++++++++ modules/ztp-sno-node-reboot-scenarios.adoc | 23 ------------------- nodes/nodes/nodes-nodes-working.adoc | 11 ++++++--- ...ference-cluster-configuration-for-vdu.adoc | 5 ---- 4 files changed, 31 insertions(+), 31 deletions(-) create mode 100644 modules/sno-clusters-reboot-without-drain.adoc delete mode 100644 modules/ztp-sno-node-reboot-scenarios.adoc diff --git a/modules/sno-clusters-reboot-without-drain.adoc b/modules/sno-clusters-reboot-without-drain.adoc new file mode 100644 index 0000000000..d8993e215b --- /dev/null +++ b/modules/sno-clusters-reboot-without-drain.adoc @@ -0,0 +1,23 @@ +// Module included in the following assemblies: +// +// * nodes/nodes/nodes-nodes-working.adoc + +:_content-type: CONCEPT +[id="sno-clusters-reboot-without-drain_{context}"] += Handling errors in {sno} clusters when the node reboots without draining application pods + +In {sno} clusters and in {product-title} clusters in general, a situation can arise where a node reboot occurs without first draining the node. This can occur where an application pod requesting devices fails with the `UnexpectedAdmissionError` error. `Deployment`, `ReplicaSet`, or `DaemonSet` errors are reported because the application pods that require those devices start before the pod serving those devices. You cannot control the order of pod restarts. + +While this behavior is to be expected, it can cause a pod to remain on the cluster even though it has failed to deploy successfully. The pod continues to report `UnexpectedAdmissionError`. This issue is mitigated by the fact that application pods are typically included in a `Deployment`, `ReplicaSet`, or `DaemonSet`. If a pod is in this error state, it is of little concern because another instance should be running. Belonging to a `Deployment`, `ReplicaSet`, or `DaemonSet` guarantees the successful creation and execution of subsequent pods and ensures the successful deployment of the application. + +There is ongoing work upstream to ensure that such pods are gracefully terminated. Until that work is resolved, run the following command for a {sno} cluster to remove the failed pods: + +[source,terminal,subs="+quotes"] +---- +$ oc delete pods --field-selector status.phase=Failed -n __ +---- + +[NOTE] +==== +The option to drain the node is unavailable for {sno} clusters. +==== diff --git a/modules/ztp-sno-node-reboot-scenarios.adoc b/modules/ztp-sno-node-reboot-scenarios.adoc deleted file mode 100644 index d570224209..0000000000 --- a/modules/ztp-sno-node-reboot-scenarios.adoc +++ /dev/null @@ -1,23 +0,0 @@ -// Module included in the following assemblies: -// -// * scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc - -:_content-type: CONCEPT -[id="ztp-sno-node-reboot-scenarios_{context}"] -= SNO node reboot scenario - -On a {sno-caps} cluster and in {product-title} clusters generally a situation might arise in case a node reboot occurs without node drain where an application pod requesting devices fails with the `UnexpectedAdmissionError` error. deployment, replicaset, or daemonset error is reported due to the fact that application pods requiring devices can start before the pod serving those devices, as there is no way to control the order of pod restarts. - -While this behavior is expected, it can lead to a pod remaining on the cluster even when it has failed to deploy successfully and will continue to report `UnexpectedAdmissionError`. The presence of this issue is mitigated since application pods are typically included in a deployment, replicaset, or daemonset. Having a pod in this state is of little concern as another instance should be running. Being part of a deployment, replicaset, or daemonset guarantees the successful creation and execution of subsequent pods and ensures the successful deployment of the application. - -There is ongoing work upstream to ensure that such pods are gracefully terminated. Until that is resolved run the following command in a {sno-caps} deployment to remove the failed pods: - -[source,terminal] ----- -$ kubectl delete pods --field-selector status.phase=Failed -n ----- - -[NOTE] -==== -The option to drain the node is unavailable in a {sno-caps} deployment. -==== \ No newline at end of file diff --git a/nodes/nodes/nodes-nodes-working.adoc b/nodes/nodes/nodes-nodes-working.adoc index aa822e0d68..04636d24ed 100644 --- a/nodes/nodes/nodes-nodes-working.adoc +++ b/nodes/nodes/nodes-nodes-working.adoc @@ -6,20 +6,26 @@ include::_attributes/common-attributes.adoc[] toc::[] -As an administrator, you can perform a number of tasks to make your clusters more efficient. +As an administrator, you can perform several tasks to make your clusters more efficient. // The following include statements pull in the module files that comprise // the assembly. Include any combination of concept, procedure, or reference // modules required to cover the user story. You can also include other // assemblies. - include::modules/nodes-nodes-working-evacuating.adoc[leveloffset=+1] include::modules/nodes-nodes-working-updating.adoc[leveloffset=+1] include::modules/nodes-nodes-working-marking.adoc[leveloffset=+1] +include::modules/sno-clusters-reboot-without-drain.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes] + == Deleting nodes include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2] @@ -31,4 +37,3 @@ include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2] see xref:../../machine_management/manually-scaling-machineset.adoc#machineset-manually-scaling-manually-scaling-machineset[Manually scaling a MachineSet]. include::modules/nodes-nodes-working-deleting-bare-metal.adoc[leveloffset=+2] - diff --git a/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc b/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc index 515adba9cf..1b2f6199f5 100644 --- a/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc +++ b/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc @@ -86,12 +86,7 @@ include::modules/ztp-sno-du-configuring-lvms.adoc[leveloffset=+2] include::modules/ztp-sno-du-disabling-network-diagnostics.adoc[leveloffset=+2] -include::modules/ztp-sno-node-reboot-scenarios.adoc[leveloffset=+2] - [role="_additional-resources"] .Additional resources -* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes -] - * xref:../../scalability_and_performance/ztp_far_edge/ztp-deploying-far-edge-sites.adoc#ztp-deploying-far-edge-sites[Deploying far edge sites using ZTP]