1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

updates for OCPBUGS-18111

This commit is contained in:
aireilly
2023-08-25 10:56:22 +01:00
committed by openshift-cherrypick-robot
parent 960b450e67
commit 62d34ad170
4 changed files with 31 additions and 31 deletions

View File

@@ -0,0 +1,23 @@
// Module included in the following assemblies:
//
// * nodes/nodes/nodes-nodes-working.adoc
:_content-type: CONCEPT
[id="sno-clusters-reboot-without-drain_{context}"]
= Handling errors in {sno} clusters when the node reboots without draining application pods
In {sno} clusters and in {product-title} clusters in general, a situation can arise where a node reboot occurs without first draining the node. This can occur where an application pod requesting devices fails with the `UnexpectedAdmissionError` error. `Deployment`, `ReplicaSet`, or `DaemonSet` errors are reported because the application pods that require those devices start before the pod serving those devices. You cannot control the order of pod restarts.
While this behavior is to be expected, it can cause a pod to remain on the cluster even though it has failed to deploy successfully. The pod continues to report `UnexpectedAdmissionError`. This issue is mitigated by the fact that application pods are typically included in a `Deployment`, `ReplicaSet`, or `DaemonSet`. If a pod is in this error state, it is of little concern because another instance should be running. Belonging to a `Deployment`, `ReplicaSet`, or `DaemonSet` guarantees the successful creation and execution of subsequent pods and ensures the successful deployment of the application.
There is ongoing work upstream to ensure that such pods are gracefully terminated. Until that work is resolved, run the following command for a {sno} cluster to remove the failed pods:
[source,terminal,subs="+quotes"]
----
$ oc delete pods --field-selector status.phase=Failed -n _<POD_NAMESPACE>_
----
[NOTE]
====
The option to drain the node is unavailable for {sno} clusters.
====

View File

@@ -1,23 +0,0 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc
:_content-type: CONCEPT
[id="ztp-sno-node-reboot-scenarios_{context}"]
= SNO node reboot scenario
On a {sno-caps} cluster and in {product-title} clusters generally a situation might arise in case a node reboot occurs without node drain where an application pod requesting devices fails with the `UnexpectedAdmissionError` error. deployment, replicaset, or daemonset error is reported due to the fact that application pods requiring devices can start before the pod serving those devices, as there is no way to control the order of pod restarts.
While this behavior is expected, it can lead to a pod remaining on the cluster even when it has failed to deploy successfully and will continue to report `UnexpectedAdmissionError`. The presence of this issue is mitigated since application pods are typically included in a deployment, replicaset, or daemonset. Having a pod in this state is of little concern as another instance should be running. Being part of a deployment, replicaset, or daemonset guarantees the successful creation and execution of subsequent pods and ensures the successful deployment of the application.
There is ongoing work upstream to ensure that such pods are gracefully terminated. Until that is resolved run the following command in a {sno-caps} deployment to remove the failed pods:
[source,terminal]
----
$ kubectl delete pods --field-selector status.phase=Failed -n <POD_NAMESPACE>
----
[NOTE]
====
The option to drain the node is unavailable in a {sno-caps} deployment.
====

View File

@@ -6,20 +6,26 @@ include::_attributes/common-attributes.adoc[]
toc::[]
As an administrator, you can perform a number of tasks to make your clusters more efficient.
As an administrator, you can perform several tasks to make your clusters more efficient.
// The following include statements pull in the module files that comprise
// the assembly. Include any combination of concept, procedure, or reference
// modules required to cover the user story. You can also include other
// assemblies.
include::modules/nodes-nodes-working-evacuating.adoc[leveloffset=+1]
include::modules/nodes-nodes-working-updating.adoc[leveloffset=+1]
include::modules/nodes-nodes-working-marking.adoc[leveloffset=+1]
include::modules/sno-clusters-reboot-without-drain.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes]
== Deleting nodes
include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2]
@@ -31,4 +37,3 @@ include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2]
see xref:../../machine_management/manually-scaling-machineset.adoc#machineset-manually-scaling-manually-scaling-machineset[Manually scaling a MachineSet].
include::modules/nodes-nodes-working-deleting-bare-metal.adoc[leveloffset=+2]

View File

@@ -86,12 +86,7 @@ include::modules/ztp-sno-du-configuring-lvms.adoc[leveloffset=+2]
include::modules/ztp-sno-du-disabling-network-diagnostics.adoc[leveloffset=+2]
include::modules/ztp-sno-node-reboot-scenarios.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes
]
* xref:../../scalability_and_performance/ztp_far_edge/ztp-deploying-far-edge-sites.adoc#ztp-deploying-far-edge-sites[Deploying far edge sites using ZTP]