OCPBUGS-24188: add warning about setting control mcp to 3

2026-02-05 21:46:22 +01:00 · 2024-02-27 14:03:04 -06:00
parent 6af57f7c2c
commit 8bb489f9e9
9 changed files with 47 additions and 1 deletions
--- a/modules/deployments-rolling-strategy.adoc
+++ b/modules/deployments-rolling-strategy.adoc
@@ -60,3 +60,9 @@ These parameters allow the deployment to be tuned for availability and speed. Fo

 Generally, if you want fast rollouts, use `maxSurge`. If you have to take into account resource quota and can accept partial unavailability, use
 `maxUnavailable`.
+
+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
--- a/modules/nodes-pods-pod-disruption-about.adoc
+++ b/modules/nodes-pods-pod-disruption-about.adoc
@@ -33,6 +33,11 @@ A `maxUnavailable` of `0%` or `0` or a `minAvailable` of `100%` or equal to the
 is permitted but can block nodes from being drained.
 ====

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 You can check for pod disruption budgets across all projects with the following:

 [source,terminal]
--- a/modules/update-best-practices.adoc
+++ b/modules/update-best-practices.adoc
@@ -48,6 +48,11 @@ Additionally, if compute nodes do not have enough spare capacity, workloads migh

 Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates.

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 [id="pod-disruption-budget_{context}"]
 == Ensure that the cluster's PodDisruptionBudget is properly configured

@@ -60,4 +65,4 @@ When planning a cluster update, check the configuration of the `PodDisruptionBud

 * For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the `PodDisruptionBudget`.

-* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.
+* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.
--- a/modules/update-duration-estimate-cluster-update-time.adoc
+++ b/modules/update-duration-estimate-cluster-update-time.adoc
@@ -14,6 +14,11 @@ Cluster update time = CVO target update payload deployment time + (# node update

 A node update iteration consists of one or more nodes updated in parallel. The control plane nodes are always updated in parallel with the compute nodes. In addition, one or more compute nodes can be updated in parallel based on the `maxUnavailable` value.

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 For example, to estimate the update time, consider an {product-title} cluster with three control plane nodes and six compute nodes and each host takes about 5 minutes to reboot.

 [NOTE]
--- a/modules/update-duration-factors.adoc
+++ b/modules/update-duration-factors.adoc
@@ -10,6 +10,11 @@ The following factors can affect your cluster update duration:

 * The reboot of compute nodes to the new machine configuration by Machine Config Operator (MCO)
 ** The value of `MaxUnavailable` in the machine config pool
+
+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
 ** The minimum number or percentages of replicas set in pod disruption budget (PDB)
 * The number of nodes in the cluster
 * The health of the cluster nodes
--- a/modules/update-mco-process.adoc
+++ b/modules/update-mco-process.adoc
@@ -7,6 +7,11 @@
 = Understanding how the Machine Config Operator updates nodes
 The Machine Config Operator (MCO) applies a new machine configuration to each control plane node and compute node. During the machine configuration update, control plane nodes and compute nodes are organized into their own machine config pools, where the pools of machines are updated in parallel. The `.spec.maxUnavailable` parameter, which has a default value of `1`, determines how many nodes in a machine config pool can simultaneously undergo the update process.

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 When the machine configuration update process begins, the MCO checks the amount of currently unavailable nodes in a pool. If there are fewer unavailable nodes than the value of `.spec.maxUnavailable`, the MCO initiates the following sequence of actions on available nodes in the pool:

 . Cordon and drain the node
--- a/modules/update-service-overview.adoc
+++ b/modules/update-service-overview.adoc
@@ -35,6 +35,11 @@ Only updating to a newer version is supported. Reverting or rolling back your cl

 During the update process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. The MCO cordons the number of nodes specified by the `maxUnavailable` field on the machine configuration pool and marks them unavailable. By default, this value is set to `1`. The MCO updates the affected nodes alphabetically by zone, based on the `topology.kubernetes.io/zone` label. If a zone has more than one node, the oldest nodes are updated first. For nodes that do not use zones, such as in bare metal deployments, the nodes are updated by age, with the oldest nodes updated first. The MCO updates the number of nodes as specified by the `maxUnavailable` field on the machine configuration pool at a time. The MCO then applies the new configuration and reboots the machine.

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 If you use {op-system-base-full} machines as workers, the MCO does not update the kubelet because you must update the OpenShift API on the machines first.

 With the specification for the new version applied to the old kubelet, the {op-system-base} machine cannot return to the `Ready` state. You cannot complete the update until the machines are available. However, the maximum number of unavailable nodes is set to ensure that normal cluster operations can continue with that number of machines out of service.
--- a/modules/update-using-custom-machine-config-pools-about.adoc
+++ b/modules/update-using-custom-machine-config-pools-about.adoc
@@ -14,6 +14,11 @@ The following steps outline the high-level workflow of the canary rollout update
 ====
 You can change the `maxUnavailable` setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is `1`.
 ====
+
+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====

 . Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP.
 +
--- a/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc
+++ b/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc
@@ -102,6 +102,11 @@ Because the MCO does not update nodes within paused MCPs, you can pause the MCPs
 Using one or more custom MCPs can give you more control over the sequence in which you update your worker nodes.
 For example, after you update the nodes in the first MCP, you can verify the application compatibility and then update the rest of the nodes gradually to the new version.

+[WARNING]
+====
+The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
+====
+
 [NOTE]
 ====
 To ensure the stability of the control plane, creating a custom MCP from the control plane nodes is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.