diff --git a/modules/deployments-rolling-strategy.adoc b/modules/deployments-rolling-strategy.adoc index 3105608436..1befe7674a 100644 --- a/modules/deployments-rolling-strategy.adoc +++ b/modules/deployments-rolling-strategy.adoc @@ -60,3 +60,9 @@ These parameters allow the deployment to be tuned for availability and speed. Fo Generally, if you want fast rollouts, use `maxSurge`. If you have to take into account resource quota and can accept partial unavailability, use `maxUnavailable`. + +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + diff --git a/modules/nodes-pods-pod-disruption-about.adoc b/modules/nodes-pods-pod-disruption-about.adoc index d6ad9ff184..cbe01f5c13 100644 --- a/modules/nodes-pods-pod-disruption-about.adoc +++ b/modules/nodes-pods-pod-disruption-about.adoc @@ -33,6 +33,11 @@ A `maxUnavailable` of `0%` or `0` or a `minAvailable` of `100%` or equal to the is permitted but can block nodes from being drained. ==== +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + You can check for pod disruption budgets across all projects with the following: [source,terminal] diff --git a/modules/update-best-practices.adoc b/modules/update-best-practices.adoc index 489a3634f4..1cdb20f874 100644 --- a/modules/update-best-practices.adoc +++ b/modules/update-best-practices.adoc @@ -48,6 +48,11 @@ Additionally, if compute nodes do not have enough spare capacity, workloads migh Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates. +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + [id="pod-disruption-budget_{context}"] == Ensure that the cluster's PodDisruptionBudget is properly configured @@ -60,4 +65,4 @@ When planning a cluster update, check the configuration of the `PodDisruptionBud * For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the `PodDisruptionBudget`. -* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination. \ No newline at end of file +* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination. diff --git a/modules/update-duration-estimate-cluster-update-time.adoc b/modules/update-duration-estimate-cluster-update-time.adoc index 905ef532ab..4f3d3dff55 100644 --- a/modules/update-duration-estimate-cluster-update-time.adoc +++ b/modules/update-duration-estimate-cluster-update-time.adoc @@ -14,6 +14,11 @@ Cluster update time = CVO target update payload deployment time + (# node update A node update iteration consists of one or more nodes updated in parallel. The control plane nodes are always updated in parallel with the compute nodes. In addition, one or more compute nodes can be updated in parallel based on the `maxUnavailable` value. +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + For example, to estimate the update time, consider an {product-title} cluster with three control plane nodes and six compute nodes and each host takes about 5 minutes to reboot. [NOTE] diff --git a/modules/update-duration-factors.adoc b/modules/update-duration-factors.adoc index 1a80066506..a427eeb9cf 100644 --- a/modules/update-duration-factors.adoc +++ b/modules/update-duration-factors.adoc @@ -10,6 +10,11 @@ The following factors can affect your cluster update duration: * The reboot of compute nodes to the new machine configuration by Machine Config Operator (MCO) ** The value of `MaxUnavailable` in the machine config pool ++ +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== ** The minimum number or percentages of replicas set in pod disruption budget (PDB) * The number of nodes in the cluster * The health of the cluster nodes diff --git a/modules/update-mco-process.adoc b/modules/update-mco-process.adoc index 0cafe64f6b..2b8f2c1559 100644 --- a/modules/update-mco-process.adoc +++ b/modules/update-mco-process.adoc @@ -7,6 +7,11 @@ = Understanding how the Machine Config Operator updates nodes The Machine Config Operator (MCO) applies a new machine configuration to each control plane node and compute node. During the machine configuration update, control plane nodes and compute nodes are organized into their own machine config pools, where the pools of machines are updated in parallel. The `.spec.maxUnavailable` parameter, which has a default value of `1`, determines how many nodes in a machine config pool can simultaneously undergo the update process. +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + When the machine configuration update process begins, the MCO checks the amount of currently unavailable nodes in a pool. If there are fewer unavailable nodes than the value of `.spec.maxUnavailable`, the MCO initiates the following sequence of actions on available nodes in the pool: . Cordon and drain the node diff --git a/modules/update-service-overview.adoc b/modules/update-service-overview.adoc index dc65906cd3..184c57f7dd 100644 --- a/modules/update-service-overview.adoc +++ b/modules/update-service-overview.adoc @@ -35,6 +35,11 @@ Only updating to a newer version is supported. Reverting or rolling back your cl During the update process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. The MCO cordons the number of nodes specified by the `maxUnavailable` field on the machine configuration pool and marks them unavailable. By default, this value is set to `1`. The MCO updates the affected nodes alphabetically by zone, based on the `topology.kubernetes.io/zone` label. If a zone has more than one node, the oldest nodes are updated first. For nodes that do not use zones, such as in bare metal deployments, the nodes are updated by age, with the oldest nodes updated first. The MCO updates the number of nodes as specified by the `maxUnavailable` field on the machine configuration pool at a time. The MCO then applies the new configuration and reboots the machine. +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + If you use {op-system-base-full} machines as workers, the MCO does not update the kubelet because you must update the OpenShift API on the machines first. With the specification for the new version applied to the old kubelet, the {op-system-base} machine cannot return to the `Ready` state. You cannot complete the update until the machines are available. However, the maximum number of unavailable nodes is set to ensure that normal cluster operations can continue with that number of machines out of service. diff --git a/modules/update-using-custom-machine-config-pools-about.adoc b/modules/update-using-custom-machine-config-pools-about.adoc index 82d2aa1e8e..e4e8605fa3 100644 --- a/modules/update-using-custom-machine-config-pools-about.adoc +++ b/modules/update-using-custom-machine-config-pools-about.adoc @@ -14,6 +14,11 @@ The following steps outline the high-level workflow of the canary rollout update ==== You can change the `maxUnavailable` setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is `1`. ==== ++ +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== . Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP. + diff --git a/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc b/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc index b7b995f252..6098f7688b 100644 --- a/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc +++ b/updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc @@ -102,6 +102,11 @@ Because the MCO does not update nodes within paused MCPs, you can pause the MCPs Using one or more custom MCPs can give you more control over the sequence in which you update your worker nodes. For example, after you update the nodes in the first MCP, you can verify the application compatibility and then update the rest of the nodes gradually to the new version. +[WARNING] +==== +The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool. +==== + [NOTE] ==== To ensure the stability of the control plane, creating a custom MCP from the control plane nodes is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.