openshift-docs/modules/rosa-how-upgrades-work.adoc


// Module included in the following assemblies:
//
// upgrading/rosa-upgrading-sts.adoc


:_mod-docs-content-type: CONCEPT
[id="rosa-how-upgrades-work_{context}"]
= How ROSA (classic architecture) cluster upgrades work

Upgrades are manually initiated (one-time) or automatically scheduled (recurring). Red{nbsp}Hat Site Reliability Engineers (SREs) monitor upgrade progress and either proactively notify you to take corrective actions or remedy issues encountered.

The Cluster Version Operator (CVO) is the primary component that orchestrates and facilitates the OpenShift Container Platform update process.

The Managed Upgrade Operator (MUO) handles the scheduling, monitoring, and notifications of ROSA (classic architecture) cluster upgrades. The MUO orchestrates the automated in-place cluster upgrades by ensuring the operating conditions are met before and after the upgrade of a managed cluster.

[id="rosa-upgrade-scheduled-time_{context}"]
== Cluster upgrade scheduled time

You can schedule cluster upgrades by setting the scheduled time. This is when the preparation for the cluster upgrade begins with pre-upgrade health checks and additional compute capacity creation. The actual cluster upgrade starts within one hour from the scheduled time. You receive an email notification when the cluster upgrade starts.

The Pre-Health Check (PHC) provides extra protection to ensure the scheduled update proceeds as expected and runs in the following two scenarios:

* If an upgrade is scheduled for more than 2 hours from the current time, the PHC runs and the user is notified if there are any failures. This PHC is in the _New_ phase of the upgrade.
* When the upgrade is immediate or within 2 hours, the PHC runs just before the upgrade begins. This PHC is in the _Upgrading_ phase of the upgrade.
This means that PHC is always run at least one time during the upgrading phase but can also be run additionally in advance if the upgrade is scheduled for more than 2 hours from the current time.

You can observe the status of the cluster upgrade by running the `rosa describe upgrade --cluster=<cluster name|cluster_id>` command in the ROSA CLI (`rosa`).

[id="rosa-cluster-upgrade-overview_{context}"]
== ROSA (classic architecture) upgrade overview

The following are the high-level steps that occur during the ROSA (classic architecture) cluster update:

. Scheduling the upgrade in advance triggers the `PreHealthCheck` and notifies users of any failures that they can then address before the upgrade begins.
. Before the cluster upgrade begins, a cluster health check is performed by the MUO. If the MUO identifies an issue that requires corrective action, you will be notified. Some examples of the cluster health checks that MUO performs include:
** Identifying any Pod Disruption Budgets (PDBs) that can potentially block or delay the update of the nodes.
** Ensuring cluster Operators are available and healthy.
** Ensuring cluster critical alerts are not firing.
. A temporary compute node is created in each availability zone (AZ) in the cluster to allow for the scheduling of drained pods during the update.
+
[NOTE]
====
The temporary compute node creation does not always happen. For example, if there is no `worker` machine pool, the temporary compute node will not be created. This might happen when a cluster admin deletes the existing `worker` machine pool and creates another `worker` machine pool with a different name or instance type.
====
. The cluster version is set to the target version.
+
[NOTE]
====
In certain situations, an upgrade path can become unavailable since the time the cluster update was requested but before it was completed. In such cases, the upgrade is automatically canceled and a notification is sent. You must pick another target version to request the upgrade.
====
+
. During the upgrade, the control plane components are updated to the new version.
. Next, individual cluster Operators perform update tasks on their domain of the cluster.
. Finally, the MCO updates the system configuration and operating system of every node. During this step, each node is rebooted after successfully draining the workloads running on the node.
.. During the update of each node, workloads are drained, honoring the PDBs. Workloads with PDBs that do not allow disruptions essentially block the draining of the node, increasing the elapsed time for the cluster update.
.. During the update of every node in the cluster, the cluster update waits until the time specified by the _node drain grace period_ to allow for safely draining the workloads. Upon reaching the node drain grace period, the node is forcibly drained to allow for cluster upgrade to progress. You can only configure the node drain grace period before initiating the upgrade and you cannot change it after the cluster upgrade begins.
.. When cluster nodes are updated, the MCO selects one node at a time per machine config pool according to their age, starting with the oldest.