openshift-docs/modules/telco-core-topology-aware-lifecycle-manager.adoc

// Module included in the following assemblies:
//
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc

:_mod-docs-content-type: REFERENCE
[id="telco-core-topology-aware-lifecycle-manager_{context}"]
= Topology Aware Lifecycle Manager

New in this release::
* No reference design updates in this release.

Description::
{cgu-operator} is an Operator that runs only on the hub cluster.
{cgu-operator} manages how changes including cluster and Operator upgrades, configurations, and so on, are rolled out to managed clusters in the network.
{cgu-operator} has the following core features:
* Provides sequenced updates of cluster configurations and upgrades ({product-title} and Operators) as defined by cluster policies.
* Provides for deferred application of cluster updates.
* Supports progressive rollout of policy updates to sets of clusters in user configurable batches.
* Allows for per-cluster actions by adding `ztp-done` or similar user-defined labels to clusters.

Limits and requirements::
* Supports concurrent cluster deployments in batches of 400

Engineering considerations::
* Only policies with the `ran.openshift.io/ztp-deploy-wave` annotation are applied by {cgu-operator} during initial cluster installation.
* Any policy can be remediated by {cgu-operator} under control of a user created `ClusterGroupUpgrade` CR.
* Set the `MachineConfigPool` (`mcp`) CR `paused` field to true during a cluster upgrade maintenance window and set the `maxUnavailable` field to the maximum tolerable value.
This prevents multiple cluster node reboots during upgrade, which results in a shorter overall upgrade.
When you unpause the `mcp` CR, all the configuration changes are applied with a single reboot.
+
[NOTE]
====
During installation, custom `mcp` CRs can be paused along with setting `maxUnavailable` to 100% to improve installation times.
====

* Orchestration of an upgrade, including {product-title}, day-2 OLM operators, and custom configuration can be done using a `ClusterGroupUpgrade` (CGU) CR containing policies describing these updates.
** An EUS to EUS upgrade can be orchestrated using chained CGU CRs
** Control of MCP pause can be managed through policy in the CGU CRs for a full control plane and worker node rollout of upgrades.