1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

TELCODOCS-2171-lifecycle-mgmt-lcavalle: Generalize Day2Ops Lifecycle management

TELCODOCS-2171-lifecycle-mgmt-lcavalle: fixing some vale errors
This commit is contained in:
Cavalle
2026-01-13 02:03:37 +01:00
committed by openshift-cherrypick-robot
parent c62170fc98
commit 64e845d54d
44 changed files with 455 additions and 417 deletions

View File

@@ -3529,31 +3529,31 @@ Topics:
File: ibi-edge-image-based-install
- Name: Deploying single-node OpenShift using the installation program
File: ibi-edge-image-based-install-standalone
- Name: Day 2 operations for telco core CNF clusters
- Name: Day 2 operations for OpenShift Container Platform clusters
Dir: day_2_core_cnf_clusters
Distros: openshift-origin,openshift-enterprise
Topics:
- Name: Day 2 operations for telco core CNF clusters
- Name: Day 2 operations for OpenShift Container Platform clusters
File: telco-day-2-welcome
- Name: Upgrading telco core CNF clusters
- Name: Upgrading OpenShift Container Platform clusters
Dir: updating
Topics:
- Name: Upgrading telco core CNF clusters
File: telco-update-welcome
- Name: Upgrading OpenShift Container Platform clusters
File: update-welcome
- Name: OpenShift Container Platform API compatibility
File: telco-update-api
File: update-api
- Name: Preparing for the cluster update
File: telco-update-ocp-update-prep
- Name: Managing live CNF pods during the cluster update
File: telco-update-cnf-update-prep
File: update-ocp-update-prep
- Name: Managing application pods during the cluster update
File: update-cnf-update-prep
- Name: Before you update the cluster
File: telco-update-before-the-update
File: update-before-the-update
- Name: Completing the Control Plane Only update
File: telco-update-completing-the-control-plane-only-update
File: update-completing-the-control-plane-only-update
- Name: Completing the y-stream update
File: telco-update-completing-the-y-stream-update
File: update-completing-the-y-stream-update
- Name: Completing the z-stream update
File: telco-update-completing-the-z-stream-update
File: update-completing-the-z-stream-update
- Name: Troubleshooting and maintaining OpenShift Container Platform clusters
Dir: troubleshooting
Topics:

View File

@@ -6,7 +6,8 @@ include::_attributes/common-attributes.adoc[]
toc::[]
Security is a critical component of {product-title} deployments, particularly when running cloud-native applications.
[role="_abstract"]
Security is a critical component of {product-title} deployments, particularly when running cloud-native applications.
You can enhance security for high-bandwidth network deployments by following key security considerations. By implementing these standards and best practices, you can strengthen security in most use cases.
@@ -49,4 +50,4 @@ include::modules/security-lifecycle-mgmnt.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-welcome.adoc#[Upgrading a telco core CNF clusters]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-welcome.adoc#update-welcome[Upgrading an OpenShift cluster]

View File

@@ -1,21 +1,22 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-day-2-welcome"]
= Day 2 operations for telco core CNF clusters
= Day 2 operations for {product-title} clusters
include::_attributes/common-attributes.adoc[]
:context: telco-day-2-welcome
toc::[]
You can use the following Day 2 operations to manage telco core CNF clusters.
[role="_abstract"]
You can use the following Day 2 operations to manage {product-title} clusters.
Updating a telco core CNF cluster:: Updating your cluster is a critical task that ensures that bugs and potential security vulnerabilities are patched.
For more information, see xref:../day_2_core_cnf_clusters/updating/telco-update-welcome.adoc#telco-update-welcome[Updating a telco core CNF cluster].
Updating an {product-title} cluster:: Updating your cluster is a critical task that ensures that bugs and potential security vulnerabilities are patched.
For more information, see xref:../day_2_core_cnf_clusters/updating/update-welcome.adoc#update-welcome[Updating an {product-title} cluster].
Troubleshooting and maintaining telco core CNF clusters:: To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see xref:../day_2_core_cnf_clusters/troubleshooting/troubleshooting-intro.adoc#troubleshooting-intro[Troubleshooting and maintaining {product-title} clusters].
Troubleshooting and maintaining {product-title} clusters:: To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see xref:../day_2_core_cnf_clusters/troubleshooting/troubleshooting-intro.adoc#troubleshooting-intro[Troubleshooting and maintaining {product-title} clusters].
Observability in telco core CNF clusters:: {product-title} generates a large amount of data, such as performance metrics and logs from the platform and the workloads running on it.
Observability in {product-title} clusters:: {product-title} generates a large amount of data, such as performance metrics and logs from the platform and the workloads running on it.
As an administrator, you can use tools to collect and analyze the available data.
For more information, see xref:../day_2_core_cnf_clusters/observability/observability.adoc#observability[Observability in {product-title}].
Security:: You can enhance security for high-bandwidth network deployments in telco environments by following key security considerations.
Security:: You can enhance security for high-bandwidth network deployments by following key security considerations.
For more information, see xref:../day_2_core_cnf_clusters/security/security-basics.adoc#security-basics[Security basics].

View File

@@ -1,73 +0,0 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-completing-the-update"]
= Completing the Control Plane Only cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-update
toc::[]
Follow these steps to perform the Control Plane Only cluster update and monitor the update through to completion.
[IMPORTANT]
====
Control Plane Only updates were previously known as EUS-to-EUS updates.
Control Plane Only updates are only viable between even-numbered minor versions of {product-title}.
====
include::modules/telco-update-acknowledging-the-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#kube-api-removals_updating-cluster-prepare[Kubernetes API removals]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-api[Verifying cluster API versions between update versions]
include::modules/telco-update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-selecting-the-target-release_telco-update-api[Selecting the target release]
include::modules/telco-update-monitoring-the-cluster-update.adoc[leveloffset=+1]
include::modules/telco-update-updating-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
include::modules/telco-update-performing-the-second-y-stream-update.adoc[leveloffset=+2]
include::modules/telco-update-acknowledging-the-y-stream-release-update.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#updating-cluster-prepare[Preparing to update to {product-title} {product-version}]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-api[Verifying cluster API versions between update versions]
include::modules/telco-update-starting-the-y-stream-control-plane-update.adoc[leveloffset=+1]
include::modules/telco-update-monitoring-second-part-y-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-monitoring-the-cluster-update_completing-the-update[Monitoring the cluster update]
include::modules/telco-update-updating-all-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update#telco-update-monitoring-the-cluster-update_completing-the-update[Monitoring the cluster update]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-updating-the-olm-operators_completing-the-update[Updating the OLM Operators]
include::modules/telco-update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/telco-update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -1,39 +0,0 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-completing-the-y-stream-update"]
= Completing the y-stream cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-y-stream-update
toc::[]
Follow these steps to perform the y-stream cluster update and monitor the update through to completion.
Completing a y-stream update is more straightforward than a Control Plane Only update.
include::modules/telco-update-acknowledging-the-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#kube-api-removals_updating-cluster-prepare[Kubernetes API removals]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-api[Verifying cluster API versions between update versions]
include::modules/telco-update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-selecting-the-target-release_telco-update-api[Selecting the target release]
include::modules/telco-update-monitoring-the-cluster-update.adoc[leveloffset=+1]
include::modules/telco-update-updating-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
include::modules/telco-update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/telco-update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -1,21 +0,0 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-completing-the-z-stream-update"]
= Completing the z-stream cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-z-stream-update
toc::[]
Follow these steps to perform the z-stream cluster update and monitor the update through to completion.
Completing a z-stream update is more straightforward than a Control Plane Only or y-stream update.
include::modules/telco-update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc#telco-update-selecting-the-target-release_telco-update-api[Selecting the target release]
include::modules/telco-update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/telco-update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -1,58 +0,0 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-ocp-update-prep"]
= Preparing the telco core cluster platform for update
include::_attributes/common-attributes.adoc[]
:context: ocp-update-prep
toc::[]
Typically, telco clusters run on bare-metal hardware.
Often you must update the firmware to take on important security fixes, take on new functionality, or maintain compatibility with the new release of {product-title}.
include::modules/telco-update-ensuring-the-host-firmware-is-compatible.adoc[leveloffset=+1]
include::modules/telco-update-ensuring-that-layered-products-are-compatible.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.adoc#telco-update-updating-all-the-olm-operators_completing-the-update[Updating all the OLM Operators]
include::modules/telco-update-applying-mcp-labels-to-nodes-before-the-update.adoc[leveloffset=+1]
include::modules/telco-update-reviewing-configured-cluster-mcp-roles.adoc[leveloffset=+2]
include::modules/telco-update-creating-mcp-groups-for-the-cluster.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/updating_a_cluster/control-plane-only-update.adoc#control-plane-only-update[Performing a Control Plane Only update]
* xref:../../../updating/understanding_updates/understanding-openshift-update-duration.adoc#factors-affecting-update-duration_openshift-update-duration[Factors affecting update duration]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-cnf-update-prep.adoc#telco-update-pdb_telco-update-cnf-update-prep[Ensuring that CNF workloads run uninterrupted with pod disruption budgets]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/telco-update-cnf-update-prep.adoc#telco-update-pod-anti-affinity_telco-update-cnf-update-prep[Ensuring that pods do not run on the same cluster node]
[id="telco-update-environment-considerations_{context}"]
== Telco deployment environment considerations
In telco environments, most clusters are in disconnected networks.
To update clusters in these environments, you must update your offline image repository.
[role="_additional-resources"]
.Additional resources
* xref:../../../rest_api/overview/understanding-compatibility-guidelines.adoc#api-compatibility-guidelines_compatibility-guidelines[API compatibility guidelines]
* xref:../../../disconnected/about-installing-oc-mirror-v2.adoc#about-installing-oc-mirror-v2[Mirroring images for a disconnected installation by using the oc-mirror plugin v2]
include::modules/telco-update-preparing-the-platform-for-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#investigating-pod-issues[Investigating pod issues]

View File

@@ -1,15 +1,16 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-api"]
[id="update-api"]
= Verifying cluster API versions between update versions
include::_attributes/common-attributes.adoc[]
:context: telco-update-api
:context: update-api
toc::[]
[role="_abstract"]
APIs change over time as components are updated.
It is important to verify that cloud-native network function (CNF) APIs are compatible with the updated cluster version.
It is important to verify that your application APIs are compatible with the updated cluster version.
include::modules/telco-update-api-compatibility.adoc[leveloffset=+1]
include::modules/update-api-compatibility.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
@@ -18,11 +19,11 @@ include::modules/telco-update-api-compatibility.adoc[leveloffset=+1]
* link:https://kubernetes.io/releases/version-skew-policy/[Kubernetes version skew policy]
include::modules/telco-update-determining-the-cluster-version-update-path.adoc[leveloffset=+1]
include::modules/update-determining-the-cluster-version-update-path.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/understanding_updates/understanding-update-channels-release.adoc#understanding-update-channels-releases[Understanding update channels and releases]
include::modules/telco-update-selecting-the-target-release.adoc[leveloffset=+1]
include::modules/update-selecting-the-target-release.adoc[leveloffset=+1]

View File

@@ -1,16 +1,17 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-before-the-update"]
[id="update-before-the-update"]
= Before you update the telco core CNF cluster
include::_attributes/common-attributes.adoc[]
:context: telco-update-before-the-update
:context: update-before-the-update
toc::[]
[role="_abstract"]
Before you start the cluster update, you must pause worker nodes, back up the etcd database, and do a final cluster health check before proceeding.
include::modules/telco-update-pause-worker-nodes-before-the-update.adoc[leveloffset=+1]
include::modules/update-pause-worker-nodes-before-the-update.adoc[leveloffset=+1]
[id="telco-update-backup-etcd-database-before-update_{context}"]
[id="update-backup-etcd-database-before-update_{context}"]
== Backup the etcd database before you proceed with the update
You must backup the etcd database before you proceed with the update.
@@ -24,4 +25,4 @@ include::modules/creating-single-etcd-backup.adoc[leveloffset=+2]
* xref:../../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd[Backing up etcd]
include::modules/telco-update-checking-the-cluster-health.adoc[leveloffset=+1]
include::modules/update-checking-the-cluster-health.adoc[leveloffset=+1]

View File

@@ -1,12 +1,13 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-cnf-update-prep"]
= Configuring CNF pods before updating the telco core CNF cluster
[id="update-cnf-update-prep"]
= Configuring application pods before updating your {product-title} cluster
include::_attributes/common-attributes.adoc[]
:context: telco-update-cnf-update-prep
:context: update-cnf-update-prep
toc::[]
Follow the guidance in link:https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat best practices for Kubernetes] when developing cloud-native network functions (CNFs) to ensure that the cluster can schedule pods during an update.
[role="_abstract"]
Configure application pods to ensure workload availability during {product-title} updates. For example, use deployment strategies, pod disruption budgets, anti-affinity rules, and health probes to maintain high availability and prevent service disruption. In the telecommunications industry, most containerized network function (CNF) vendors follow the guidance in Red Hat best practices for Kubernetes to ensure that the cluster can schedule pods properly during an upgrade.
[IMPORTANT]
====
@@ -20,23 +21,25 @@ When a pod that is managed by a `Deployment` resource is deleted, a new pod take
* link:https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat best practices for Kubernetes]
include::modules/telco-update-pdb.adoc[leveloffset=+1]
include::modules/update-pdb.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../nodes/pods/nodes-pods-configuring.adoc#nodes-pods-pod-disruption-configuring_nodes-pods-configuring[Specifying the number of pods that must be up with pod disruption budgets]
* xref:../../../post_installation_configuration/cluster-tasks.adoc#nodes-pods-pod-disruption-configuring_post-install-pod-disruption-budgets[Specifying the number of pods that must be up with pod disruption budgets]
* xref:../../../nodes/pods/nodes-pods-configuring.adoc#nodes-pods-pod-disruption-configuring_nodes-pods-configuring[Configuring an {product-title} cluster for pods]
* xref:../../../nodes/pods/nodes-pods-priority.adoc#priority-preemption-other_nodes-pods-priority[Pod preemption and other scheduler settings]
include::modules/telco-update-pod-anti-affinity.adoc[leveloffset=+1]
include::modules/update-pod-anti-affinity.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../nodes/scheduling/nodes-scheduler-pod-affinity.adoc#nodes-scheduler-pod-affinity-configuring_nodes-scheduler-pod-affinity[Configuring a pod affinity rule]
include::modules/telco-update-monitoring-application-health.adoc[leveloffset=+1]
include::modules/update-monitoring-application-health.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources

View File

@@ -0,0 +1,70 @@
:_mod-docs-content-type: ASSEMBLY
[id="update-completing-the-update"]
= Completing the control plane Only cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-update
toc::[]
[role="_abstract"]
Complete the following steps to perform the control plane only cluster update.
[IMPORTANT]
====
Control plane only updates were previously known as EUS-to-EUS updates.
Control plane only updates are only viable between even-numbered minor versions of {product-title}.
====
include::modules/update-acknowledging-the-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#kube-api-removals_updating-cluster-prepare[Kubernetes API removals]
include::modules/update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc#update-selecting-the-target-release_update-api[Selecting the target release]
include::modules/update-monitoring-the-cluster-update.adoc[leveloffset=+1]
include::modules/update-updating-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
include::modules/update-performing-the-second-y-stream-update.adoc[leveloffset=+2]
include::modules/update-acknowledging-the-y-stream-release-update.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#updating-cluster-prepare[Preparing to update to {product-title} {product-version}]
include::modules/update-starting-the-y-stream-control-plane-update.adoc[leveloffset=+1]
include::modules/update-monitoring-second-part-y-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-monitoring-the-cluster-update_completing-the-update[Monitoring the cluster update]
include::modules/update-updating-all-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update#update-monitoring-the-cluster-update_completing-the-update[Monitoring the cluster update]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-updating-the-olm-operators_completing-the-update[Updating the OLM Operators]
include::modules/update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -0,0 +1,37 @@
:_mod-docs-content-type: ASSEMBLY
[id="update-completing-the-y-stream-update"]
= Completing the y-stream cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-y-stream-update
toc::[]
[role="_abstract"]
Complete the following steps to perform a y-stream cluster update.
include::modules/update-acknowledging-the-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/preparing_for_updates/updating-cluster-prepare.adoc#kube-api-removals_updating-cluster-prepare[Kubernetes API removals]
include::modules/update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc#update-selecting-the-target-release_update-api[Selecting the target release]
include::modules/update-monitoring-the-cluster-update.adoc[leveloffset=+1]
include::modules/update-updating-the-olm-operators.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
include::modules/update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -0,0 +1,22 @@
:_mod-docs-content-type: ASSEMBLY
[id="update-completing-the-z-stream-update"]
= Completing the z-stream cluster update
include::_attributes/common-attributes.adoc[]
:context: completing-the-z-stream-update
toc::[]
[role="_abstract"]
Complete the following steps to perform a z-stream cluster update.
include::modules/update-starting-the-cluster-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc#update-selecting-the-target-release_update-api[Selecting the target release]
include::modules/update-updating-the-worker-nodes.adoc[leveloffset=+1]
include::modules/update-verifying-the-health-of-the-newly-updated-cluster.adoc[leveloffset=+1]

View File

@@ -0,0 +1,62 @@
:_mod-docs-content-type: ASSEMBLY
[id="update-ocp-update-prep"]
= Preparing a bare-metal cluster for platform update
include::_attributes/common-attributes.adoc[]
:context: ocp-update-prep
toc::[]
[role="_abstract"]
On bare-metal hardware, you often must update the firmware to take on important security fixes, take on new functionality, or maintain compatibility with the new release of {product-title}.
include::modules/update-ensuring-the-host-firmware-is-compatible.adoc[leveloffset=+1]
include::modules/update-ensuring-that-layered-products-are-compatible.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-updating-the-worker-nodes_completing-the-update[Updating the worker nodes]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-control-plane-only-update.adoc#update-updating-all-the-olm-operators_completing-the-update[Updating all the OLM Operators]
include::modules/update-applying-mcp-labels-to-nodes-before-the-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../machine_configuration/index.adoc#architecture-machine-config-pools_machine-config-overview[Node configuration management with machine config pools]
include::modules/update-reviewing-configured-cluster-mcp-roles.adoc[leveloffset=+2]
include::modules/update-creating-mcp-groups-for-the-cluster.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../../updating/updating_a_cluster/control-plane-only-update.adoc#control-plane-only-update[Performing a Control Plane Only update]
* xref:../../../updating/understanding_updates/understanding-openshift-update-duration.adoc#factors-affecting-update-duration_openshift-update-duration[Factors affecting update duration]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-cnf-update-prep.adoc#update-pdb_update-cnf-update-prep[Ensuring that CNF workloads run uninterrupted with pod disruption budgets]
* xref:../../../edge_computing/day_2_core_cnf_clusters/updating/update-cnf-update-prep.adoc#update-pod-anti-affinity_update-cnf-update-prep[Ensuring that pods do not run on the same cluster node]
[id="update-environment-considerations_{context}"]
== Disconnected environment considerations
To update clusters in disconnected environments, you must update your offline image repository.
[role="_additional-resources"]
.Additional resources
* xref:../../../rest_api/overview/understanding-compatibility-guidelines.adoc#api-compatibility-guidelines_compatibility-guidelines[API compatibility guidelines]
* xref:../../../disconnected/about-installing-oc-mirror-v2.adoc#about-installing-oc-mirror-v2[Mirroring images for a disconnected installation by using the oc-mirror plugin v2]
include::modules/update-preparing-the-platform-for-update.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#investigating-pod-issues[Investigating pod issues]

View File

@@ -1,13 +1,14 @@
:_mod-docs-content-type: ASSEMBLY
[id="telco-update-welcome"]
= Updating a telco core CNF cluster
[id="update-welcome"]
= Upgrading an {product-title} cluster
include::_attributes/common-attributes.adoc[]
:context: telco-update-welcome
:context: update-welcome
toc::[]
[role="_abstract"]
{product-title} has long term support or extended update support (EUS) on all even releases and update paths between EUS releases.
You can update from one EUS version to the next EUS version.
It is also possible to update between y-stream and z-stream versions.
include::modules/telco-update-introduction.adoc[leveloffset=+1]
include::modules/update-introduction.adoc[leveloffset=+1]

View File

@@ -1,12 +0,0 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-acknowledging-update-for-eus-eus-y-stream_{context}"]
= Acknowledging the Control Plane Only or y-stream update
When you update to all versions from 4.11 and later, you must manually acknowledge that the update can continue.
// Reused in "telco-update-acknowledging-the-update.adoc" and "telco-update-acknowledging-the-y-stream-release-update.adoc" files
include::snippets/acknowledge-the-update.adoc[]

View File

@@ -1,63 +0,0 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-applying-mcp-labels-to-nodes-before-the-update_{context}"]
= Applying MachineConfigPool labels to nodes before the update
Prepare `MachineConfigPool` (`mcp`) node labels to group nodes together in groups of roughly 8 to 10 nodes.
With `mcp` groups, you can reboot groups of nodes independently from the rest of the cluster.
You use the `mcp` node labels to pause and unpause the set of nodes during the update process so that you can do the update and reboot at a time of your choosing.
[id="telco-update-staggering-the-cluster-update_{context}"]
== Staggering the cluster update
Sometimes there are problems during the update.
Often the problem is related to hardware failure or nodes needing to be reset.
Using `mcp` node labels, you can update nodes in stages by pausing the update at critical moments, tracking paused and unpaused nodes as you proceed.
When a problem occurs, you use the nodes that are in an unpaused state to ensure that there are enough nodes running to keep all applications pods running.
[id="telco-update-dividing-worker-nodes-into-mcp-groups_{context}"]
== Dividing worker nodes into MachineConfigPool groups
How you divide worker nodes into `mcp` groups can vary depending on how many nodes are in the cluster or how many nodes you assign to a node role.
By default the 2 roles in a cluster are control plane and worker.
In clusters that run telco workloads, you can further split the worker nodes between CNF control plane and CNF data plane roles.
Add `mcp` role labels that split the worker nodes into each of these two groups.
[NOTE]
====
Larger clusters can have as many as 100 worker nodes in the CNF control plane role.
No matter how many nodes there are in the cluster, keep each `MachineConfigPool` group to around 10 nodes.
This allows you to control how many nodes are taken down at a time.
With multiple `MachineConfigPool` groups, you can unpause several groups at a time to accelerate the update, or separate the update over 2 or more maintenance windows.
====
Example cluster with 15 worker nodes::
Consider a cluster with 15 worker nodes:
* 10 worker nodes are CNF control plane nodes.
* 5 worker nodes are CNF data plane nodes.
+
Split the CNF control plane and data plane worker node roles into at least 2 `mcp` groups each.
Having 2 `mcp` groups per role means that you can have one set of nodes that are not affected by the update.
Example cluster with 6 worker nodes::
Consider a cluster with 6 worker nodes:
* Split the worker nodes into 3 `mcp` groups of 2 nodes each.
+
Upgrade one of the `mcp` groups.
Allow the updated nodes to sit through a day to allow for verification of CNF compatibility before completing the update on the other 4 nodes.
[IMPORTANT]
====
The process and pace at which you unpause the `mcp` groups is determined by your CNF applications and configuration.
If your CNF pod can handle being scheduled across nodes in a cluster, you can unpause several `mcp` groups at a time and set the `MaxUnavailable` in the `mcp` custom resource (CR) to as high as 50%. This allows up to half of the nodes in an `mcp` group to restart and get updated.
====

View File

@@ -1,22 +0,0 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-cnf-update-prep.adoc
:_mod-docs-content-type: CONCEPT
[id="telco-update-pdb_{context}"]
= Ensuring that CNF workloads run uninterrupted with pod disruption budgets
You can configure the minimum number of pods in a deployment to allow the CNF workload to run uninterrupted by setting a pod disruption budget in a `PodDisruptionBudget` custom resource (CR) that you apply.
Be careful when setting this value; setting it improperly can cause an update to fail.
For example, if you have 4 pods in a deployment and you set the pod disruption budget to 4, the cluster scheduler keeps 4 pods running at all times - no pods can be scaled down.
Instead, set the pod disruption budget to 2, letting 2 of the 4 pods be scheduled as down.
Then, the worker nodes where those pods are located can be rebooted.
[NOTE]
====
Setting the pod disruption budget to 2 does not mean that your deployment runs on only 2 pods for a period of time, for example, during an update.
The cluster scheduler creates 2 new pods to replace the 2 older pods.
However, there is short period of time between the new pods coming online and the old pods being deleted.
====

View File

@@ -0,0 +1,13 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="update-acknowledging-update-for-eus-eus-y-stream_{context}"]
= Acknowledging the control plane only or y-stream update
[role="_abstract"]
When you update to all versions from 4.11 and later, you must manually acknowledge that the update can continue.
// Reused in "Acknowledging the Control Plane Only or y-stream update" and "Acknowledging the y-stream release update" sections
include::snippets/acknowledge-the-update.adoc[]

View File

@@ -1,13 +1,14 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-acknowledging-the-y-stream-release-update_{context}"]
[id="update-acknowledging-the-y-stream-release-update_{context}"]
= Acknowledging the y-stream release update
[role="_abstract"]
When moving between y-stream releases, you must run a patch command to explicitly acknowledge the update.
In the output of the `oc adm upgrade` command, a URL is provided that shows the specific command to run.
// Reused in "telco-update-acknowledging-the-update.adoc" and "telco-update-acknowledging-the-y-stream-release-update.adoc" files
// Reused in "Acknowledging the Control Plane Only or y-stream update" and "Acknowledging the y-stream release update" sections
include::snippets/acknowledge-the-update.adoc[]

View File

@@ -1,17 +1,18 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-openshift-container-platform-api-compatibility_{context}"]
[id="update-openshift-container-platform-api-compatibility_{context}"]
= {product-title} API compatibility
[role="_abstract"]
When considering what z-stream release to update to as part of a new y-stream update, you must be sure that all the patches that are in the z-stream version you are moving from are in the new z-stream version.
If the version you update to does not have all the required patches, the built-in compatibility of Kubernetes is broken.
For example, if the cluster version is 4.15.32, you must update to 4.16 z-stream release that has all of the patches that are applied to 4.15.32.
[id="telco-update-about-kubernetes-version-skew_{context}"]
[id="update-about-kubernetes-version-skew_{context}"]
== About Kubernetes version skew
Each cluster Operator supports specific API versions.

View File

@@ -0,0 +1,63 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="update-applying-mcp-labels-to-nodes-before-the-update_{context}"]
= Applying MachineConfigPool labels to nodes before the update
[role="_abstract"]
Prepare `MachineConfigPool` (MCP) node labels to group nodes together in groups of roughly 8 to 10 nodes.
With MCP groups, you can reboot groups of nodes independently from the rest of the cluster.
You use the MCP node labels to pause and unpause the set of nodes during the update process so that you can do the update and reboot at a time of your choosing.
[id="update-staggering-the-cluster-update_{context}"]
== Staggering the cluster update
Sometimes there are problems during the update.
Often the problem is related to hardware failure or nodes needing to be reset.
Using MCP node labels, you can update nodes in stages by pausing the update at critical moments, tracking paused and unpaused nodes as you proceed.
When a problem occurs, you use the nodes that are in an unpaused state to ensure that there are enough nodes running to keep all applications pods running.
[id="update-dividing-worker-nodes-into-mcp-groups_{context}"]
== Dividing worker nodes into MachineConfigPool groups
How you divide worker nodes into MCPs can vary depending on how many nodes are in the cluster or how many nodes you assign to a node role.
By default, the two roles in a cluster are control plane and worker roles.
You can also move nodes between MCP groups if both groups have the same machine config, which is important if you have too many nodes in one large machine config pool. For more information about MCP groups, see _Additional resources_.
[NOTE]
====
Larger clusters can have as many as 100 worker nodes.
No matter how many nodes there are in the cluster, keep each `MachineConfigPool` group to around 10 nodes.
This allows you to control how many nodes are taken down at a time.
With multiple `MachineConfigPool` groups, you can unpause several groups at a time to accelerate the update, or separate the update over two or more maintenance windows.
====
Example cluster with 15 worker nodes::
Consider a cluster with 15 worker nodes:
* 10 worker nodes are control plane nodes.
* 5 worker nodes are data plane nodes.
+
Split the control plane and data plane worker node roles into at least 2 MCP groups each.
Having 2 MCP groups per role means that you can have one set of nodes that are not affected by the update.
Example cluster with 6 worker nodes::
Consider a cluster with 6 worker nodes:
* Split the worker nodes into 3 MCP groups of 2 nodes each.
+
Upgrade one of the MCP groups.
Allow the updated nodes to sit through a day to allow for verification of application compatibility before completing the update on the other 4 nodes.
[IMPORTANT]
====
The process and pace at which you unpause the MCP groups is determined by your applications and configuration.
If your pod can handle being scheduled across nodes in a cluster, you can unpause several MCP groups at a time and set the `MaxUnavailable` field in the MCP custom resource (CR) to as high as 50%. This allows up to half of the nodes in an MCP group to restart and get updated.
====

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-before-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-before-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-checking-cluster-health_{context}"]
[id="update-checking-cluster-health_{context}"]
= Checking the cluster health
[role="_abstract"]
You should check the cluster health often during the update.
Check for the node status, cluster Operators status and failed pods.

View File

@@ -1,7 +1,11 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-creating-mcp-groups-for-the-cluster_{context}"]
[id="update-creating-mcp-groups-for-the-cluster_{context}"]
= Creating MachineConfigPool groups for the cluster
[role="_abstract"]
Creating `mcp` groups is a 2-step process:
. Add an `mcp` label to the nodes in the cluster
@@ -88,6 +92,7 @@ machineconfigpool.machineconfiguration.openshift.io/mcp-2 created
----
.Verification
Monitor the `MachineConfigPool` resources as they are applied in the cluster.
After you apply the `mcp` resources, the nodes are added into the new machine config pools.
This takes a few minutes.

View File

@@ -1,19 +1,19 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-determining-the-cluster-version-update-path_{context}"]
[id="update-determining-the-cluster-version-update-path_{context}"]
= Determining the cluster version update path
[role="_abstract"]
Use the link:https://access.redhat.com/labs/ocpupgradegraph/update_path/[Red Hat {product-title} Update Graph] tool to determine if the path is valid for the z-stream release you want to update to.
Verify the update with your Red Hat Technical Account Manager to ensure that the update path is valid for telco implementations.
[IMPORTANT]
====
The <4.y+1.z> or <4.y+2.z> version that you update to must have the same patch level as the <4.y.z> release you are updating from.
The OpenShift update process mandates that if a fix is present in a specific <4.y.z> release, then the that fix must be present in the <4.y+1.z> release that you update to.
The {product-title} update process mandates that if a fix is present in a specific <4.y.z> release, then the that fix must be present in the <4.y+1.z> release that you update to.
====
.Bug fix backporting and the update graph
@@ -21,7 +21,7 @@ image::openshift-bug-fix-backporting-update-graph.png[Bug fix backporting and th
[IMPORTANT]
====
OpenShift development has a strict backport policy that prevents regressions.
{product-title} development has a strict backport policy that prevents regressions.
For example, a bug must be fixed in 4.16.z before it is fixed in 4.15.z.
This means that the update graph does not allow for updates to chronologically older releases even if the minor version is greater, for example, updating from 4.15.24 to 4.16.2.
====

View File

@@ -1,15 +1,17 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-ocp-update-prep.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-ensuring-layered-products-are-compatible_{context}"]
[id="update-ensuring-layered-products-are-compatible_{context}"]
= Ensuring that layered products are compatible with the update
[role="_abstract"]
Verify that all layered products run on the version of {product-title} that you are updating to before you begin the update.
This generally includes all Operators.
.Procedure
+
. Verify the currently installed Operators in the cluster.
For example, run the following command:
+
@@ -45,7 +47,7 @@ You can also use the link:https://access.redhat.com/labs/ocpouic/?upgrade_path=4
For all OLM-installed Operators that are not directly supported by Red Hat, contact the Operator vendor to ensure release compatibility.
* Some Operators are compatible with several releases of {product-title}.
You might not must update the Operators until after you complete the cluster update.
+
See "Updating the worker nodes" for more information.
* See "Updating all the OLM Operators" for information about updating an Operator after performing the first y-stream control plane update.

View File

@@ -1,22 +1,22 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-ocp-update-prep.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-ensuring-host-firmware-compatible_{context}"]
[id="update-ensuring-host-firmware-compatible_{context}"]
= Ensuring the host firmware is compatible with the update
[role="_abstract"]
You are responsible for the firmware versions that you run in your clusters.
Updating host firmware is not a part of the {product-title} update process.
It is not recommended to update firmware in conjunction
with the {product-title} version.
It is not recommended to update firmware in conjunction with the {product-title} version.
[IMPORTANT]
====
Hardware vendors advise that it is best to apply the latest certified firmware version for the specific hardware that you are running.
For telco use cases, always verify firmware updates in test environments before applying them in production.
The high throughput nature of telco CNF workloads can be adversely affected by sub-optimal host firmware.
For each different use case, always verify firmware updates in test environments before applying them in production.
For example, workloads with high throughput requirements can be negatively affected outdated host firmware.
You should thoroughly test new firmware updates to ensure that they work as expected with the current version of {product-title}.
Ideally, you test the latest firmware version with the target {product-title} update version.
For best results, test the latest firmware version with the target {product-title} update version.
====

View File

@@ -1,13 +1,14 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-welcome.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-welcome.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-introduction_{context}"]
= Cluster updates for telco core CNF clusters
[id="update-introduction_{context}"]
= Cluster updates for OpenShift clusters
[role="_abstract"]
Updating your cluster is a critical task that ensures that bugs and potential security vulnerabilities are patched.
Often, updates to cloud-native network functions (CNF) require additional functionality from the platform that comes when you update the cluster version.
Often, updates to cloud-native applications require additional functionality from the platform that comes when you update the cluster version.
You also must update the cluster periodically to ensure that the cluster platform version is supported.
You can minimize the effort required to stay current with updates by keeping up-to-date with EUS releases and upgrading to select important z-stream releases only.
@@ -15,8 +16,6 @@ You can minimize the effort required to stay current with updates by keeping up-
[NOTE]
====
The update path for the cluster can vary depending on the size and topology of the cluster.
The update procedures described here are valid for most clusters from 3-node clusters up to the largest size clusters certified by the telco scale team.
This includes some scenarios for mixed-workload clusters.
====
The following update scenarios are described:

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-cnf-update-prep.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-cnf-update-prep.adoc
:_mod-docs-content-type: CONCEPT
[id="telco-update-monitoring-application-health_{context}"]
[id="update-monitoring-application-health_{context}"]
= Application liveness, readiness, and startup probes
[role="_abstract"]
You can use liveness, readiness and startup probes to check the health of your live application containers before you schedule an update.
These are very useful tools to use with pods that are dependent upon keeping state for their application containers.
@@ -20,4 +21,4 @@ If the readiness probe fails for a container, the kubelet removes the container
Startup probe::
A startup probe indicates whether the application within a container is started.
All other probes are disabled until the startup succeeds.
If the startup probe does not succeed, the kubelet kills the container, and the container is subject to the pod `restartPolicy` setting.
If the startup probe does not succeed, the kubelet stops the container, and the container is subject to the pod `restartPolicy` setting.

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-monitoring-second-part-y-update_{context}"]
[id="update-monitoring-second-part-y-update_{context}"]
= Monitoring the second part of a <y+1> cluster update
[role="_abstract"]
Monitor the second part of the cluster update to the <y+1> version.
.Procedure

View File

@@ -1,15 +1,17 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-monitoring-the-cluster-update_{context}"]
[id="update-monitoring-the-cluster-update_{context}"]
= Monitoring the cluster update
[role="_abstract"]
You should check the cluster health often during the update.
Check for the node status, cluster Operators status and failed pods.
.Procedure
+
* Monitor the cluster update.
For example, to monitor the cluster update from version 4.14 to 4.15, run the following command:
+
@@ -50,6 +52,7 @@ openshift-marketplace redhat-marketplace-rf86t 0/1 ContainerCreating 0
----
.Verification
+
During the update the `watch` command cycles through one or several of the cluster Operators at a time, providing a status of the Operator update in the `MESSAGE` column.
When the cluster Operators update process is complete, each control plane nodes is rebooted, one at a time.

View File

@@ -1,16 +1,18 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-before-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-before-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-pause-worker-nodes-before-update_{context}"]
[id="update-pause-worker-nodes-before-update_{context}"]
= Pausing worker nodes before the update
[role="_abstract"]
You must pause the worker nodes before you proceed with the update.
In the following example, there are 2 `mcp` groups, `mcp-1` and `mcp-2`.
You patch the `spec.paused` field to `true` for each of these `MachineConfigPool` groups.
You patch the `spec.paused` field to `true` for each of the `MachineConfigPool` groups.
.Procedure
+
. Patch the `mcp` CRs to pause the nodes and drain and remove the pods from those nodes by running the following command:
+
[source,terminal]

10
modules/update-pdb.adoc Normal file
View File

@@ -0,0 +1,10 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/update-cnf-update-prep.adoc
:_mod-docs-content-type: CONCEPT
[id="update-pdb_{context}"]
= Ensuring that workloads run uninterrupted with pod disruption budgets
[role="_abstract"]
To prevent interruption of upgrading worker nodes, configure the pod disruption budget properly. For more information, see _Additional resources_.

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-performing-the-second-y-stream-update_{context}"]
[id="update-performing-the-second-y-stream-update_{context}"]
= Performing the second y-stream update
[role="_abstract"]
After completing the first y-stream update, you must update the y-stream control plane version to the new EUS version.
.Procedure

View File

@@ -1,15 +1,16 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-cnf-update-prep.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-cnf-update-prep.adoc
:_mod-docs-content-type: CONCEPT
[id="telco-update-pod-anti-affinity_{context}"]
[id="update-pod-anti-affinity_{context}"]
= Ensuring that pods do not run on the same cluster node
[role="_abstract"]
High availability in Kubernetes requires duplicate processes to be running on separate nodes in the cluster.
This ensures that the application continues to run even if one node becomes unavailable.
In {product-title}, processes can be automatically duplicated in separate pods in a deployment.
You configure anti-affinity in the `Pod` spec to ensure that the pods in a deployment do not run on the same cluster node.
You configure anti-affinity in the `Pod` resource to ensure that the pods in a deployment do not run on the same cluster node.
During an update, setting pod anti-affinity ensures that pods are distributed evenly across nodes in the cluster. This means that node reboots are easier during an update.
For example, if there are 4 pods from a single deployment on a node, and the pod disruption budget is set to only allow 1 pod to be deleted at a time, then it will take 4 times as long for that node to reboot.

View File

@@ -1,14 +1,16 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-ocp-update-prep.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-preparing-the-cluster-platform_{context}"]
[id="update-preparing-the-cluster-platform_{context}"]
= Preparing the cluster platform for update
Before you update the cluster, perform some basic checks and verifications to make sure that the cluster is ready for the update.
[role="_abstract"]
Before you update the cluster, perform basic checks and verifications to ensure that the cluster is ready for the update.
.Procedure
+
. Verify that there are no failed or in progress pods in the cluster by running the following command:
+
[source,terminal]
@@ -54,11 +56,12 @@ ctrl-plane-0 unmanaged cnf-58879-master-0 true 33d
ctrl-plane-1 unmanaged cnf-58879-master-1 true 33d
ctrl-plane-2 unmanaged cnf-58879-master-2 true 33d
worker-0 unmanaged cnf-58879-worker-0-45879 true 33d
worker-1 progressing cnf-58879-worker-0-dszsh false 1d <1>
worker-1 progressing cnf-58879-worker-0-dszsh false 1d
----
<1> An error occurred while provisioning the `worker-1` node.
An error occurred while provisioning the `worker-1` node.
.Verification
+
* Verify that all cluster Operators are ready:
+
[source,terminal]

View File

@@ -1,10 +1,15 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/update-ocp-update-prep.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-reviewing-configured-cluster-mcp-roles_{context}"]
[id="update-reviewing-configured-cluster-mcp-roles_{context}"]
= Reviewing configured cluster MachineConfigPool roles
[role="_abstract"]
Review the currently configured `MachineConfigPool` roles in the cluster.
.Procedure
+
. Get the currently configured `mcp` groups in the cluster:
+
[source,terminal]

View File

@@ -1,15 +1,16 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-api.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-api.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-selecting-the-target-release_{context}"]
[id="update-selecting-the-target-release_{context}"]
= Selecting the target release
[role="_abstract"]
Use the link:https://access.redhat.com/labs/ocpupgradegraph/update_path[Red Hat {product-title} Update Graph] or the
link:https://github.com/openshift/cincinnati-graph-data/tree/master/channels[cincinnati-graph-data repository] to determine what release to update to.
[id="telco-update-determining-available-z-streams_{context}"]
[id="update-determining-available-z-streams_{context}"]
== Determining what z-stream updates are available
Before you can update to a new z-stream release, you need to know what versions are available.
@@ -20,6 +21,7 @@ You do not need to change the channel when performing a z-stream update.
====
.Procedure
+
. Determine which z-stream releases are available.
Run the following command:
+
@@ -44,7 +46,7 @@ Recommended updates:
4.14.35 quay.io/openshift-release-dev/ocp-release@sha256:883088e3e6efa7443b0ac28cd7682c2fdbda889b576edad626769bf956ac0858
----
[id="telco-update-changing-channel-eus-to-eus_{context}"]
[id="update-changing-channel-eus-to-eus_{context}"]
== Changing the channel for a Control Plane Only update
You must change the channel to the required version for a Control Plane Only update.
@@ -55,7 +57,8 @@ You do not need to change the channel when performing a z-stream update.
====
.Procedure
. Determine the currently configured update channel:
+
. Determine the currently configured update channel by running the following command:
+
[source,terminal]
----
@@ -71,14 +74,14 @@ $ oc get clusterversion -o=jsonpath='{.items[*].spec}' | jq
}
----
. Change the channel to point to the new channel you want to update to:
. Change the channel to point to the new channel you want to update to by running the following command:
+
[source,terminal]
----
$ oc adm upgrade channel eus-4.16
----
. Confirm the updated channel:
. Confirm the updated channel by running the following command:
+
[source,terminal]
----
@@ -94,7 +97,7 @@ $ oc get clusterversion -o=jsonpath='{.items[*].spec}' | jq
}
----
[id="telco-update-changing-channel-early-eus-to-eus_{context}"]
[id="update-changing-channel-early-eus-to-eus_{context}"]
=== Changing the channel for an early EUS to EUS update
The update path to a brand new release of {product-title} is not available in either the EUS channel or the stable channel until 45 to 90 days after the initial GA of a minor release.
@@ -102,6 +105,7 @@ The update path to a brand new release of {product-title} is not available in ei
To begin testing an update to a new release, you can use the fast channel.
.Procedure
+
. Change the channel to `fast-<y+1>`.
For example, run the following command:
+
@@ -125,7 +129,7 @@ Cluster version is 4.15.33
Upgradeable=False
Reason: AdminAckRequired
Message: Kubernetes 1.28 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958394 for details and instructions.
Message: Kubernetes 1.28 and therefore {product-title} 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958394 for details and instructions.
Upstream is unset, so the cluster will use an appropriate default.
Channel: fast-4.16 (available channels: candidate-4.15, candidate-4.16, eus-4.15, eus-4.16, fast-4.15, fast-4.16, stable-4.15, stable-4.16)
@@ -154,7 +158,7 @@ You can keep your worker nodes paused between EUS releases even if you are using
. Follow the EUS update procedure to get to the required <y+2> release.
[id="telco-update-updating-y-stream_{context}"]
[id="update-updating-y-stream_{context}"]
== Changing the channel for a y-stream update
In a y-stream update you change the channel to the next release channel.
@@ -165,7 +169,8 @@ Use the stable or EUS release channels for production clusters.
====
.Procedure
. Change the update channel:
+
. Change the update channel by running the following command:
+
[source,terminal]
----
@@ -188,7 +193,7 @@ Cluster version is 4.14.34
Upgradeable=False
Reason: AdminAckRequired
Message: Kubernetes 1.27 and therefore OpenShift 4.15 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958394 for details and instructions.
Message: Kubernetes 1.27 and therefore {product-title} 4.15 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958394 for details and instructions.
Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.15 (available channels: candidate-4.14, candidate-4.15, eus-4.14, eus-4.15, fast-4.14, fast-4.15, stable-4.14, stable-4.15)

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-starting-the-cluster-update_{context}"]
[id="update-starting-the-cluster-update_{context}"]
= Starting the cluster update
[role="_abstract"]
When updating from one y-stream release to the next, you must ensure that the intermediate z-stream releases are also compatible.
[NOTE]
@@ -15,6 +16,7 @@ The `oc adm upgrade` command lists the compatible update releases.
====
.Procedure
+
. Start the update:
+
--
@@ -25,8 +27,8 @@ $ oc adm upgrade --to=4.15.33
[IMPORTANT]
====
* **Control Plane Only update**: Make sure you point to the interim <y+1> release path
* **Y-stream update** - Make sure you use the correct <y.z> release that follows the Kubernetes link:https://kubernetes.io/releases/version-skew-policy/[version skew policy].
* **Control plane only update**: Ensure you point to the interim <y+1> release path
* **Y-stream update** - Ensure you use the correct <y.z> release that follows the Kubernetes link:https://kubernetes.io/releases/version-skew-policy/[version skew policy].
* **Z-stream update** - Verify that there are no problems moving to that specific release
====

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-starting-the-y-stream-control-plane-update_{context}"]
[id="update-starting-the-y-stream-control-plane-update_{context}"]
= Starting the y-stream control plane update
[role="_abstract"]
After you have determined the full new release that you are moving to, you can run the `oc adm upgrade to=x.y.z` command.
.Procedure

View File

@@ -1,11 +1,12 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-updating-all-the-olm-operators_{context}"]
[id="update-updating-all-the-olm-operators_{context}"]
= Updating all the OLM Operators
[role="_abstract"]
In the second phase of a multi-version upgrade, you must approve all of the Operators and additionally add installations plans for any other Operators that you want to upgrade.
Follow the same procedure as outlined in "Updating the OLM Operators".

View File

@@ -1,17 +1,19 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-updating-the-olm-operators_{context}"]
[id="update-updating-the-olm-operators_{context}"]
= Updating the OLM Operators
In telco environments, software needs to vetted before it is loaded onto a production cluster.
Production clusters are also configured in a disconnected network, which means that they are not always directly connected to the internet.
Because the clusters are in a disconnected network, the OpenShift Operators are configured for manual update during installation so that new versions can be managed on a cluster-by-cluster basis.
Perform the following procedure to move the Operators to the newer versions.
[role="_abstract"]
Software needs to vetted before it is loaded onto a production cluster.
Production clusters are also quite often configured in disconnected network, which means that they are not always directly connected to the internet.
Because the clusters are in a disconnected network, the {product-title} Operators are configured for manual update during installation so that new versions can be managed on a cluster-by-cluster basis.
Complete the following procedure to move the Operators to the newer versions.
.Procedure
+
. Check to see which Operators need to be updated:
+
[source,terminal]
@@ -86,6 +88,7 @@ replicaset.apps/metallb-operator-webhook-server-d76f9c6c8 0 0
----
.Verification
+
* Verify that the Operators do not need to be updated for a second time:
+
[source,terminal]

View File

@@ -1,16 +1,17 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/upgrading/telco-upgrade-completing-the-upgrade.adoc
// * edge_computing/day_2_core_cnf_clusters/upgrading/upgrade-completing-the-upgrade.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-updating-the-worker-nodes_{context}"]
[id="update-updating-the-worker-nodes_{context}"]
= Updating the worker nodes
[role="_abstract"]
You upgrade the worker nodes after you have updated the control plane by unpausing the relevant `mcp` groups you created.
Unpausing the `mcp` group starts the upgrade process for the worker nodes in that group.
Each of the worker nodes in the cluster reboot to upgrade to the new EUS, y-stream or z-stream version as required.
In the case of Control Plane Only upgrades note that when a worker node is updated it will only require one reboot and will jump <y+2>-release versions. This is a feature that was added to decrease the amount of time that it takes to upgrade large bare-metal clusters.
In the case of control plane only upgrades, note that when a worker node is updated it will only require one reboot and will jump <y+2>-release versions. This is a feature that was added to decrease the amount of time that it takes to upgrade large bare-metal clusters.
[IMPORTANT]
====
@@ -39,7 +40,7 @@ worker rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0 True False
[NOTE]
====
You decide how many `mcp` groups to upgrade at a time.
This depends on how many CNF pods can be taken down at a time and how your pod disruption budget and anti-affinity settings are configured.
This depends on how many pods can be taken down at a time and how your pod disruption budget and anti-affinity settings are configured.
====
. Get the list of nodes in the cluster:

View File

@@ -1,14 +1,16 @@
// Module included in the following assemblies:
//
// * edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-update.adoc
// * edge_computing/day_2_core_cnf_clusters/updating/update-completing-the-update.adoc
:_mod-docs-content-type: PROCEDURE
[id="telco-update-verifying-the-health-of-the-newly-updated-cluster_{context}"]
[id="update-verifying-the-health-of-the-newly-updated-cluster_{context}"]
= Verifying the health of the newly updated cluster
Run the following commands after updating the cluster to verify that the cluster is back up and running.
[role="_abstract"]
After updating the cluster, verify that the cluster is back up and running.
.Procedure
+
. Check the cluster version by running the following command:
+
[source,terminal]