From 1ff4b94a986965ff15c2bb2605ce370555401ade Mon Sep 17 00:00:00 2001 From: srir Date: Mon, 7 Apr 2025 15:26:21 +0530 Subject: [PATCH] TELCODOCS#2230: Coordinating reboots for configuration changes --- ...ring-managed-clusters-policygenerator.adoc | 7 ++ modules/defer-applicaton-tuning-example.adoc | 7 +- ...ordinating-reboots-for-config-changes.adoc | 93 +++++++++++++++++++ .../using-node-tuning-operator.adoc | 4 + 4 files changed, 110 insertions(+), 1 deletion(-) create mode 100644 modules/ztp-coordinating-reboots-for-config-changes.adoc diff --git a/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc b/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc index 2ecca7fa64..8e16890c34 100644 --- a/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc +++ b/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc @@ -55,6 +55,13 @@ include::modules/ztp-customizing-a-managed-site-using-pgt.adoc[leveloffset=+1] include::modules/ztp-monitoring-policy-deployment-progress.adoc[leveloffset=+1] +include::modules/ztp-coordinating-reboots-for-config-changes.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* xref:../../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-customizing-a-managed-site-using-pgt_ztp-configuring-managed-clusters-policygenerator[Customizing a managed cluster with PolicyGenerator CRs] + include::modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc[leveloffset=+1] include::modules/ztp-restarting-policies-reconciliation.adoc[leveloffset=+1] diff --git a/modules/defer-applicaton-tuning-example.adoc b/modules/defer-applicaton-tuning-example.adoc index 2f7f14ede4..b23926ce79 100644 --- a/modules/defer-applicaton-tuning-example.adoc +++ b/modules/defer-applicaton-tuning-example.adoc @@ -55,4 +55,9 @@ spec: <1> The `include` directive is used to inherit the `openshift-node-performance-performance` profile. This is a best practice to ensure that the profile is not missing any required settings. <2> The `kernel.shmmni` sysctl parameter is being changed to `8192`. -<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes. \ No newline at end of file +<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes. + +[NOTE] +==== +You can use {cgu-operator-full} to perform a controlled reboot across a fleet of spoke clusters to apply a deferred tuning change. For more information about coordinated reboots, see "Coordinating reboots for configuration changes". +==== \ No newline at end of file diff --git a/modules/ztp-coordinating-reboots-for-config-changes.adoc b/modules/ztp-coordinating-reboots-for-config-changes.adoc new file mode 100644 index 0000000000..64f8f8d063 --- /dev/null +++ b/modules/ztp-coordinating-reboots-for-config-changes.adoc @@ -0,0 +1,93 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp_far_edge/ztp-configuring-managed-clusters-policies.adoc + +:_mod-docs-content-type: PROCEDURE +[id="ztp-coordinating-reboots-for-config-changes_{context}"] += Coordinating reboots for configuration changes + +You can use {cgu-operator-full} (TALM) to coordinate reboots across a fleet of spoke clusters when configuration changes require a reboot, such as deferred tuning changes. {cgu-operator} reboots all nodes in the targeted `MachineConfigPool` on the selected clusters when the reboot policy is applied. + +Instead of rebooting nodes after each individual change, you can apply all configuration updates through policies and then trigger a single, coordinated reboot. + +.Prerequisites + +* You have installed the {oc-first}. +* You have logged in to the hub cluster as a user with `cluster-admin` privileges. +* You have deployed and configured {cgu-operator}. + +.Procedure + +. Generate the configuration policies by creating a `PolicyGenerator` custom resource (CR). You can use one of the following sample manifests: + +* `out/argocd/example/acmpolicygenerator/acm-example-sno-reboot` +* `out/argocd/example/acmpolicygenerator/acm-example-multinode-reboot` + +. Update the `policyDefaults.placement.labelSelector` field in the `PolicyGenerator` CR to target the clusters that you want to reboot. Modify other fields as necessary for your use case. ++ +If you are coordinating a reboot to apply a deferred tuning change, ensure the `MachineConfigPool` in the reboot policy matches the value specified in the `spec.recommend` field in the `Tuned` object. + +. Apply the `PolicyGenerator` CR to generate and apply the configuration policies. For detailed steps, see "Customizing a managed cluster with PolicyGenerator CRs". + +. After ArgoCD completes syncing the policies, create and apply the `ClusterGroupUpgrade` (CGU) CR. ++ +.Example CGU custom resource configuration +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: reboot + namespace: default +spec: + clusterLabelSelectors: + - matchLabels: <1> +# ... + enable: true + managedPolicies: <2> + - example-reboot + remediationStrategy: + timeout: 300 <3> + maxConcurrency: 10 +# ... +---- +<1> Configure the labels that match the clusters you want to reboot. +<2> Add all required configuration policies before the reboot policy. {cgu-operator} applies the configuration changes as specified in the policies, in the order they are listed. +<3> Specify the timeout in seconds for the entire upgrade across all selected clusters. Set this field by considering the worst-case scenario. + +. After you apply the CGU custom resource, {cgu-operator} rolls out the configuration policies in order. Once all policies are compliant, it applies the reboot policy and triggers a reboot of all nodes in the specified `MachineConfigPool`. + +.Verification + +. Monitor the CGU rollout status. ++ +You can monitor the rollout of the CGU custom resource on the hub by checking the status. Verify the successful rollout of the reboot by running the following command: ++ +[source,terminal] +---- +oc get cgu -A +---- ++ +.Example output +[source,terminal] +---- +NAMESPACE NAME AGE STATE DETAILS +default reboot 1d Completed All clusters are compliant with all the managed policies +---- + +. Verify successful reboot on a specific node. ++ +To confirm that the reboot was successful on a specific node, check the status of the `MachineConfigPool` (MCP) for the node by running the following command: ++ +[source,terminal] +---- +oc get mcp master + +---- ++ +.Example output +[source,terminal] +---- +NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE +master rendered-master-be5785c3b98eb7a1ec902fef2b81e865 True False False 3 3 3 0 72d +---- \ No newline at end of file diff --git a/scalability_and_performance/using-node-tuning-operator.adoc b/scalability_and_performance/using-node-tuning-operator.adoc index 4bbfc51033..9fa2d534b8 100644 --- a/scalability_and_performance/using-node-tuning-operator.adoc +++ b/scalability_and_performance/using-node-tuning-operator.adoc @@ -23,6 +23,10 @@ include::modules/custom-tuning-example.adoc[leveloffset=+1] include::modules/defer-applicaton-tuning-example.adoc[leveloffset=+1] +[role="_additional-resources"] +.Additional resources +* xref:../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-coordinating-reboots-for-config-changes_ztp-configuring-managed-clusters-policygenerator[Coordinating reboots for configuration changes] + include::modules/defer-application-tuning-proc.adoc[leveloffset=+2] include::modules/node-tuning-operator-supported-tuned-daemon-plug-ins.adoc[leveloffset=+1]