From 5124259072cc5471a242df9eb8ff7f7772eb0871 Mon Sep 17 00:00:00 2001 From: xJustin Date: Wed, 19 Mar 2025 09:24:21 -0400 Subject: [PATCH] OSDOCS-12261 autorepair --- modules/rosa-autorepair-cli.adoc | 89 +++++++++++++++++++ modules/rosa-autorepair-ocm.adoc | 31 +++++++ modules/rosa-configuring-autorepair.adoc | 19 ++++ modules/rosa-create-objects.adoc | 3 + .../rosa-managing-worker-nodes.adoc | 3 + 5 files changed, 145 insertions(+) create mode 100644 modules/rosa-autorepair-cli.adoc create mode 100644 modules/rosa-autorepair-ocm.adoc create mode 100644 modules/rosa-configuring-autorepair.adoc diff --git a/modules/rosa-autorepair-cli.adoc b/modules/rosa-autorepair-cli.adoc new file mode 100644 index 0000000000..8e4df1b876 --- /dev/null +++ b/modules/rosa-autorepair-cli.adoc @@ -0,0 +1,89 @@ +// Module included in the following assemblies: +// +// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc +// * nodes/rosa-managing-worker-nodes.adoc + + +:_mod-docs-content-type: PROCEDURE +[id="rosa-autorepair-cli_{context}"] += Configuring machine pool AutoRepair using the ROSA CLI + +You can configure machine pool AutoRepair for your {product-title} cluster by using the ROSA CLI. + + +.Prerequisites + + +* You installed and configured the latest AWS (`aws`) and ROSA (`rosa`) CLIs on your workstation. +* You logged in to your Red{nbsp}Hat account by using the `rosa` CLI. +* You created a {hcp-title} cluster. +* You have an existing machine pool. + +.Procedure + +. List the machine pools in the cluster by running the following command: ++ +[source,terminal] +---- +$ rosa list machinepools --cluster= +---- ++ +.Example output +[source,terminal] +---- +ID AUTOSCALING REPLICAS INSTANCE TYPE LABELS TAINTS AVAILABILITY ZONE SUBNET VERSION AUTOREPAIR +workers No 2/2 m5.xlarge us-east-2a subnet-0df2ec3377847164f 4.16.6 Yes +db-nodes-mp No 2/2 m5.xlarge us-east-2a subnet-0df2ec3377847164f 4.16.6 Yes +---- + +. Enable or disable AutoRepair on a machine pool: + +* To enable or disable AutoRepair for a machine pool, run the following command: ++ +[source,terminal] +---- +$ rosa edit machinepool --cluster=mycluster --machinepool= --autorepair false +---- ++ +.Example output +[source,terminal] +---- +I: Updated machine pool 'machinepool_name' on cluster 'mycluster' +---- + + +.Verification + +. Describe the details of the machine pool: ++ +[source,terminal] +---- +$ rosa describe machinepool --cluster= --machinepool= +---- ++ +.Example output +[source,terminal] +---- +ID: machinepool_name +Cluster ID: +Autoscaling: No +Desired replicas: 2 +Current replicas: 2 +Instance type: m5.xlarge +Labels: +Tags: +Taints: +Availability zone: us-east-2a +... +Autorepair: Yes +Tuning configs: +Kubelet configs: +Additional security group IDs: +Node drain grace period: +Management upgrade: + - Type: Replace + - Max surge: 1 + - Max unavailable: 0 +---- + +. Verify that the AutoRepair setting is correct for your machine pool in the output. diff --git a/modules/rosa-autorepair-ocm.adoc b/modules/rosa-autorepair-ocm.adoc new file mode 100644 index 0000000000..7315863cdd --- /dev/null +++ b/modules/rosa-autorepair-ocm.adoc @@ -0,0 +1,31 @@ +// Module included in the following assemblies: +// +// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc +// * nodes/rosa-managing-worker-nodes.adoc + + +:_mod-docs-content-type: PROCEDURE +[id="rosa-autorepair-ocm_{context}"] += Configuring AutoRepair on a machine pool using {cluster-manager} + +You can configure machine pool AutoRepair for your {product-title} cluster by using {cluster-manager-first}. + +.Prerequisites + +* You created a {hcp-title} cluster. +* You have an existing machine pool. + +.Procedure + + +. Navigate to {cluster-manager-url} and select your cluster. +. Under the *Machine pools* tab, click the Options menu {kebab} for the machine pool that you want to configure auto repair for. +. From the menu, select *Edit*. +. From the *Edit Machine Pool* dialog box that displays, find the *AutoRepair* option. +. Select or clear the box next to *AutoRepair* to enable or disable. +. Click *Save* to apply the change to the machine pool. + +.Verification + +. Under the *Machine pools* tab, select *>* next to your machine pool to expand the view. +. Verify that your machine pool has the correct *AutoRepair* setting in the expanded view. \ No newline at end of file diff --git a/modules/rosa-configuring-autorepair.adoc b/modules/rosa-configuring-autorepair.adoc new file mode 100644 index 0000000000..94d9f44d86 --- /dev/null +++ b/modules/rosa-configuring-autorepair.adoc @@ -0,0 +1,19 @@ +// Module included in the following assemblies: +// +// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc +// * nodes/rosa-managing-worker-nodes.adoc +// + +:_mod-docs-content-type: PROCEDURE +[id="rosa-configuring-autorepair_{context}"] += Configuring machine pool AutoRepair + +{hcp-title} supports an automatic repair process for machine pools, called AutoRepair. AutoRepair is useful when you want the ROSA service to detect certain unhealthy nodes, drain the unhealthy nodes, and re-create the nodes. You can disable AutoRepair if the unhealthy nodes should not be replaced, such as in cases where the nodes should be preserved. AutoRepair is enabled by default on machine pools. + +The AutoRepair process deems a node unhealthy when the state of the node is either `NotReady` or is in an unknown state for predefined amount of time (typically 8 minutes). Whenever two or more nodes become unhealthy simultaneously, the AutoRepair process stops repairing the nodes. +Similarly, when a new node is created unhealthy even after a predefined amount of time (typically 20 minutes), the service will auto-repair. + +[NOTE] +==== +Machine pool AutoRepair is only available for {hcp-title} clusters. +==== diff --git a/modules/rosa-create-objects.adoc b/modules/rosa-create-objects.adoc index 07c82fbc43..4c888f4fa9 100644 --- a/modules/rosa-create-objects.adoc +++ b/modules/rosa-create-objects.adoc @@ -796,6 +796,9 @@ The default value is `0`, meaning that no outdated nodes are removed before new |--taints |Taints for the machine pool. This string value should be formatted as a comma-separated list of `key=value:ScheduleType`. This list will overwrite any modifications made to Node taints on an ongoing basis. + +|--autorepair +|AutoRepair setting for the machine pool represented as the boolean `true` or `false`. |=== .Optional arguments inherited from parent commands diff --git a/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc b/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc index 1dfff876c0..3710566e54 100644 --- a/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc +++ b/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc @@ -50,6 +50,9 @@ include::modules/rosa-adding-tags-cli.adoc[leveloffset=+2] include::modules/rosa-adding-taints.adoc[leveloffset=+1] include::modules/rosa-adding-taints-ocm.adoc[leveloffset=+2] include::modules/rosa-adding-taints-cli.adoc[leveloffset=+2] +include::modules/rosa-configuring-autorepair.adoc[leveloffset=+1] +include::modules/rosa-autorepair-ocm.adoc[leveloffset=+2] +include::modules/rosa-autorepair-cli.adoc[leveloffset=+2] ifdef::openshift-rosa-hcp[] include::modules/rosa-adding-tuning.adoc[leveloffset=+1]