OSDOCS-12261 autorepair

2026-02-05 12:46:18 +01:00 · 2025-03-19 09:24:21 -04:00
parent 3450bef19c
commit 5124259072
5 changed files with 145 additions and 0 deletions
--- a/modules/rosa-autorepair-cli.adoc
+++ b/modules/rosa-autorepair-cli.adoc
@@ -0,0 +1,89 @@
+// Module included in the following assemblies:
+//
+// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
+// * nodes/rosa-managing-worker-nodes.adoc
+
+
+:_mod-docs-content-type: PROCEDURE
+[id="rosa-autorepair-cli_{context}"]
+= Configuring machine pool AutoRepair using the ROSA CLI
+
+You can configure machine pool AutoRepair for your {product-title} cluster by using the ROSA CLI.
+
+
+.Prerequisites
+
+
+* You installed and configured the latest AWS (`aws`) and ROSA (`rosa`) CLIs on your workstation.
+* You logged in to your Red{nbsp}Hat account by using the `rosa` CLI.
+* You created a {hcp-title} cluster.
+* You have an existing machine pool.
+
+.Procedure
+
+. List the machine pools in the cluster by running the following command:
+
+[source,terminal]
+----
+$ rosa list machinepools --cluster=<cluster_name> 
+----
+
+.Example output
+[source,terminal]
+----
+ID           AUTOSCALING  REPLICAS  INSTANCE TYPE  LABELS    TAINTS    AVAILABILITY ZONE  SUBNET                    VERSION  AUTOREPAIR  
+workers      No           2/2       m5.xlarge                          us-east-2a         subnet-0df2ec3377847164f  4.16.6   Yes         
+db-nodes-mp  No           2/2       m5.xlarge                          us-east-2a         subnet-0df2ec3377847164f  4.16.6   Yes  
+----
+
+. Enable or disable AutoRepair on a machine pool:
+
+* To enable or disable AutoRepair for a machine pool, run the following command:
+
+[source,terminal]
+----
+$ rosa edit machinepool --cluster=mycluster --machinepool=<machinepool_name>  --autorepair false
+----
+
+.Example output
+[source,terminal]
+----
+I: Updated machine pool 'machinepool_name' on cluster 'mycluster'
+----
+
+
+.Verification
+
+. Describe the details of the machine pool:
+
+[source,terminal]
+----
+$ rosa describe machinepool --cluster=<cluster_name> --machinepool=<machinepool_name>
+----
+
+.Example output
+[source,terminal]
+----
+ID:                            machinepool_name
+Cluster ID:                    <ID_of_cluster>
+Autoscaling:                   No
+Desired replicas:              2
+Current replicas:              2
+Instance type:                 m5.xlarge
+Labels:                        
+Tags:                              
+Taints:                               
+Availability zone:             us-east-2a
+...
+Autorepair:                    Yes
+Tuning configs:
+Kubelet configs:
+Additional security group IDs:
+Node drain grace period:
+Management upgrade:
+ - Type:                               Replace
+ - Max surge:                          1
+ - Max unavailable:                    0
+----
+
+. Verify that the AutoRepair setting is correct for your machine pool in the output.
--- a/modules/rosa-autorepair-ocm.adoc
+++ b/modules/rosa-autorepair-ocm.adoc
@@ -0,0 +1,31 @@
+// Module included in the following assemblies:
+//
+// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
+// * nodes/rosa-managing-worker-nodes.adoc
+
+
+:_mod-docs-content-type: PROCEDURE
+[id="rosa-autorepair-ocm_{context}"]
+= Configuring AutoRepair on a machine pool using {cluster-manager}
+
+You can configure machine pool AutoRepair for your {product-title} cluster by using {cluster-manager-first}.
+
+.Prerequisites
+
+* You created a {hcp-title} cluster.
+* You have an existing machine pool.
+
+.Procedure
+
+
+. Navigate to {cluster-manager-url} and select your cluster.
+. Under the *Machine pools* tab, click the Options menu {kebab} for the machine pool that you want to configure auto repair for.
+. From the menu, select *Edit*.
+. From the *Edit Machine Pool* dialog box that displays, find the *AutoRepair* option.
+. Select or clear the box next to *AutoRepair* to enable or disable.
+. Click *Save* to apply the change to the machine pool.
+
+.Verification
+
+. Under the *Machine pools* tab, select *>* next to your machine pool to expand the view.
+. Verify that your machine pool has the correct *AutoRepair* setting in the expanded view.
--- a/modules/rosa-configuring-autorepair.adoc
+++ b/modules/rosa-configuring-autorepair.adoc
@@ -0,0 +1,19 @@
+// Module included in the following assemblies:
+//
+// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
+// * nodes/rosa-managing-worker-nodes.adoc
+//
+
+:_mod-docs-content-type: PROCEDURE
+[id="rosa-configuring-autorepair_{context}"]
+= Configuring machine pool AutoRepair
+
+{hcp-title} supports an automatic repair process for machine pools, called AutoRepair. AutoRepair is useful when you want the ROSA service to detect certain unhealthy nodes, drain the unhealthy nodes, and re-create the nodes. You can disable AutoRepair if the unhealthy nodes should not be replaced, such as in cases where the nodes should be preserved. AutoRepair is enabled by default on machine pools. 
+
+The AutoRepair process deems a node unhealthy when the state of the node is either `NotReady` or is in an unknown state for predefined amount of time (typically 8 minutes). Whenever two or more nodes become unhealthy simultaneously, the AutoRepair process stops repairing the nodes.
+Similarly, when a new node is created unhealthy even after a predefined amount of time (typically 20 minutes), the service will auto-repair. 
+
+[NOTE]
+====
+Machine pool AutoRepair is only available for {hcp-title} clusters. 
+====
--- a/modules/rosa-create-objects.adoc
+++ b/modules/rosa-create-objects.adoc
@@ -796,6 +796,9 @@ The default value is `0`, meaning that no outdated nodes are removed before new

 |--taints
 |Taints for the machine pool. This string value should be formatted as a comma-separated list of `key=value:ScheduleType`. This list will overwrite any modifications made to Node taints on an ongoing basis.
+
+|--autorepair
+|AutoRepair setting for the machine pool represented as the boolean `true` or `false`.
 |===

 .Optional arguments inherited from parent commands
--- a/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
+++ b/rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
@@ -50,6 +50,9 @@ include::modules/rosa-adding-tags-cli.adoc[leveloffset=+2]
 include::modules/rosa-adding-taints.adoc[leveloffset=+1]
 include::modules/rosa-adding-taints-ocm.adoc[leveloffset=+2]
 include::modules/rosa-adding-taints-cli.adoc[leveloffset=+2]
+include::modules/rosa-configuring-autorepair.adoc[leveloffset=+1]
+include::modules/rosa-autorepair-ocm.adoc[leveloffset=+2]
+include::modules/rosa-autorepair-cli.adoc[leveloffset=+2]

 ifdef::openshift-rosa-hcp[]
 include::modules/rosa-adding-tuning.adoc[leveloffset=+1]