1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

OSDOCS-12261 autorepair

This commit is contained in:
xJustin
2025-03-19 09:24:21 -04:00
committed by openshift-cherrypick-robot
parent 3450bef19c
commit 5124259072
5 changed files with 145 additions and 0 deletions

View File

@@ -0,0 +1,89 @@
// Module included in the following assemblies:
//
// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
// * nodes/rosa-managing-worker-nodes.adoc
:_mod-docs-content-type: PROCEDURE
[id="rosa-autorepair-cli_{context}"]
= Configuring machine pool AutoRepair using the ROSA CLI
You can configure machine pool AutoRepair for your {product-title} cluster by using the ROSA CLI.
.Prerequisites
* You installed and configured the latest AWS (`aws`) and ROSA (`rosa`) CLIs on your workstation.
* You logged in to your Red{nbsp}Hat account by using the `rosa` CLI.
* You created a {hcp-title} cluster.
* You have an existing machine pool.
.Procedure
. List the machine pools in the cluster by running the following command:
+
[source,terminal]
----
$ rosa list machinepools --cluster=<cluster_name>
----
+
.Example output
[source,terminal]
----
ID AUTOSCALING REPLICAS INSTANCE TYPE LABELS TAINTS AVAILABILITY ZONE SUBNET VERSION AUTOREPAIR
workers No 2/2 m5.xlarge us-east-2a subnet-0df2ec3377847164f 4.16.6 Yes
db-nodes-mp No 2/2 m5.xlarge us-east-2a subnet-0df2ec3377847164f 4.16.6 Yes
----
. Enable or disable AutoRepair on a machine pool:
* To enable or disable AutoRepair for a machine pool, run the following command:
+
[source,terminal]
----
$ rosa edit machinepool --cluster=mycluster --machinepool=<machinepool_name> --autorepair false
----
+
.Example output
[source,terminal]
----
I: Updated machine pool 'machinepool_name' on cluster 'mycluster'
----
.Verification
. Describe the details of the machine pool:
+
[source,terminal]
----
$ rosa describe machinepool --cluster=<cluster_name> --machinepool=<machinepool_name>
----
+
.Example output
[source,terminal]
----
ID: machinepool_name
Cluster ID: <ID_of_cluster>
Autoscaling: No
Desired replicas: 2
Current replicas: 2
Instance type: m5.xlarge
Labels:
Tags:
Taints:
Availability zone: us-east-2a
...
Autorepair: Yes
Tuning configs:
Kubelet configs:
Additional security group IDs:
Node drain grace period:
Management upgrade:
- Type: Replace
- Max surge: 1
- Max unavailable: 0
----
. Verify that the AutoRepair setting is correct for your machine pool in the output.

View File

@@ -0,0 +1,31 @@
// Module included in the following assemblies:
//
// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
// * nodes/rosa-managing-worker-nodes.adoc
:_mod-docs-content-type: PROCEDURE
[id="rosa-autorepair-ocm_{context}"]
= Configuring AutoRepair on a machine pool using {cluster-manager}
You can configure machine pool AutoRepair for your {product-title} cluster by using {cluster-manager-first}.
.Prerequisites
* You created a {hcp-title} cluster.
* You have an existing machine pool.
.Procedure
. Navigate to {cluster-manager-url} and select your cluster.
. Under the *Machine pools* tab, click the Options menu {kebab} for the machine pool that you want to configure auto repair for.
. From the menu, select *Edit*.
. From the *Edit Machine Pool* dialog box that displays, find the *AutoRepair* option.
. Select or clear the box next to *AutoRepair* to enable or disable.
. Click *Save* to apply the change to the machine pool.
.Verification
. Under the *Machine pools* tab, select *>* next to your machine pool to expand the view.
. Verify that your machine pool has the correct *AutoRepair* setting in the expanded view.

View File

@@ -0,0 +1,19 @@
// Module included in the following assemblies:
//
// * rosa_cluster_admin/rosa_nodes/rosa-managing-worker-nodes.adoc
// * nodes/rosa-managing-worker-nodes.adoc
//
:_mod-docs-content-type: PROCEDURE
[id="rosa-configuring-autorepair_{context}"]
= Configuring machine pool AutoRepair
{hcp-title} supports an automatic repair process for machine pools, called AutoRepair. AutoRepair is useful when you want the ROSA service to detect certain unhealthy nodes, drain the unhealthy nodes, and re-create the nodes. You can disable AutoRepair if the unhealthy nodes should not be replaced, such as in cases where the nodes should be preserved. AutoRepair is enabled by default on machine pools.
The AutoRepair process deems a node unhealthy when the state of the node is either `NotReady` or is in an unknown state for predefined amount of time (typically 8 minutes). Whenever two or more nodes become unhealthy simultaneously, the AutoRepair process stops repairing the nodes.
Similarly, when a new node is created unhealthy even after a predefined amount of time (typically 20 minutes), the service will auto-repair.
[NOTE]
====
Machine pool AutoRepair is only available for {hcp-title} clusters.
====

View File

@@ -796,6 +796,9 @@ The default value is `0`, meaning that no outdated nodes are removed before new
|--taints
|Taints for the machine pool. This string value should be formatted as a comma-separated list of `key=value:ScheduleType`. This list will overwrite any modifications made to Node taints on an ongoing basis.
|--autorepair
|AutoRepair setting for the machine pool represented as the boolean `true` or `false`.
|===
.Optional arguments inherited from parent commands

View File

@@ -50,6 +50,9 @@ include::modules/rosa-adding-tags-cli.adoc[leveloffset=+2]
include::modules/rosa-adding-taints.adoc[leveloffset=+1]
include::modules/rosa-adding-taints-ocm.adoc[leveloffset=+2]
include::modules/rosa-adding-taints-cli.adoc[leveloffset=+2]
include::modules/rosa-configuring-autorepair.adoc[leveloffset=+1]
include::modules/rosa-autorepair-ocm.adoc[leveloffset=+2]
include::modules/rosa-autorepair-cli.adoc[leveloffset=+2]
ifdef::openshift-rosa-hcp[]
include::modules/rosa-adding-tuning.adoc[leveloffset=+1]