OSDOCS#8867: Adding failure domains to Nutanix docs

2026-02-05 21:46:22 +01:00 · 2023-12-19 13:58:01 -05:00
parent 33878ed069
commit 3940c53533
11 changed files with 473 additions and 2 deletions
--- a/_topic_maps/_topic_map.yml
+++ b/_topic_maps/_topic_map.yml
@@ -307,6 +307,8 @@ Topics:
  Topics:
  - Name: Preparing to install on Nutanix
    File: preparing-to-install-on-nutanix
+  - Name: Fault tolerant deployments
+    File: nutanix-failure-domains
  - Name: Installing a cluster on Nutanix
    File: installing-nutanix-installer-provisioned
  - Name: Installing a cluster on Nutanix in a restricted network
@@ -606,6 +608,9 @@ Topics:
 - Name: AWS Local Zone tasks
  File: aws-compute-edge-tasks
  Distros: openshift-enterprise
+- Name: Adding failure domains to an existing Nutanix cluster
+  File: adding-nutanix-failure-domains
+  Distros: openshift-origin,openshift-enterprise
 ---
 Name: Updating clusters
 Dir: updating
--- a/installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc
+++ b/installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc
@@ -47,6 +47,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
 * xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]

 include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
+include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
 include::modules/installation-configure-proxy.adoc[leveloffset=+2]

 include::modules/cli-installing-cli.adoc[leveloffset=+1]
--- a/installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc
+++ b/installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc
@@ -46,6 +46,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
 * xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]

 include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
+include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
 include::modules/installation-configure-proxy.adoc[leveloffset=+2]

 include::modules/cli-installing-cli.adoc[leveloffset=+1]
--- a/installing/installing_nutanix/nutanix-failure-domains.adoc
+++ b/installing/installing_nutanix/nutanix-failure-domains.adoc
@@ -0,0 +1,26 @@
+:_mod-docs-content-type: ASSEMBLY
+[id="nutanix-failure-domains"]
+= Fault tolerant deployments using multiple Prism Elements
+include::_attributes/common-attributes.adoc[]
+:context: nutanix-failure-domains
+
+toc::[]
+
+By default, the installation program installs control plane and compute machines into a single Nutanix Prism Element (cluster). To improve the fault tolerance of your {product-title} cluster, you can specify that these machines be distributed across multiple Nutanix clusters by configuring failure domains.
+
+A failure domain represents an additional Prism Element instance that is available to {product-title} machine pools during and after installation.
+
+include::modules/installation-nutanix-failure-domains-req.adoc[leveloffset=+1]
+
+== Installation method and failure domain configuration
+
+The {product-title} installation method determines how and when you configure failure domains:
+
+* If you deploy using installer-provisioned infrastructure, you can configure failure domains in the installation configuration file before deploying the cluster. For more information, see xref:../../installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc#installation-configuring-nutanix-failure-domains_installing-nutanix-installer-provisioned[Configuring failure domains].
+
+You can also configure failure domains after the cluster is deployed.
+* If you deploy using the {ai-full}, you configure failure domains after the cluster is deployed.
+
+For more information about configuring failure domains post-installation, see xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster].
+
+* If you deploy using infrastructure that you manage (user-provisioned infrastructure) no additional configuration is required. After the cluster is deployed, you can manually distribute control plane and compute machines across failure domains.
--- a/modules/installation-configuration-parameters.adoc
+++ b/modules/installation-configuration-parameters.adoc
@@ -3089,6 +3089,17 @@ Additional Nutanix configuration parameters are described in the following table
 |The value of a prism category key-value pair to apply to compute VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
 |String

+|compute:
+  platform:
+    nutanix:
+     failureDomains:
+d|The failure domains that apply to only compute machines.
+
+Failure domains are specified in `platform.nutanix.failureDomains`.
+d|List.
+
+The name of one or more failures domains.
+
 |compute:
  platform:
    nutanix:
@@ -3128,6 +3139,17 @@ Additional Nutanix configuration parameters are described in the following table
 |The value of a prism category key-value pair to apply to control plane VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
 |String

+|controlPlane:
+  platform:
+    nutanix:
+     failureDomains:
+d|The failure domains that apply to only control plane machines.
+
+Failure domains are specified in `platform.nutanix.failureDomains`.
+d|List.
+
+The name of one or more failures domains.
+
 |controlPlane:
  platform:
    nutanix:
@@ -3160,6 +3182,17 @@ Additional Nutanix configuration parameters are described in the following table
 |The value of a prism category key-value pair to apply to all VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
 |String

+|platform:
+  nutanix:
+    defaulatMachinePlatform:
+      failureDomains:
+d|The failure domains that apply to both control plane and compute machines.
+
+Failure domains are specified in `platform.nutanix.failureDomains`.
+d|List.
+
+The name of one or more failures domains.
+
 |platform:
  nutanix:
    defaultMachinePlatform:
@@ -3189,6 +3222,23 @@ Additional Nutanix configuration parameters are described in the following table
 |The virtual IP (VIP) address that you configured for control plane API access.
 |IP address

+|platform:
+  nutanix:
+    failureDomains:
+    - name:
+      prismElement:
+        name:
+        uuid:
+      subnetUUIDs:
+      -
+a|By default, the installation program installs cluster machines to a single Prism Element instance. You can specify additional Prism Element instances for fault tolerance, and then apply them to:
+
+* The cluster's default machine configuration
+* Only control plane or compute machine pools
+d|A list of configured failure domains.
+
+For more information on usage, see "Configuring a failure domain" in "Installing a cluster on Nutanix".
+
 |platform:
  nutanix:
    ingressVIP:
@@ -3262,8 +3312,8 @@ Additional Nutanix configuration parameters are described in the following table
 |====
 [.small]
 --
-1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster. Only a single Prism Element is supported.
-2. Only one subnet per {product-title} cluster is supported.
+1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster.
+2. Only one subnet per Prism Element in an {product-title} cluster is supported.
 --
 endif::nutanix[]

--- a/modules/installation-configuring-nutanix-failure-domains.adoc
+++ b/modules/installation-configuring-nutanix-failure-domains.adoc
@@ -0,0 +1,99 @@
+// Module included in the following assemblies:
+//
+// * installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc
+// * installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="installation-configuring-nutanix-failure-domains_{context}"]
+= Configuring failure domains
+
+Failure domains improve the fault tolerance of an {product-title} cluster by distributing control plane and compute machines across multiple Nutanix Prism Elements (clusters).
+
+[TIP]
+====
+It is recommended that you configure three failure domains to ensure high-availability.
+====
+
+.Prerequisites
+
+* You have an installation configuration file (`install-config.yaml`).
+
+.Procedure
+
+. Edit the `install-config.yaml` file and add the following stanza to configure the first failure domain:
+
+[source,yaml]
+----
+apiVersion: v1
+baseDomain: example.com
+compute:
+# ...
+platform:
+  nutanix:
+    failureDomains:
+    - name: <failure_domain_name>
+      prismElement:
+        name: <prism_element_name>
+        uuid: <prism_element_uuid>
+      subnetUUIDs:
+      - <network_uuid>
+# ...
+----
+
+where:
+
+`<failure_domain_name>`:: Specifies a unique name for the failure domain. The name is limited to 64 or fewer characters, which can include lower-case letters, digits, and a dash (`-`). The dash cannot be in the leading or ending position of the name.
+`<prism_element_name>`:: Optional. Specifies the name of the Prism Element.
+`<prism_element_uuid`>:: Specifies the UUID of the Prism Element.
+`<network_uuid`>:: Specifies the UUID of the Prism Element subnet object. The subnet's IP address prefix (CIDR) should contain the virtual IP addresses that the {product-title} cluster uses. Only one subnet per failure domain (Prism Element) in an {product-title} cluster is supported.
+
+. As required, configure additional failure domains.
+. To distribute control plane and compute machines across the failure domains, do one of the following:
+
+** If compute and control plane machines can share the same set of failure domains, add the failure domain names under the cluster's default machine configuration.
+
+.Example of control plane and compute machines sharing a set of failure domains
+
+[source,yaml]
+----
+apiVersion: v1
+baseDomain: example.com
+compute:
+# ...
+platform:
+  nutanix:
+    defaultMachinePlatform:
+      failureDomains:
+        - failure-domain-1
+        - failure-domain-2
+        - failure-domain-3
+# ...
+----
+** If compute and control plane machines must use different failure domains, add the failure domain names under the respective machine pools.
+
+.Example of control plane and compute machines using different failure domains
+
+[source,yaml]
+----
+apiVersion: v1
+baseDomain: example.com
+compute:
+# ...
+controlPlane:
+  platform:
+    nutanix:
+      failureDomains:
+        - failure-domain-1
+        - failure-domain-2
+        - failure-domain-3
+# ...
+compute:
+  platform:
+    nutanix:
+      failureDomains:
+        - failure-domain-1
+        - failure-domain-2
+# ...
+----
+
+. Save the file.
--- a/modules/installation-nutanix-failure-domains-req.adoc
+++ b/modules/installation-nutanix-failure-domains-req.adoc
@@ -0,0 +1,14 @@
+// Module included in the following assemblies:
+//
+// * installing/installing_nutanix/nutanix-failure-domains.adoc
+// * post_installation_configuration/adding-nutanix-failure-domains.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="installation-nutanix-failure-domains-req_{context}"]
+= Failure domain requirements
+
+When planning to use failure domains, consider the following requirements:
+
+* All Nutanix Prism Element instances must be managed by the same instance of Prism Central. A deployment that is comprised of multiple Prism Central instances is not supported.
+* The machines that make up the Prism Element clusters must reside on the same Ethernet network for failure domains to be able to communicate with each other.
+* A subnet is required in each Prism Element that will be used as a failure domain in the {product-title} cluster. When defining these subnets, they must share the same IP address prefix (CIDR) and should contain the virtual IP addresses that the {product-title} cluster uses.
--- a/modules/post-installation-adding-nutanix-failure-domains-compute-machines.adoc
+++ b/modules/post-installation-adding-nutanix-failure-domains-compute-machines.adoc
@@ -0,0 +1,120 @@
+// Module included in the following assemblies:
+//
+// * post_installation_configuration/adding-nutanix-failure-domains.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="post-installation-adding-nutanix-failure-domains-compute-machines_{context}"]
+= Distributing compute machines across failure domains
+
+You can distribute compute machines across Nutanix failure domains by performing either of the following tasks:
+
+* Modifying existing compute machine sets.
+* Creating new compute machine sets.
+
+The following procedure details how to distribute compute machines across failure domains by modifying existing compute machine sets. For more information on creating a compute machine set, see "Additional resources".
+
+.Prerequisites
+
+* You have configured the failure domains in the cluster's Infrastructure custom resource (CR).
+
+.Procedure
+
+. Run the following command to view the cluster's Infrastructure CR.
+
+[source,terminal]
+----
+$ oc describe infrastructures.config.openshift.io cluster
+----
+. For each failure domain (`platformSpec.nutanix.failureDomains`), note the cluster's UUID, name, and subnet object UUID. These values are required to add a failure domain to a compute machine set.
+. List the compute machine sets in your cluster by running the following command:
+
+[source,terminal]
+----
+$ oc get machinesets -n openshift-machine-api
+----
+. Edit the first compute machine set by running the following command:
+
+[source,terminal]
+----
+$ oc edit machineset <machineset_name> -n openshift-machine-api
+----
+. Configure the compute machine set to use the first failure domain by adding the following to the `spec.template.spec.providerSpec.value` stanza:
+
+[NOTE]
+====
+Be sure that the values you specify for the `cluster` and `subnets` fields match the values that were configured in the `failureDomains` stanza in the cluster's Infrastructure CR.
+====
+
+.Example compute machine set with Nutanix failure domains
+[source,yaml]
+----
+apiVersion: machine.openshift.io/v1
+kind: MachineSet
+metadata:
+  creationTimestamp: null
+  labels:
+    machine.openshift.io/cluster-api-cluster: <cluster_name>
+  name: <machineset_name>
+  namespace: openshift-machine-api
+spec:
+  replicas: 2
+# ...
+  template:
+    spec:
+# ...
+      providerSpec:
+        value:
+          apiVersion: machine.openshift.io/v1
+          failureDomain:
+            name: <failure_domain_name_1>
+          cluster:
+            type: uuid
+            uuid: <prism_element_uuid_1>
+          subnets:
+          - type: uuid
+            uuid: <prism_element_network_uuid_1>
+# ...
+----
+. Note the value of `spec.replicas`, as you need it when scaling the machine set to apply the changes.
+. Save your changes.
+. List the machines that are managed by the updated compute machine set by running the following command:
+
+[source,terminal]
+----
+$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
+----
+. For each machine that is managed by the updated compute machine set, set the `delete` annotation by running the following command:
+
+[source,terminal]
+----
+$ oc annotate machine/<machine_name_original_1> \
+  -n openshift-machine-api \
+  machine.openshift.io/delete-machine="true"
+----
+. Scale the compute machine set to twice the number of replicas by running the following command:
+
+[source,terminal]
+----
+$ oc scale --replicas=<twice_the_number_of_replicas> \// <1>
+  machineset <machine_set_name> \
+  -n openshift-machine-api
+----
+<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `4`.
+. List the machines that are managed by the updated compute machine set by running the following command:
+
+[source,terminal]
+----
+$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
+----
+
+When the new machines are in the `Running` phase, you can scale the compute machine set to the original number of replicas.
+. Scale the compute machine set to the original number of replicas by running the following command:
+
+[source,terminal]
+----
+$ oc scale --replicas=<original_number_of_replicas> \// <1>
+  machineset <machine_set_name> \
+  -n openshift-machine-api
+----
+<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `2`.
+. As required, continue to modify machine sets to reference the additional failure domains that are available to the deployment.
--- a/modules/post-installation-adding-nutanix-failure-domains-control-planes.adoc
+++ b/modules/post-installation-adding-nutanix-failure-domains-control-planes.adoc
@@ -0,0 +1,54 @@
+// Module included in the following assemblies:
+//
+// * post_installation_configuration/adding-nutanix-failure-domains.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="post-installation-adding-nutanix-failure-domains-control-planes_{context}"]
+= Distributing control planes across failure domains
+
+You distribute control planes across Nutanix failure domains by modifying the control plane machine set custom resource (CR).
+
+.Prerequisites
+
+* You have configured the failure domains in the cluster's Infrastructure custom resource (CR).
+* The control plane machine set custom resource (CR) is in an active state.
+
+For more information on checking the control plane machine set custom resource state, see "Additional resources".
+
+.Procedure
+
+. Edit the control plane machine set CR by running the following command:
+
+[source,terminal]
+----
+$ oc edit controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api
+----
+. Configure the control plane machine set to use failure domains by adding a `spec.template.machines_v1beta1_machine_openshift_io.failureDomains` stanza.
+
+.Example control plane machine set with Nutanix failure domains
+[source,yaml]
+----
+apiVersion: machine.openshift.io/v1
+kind: ControlPlaneMachineSet
+  metadata:
+    creationTimestamp: null
+    labels:
+      machine.openshift.io/cluster-api-cluster: <cluster_name>
+    name: cluster
+    namespace: openshift-machine-api
+spec:
+# ...
+  template:
+    machineType: machines_v1beta1_machine_openshift_io
+    machines_v1beta1_machine_openshift_io:
+      failureDomains:
+        platform: Nutanix
+        nutanix:
+        - name: <failure_domain_name_1>
+        - name: <failure_domain_name_2>
+        - name: <failure_domain_name_3>
+# ...
+----
+. Save your changes.
+
+By default, the control plane machine set propagates changes to your control plane configuration automatically. If the cluster is configured to use the `OnDelete` update strategy, you must replace your control planes manually. For more information, see "Additional resources".
--- a/modules/post-installation-configuring-nutanix-failure-domains.adoc
+++ b/modules/post-installation-configuring-nutanix-failure-domains.adoc
@@ -0,0 +1,67 @@
+// Module included in the following assemblies:
+//
+// * post_installation_configuration/adding-nutanix-failure-domains.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="post-installation-configuring-nutanix-failure-domains_{context}"]
+= Adding failure domains to the Infrastructure CR
+
+You add failure domains to an existing Nutanix cluster by modifying its Infrastructure custom resource (CR) (`infrastructures.config.openshift.io`).
+
+[TIP]
+====
+It is recommended that you configure three failure domains to ensure high-availability.
+====
+
+.Procedure
+
+. Edit the Infrastructure CR by running the following command:
+
+[source,terminal]
+----
+$ oc edit infrastructures.config.openshift.io cluster
+----
+
+. Configure the failure domains.
+
+.Example Infrastructure CR with Nutanix failure domains
+[source,yaml]
+----
+spec:
+  cloudConfig:
+    key: config
+    name: cloud-provider-config
+#...
+  platformSpec:
+    nutanix:
+      failureDomains:
+      - cluster:
+         type: UUID
+         uuid: <uuid>
+        name: <failure_domain_name>
+        subnets:
+        - type: UUID
+          uuid: <network_uuid>
+      - cluster:
+         type: UUID
+         uuid: <uuid>
+        name: <failure_domain_name>
+        subnets:
+        - type: UUID
+          uuid: <network_uuid>
+      - cluster:
+          type: UUID
+          uuid: <uuid>
+        name: <failure_domain_name>
+        subnets:
+        - type: UUID
+          uuid: <network_uuid>
+# ...
+----
+where:
+
+`<uuid>`:: Specifies the universally unique identifier (UUID) of the Prism Element.
+`<failure_domain_name>`:: Specifies a unique name for the failure domain. The name is limited to 64 or fewer characters, which can include lower-case letters, digits, and a dash (`-`). The dash cannot be in the leading or ending position of the name.
+`<network_uuid>`:: Specifies the UUID of the Prism Element subnet object. The subnet's IP address prefix (CIDR) should contain the virtual IP addresses that the {product-title} cluster uses. Only one subnet per failure domain (Prism Element) in an {product-title} cluster is supported.
+
+. Save the CR to apply the changes.
--- a/post_installation_configuration/adding-nutanix-failure-domains.adoc
+++ b/post_installation_configuration/adding-nutanix-failure-domains.adoc
@@ -0,0 +1,34 @@
+:_mod-docs-content-type: ASSEMBLY
+[id="adding-failure-domains-to-an-existing-nutanix-cluster"]
+= Adding failure domains to an existing Nutanix cluster
+include::_attributes/common-attributes.adoc[]
+:context: adding-failure-domains-to-an-existing-nutanix-cluster
+
+toc::[]
+
+By default, the installation program installs control plane and compute machines into a single Nutanix Prism Element (cluster). After an {product-title} cluster is deployed, you can improve its fault tolerance by adding additional Prism Element instances to the deployment using failure domains.
+
+A failure domain represents a single Prism Element instance to which:
+
+* New control plane and compute machines can be deployed.
+* Existing control plane and compute machines can be distributed.
+
+[IMPORTANT]
+====
+If you deployed the {product-title} cluster using the {ai-full}, be sure that the postinstallation steps have been completed. Completing the {product-title} integration with Nutanix is required to add failure domains. For more information, see the {ai-full} documentation for link:https://access.redhat.com/documentation/en-us/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/assembly_installing-on-nutanix#nutanix-post-installation-configuration_assembly_installing-on-nutanix[Nutanix postinstallation configuration].
+====
+
+include::modules/installation-nutanix-failure-domains-req.adoc[leveloffset=+1]
+
+include::modules/post-installation-configuring-nutanix-failure-domains.adoc[leveloffset=+1]
+
+include::modules/post-installation-adding-nutanix-failure-domains-control-planes.adoc[leveloffset=+1]
+
+include::modules/post-installation-adding-nutanix-failure-domains-compute-machines.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+[id="additional-resources_adding-nutanix-failure-domains"]
+== Additional resources
+* xref:../machine_management/control_plane_machine_management/cpmso-getting-started.adoc#cpmso-checking-status_cpmso-getting-started[Checking the control plane machine set custom resource state]
+* xref:../machine_management/control_plane_machine_management/cpmso-using.adoc#cpmso-feat-replace_cpmso-using[Replacing a control plane machine]
+* xref:../machine_management/creating_machinesets/creating-machineset-nutanix.adoc#creating-machineset-nutanix[Creating a compute machine set on Nutanix]