1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 21:46:22 +01:00

OSDOCS#8867: Adding failure domains to Nutanix docs

This commit is contained in:
Mike Pytlak
2023-12-19 13:58:01 -05:00
committed by openshift-cherrypick-robot
parent 33878ed069
commit 3940c53533
11 changed files with 473 additions and 2 deletions

View File

@@ -307,6 +307,8 @@ Topics:
Topics:
- Name: Preparing to install on Nutanix
File: preparing-to-install-on-nutanix
- Name: Fault tolerant deployments
File: nutanix-failure-domains
- Name: Installing a cluster on Nutanix
File: installing-nutanix-installer-provisioned
- Name: Installing a cluster on Nutanix in a restricted network
@@ -606,6 +608,9 @@ Topics:
- Name: AWS Local Zone tasks
File: aws-compute-edge-tasks
Distros: openshift-enterprise
- Name: Adding failure domains to an existing Nutanix cluster
File: adding-nutanix-failure-domains
Distros: openshift-origin,openshift-enterprise
---
Name: Updating clusters
Dir: updating

View File

@@ -47,6 +47,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
* xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]
include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
include::modules/installation-configure-proxy.adoc[leveloffset=+2]
include::modules/cli-installing-cli.adoc[leveloffset=+1]

View File

@@ -46,6 +46,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
* xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]
include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
include::modules/installation-configure-proxy.adoc[leveloffset=+2]
include::modules/cli-installing-cli.adoc[leveloffset=+1]

View File

@@ -0,0 +1,26 @@
:_mod-docs-content-type: ASSEMBLY
[id="nutanix-failure-domains"]
= Fault tolerant deployments using multiple Prism Elements
include::_attributes/common-attributes.adoc[]
:context: nutanix-failure-domains
toc::[]
By default, the installation program installs control plane and compute machines into a single Nutanix Prism Element (cluster). To improve the fault tolerance of your {product-title} cluster, you can specify that these machines be distributed across multiple Nutanix clusters by configuring failure domains.
A failure domain represents an additional Prism Element instance that is available to {product-title} machine pools during and after installation.
include::modules/installation-nutanix-failure-domains-req.adoc[leveloffset=+1]
== Installation method and failure domain configuration
The {product-title} installation method determines how and when you configure failure domains:
* If you deploy using installer-provisioned infrastructure, you can configure failure domains in the installation configuration file before deploying the cluster. For more information, see xref:../../installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc#installation-configuring-nutanix-failure-domains_installing-nutanix-installer-provisioned[Configuring failure domains].
+
You can also configure failure domains after the cluster is deployed.
* If you deploy using the {ai-full}, you configure failure domains after the cluster is deployed.
+
For more information about configuring failure domains post-installation, see xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster].
* If you deploy using infrastructure that you manage (user-provisioned infrastructure) no additional configuration is required. After the cluster is deployed, you can manually distribute control plane and compute machines across failure domains.

View File

@@ -3089,6 +3089,17 @@ Additional Nutanix configuration parameters are described in the following table
|The value of a prism category key-value pair to apply to compute VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
|String
|compute:
platform:
nutanix:
failureDomains:
d|The failure domains that apply to only compute machines.
Failure domains are specified in `platform.nutanix.failureDomains`.
d|List.
The name of one or more failures domains.
|compute:
platform:
nutanix:
@@ -3128,6 +3139,17 @@ Additional Nutanix configuration parameters are described in the following table
|The value of a prism category key-value pair to apply to control plane VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
|String
|controlPlane:
platform:
nutanix:
failureDomains:
d|The failure domains that apply to only control plane machines.
Failure domains are specified in `platform.nutanix.failureDomains`.
d|List.
The name of one or more failures domains.
|controlPlane:
platform:
nutanix:
@@ -3160,6 +3182,17 @@ Additional Nutanix configuration parameters are described in the following table
|The value of a prism category key-value pair to apply to all VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
|String
|platform:
nutanix:
defaulatMachinePlatform:
failureDomains:
d|The failure domains that apply to both control plane and compute machines.
Failure domains are specified in `platform.nutanix.failureDomains`.
d|List.
The name of one or more failures domains.
|platform:
nutanix:
defaultMachinePlatform:
@@ -3189,6 +3222,23 @@ Additional Nutanix configuration parameters are described in the following table
|The virtual IP (VIP) address that you configured for control plane API access.
|IP address
|platform:
nutanix:
failureDomains:
- name:
prismElement:
name:
uuid:
subnetUUIDs:
-
a|By default, the installation program installs cluster machines to a single Prism Element instance. You can specify additional Prism Element instances for fault tolerance, and then apply them to:
* The cluster's default machine configuration
* Only control plane or compute machine pools
d|A list of configured failure domains.
For more information on usage, see "Configuring a failure domain" in "Installing a cluster on Nutanix".
|platform:
nutanix:
ingressVIP:
@@ -3262,8 +3312,8 @@ Additional Nutanix configuration parameters are described in the following table
|====
[.small]
--
1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster. Only a single Prism Element is supported.
2. Only one subnet per {product-title} cluster is supported.
1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster.
2. Only one subnet per Prism Element in an {product-title} cluster is supported.
--
endif::nutanix[]

View File

@@ -0,0 +1,99 @@
// Module included in the following assemblies:
//
// * installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc
// * installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc
:_mod-docs-content-type: PROCEDURE
[id="installation-configuring-nutanix-failure-domains_{context}"]
= Configuring failure domains
Failure domains improve the fault tolerance of an {product-title} cluster by distributing control plane and compute machines across multiple Nutanix Prism Elements (clusters).
[TIP]
====
It is recommended that you configure three failure domains to ensure high-availability.
====
.Prerequisites
* You have an installation configuration file (`install-config.yaml`).
.Procedure
. Edit the `install-config.yaml` file and add the following stanza to configure the first failure domain:
+
[source,yaml]
----
apiVersion: v1
baseDomain: example.com
compute:
# ...
platform:
nutanix:
failureDomains:
- name: <failure_domain_name>
prismElement:
name: <prism_element_name>
uuid: <prism_element_uuid>
subnetUUIDs:
- <network_uuid>
# ...
----
+
where:
`<failure_domain_name>`:: Specifies a unique name for the failure domain. The name is limited to 64 or fewer characters, which can include lower-case letters, digits, and a dash (`-`). The dash cannot be in the leading or ending position of the name.
`<prism_element_name>`:: Optional. Specifies the name of the Prism Element.
`<prism_element_uuid`>:: Specifies the UUID of the Prism Element.
`<network_uuid`>:: Specifies the UUID of the Prism Element subnet object. The subnet's IP address prefix (CIDR) should contain the virtual IP addresses that the {product-title} cluster uses. Only one subnet per failure domain (Prism Element) in an {product-title} cluster is supported.
. As required, configure additional failure domains.
. To distribute control plane and compute machines across the failure domains, do one of the following:
** If compute and control plane machines can share the same set of failure domains, add the failure domain names under the cluster's default machine configuration.
+
.Example of control plane and compute machines sharing a set of failure domains
+
[source,yaml]
----
apiVersion: v1
baseDomain: example.com
compute:
# ...
platform:
nutanix:
defaultMachinePlatform:
failureDomains:
- failure-domain-1
- failure-domain-2
- failure-domain-3
# ...
----
** If compute and control plane machines must use different failure domains, add the failure domain names under the respective machine pools.
+
.Example of control plane and compute machines using different failure domains
+
[source,yaml]
----
apiVersion: v1
baseDomain: example.com
compute:
# ...
controlPlane:
platform:
nutanix:
failureDomains:
- failure-domain-1
- failure-domain-2
- failure-domain-3
# ...
compute:
platform:
nutanix:
failureDomains:
- failure-domain-1
- failure-domain-2
# ...
----
. Save the file.

View File

@@ -0,0 +1,14 @@
// Module included in the following assemblies:
//
// * installing/installing_nutanix/nutanix-failure-domains.adoc
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
:_mod-docs-content-type: CONCEPT
[id="installation-nutanix-failure-domains-req_{context}"]
= Failure domain requirements
When planning to use failure domains, consider the following requirements:
* All Nutanix Prism Element instances must be managed by the same instance of Prism Central. A deployment that is comprised of multiple Prism Central instances is not supported.
* The machines that make up the Prism Element clusters must reside on the same Ethernet network for failure domains to be able to communicate with each other.
* A subnet is required in each Prism Element that will be used as a failure domain in the {product-title} cluster. When defining these subnets, they must share the same IP address prefix (CIDR) and should contain the virtual IP addresses that the {product-title} cluster uses.

View File

@@ -0,0 +1,120 @@
// Module included in the following assemblies:
//
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
:_mod-docs-content-type: PROCEDURE
[id="post-installation-adding-nutanix-failure-domains-compute-machines_{context}"]
= Distributing compute machines across failure domains
You can distribute compute machines across Nutanix failure domains by performing either of the following tasks:
* Modifying existing compute machine sets.
* Creating new compute machine sets.
The following procedure details how to distribute compute machines across failure domains by modifying existing compute machine sets. For more information on creating a compute machine set, see "Additional resources".
.Prerequisites
* You have configured the failure domains in the cluster's Infrastructure custom resource (CR).
.Procedure
. Run the following command to view the cluster's Infrastructure CR.
+
[source,terminal]
----
$ oc describe infrastructures.config.openshift.io cluster
----
. For each failure domain (`platformSpec.nutanix.failureDomains`), note the cluster's UUID, name, and subnet object UUID. These values are required to add a failure domain to a compute machine set.
. List the compute machine sets in your cluster by running the following command:
+
[source,terminal]
----
$ oc get machinesets -n openshift-machine-api
----
. Edit the first compute machine set by running the following command:
+
[source,terminal]
----
$ oc edit machineset <machineset_name> -n openshift-machine-api
----
. Configure the compute machine set to use the first failure domain by adding the following to the `spec.template.spec.providerSpec.value` stanza:
+
[NOTE]
====
Be sure that the values you specify for the `cluster` and `subnets` fields match the values that were configured in the `failureDomains` stanza in the cluster's Infrastructure CR.
====
+
.Example compute machine set with Nutanix failure domains
[source,yaml]
----
apiVersion: machine.openshift.io/v1
kind: MachineSet
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <cluster_name>
name: <machineset_name>
namespace: openshift-machine-api
spec:
replicas: 2
# ...
template:
spec:
# ...
providerSpec:
value:
apiVersion: machine.openshift.io/v1
failureDomain:
name: <failure_domain_name_1>
cluster:
type: uuid
uuid: <prism_element_uuid_1>
subnets:
- type: uuid
uuid: <prism_element_network_uuid_1>
# ...
----
. Note the value of `spec.replicas`, as you need it when scaling the machine set to apply the changes.
. Save your changes.
. List the machines that are managed by the updated compute machine set by running the following command:
+
[source,terminal]
----
$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
----
. For each machine that is managed by the updated compute machine set, set the `delete` annotation by running the following command:
+
[source,terminal]
----
$ oc annotate machine/<machine_name_original_1> \
-n openshift-machine-api \
machine.openshift.io/delete-machine="true"
----
. Scale the compute machine set to twice the number of replicas by running the following command:
+
[source,terminal]
----
$ oc scale --replicas=<twice_the_number_of_replicas> \// <1>
machineset <machine_set_name> \
-n openshift-machine-api
----
<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `4`.
. List the machines that are managed by the updated compute machine set by running the following command:
+
[source,terminal]
----
$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
----
+
When the new machines are in the `Running` phase, you can scale the compute machine set to the original number of replicas.
. Scale the compute machine set to the original number of replicas by running the following command:
+
[source,terminal]
----
$ oc scale --replicas=<original_number_of_replicas> \// <1>
machineset <machine_set_name> \
-n openshift-machine-api
----
<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `2`.
. As required, continue to modify machine sets to reference the additional failure domains that are available to the deployment.

View File

@@ -0,0 +1,54 @@
// Module included in the following assemblies:
//
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
:_mod-docs-content-type: PROCEDURE
[id="post-installation-adding-nutanix-failure-domains-control-planes_{context}"]
= Distributing control planes across failure domains
You distribute control planes across Nutanix failure domains by modifying the control plane machine set custom resource (CR).
.Prerequisites
* You have configured the failure domains in the cluster's Infrastructure custom resource (CR).
* The control plane machine set custom resource (CR) is in an active state.
For more information on checking the control plane machine set custom resource state, see "Additional resources".
.Procedure
. Edit the control plane machine set CR by running the following command:
+
[source,terminal]
----
$ oc edit controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api
----
. Configure the control plane machine set to use failure domains by adding a `spec.template.machines_v1beta1_machine_openshift_io.failureDomains` stanza.
+
.Example control plane machine set with Nutanix failure domains
[source,yaml]
----
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <cluster_name>
name: cluster
namespace: openshift-machine-api
spec:
# ...
template:
machineType: machines_v1beta1_machine_openshift_io
machines_v1beta1_machine_openshift_io:
failureDomains:
platform: Nutanix
nutanix:
- name: <failure_domain_name_1>
- name: <failure_domain_name_2>
- name: <failure_domain_name_3>
# ...
----
. Save your changes.
By default, the control plane machine set propagates changes to your control plane configuration automatically. If the cluster is configured to use the `OnDelete` update strategy, you must replace your control planes manually. For more information, see "Additional resources".

View File

@@ -0,0 +1,67 @@
// Module included in the following assemblies:
//
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
:_mod-docs-content-type: PROCEDURE
[id="post-installation-configuring-nutanix-failure-domains_{context}"]
= Adding failure domains to the Infrastructure CR
You add failure domains to an existing Nutanix cluster by modifying its Infrastructure custom resource (CR) (`infrastructures.config.openshift.io`).
[TIP]
====
It is recommended that you configure three failure domains to ensure high-availability.
====
.Procedure
. Edit the Infrastructure CR by running the following command:
+
[source,terminal]
----
$ oc edit infrastructures.config.openshift.io cluster
----
. Configure the failure domains.
+
.Example Infrastructure CR with Nutanix failure domains
[source,yaml]
----
spec:
cloudConfig:
key: config
name: cloud-provider-config
#...
platformSpec:
nutanix:
failureDomains:
- cluster:
type: UUID
uuid: <uuid>
name: <failure_domain_name>
subnets:
- type: UUID
uuid: <network_uuid>
- cluster:
type: UUID
uuid: <uuid>
name: <failure_domain_name>
subnets:
- type: UUID
uuid: <network_uuid>
- cluster:
type: UUID
uuid: <uuid>
name: <failure_domain_name>
subnets:
- type: UUID
uuid: <network_uuid>
# ...
----
where:
`<uuid>`:: Specifies the universally unique identifier (UUID) of the Prism Element.
`<failure_domain_name>`:: Specifies a unique name for the failure domain. The name is limited to 64 or fewer characters, which can include lower-case letters, digits, and a dash (`-`). The dash cannot be in the leading or ending position of the name.
`<network_uuid>`:: Specifies the UUID of the Prism Element subnet object. The subnet's IP address prefix (CIDR) should contain the virtual IP addresses that the {product-title} cluster uses. Only one subnet per failure domain (Prism Element) in an {product-title} cluster is supported.
. Save the CR to apply the changes.

View File

@@ -0,0 +1,34 @@
:_mod-docs-content-type: ASSEMBLY
[id="adding-failure-domains-to-an-existing-nutanix-cluster"]
= Adding failure domains to an existing Nutanix cluster
include::_attributes/common-attributes.adoc[]
:context: adding-failure-domains-to-an-existing-nutanix-cluster
toc::[]
By default, the installation program installs control plane and compute machines into a single Nutanix Prism Element (cluster). After an {product-title} cluster is deployed, you can improve its fault tolerance by adding additional Prism Element instances to the deployment using failure domains.
A failure domain represents a single Prism Element instance to which:
* New control plane and compute machines can be deployed.
* Existing control plane and compute machines can be distributed.
[IMPORTANT]
====
If you deployed the {product-title} cluster using the {ai-full}, be sure that the postinstallation steps have been completed. Completing the {product-title} integration with Nutanix is required to add failure domains. For more information, see the {ai-full} documentation for link:https://access.redhat.com/documentation/en-us/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/assembly_installing-on-nutanix#nutanix-post-installation-configuration_assembly_installing-on-nutanix[Nutanix postinstallation configuration].
====
include::modules/installation-nutanix-failure-domains-req.adoc[leveloffset=+1]
include::modules/post-installation-configuring-nutanix-failure-domains.adoc[leveloffset=+1]
include::modules/post-installation-adding-nutanix-failure-domains-control-planes.adoc[leveloffset=+1]
include::modules/post-installation-adding-nutanix-failure-domains-compute-machines.adoc[leveloffset=+1]
[role="_additional-resources"]
[id="additional-resources_adding-nutanix-failure-domains"]
== Additional resources
* xref:../machine_management/control_plane_machine_management/cpmso-getting-started.adoc#cpmso-checking-status_cpmso-getting-started[Checking the control plane machine set custom resource state]
* xref:../machine_management/control_plane_machine_management/cpmso-using.adoc#cpmso-feat-replace_cpmso-using[Replacing a control plane machine]
* xref:../machine_management/creating_machinesets/creating-machineset-nutanix.adoc#creating-machineset-nutanix[Creating a compute machine set on Nutanix]