1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

OSDOCS-14523: Porting AWS MAPI features to CAPI docs

This commit is contained in:
Jeana Routh
2025-06-02 16:04:03 -04:00
parent cce7f17792
commit 70efb3ec4b
12 changed files with 437 additions and 9 deletions

View File

@@ -32,7 +32,7 @@ include::modules/cluster-autoscaler-cr.adoc[leveloffset=+3]
include::modules/cluster-autoscaler-config-priority-expander.adoc[leveloffset=+3]
//Labeling GPU machine sets for the cluster autoscaler
include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leveloffset=+3]
include::modules/machineset-label-gpu-autoscaler.adoc[leveloffset=+3]
:FeatureName: cluster autoscaler
:FeatureResourceName: ClusterAutoscaler

View File

@@ -22,9 +22,45 @@ include::modules/capi-yaml-machine-template-aws.adoc[leveloffset=+2]
//Sample YAML for a CAPI AWS compute machine set resource
include::modules/capi-yaml-machine-set-aws.adoc[leveloffset=+2]
// [id="cluster-api-supported-features-aws_{context}"]
// == Enabling {aws-full} features with the Cluster API
[id="cluster-api-supported-features-aws_{context}"]
== Enabling {aws-full} features with the Cluster API
// You can enable the following features by updating values in the Cluster API custom resource manifests.
You can enable the following features by updating values in the Cluster API custom resource manifests.
//Not sure what, if anything, we can add here at this time.
////
//Not yet supported, relies on Cluster API CAS support
// Cluster autoscaler GPU labels
include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leveloffset=+2]
[role="_additional-resources"]
.Additional resources
* xref:../../../machine_management/applying-autoscaling.adoc#cluster-autoscaler-cr_applying-autoscaling[Cluster autoscaler resource definition]
////
// Elastic Fabric Adapter instances and placement group options
include::modules/machine-feature-aws-existing-placement-group.adoc[leveloffset=+2]
// Amazon EC2 Instance Metadata Service configuration options
include::modules/machine-feature-aws-imds-options.adoc[leveloffset=+2]
////
//This link is for a note that does not apply to TP clusters, reassess for Cluster API GA
[role="_additional-resources"]
.Additional resources
* xref:../../../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images[Updated boot images]
////
// Dedicated Instances configuration options
include::modules/machine-feature-aws-dedicated-instances.adoc[leveloffset=+2]
// Non-guaranteed Spot Instances and hourly cost limits
include::modules/machine-feature-agnostic-nonguaranteed-instances.adoc[leveloffset=+2]
// Capacity Reservation configuration options
include::modules/machine-feature-agnostic-capacity-reservation.adoc[leveloffset=+2]
//Adding a GPU node to a machine set (stesmith)
include::modules/machine-feature-aws-add-nvidia-gpu-node.adoc[leveloffset=+2]
// //Deploying the Node Feature Discovery Operator (stesmith)
// include::modules/nvidia-gpu-aws-deploying-the-node-feature-discovery-operator.adoc[leveloffset=+1]

View File

@@ -22,6 +22,8 @@ include::modules/capi-yaml-machine-template-bare-metal.adoc[leveloffset=+2]
//Sample YAML for a CAPI bare metal compute machine set resource
include::modules/capi-yaml-machine-set-bare-metal.adoc[leveloffset=+2]
////
//Section depends on migration support
[id="cluster-api-supported-features-bare-metal_{context}"]
== Enabling bare metal features with the Cluster API
@@ -33,3 +35,4 @@ include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leve
[role="_additional-resources"]
.Additional resources
* xref:../../../machine_management/applying-autoscaling.adoc#cluster-autoscaler-cr_applying-autoscaling[Cluster autoscaler resource definition]
////

View File

@@ -19,12 +19,11 @@ metadata:
spec:
template:
spec: # <3>
uncompressedUserData: true
iamInstanceProfile: # ...
instanceType: m5.large
ignition:
storageType: UnencryptedUserData
version: "3.2"
version: "3.4"
ami:
id: # ...
subnet:

View File

@@ -0,0 +1,70 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
// There are parallel features in Azure so this module is set up for reuse.
ifeval::["{context}" == "cluster-api-config-options-aws"]
:aws:
endif::[]
:_mod-docs-content-type: CONCEPT
[id="machine-feature-agnostic-capacity-reservation_{context}"]
= Capacity Reservation configuration options
{product-title} version {product-version} and later supports
ifdef::azure[on-demand Capacity Reservation with Capacity Reservation groups on {azure-full} clusters.]
ifdef::aws[Capacity Reservations on {aws-full} clusters, including On-Demand Capacity Reservations and Capacity Blocks for ML.]
You can deploy machines on any available resources that match the parameters of a capacity request that you define.
These parameters specify the
ifdef::azure[VM size,]
ifdef::aws[instance type,]
region, and number of instances that you want to reserve.
If your
ifdef::azure[{azure-short} subscription quota]
ifdef::aws[Capacity Reservation]
can accommodate the capacity request, the deployment succeeds.
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
ifdef::azure[]
[NOTE]
====
You cannot change an existing Capacity Reservation configuration for a machine set.
To use a different Capacity Reservation group, you must replace the machine set and the machines that the previous machine set deployed.
====
endif::azure[]
.Sample Capacity Reservation configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
capacityReservationId: <capacity_reservation> # <1>
marketType: <market_type> # <2>
# ...
----
<1> Specify the ID of the
ifdef::azure[Capacity Reservation group]
ifdef::aws[Capacity Block for ML or On-Demand Capacity Reservation]
that you want to deploy machines on.
ifdef::aws[]
<2> Specify the market type to use.
The following values are valid:
`CapacityBlock`:: Use this market type with Capacity Blocks for ML.
`OnDemand`:: Use this market type with On-Demand Capacity Reservations.
`Spot`:: Use this market type with Spot Instances.
This option is not compatible with Capacity Reservations.
endif::aws[]
For more information, including limitations and suggested use cases for this offering, see
ifdef::azure[link:https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview[On-demand Capacity Reservation] in the {azure-full} documentation.]
ifdef::aws[link:https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/capacity-reservation-overview.html[On-Demand Capacity Reservations and Capacity Blocks for ML] in the {aws-short} documentation.]
ifeval::["{context}" == "cluster-api-config-options-aws"]
:!aws:
endif::[]

View File

@@ -0,0 +1,67 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
// There are parallel features in Azure and GCP so this module is set up for reuse.
ifeval::["{context}" == "cluster-api-config-options-aws"]
:aws:
endif::[]
:_mod-docs-content-type: CONCEPT
[id="machine-feature-agnostic-nonguaranteed-instances_{context}"]
ifdef::aws[= Non-guaranteed Spot Instances and hourly cost limits]
ifdef::aws[]
You can deploy machines as non-guaranteed Spot Instances on {aws-first}.
Spot Instances use spare AWS EC2 capacity and are less expensive than On-Demand Instances.
You can use Spot Instances for workloads that can tolerate interruptions, such as batch or stateless, horizontally scalable workloads.
endif::aws[]
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
ifdef::aws[]
[IMPORTANT]
====
AWS EC2 can reclaim the capacity for a Spot Instance at any time.
====
.Sample Spot Instance configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
spotMarketOptions: <1>
maxPrice: <price_per_hour> <2>
# ...
----
<1> Specifies the use of Spot Instances.
<2> Optional: Specifies an hourly cost limit in US dollars for the Spot Instance.
For example, setting the `<price_per_hour>` value to `2.50` limits the cost of the Spot Instance to USD 2.50 per hour.
When this value is not set, the maximum price charges up to the On-Demand Instance price.
+
[WARNING]
====
Setting a specific `maxPrice: <price_per_hour>` value might increase the frequency of interruptions compared to using the default On-Demand Instance price.
It is strongly recommended to use the default On-Demand Instance price and to not set the maximum price for Spot Instances.
====
Interruptions can occur when using Spot Instances for the following reasons:
* The instance price exceeds your maximum price
* The demand for Spot Instances increases
* The supply of Spot Instances decreases
AWS gives a two-minute warning to the user when an interruption occurs.
{product-title} begins to remove the workloads from the affected instances when AWS issues the termination warning.
When AWS terminates an instance, a termination handler running on the Spot Instance node deletes the machine resource.
To satisfy the compute machine set `replicas` quantity, the compute machine set creates a machine that requests a Spot Instance.
endif::aws[]
ifeval::["{context}" == "cluster-api-config-options-aws"]
:!aws:
endif::[]

View File

@@ -1,7 +1,5 @@
// Module included in the following assemblies:
//
// * machine_management/applying-autoscaling.adoc
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
:_mod-docs-content-type: CONCEPT
[id="machine-feature-agnostic-options-label-gpu-autoscaler_{context}"]

View File

@@ -0,0 +1,65 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
:_mod-docs-content-type: CONCEPT
[id="machine-feature-aws-add-nvidia-gpu-node_{context}"]
= GPU-enabled machine options
You can deploy GPU-enabled compute machines on {aws-first}.
The following sample configuration uses an link:https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing[{aws-short} G4dn instance type], which includes an NVIDIA Tesla T4 Tensor Core GPU, as an example.
For more information about supported instance types, see the following pages in the NVIDIA documentation:
* link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html[NVIDIA GPU Operator Community support matrix]
* link:https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html[NVIDIA AI Enterprise support matrix]
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template-and-machine-set]
// Cluster API machine template spec
.Sample GPU-enabled machine template configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
instanceType: g4dn.xlarge <1>
# ...
----
<1> Specifies a G4dn instance type.
// Cluster API machine set spec
.Sample GPU-enabled machine set configuration
[source,yaml]
----
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineSet
metadata:
name: <cluster_name>-gpu-<region> <1>
namespace: openshift-cluster-api
labels:
cluster.x-k8s.io/cluster-name: <cluster_name>
spec:
clusterName: <cluster_name>
replicas: 1
selector:
matchLabels:
test: example
cluster.x-k8s.io/cluster-name: <cluster_name>
cluster.x-k8s.io/set-name: <cluster_name>-gpu-<region> <2>
template:
metadata:
labels:
test: example
cluster.x-k8s.io/cluster-name: <cluster_name>
cluster.x-k8s.io/set-name: <cluster_name>-gpu-<region> <3>
node-role.kubernetes.io/<role>: ""
# ...
----
<1> Specifies a name that includes the `gpu` role. The name includes the cluster ID as a prefix and the region as a suffix.
<2> Specifies a selector label that matches the machine set name.
<3> Specifies a template label that matches the machine set name.

View File

@@ -0,0 +1,33 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
:_mod-docs-content-type: CONCEPT
[id="machine-feature-aws-dedicated-instances_{context}"]
= Dedicated Instance configuration options
You can deploy machines that are backed by Dedicated Instances on {aws-first} clusters.
Dedicated Instances run in a virtual private cloud (VPC) on hardware that is dedicated to a single customer.
These Amazon EC2 instances are physically isolated at the host hardware level.
The isolation of Dedicated Instances occurs even if the instances belong to different AWS accounts that are linked to a single payer account.
However, other instances that are not dedicated can share hardware with Dedicated Instances if they belong to the same AWS account.
{product-title} supports instances with public or dedicated tenancy.
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
.Sample Dedicated Instances configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
tenancy: dedicated <1>
# ...
----
<1> Specifies using instances with dedicated tenancy that run on single-tenant hardware.
If you do not specify this value, instances with public tenancy that run on shared hardware are used by default.

View File

@@ -0,0 +1,58 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
:_mod-docs-content-type: CONCEPT
[id="machine-feature-aws-existing-placement-group_{context}"]
= Elastic Fabric Adapter instances and placement group options
You can deploy compute machines on link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html[Elastic Fabric Adapter] (EFA) instances within an existing AWS placement group.
EFA instances do not require placement groups, and you can use placement groups for purposes other than configuring an EFA.
The following example uses an EFA and placement group together to demonstrate a configuration that can improve network performance for machines within the specified placement group.
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
.Sample EFA instance and placement group configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
instanceType: <supported_instance_type> # <1>
networkInterfaceType: efa # <2>
placementGroupName: <placement_group> # <3>
placementGroupPartition: <placement_group_partition_number> # <4>
# ...
----
<1> Specifies an instance type that link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types[supports EFAs].
<2> Specifies the `efa` network interface type.
<3> Specifies the name of the existing AWS placement group to deploy machines in.
<4> Optional: Specifies the partition number of the existing AWS placement group where you want your machines deployed.
[NOTE]
====
Ensure that the link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#limitations-placement-groups[rules and limitations] for the type of placement group that you create are compatible with your intended use case.
====
////
The MAPI version of this has additional parameters in the providerSpec:
----
placement:
availabilityZone: <zone> # <3>
region: <region> # <4>
----
<3> Specifies the zone, for example, `us-east-1a`.
<4> Specifies the region, for example, `us-east-1`.
Do we need to say anything specific about this, or is this just redundant with the failure domain?
Note:
CAPI has networkInterfaceType: efa
MAPI has networkInterfaceType: EFA
Capitalization matters!
////

View File

@@ -0,0 +1,60 @@
// Module included in the following assemblies:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
:_mod-docs-content-type: CONCEPT
[id="machine-feature-aws-imds-options_{context}"]
= Amazon EC2 Instance Metadata Service configuration options
You can restrict the version of the Amazon EC2 Instance Metadata Service (IMDS) that machines on {aws-first} clusters use.
Machines can require the use of link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html[IMDSv2] (AWS documentation), or allow the use of IMDSv1 in addition to IMDSv2.
////
This is true but does not apply to TP clusters, reassess for Cluster API GA
[NOTE]
====
To use IMDSv2 on AWS clusters that were created with {product-title} version 4.6 or earlier, you must update your boot image.
For more information, see "Updated boot images".
====
////
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
[IMPORTANT]
====
Before creating machines that require IMDSv2, ensure that any workloads that interact with the IMDS support IMDSv2.
====
.Sample IMDS configuration
[source,yaml]
----
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
# ...
spec:
template:
spec:
instanceMetadataOptions:
httpEndpoint: enabled
httpPutResponseHopLimit: 1 <1>
httpTokens: optional <2>
instanceMetadataTags: disabled
# ...
----
<1> Specifies the number of network hops allowed for IMDSv2 calls.
If no value is specified, this parameter is set to `1` by default.
<2> Specifies whether to require the use of IMDSv2.
If no value is specified, this parameter is set to `optional` by default.
The following values are valid:
`optional`:: Allow the use of both IMDSv1 and IMDSv2.
`required`:: Require IMDSv2.
[NOTE]
====
The Machine API does not support the `httpEndpoint`, `httpPutResponseHopLimit`, and `instanceMetadataTags` fields.
If you migrate a Cluster API machine template that uses this feature to a Machine API compute machine set, any Machine API machines that it creates will not have these fields and the underlying instances will not use these settings.
Any existing machines that the migrated machine set manages will retain these fields and the underlying instances will continue to use these settings.
====
Requiring the use of IMDSv2 might cause timeouts.
For more information, including mitigation strategies, see link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#imds-considerations[Instance metadata access considerations] (AWS documentation).

View File

@@ -0,0 +1,39 @@
// Text snippet included in the following modules:
//
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/modules/machine-feature-aws-existing-placement-group.adoc
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/modules/machine-feature-aws-imds-options.adoc
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/modules/machine-feature-agnostic-nonguaranteed-instances.adoc
// *
// *
// *
// *
// *
// *
// *
// *
:_mod-docs-content-type: SNIPPET
//Cluster API machine template
tag::method-machine-template[]
To deploy compute machines with your configuration, configure the appropriate values in a machine template YAML file.
Then, configure a machine set YAML file to reference the machine template when it deploys machines.
end::method-machine-template[]
//Cluster API or Machine API machine set
tag::method-compute-machine-set[]
To deploy compute machines with your configuration, configure the appropriate values in a machine set YAML file to use when it deploys machines.
end::method-compute-machine-set[]
//Cluster API machine template and machine set
tag::method-machine-template-and-machine-set[]
To deploy compute machines with your configuration, configure the appropriate values in a machine template YAML file and a machine set YAML file that references the machine template when it deploys machines.
end::method-machine-template-and-machine-set[]
//Control plane machine set
tag::method-control-plane-machine-set[]
To deploy control machines with your configuration, configure the appropriate values in your control plane machine set YAML file.
* For clusters that use the default `RollingUpdate` update strategy, the control plane machine set propagates changes to your control plane configuration automatically.
* For clusters that use the `OnDelete` update strategy, you must replace your control plane machines manually.
end::method-control-plane-machine-set[]