// Module included in the following assemblies: // // * machine_management/creating-machinesets/creating-machineset-aws.adoc :_mod-docs-content-type: PROCEDURE [id="nvidia-gpu-aws-adding-a-gpu-node_{context}"] = Adding a GPU node to an existing {product-title} cluster You can copy and modify a default compute machine set configuration to create a GPU-enabled machine set and machines for the AWS EC2 cloud provider. For more information about the supported instance types, see the following NVIDIA documentation: * link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html[NVIDIA GPU Operator Community support matrix] * link:https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html[NVIDIA AI Enterprise support matrix] .Procedure . View the existing nodes, machines, and machine sets by running the following command. Note that each node is an instance of a machine definition with a specific AWS region and {product-title} role. + [source,terminal] ---- $ oc get nodes ---- + .Example output + [source,terminal] ---- NAME STATUS ROLES AGE VERSION ip-10-0-52-50.us-east-2.compute.internal Ready worker 3d17h v1.31.3 ip-10-0-58-24.us-east-2.compute.internal Ready control-plane,master 3d17h v1.31.3 ip-10-0-68-148.us-east-2.compute.internal Ready worker 3d17h v1.31.3 ip-10-0-68-68.us-east-2.compute.internal Ready control-plane,master 3d17h v1.31.3 ip-10-0-72-170.us-east-2.compute.internal Ready control-plane,master 3d17h v1.31.3 ip-10-0-74-50.us-east-2.compute.internal Ready worker 3d17h v1.31.3 ---- . View the machines and machine sets that exist in the `openshift-machine-api` namespace by running the following command. Each compute machine set is associated with a different availability zone within the AWS region. The installer automatically load balances compute machines across availability zones. + [source,terminal] ---- $ oc get machinesets -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME DESIRED CURRENT READY AVAILABLE AGE preserve-dsoc12r4-ktjfc-worker-us-east-2a 1 1 1 1 3d11h preserve-dsoc12r4-ktjfc-worker-us-east-2b 2 2 2 2 3d11h ---- . View the machines that exist in the `openshift-machine-api` namespace by running the following command. At this time, there is only one compute machine per machine set, though a compute machine set could be scaled to add a node in a particular region and zone. + [source,terminal] ---- $ oc get machines -n openshift-machine-api | grep worker ---- + .Example output + [source,terminal] ---- preserve-dsoc12r4-ktjfc-worker-us-east-2a-dts8r Running m5.xlarge us-east-2 us-east-2a 3d11h preserve-dsoc12r4-ktjfc-worker-us-east-2b-dkv7w Running m5.xlarge us-east-2 us-east-2b 3d11h preserve-dsoc12r4-ktjfc-worker-us-east-2b-k58cw Running m5.xlarge us-east-2 us-east-2b 3d11h ---- . Make a copy of one of the existing compute `MachineSet` definitions and output the result to a JSON file by running the following command. This will be the basis for the GPU-enabled compute machine set definition. + [source,terminal] ---- $ oc get machineset preserve-dsoc12r4-ktjfc-worker-us-east-2a -n openshift-machine-api -o json > ---- . Edit the JSON file and make the following changes to the new `MachineSet` definition: + * Replace `worker` with `gpu`. This will be the name of the new machine set. * Change the instance type of the new `MachineSet` definition to `g4dn`, which includes an NVIDIA Tesla T4 GPU. To learn more about AWS `g4dn` instance types, see link:https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing[Accelerated Computing]. + [source,terminal] ---- $ jq .spec.template.spec.providerSpec.value.instanceType preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json "g4dn.xlarge" ---- + The `` file is saved as `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`. . Update the following fields in `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`: + * `.metadata.name` to a name containing `gpu`. * `.spec.selector.matchLabels["machine.openshift.io/cluster-api-machineset"]` to match the new `.metadata.name`. * `.spec.template.metadata.labels["machine.openshift.io/cluster-api-machineset"]` to match the new `.metadata.name`. * `.spec.template.spec.providerSpec.value.instanceType` to `g4dn.xlarge`. . To verify your changes, perform a `diff` of the original compute definition and the new GPU-enabled node definition by running the following command: + [source,terminal] ---- $ oc -n openshift-machine-api get preserve-dsoc12r4-ktjfc-worker-us-east-2a -o json | diff preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json - ---- + .Example output + [source,terminal] ---- 10c10 < "name": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a", --- > "name": "preserve-dsoc12r4-ktjfc-worker-us-east-2a", 21c21 < "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a" --- > "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a" 31c31 < "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a" --- > "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a" 60c60 < "instanceType": "g4dn.xlarge", --- > "instanceType": "m5.xlarge", ---- . Create the GPU-enabled compute machine set from the definition by running the following command: + [source,terminal] ---- $ oc create -f preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json ---- + .Example output + [source,terminal] ---- machineset.machine.openshift.io/preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a created ---- .Verification . View the machine set you created by running the following command: + [source,terminal] ---- $ oc -n openshift-machine-api get machinesets | grep gpu ---- + The MachineSet replica count is set to `1` so a new `Machine` object is created automatically. + .Example output + [source,terminal] ---- preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a 1 1 1 1 4m21s ---- . View the `Machine` object that the machine set created by running the following command: + [source,terminal] ---- $ oc -n openshift-machine-api get machines | grep gpu ---- + .Example output + [source,terminal] ---- preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a running g4dn.xlarge us-east-2 us-east-2a 4m36s ---- Note that there is no need to specify a namespace for the node. The node definition is cluster scoped.