mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
200 lines
6.5 KiB
Plaintext
200 lines
6.5 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * machine_management/creating-machinesets/creating-machineset-aws.adoc
|
|
|
|
:_content-type: PROCEDURE
|
|
[id="nvidia-gpu-aws-adding-a-gpu-node_{context}"]
|
|
= Adding a GPU node to an existing {product-title} cluster
|
|
|
|
You can copy and modify a default compute machine set configuration to create a GPU-enabled machine set and machines for the AWS EC2 cloud provider.
|
|
|
|
The following table lists the validated instance types:
|
|
|
|
[cols="1,1,1,1"]
|
|
|===
|
|
|Instance type |NVIDIA GPU accelerator |Maximum number of GPUs |Architecture
|
|
|
|
|`p4d.24xlarge`
|
|
|A100
|
|
|8
|
|
|x86
|
|
|
|
|`g4dn.xlarge`
|
|
|T4
|
|
|1
|
|
|x86
|
|
|===
|
|
|
|
.Procedure
|
|
|
|
. View the existing nodes, machines, and machine sets by running the following command. Note that each node is an instance of a machine definition with a specific AWS region and {product-title} role.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get nodes
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME STATUS ROLES AGE VERSION
|
|
ip-10-0-52-50.us-east-2.compute.internal Ready worker 3d17h v1.27.3
|
|
ip-10-0-58-24.us-east-2.compute.internal Ready control-plane,master 3d17h v1.27.3
|
|
ip-10-0-68-148.us-east-2.compute.internal Ready worker 3d17h v1.27.3
|
|
ip-10-0-68-68.us-east-2.compute.internal Ready control-plane,master 3d17h v1.27.3
|
|
ip-10-0-72-170.us-east-2.compute.internal Ready control-plane,master 3d17h v1.27.3
|
|
ip-10-0-74-50.us-east-2.compute.internal Ready worker 3d17h v1.27.3
|
|
----
|
|
|
|
. View the machines and machine sets that exist in the `openshift-machine-api` namespace by running the following command. Each compute machine set is associated with a different availability zone within the AWS region. The installer automatically load balances compute machines across availability zones.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machinesets -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME DESIRED CURRENT READY AVAILABLE AGE
|
|
preserve-dsoc12r4-ktjfc-worker-us-east-2a 1 1 1 1 3d11h
|
|
preserve-dsoc12r4-ktjfc-worker-us-east-2b 2 2 2 2 3d11h
|
|
----
|
|
|
|
. View the machines that exist in the `openshift-machine-api` namespace by running the following command. At this time, there is only one compute machine per machine set, though a compute machine set could be scaled to add a node in a particular region and zone.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machines -n openshift-machine-api | grep worker
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
preserve-dsoc12r4-ktjfc-worker-us-east-2a-dts8r Running m5.xlarge us-east-2 us-east-2a 3d11h
|
|
preserve-dsoc12r4-ktjfc-worker-us-east-2b-dkv7w Running m5.xlarge us-east-2 us-east-2b 3d11h
|
|
preserve-dsoc12r4-ktjfc-worker-us-east-2b-k58cw Running m5.xlarge us-east-2 us-east-2b 3d11h
|
|
----
|
|
|
|
. Make a copy of one of the existing compute `MachineSet` definitions and output the result to a JSON file by running the following command. This will be the basis for the GPU-enabled compute machine set definition.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset preserve-dsoc12r4-ktjfc-worker-us-east-2a -n openshift-machine-api -o json > <output_file.json>
|
|
----
|
|
|
|
. Edit the JSON file and make the following changes to the new `MachineSet` definition:
|
|
+
|
|
* Replace `worker` with `gpu`. This will be the name of the new machine set.
|
|
* Change the instance type of the new `MachineSet` definition to `g4dn`, which includes an NVIDIA Tesla T4 GPU.
|
|
To learn more about AWS `g4dn` instance types, see link:https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing[Accelerated Computing].
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ jq .spec.template.spec.providerSpec.value.instanceType preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json
|
|
|
|
"g4dn.xlarge"
|
|
----
|
|
+
|
|
The `<output_file.json>` file is saved as `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`.
|
|
|
|
. Update the following fields in `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`:
|
|
+
|
|
* `.metadata.name` to a name containing `gpu`.
|
|
|
|
* `.spec.selector.matchLabels["machine.openshift.io/cluster-api-machineset"]` to
|
|
match the new `.metadata.name`.
|
|
|
|
* `.spec.template.metadata.labels["machine.openshift.io/cluster-api-machineset"]`
|
|
to match the new `.metadata.name`.
|
|
|
|
* `.spec.template.spec.providerSpec.value.instanceType` to `g4dn.xlarge`.
|
|
|
|
. To verify your changes, perform a `diff` of the original compute definition and the new GPU-enabled node definition by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc -n openshift-machine-api get preserve-dsoc12r4-ktjfc-worker-us-east-2a -o json | diff preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json -
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
10c10
|
|
|
|
< "name": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a",
|
|
---
|
|
> "name": "preserve-dsoc12r4-ktjfc-worker-us-east-2a",
|
|
|
|
21c21
|
|
|
|
< "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a"
|
|
---
|
|
> "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a"
|
|
|
|
31c31
|
|
|
|
< "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a"
|
|
---
|
|
> "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a"
|
|
|
|
60c60
|
|
|
|
< "instanceType": "g4dn.xlarge",
|
|
---
|
|
> "instanceType": "m5.xlarge",
|
|
----
|
|
|
|
. Create the GPU-enabled compute machine set from the definition by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc create -f preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
machineset.machine.openshift.io/preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a created
|
|
----
|
|
|
|
.Verification
|
|
|
|
. View the machine set you created by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc -n openshift-machine-api get machinesets | grep gpu
|
|
----
|
|
+
|
|
The MachineSet replica count is set to `1` so a new `Machine` object is created automatically.
|
|
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a 1 1 1 1 4m21s
|
|
----
|
|
|
|
. View the `Machine` object that the machine set created by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc -n openshift-machine-api get machines | grep gpu
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a running g4dn.xlarge us-east-2 us-east-2a 4m36s
|
|
----
|
|
|
|
Note that there is no need to specify a namespace for the node. The node definition is cluster scoped.
|