// Module included in the following assemblies: // // * machine_management/creating-machinesets/creating-machineset-azure.adoc :_mod-docs-content-type: PROCEDURE [id="nvidia-gpu-aws-adding-a-gpu-node_{context}"] = Adding a GPU node to an existing {product-title} cluster You can copy and modify a default compute machine set configuration to create a GPU-enabled machine set and machines for the Azure cloud provider. The following table lists the validated instance types: [cols="1,1,1,1"] |=== |vmSize |NVIDIA GPU accelerator |Maximum number of GPUs |Architecture |`Standard_NC24s_v3` |V100 |4 |x86 |`Standard_NC4as_T4_v3` |T4 |1 |x86 |`ND A100 v4` |A100 |8 |x86 |=== [NOTE] ==== By default, Azure subscriptions do not have a quota for the Azure instance types with GPU. Customers have to request a quota increase for the Azure instance families listed above. ==== .Procedure . View the machines and machine sets that exist in the `openshift-machine-api` namespace by running the following command. Each compute machine set is associated with a different availability zone within the Azure region. The installer automatically load balances compute machines across availability zones. + [source,terminal] ---- $ oc get machineset -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME DESIRED CURRENT READY AVAILABLE AGE myclustername-worker-centralus1 1 1 1 1 6h9m myclustername-worker-centralus2 1 1 1 1 6h9m myclustername-worker-centralus3 1 1 1 1 6h9m ---- . Make a copy of one of the existing compute `MachineSet` definitions and output the result to a YAML file by running the following command. This will be the basis for the GPU-enabled compute machine set definition. + [source,terminal] ---- $ oc get machineset -n openshift-machine-api myclustername-worker-centralus1 -o yaml > machineset-azure.yaml ---- . View the content of the machineset: + [source,terminal] ---- $ cat machineset-azure.yaml ---- + .Example `machineset-azure.yaml` file + [source,yaml] ---- apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/GPU: "0" machine.openshift.io/memoryMb: "16384" machine.openshift.io/vCPU: "4" creationTimestamp: "2023-02-06T14:08:19Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: myclustername-worker-centralus1 namespace: openshift-machine-api resourceVersion: "23601" uid: acd56e0c-7612-473a-ae37-8704f34b80de spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1 template: metadata: labels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1 spec: lifecycleHooks: {} metadata: {} providerSpec: value: acceleratedNetworking: true apiVersion: machine.openshift.io/v1beta1 credentialsSecret: name: azure-cloud-credentials namespace: openshift-machine-api diagnostics: {} image: offer: "" publisher: "" resourceID: /resourceGroups/myclustername-rg/providers/Microsoft.Compute/galleries/gallery_myclustername_n6n4r/images/myclustername-gen2/versions/latest sku: "" version: "" kind: AzureMachineProviderSpec location: centralus managedIdentity: myclustername-identity metadata: creationTimestamp: null networkResourceGroup: myclustername-rg osDisk: diskSettings: {} diskSizeGB: 128 managedDisk: storageAccountType: Premium_LRS osType: Linux publicIP: false publicLoadBalancer: myclustername resourceGroup: myclustername-rg spotVMOptions: {} subnet: myclustername-worker-subnet userDataSecret: name: worker-user-data vmSize: Standard_D4s_v3 vnet: myclustername-vnet zone: "1" status: availableReplicas: 1 fullyLabeledReplicas: 1 observedGeneration: 1 readyReplicas: 1 replicas: 1 ---- . Make a copy of the `machineset-azure.yaml` file by running the following command: + [source,terminal] ---- $ cp machineset-azure.yaml machineset-azure-gpu.yaml ---- . Update the following fields in `machineset-azure-gpu.yaml`: + * Change `.metadata.name` to a name containing `gpu`. * Change `.spec.selector.matchLabels["machine.openshift.io/cluster-api-machineset"]` to match the new .metadata.name. * Change `.spec.template.metadata.labels["machine.openshift.io/cluster-api-machineset"]` to match the new `.metadata.name`. * Change `.spec.template.spec.providerSpec.value.vmSize` to `Standard_NC4as_T4_v3`. + .Example `machineset-azure-gpu.yaml` file + [source,yaml] ---- apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/GPU: "1" machine.openshift.io/memoryMb: "28672" machine.openshift.io/vCPU: "4" creationTimestamp: "2023-02-06T20:27:12Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: myclustername-nc4ast4-gpu-worker-centralus1 namespace: openshift-machine-api resourceVersion: "166285" uid: 4eedce7f-6a57-4abe-b529-031140f02ffa spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1 template: metadata: labels: machine.openshift.io/cluster-api-cluster: myclustername machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1 spec: lifecycleHooks: {} metadata: {} providerSpec: value: acceleratedNetworking: true apiVersion: machine.openshift.io/v1beta1 credentialsSecret: name: azure-cloud-credentials namespace: openshift-machine-api diagnostics: {} image: offer: "" publisher: "" resourceID: /resourceGroups/myclustername-rg/providers/Microsoft.Compute/galleries/gallery_myclustername_n6n4r/images/myclustername-gen2/versions/latest sku: "" version: "" kind: AzureMachineProviderSpec location: centralus managedIdentity: myclustername-identity metadata: creationTimestamp: null networkResourceGroup: myclustername-rg osDisk: diskSettings: {} diskSizeGB: 128 managedDisk: storageAccountType: Premium_LRS osType: Linux publicIP: false publicLoadBalancer: myclustername resourceGroup: myclustername-rg spotVMOptions: {} subnet: myclustername-worker-subnet userDataSecret: name: worker-user-data vmSize: Standard_NC4as_T4_v3 vnet: myclustername-vnet zone: "1" status: availableReplicas: 1 fullyLabeledReplicas: 1 observedGeneration: 1 readyReplicas: 1 replicas: 1 ---- . To verify your changes, perform a `diff` of the original compute definition and the new GPU-enabled node definition by running the following command: + [source,terminal] ---- $ diff machineset-azure.yaml machineset-azure-gpu.yaml ---- + .Example output [source,terminal] ---- 14c14 < name: myclustername-worker-centralus1 --- > name: myclustername-nc4ast4-gpu-worker-centralus1 23c23 < machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1 --- > machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1 30c30 < machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1 --- > machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1 67c67 < vmSize: Standard_D4s_v3 --- > vmSize: Standard_NC4as_T4_v3 ---- . Create the GPU-enabled compute machine set from the definition file by running the following command: + [source,terminal] ---- $ oc create -f machineset-azure-gpu.yaml ---- + .Example output + [source,terminal] ---- machineset.machine.openshift.io/myclustername-nc4ast4-gpu-worker-centralus1 created ---- . View the machines and machine sets that exist in the `openshift-machine-api` namespace by running the following command. Each compute machine set is associated with a different availability zone within the Azure region. The installer automatically load balances compute machines across availability zones. + [source,terminal] ---- $ oc get machineset -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME DESIRED CURRENT READY AVAILABLE AGE clustername-n6n4r-nc4ast4-gpu-worker-centralus1 1 1 1 1 122m clustername-n6n4r-worker-centralus1 1 1 1 1 8h clustername-n6n4r-worker-centralus2 1 1 1 1 8h clustername-n6n4r-worker-centralus3 1 1 1 1 8h ---- . View the machines that exist in the `openshift-machine-api` namespace by running the following command. You can only configure one compute machine per set, although you can scale a compute machine set to add a node in a particular region and zone. + [source,terminal] ---- $ oc get machines -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME PHASE TYPE REGION ZONE AGE myclustername-master-0 Running Standard_D8s_v3 centralus 2 6h40m myclustername-master-1 Running Standard_D8s_v3 centralus 1 6h40m myclustername-master-2 Running Standard_D8s_v3 centralus 3 6h40m myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Running centralus 1 21m myclustername-worker-centralus1-rbh6b Running Standard_D4s_v3 centralus 1 6h38m myclustername-worker-centralus2-dbz7w Running Standard_D4s_v3 centralus 2 6h38m myclustername-worker-centralus3-p9b8c Running Standard_D4s_v3 centralus 3 6h38m ---- . View the existing nodes, machines, and machine sets by running the following command. Note that each node is an instance of a machine definition with a specific Azure region and {product-title} role. + [source,terminal] ---- $ oc get nodes ---- + .Example output + [source,terminal] ---- NAME STATUS ROLES AGE VERSION myclustername-master-0 Ready control-plane,master 6h39m v1.34.2 myclustername-master-1 Ready control-plane,master 6h41m v1.34.2 myclustername-master-2 Ready control-plane,master 6h39m v1.34.2 myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Ready worker 14m v1.34.2 myclustername-worker-centralus1-rbh6b Ready worker 6h29m v1.34.2 myclustername-worker-centralus2-dbz7w Ready worker 6h29m v1.34.2 myclustername-worker-centralus3-p9b8c Ready worker 6h31m v1.34.2 ---- . View the list of compute machine sets: + [source,terminal] ---- $ oc get machineset -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME DESIRED CURRENT READY AVAILABLE AGE myclustername-worker-centralus1 1 1 1 1 8h myclustername-worker-centralus2 1 1 1 1 8h myclustername-worker-centralus3 1 1 1 1 8h ---- . Create the GPU-enabled compute machine set from the definition file by running the following command: + [source,terminal] ---- $ oc create -f machineset-azure-gpu.yaml ---- . View the list of compute machine sets: + [source,terminal] ---- oc get machineset -n openshift-machine-api ---- + .Example output + [source,terminal] ---- NAME DESIRED CURRENT READY AVAILABLE AGE myclustername-nc4ast4-gpu-worker-centralus1 1 1 1 1 121m myclustername-worker-centralus1 1 1 1 1 8h myclustername-worker-centralus2 1 1 1 1 8h myclustername-worker-centralus3 1 1 1 1 8h ---- .Verification . View the machine set you created by running the following command: + [source,terminal] ---- $ oc get machineset -n openshift-machine-api | grep gpu ---- + The MachineSet replica count is set to `1` so a new `Machine` object is created automatically. + .Example output + [source,terminal] ---- myclustername-nc4ast4-gpu-worker-centralus1 1 1 1 1 121m ---- . View the `Machine` object that the machine set created by running the following command: + [source,terminal] ---- $ oc -n openshift-machine-api get machines | grep gpu ---- + .Example output + [source,terminal] ---- myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Running Standard_NC4as_T4_v3 centralus 1 21m ---- [NOTE] ==== There is no need to specify a namespace for the node. The node definition is cluster scoped. ====