mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
436 lines
14 KiB
Plaintext
436 lines
14 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * machine_management/creating-machinesets/creating-machineset-azure.adoc
|
|
|
|
:_content-type: PROCEDURE
|
|
[id="nvidia-gpu-aws-adding-a-gpu-node_{context}"]
|
|
= Adding a GPU node to an existing {product-title} cluster
|
|
|
|
You can copy and modify a default compute machine set configuration to create a GPU-enabled machine set and machines for the Azure cloud provider.
|
|
|
|
The following table lists the validated instance types:
|
|
|
|
[cols="1,1,1,1"]
|
|
|===
|
|
|vmSize |NVIDIA GPU accelerator |Maximum number of GPUs |Architecture
|
|
|
|
|`Standard_NC24s_v3`
|
|
|V100
|
|
|4
|
|
|x86
|
|
|
|
|`Standard_NC4as_T4_v3`
|
|
|T4
|
|
|1
|
|
|x86
|
|
|
|
|`ND A100 v4`
|
|
|A100
|
|
|8
|
|
|x86
|
|
|===
|
|
|
|
[NOTE]
|
|
====
|
|
By default, Azure subscriptions do not have a quota for the Azure instance types with GPU. Customers have to request a quota increase for the Azure instance families listed above.
|
|
====
|
|
|
|
.Procedure
|
|
|
|
. View the machines and machine sets that exist in the `openshift-machine-api` namespace
|
|
by running the following command. Each compute machine set is associated with a different availability zone within the Azure region.
|
|
The installer automatically load balances compute machines across availability zones.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME DESIRED CURRENT READY AVAILABLE AGE
|
|
myclustername-worker-centralus1 1 1 1 1 6h9m
|
|
myclustername-worker-centralus2 1 1 1 1 6h9m
|
|
myclustername-worker-centralus3 1 1 1 1 6h9m
|
|
----
|
|
|
|
. Make a copy of one of the existing compute `MachineSet` definitions and output the result to a YAML file by running the following command.
|
|
This will be the basis for the GPU-enabled compute machine set definition.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset -n openshift-machine-api myclustername-worker-centralus1 -o yaml > machineset-azure.yaml
|
|
----
|
|
|
|
. View the content of the machineset:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ cat machineset-azure.yaml
|
|
----
|
|
+
|
|
.Example `machineset-azure.yaml` file
|
|
+
|
|
[source,yaml]
|
|
----
|
|
apiVersion: machine.openshift.io/v1beta1
|
|
kind: MachineSet
|
|
metadata:
|
|
annotations:
|
|
machine.openshift.io/GPU: "0"
|
|
machine.openshift.io/memoryMb: "16384"
|
|
machine.openshift.io/vCPU: "4"
|
|
creationTimestamp: "2023-02-06T14:08:19Z"
|
|
generation: 1
|
|
labels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machine-role: worker
|
|
machine.openshift.io/cluster-api-machine-type: worker
|
|
name: myclustername-worker-centralus1
|
|
namespace: openshift-machine-api
|
|
resourceVersion: "23601"
|
|
uid: acd56e0c-7612-473a-ae37-8704f34b80de
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1
|
|
template:
|
|
metadata:
|
|
labels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machine-role: worker
|
|
machine.openshift.io/cluster-api-machine-type: worker
|
|
machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1
|
|
spec:
|
|
lifecycleHooks: {}
|
|
metadata: {}
|
|
providerSpec:
|
|
value:
|
|
acceleratedNetworking: true
|
|
apiVersion: machine.openshift.io/v1beta1
|
|
credentialsSecret:
|
|
name: azure-cloud-credentials
|
|
namespace: openshift-machine-api
|
|
diagnostics: {}
|
|
image:
|
|
offer: ""
|
|
publisher: ""
|
|
resourceID: /resourceGroups/myclustername-rg/providers/Microsoft.Compute/galleries/gallery_myclustername_n6n4r/images/myclustername-gen2/versions/latest
|
|
sku: ""
|
|
version: ""
|
|
kind: AzureMachineProviderSpec
|
|
location: centralus
|
|
managedIdentity: myclustername-identity
|
|
metadata:
|
|
creationTimestamp: null
|
|
networkResourceGroup: myclustername-rg
|
|
osDisk:
|
|
diskSettings: {}
|
|
diskSizeGB: 128
|
|
managedDisk:
|
|
storageAccountType: Premium_LRS
|
|
osType: Linux
|
|
publicIP: false
|
|
publicLoadBalancer: myclustername
|
|
resourceGroup: myclustername-rg
|
|
spotVMOptions: {}
|
|
subnet: myclustername-worker-subnet
|
|
userDataSecret:
|
|
name: worker-user-data
|
|
vmSize: Standard_D4s_v3
|
|
vnet: myclustername-vnet
|
|
zone: "1"
|
|
status:
|
|
availableReplicas: 1
|
|
fullyLabeledReplicas: 1
|
|
observedGeneration: 1
|
|
readyReplicas: 1
|
|
replicas: 1
|
|
----
|
|
|
|
. Make a copy of the `machineset-azure.yaml` file by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ cp machineset-azure.yaml machineset-azure-gpu.yaml
|
|
----
|
|
|
|
. Update the following fields in `machineset-azure-gpu.yaml`:
|
|
+
|
|
* Change `.metadata.name` to a name containing `gpu`.
|
|
|
|
* Change `.spec.selector.matchLabels["machine.openshift.io/cluster-api-machineset"]` to match the new .metadata.name.
|
|
|
|
* Change `.spec.template.metadata.labels["machine.openshift.io/cluster-api-machineset"]` to match the new `.metadata.name`.
|
|
|
|
* Change `.spec.template.spec.providerSpec.value.vmSize` to `Standard_NC4as_T4_v3`.
|
|
+
|
|
.Example `machineset-azure-gpu.yaml` file
|
|
+
|
|
[source,yaml]
|
|
----
|
|
apiVersion: machine.openshift.io/v1beta1
|
|
kind: MachineSet
|
|
metadata:
|
|
annotations:
|
|
machine.openshift.io/GPU: "1"
|
|
machine.openshift.io/memoryMb: "28672"
|
|
machine.openshift.io/vCPU: "4"
|
|
creationTimestamp: "2023-02-06T20:27:12Z"
|
|
generation: 1
|
|
labels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machine-role: worker
|
|
machine.openshift.io/cluster-api-machine-type: worker
|
|
name: myclustername-nc4ast4-gpu-worker-centralus1
|
|
namespace: openshift-machine-api
|
|
resourceVersion: "166285"
|
|
uid: 4eedce7f-6a57-4abe-b529-031140f02ffa
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1
|
|
template:
|
|
metadata:
|
|
labels:
|
|
machine.openshift.io/cluster-api-cluster: myclustername
|
|
machine.openshift.io/cluster-api-machine-role: worker
|
|
machine.openshift.io/cluster-api-machine-type: worker
|
|
machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1
|
|
spec:
|
|
lifecycleHooks: {}
|
|
metadata: {}
|
|
providerSpec:
|
|
value:
|
|
acceleratedNetworking: true
|
|
apiVersion: machine.openshift.io/v1beta1
|
|
credentialsSecret:
|
|
name: azure-cloud-credentials
|
|
namespace: openshift-machine-api
|
|
diagnostics: {}
|
|
image:
|
|
offer: ""
|
|
publisher: ""
|
|
resourceID: /resourceGroups/myclustername-rg/providers/Microsoft.Compute/galleries/gallery_myclustername_n6n4r/images/myclustername-gen2/versions/latest
|
|
sku: ""
|
|
version: ""
|
|
kind: AzureMachineProviderSpec
|
|
location: centralus
|
|
managedIdentity: myclustername-identity
|
|
metadata:
|
|
creationTimestamp: null
|
|
networkResourceGroup: myclustername-rg
|
|
osDisk:
|
|
diskSettings: {}
|
|
diskSizeGB: 128
|
|
managedDisk:
|
|
storageAccountType: Premium_LRS
|
|
osType: Linux
|
|
publicIP: false
|
|
publicLoadBalancer: myclustername
|
|
resourceGroup: myclustername-rg
|
|
spotVMOptions: {}
|
|
subnet: myclustername-worker-subnet
|
|
userDataSecret:
|
|
name: worker-user-data
|
|
vmSize: Standard_NC4as_T4_v3
|
|
vnet: myclustername-vnet
|
|
zone: "1"
|
|
status:
|
|
availableReplicas: 1
|
|
fullyLabeledReplicas: 1
|
|
observedGeneration: 1
|
|
readyReplicas: 1
|
|
replicas: 1
|
|
----
|
|
|
|
. To verify your changes, perform a `diff` of the original compute definition and the new GPU-enabled node definition by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ diff machineset-azure.yaml machineset-azure-gpu.yaml
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
14c14
|
|
< name: myclustername-worker-centralus1
|
|
---
|
|
> name: myclustername-nc4ast4-gpu-worker-centralus1
|
|
23c23
|
|
< machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1
|
|
---
|
|
> machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1
|
|
30c30
|
|
< machine.openshift.io/cluster-api-machineset: myclustername-worker-centralus1
|
|
---
|
|
> machine.openshift.io/cluster-api-machineset: myclustername-nc4ast4-gpu-worker-centralus1
|
|
67c67
|
|
< vmSize: Standard_D4s_v3
|
|
---
|
|
> vmSize: Standard_NC4as_T4_v3
|
|
----
|
|
|
|
. Create the GPU-enabled compute machine set from the definition file by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc create -f machineset-azure-gpu.yaml
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
machineset.machine.openshift.io/myclustername-nc4ast4-gpu-worker-centralus1 created
|
|
----
|
|
|
|
. View the machines and machine sets that exist in the `openshift-machine-api` namespace
|
|
by running the following command. Each compute machine set is associated with a
|
|
different availability zone within the Azure region.
|
|
The installer automatically load balances compute machines across availability zones.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME DESIRED CURRENT READY AVAILABLE AGE
|
|
clustername-n6n4r-nc4ast4-gpu-worker-centralus1 1 1 1 1 122m
|
|
clustername-n6n4r-worker-centralus1 1 1 1 1 8h
|
|
clustername-n6n4r-worker-centralus2 1 1 1 1 8h
|
|
clustername-n6n4r-worker-centralus3 1 1 1 1 8h
|
|
----
|
|
|
|
. View the machines that exist in the `openshift-machine-api` namespace by running the following command. You can only configure one compute machine per set, although you can scale a compute machine set to add a node in a particular region and zone.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machines -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME PHASE TYPE REGION ZONE AGE
|
|
myclustername-master-0 Running Standard_D8s_v3 centralus 2 6h40m
|
|
myclustername-master-1 Running Standard_D8s_v3 centralus 1 6h40m
|
|
myclustername-master-2 Running Standard_D8s_v3 centralus 3 6h40m
|
|
myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Running centralus 1 21m
|
|
myclustername-worker-centralus1-rbh6b Running Standard_D4s_v3 centralus 1 6h38m
|
|
myclustername-worker-centralus2-dbz7w Running Standard_D4s_v3 centralus 2 6h38m
|
|
myclustername-worker-centralus3-p9b8c Running Standard_D4s_v3 centralus 3 6h38m
|
|
----
|
|
|
|
. View the existing nodes, machines, and machine sets by running the following command. Note that each node is an instance of a machine definition with a specific Azure region and {product-title} role.
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get nodes
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME STATUS ROLES AGE VERSION
|
|
myclustername-master-0 Ready control-plane,master 6h39m v1.27.3
|
|
myclustername-master-1 Ready control-plane,master 6h41m v1.27.3
|
|
myclustername-master-2 Ready control-plane,master 6h39m v1.27.3
|
|
myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Ready worker 14m v1.27.3
|
|
myclustername-worker-centralus1-rbh6b Ready worker 6h29m v1.27.3
|
|
myclustername-worker-centralus2-dbz7w Ready worker 6h29m v1.27.3
|
|
myclustername-worker-centralus3-p9b8c Ready worker 6h31m v1.27.3
|
|
----
|
|
|
|
. View the list of compute machine sets:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME DESIRED CURRENT READY AVAILABLE AGE
|
|
myclustername-worker-centralus1 1 1 1 1 8h
|
|
myclustername-worker-centralus2 1 1 1 1 8h
|
|
myclustername-worker-centralus3 1 1 1 1 8h
|
|
----
|
|
|
|
. Create the GPU-enabled compute machine set from the definition file by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc create -f machineset-azure-gpu.yaml
|
|
----
|
|
|
|
. View the list of compute machine sets:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
oc get machineset -n openshift-machine-api
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
NAME DESIRED CURRENT READY AVAILABLE AGE
|
|
myclustername-nc4ast4-gpu-worker-centralus1 1 1 1 1 121m
|
|
myclustername-worker-centralus1 1 1 1 1 8h
|
|
myclustername-worker-centralus2 1 1 1 1 8h
|
|
myclustername-worker-centralus3 1 1 1 1 8h
|
|
----
|
|
|
|
.Verification
|
|
|
|
. View the machine set you created by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get machineset -n openshift-machine-api | grep gpu
|
|
----
|
|
+
|
|
The MachineSet replica count is set to `1` so a new `Machine` object is created automatically.
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
myclustername-nc4ast4-gpu-worker-centralus1 1 1 1 1 121m
|
|
----
|
|
|
|
. View the `Machine` object that the machine set created by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc -n openshift-machine-api get machines | grep gpu
|
|
----
|
|
+
|
|
.Example output
|
|
+
|
|
[source,terminal]
|
|
----
|
|
myclustername-nc4ast4-gpu-worker-centralus1-w9bqn Running Standard_NC4as_T4_v3 centralus 1 21m
|
|
----
|
|
|
|
[NOTE]
|
|
====
|
|
There is no need to specify a namespace for the node. The node definition is cluster scoped.
|
|
====
|