mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
150 lines
4.0 KiB
Plaintext
150 lines
4.0 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
|
|
//
|
|
:_mod-docs-content-type: PROCEDURE
|
|
[id="das-operator-deploying-workloads_{context}"]
|
|
= Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
|
|
|
|
You can deploy workloads that request GPU slices managed by the Dynamic Accelerator Slicer (DAS) Operator. The Operator dynamically partitions GPU accelerators and schedules workloads to available GPU slices.
|
|
|
|
.Prerequisites
|
|
|
|
* You have MIG supported GPU hardware available in your cluster.
|
|
* The NVIDIA GPU Operator is installed and the `ClusterPolicy` shows a **Ready** state.
|
|
* You have installed the DAS Operator.
|
|
|
|
.Procedure
|
|
|
|
. Create a namespace by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
oc new-project cuda-workloads
|
|
----
|
|
|
|
. Create a deployment that requests GPU resources using the NVIDIA MIG resource:
|
|
+
|
|
[source,yaml]
|
|
----
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: cuda-vectoradd
|
|
spec:
|
|
replicas: 2
|
|
selector:
|
|
matchLabels:
|
|
app: cuda-vectoradd
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: cuda-vectoradd
|
|
spec:
|
|
restartPolicy: Always
|
|
containers:
|
|
- name: cuda-vectoradd
|
|
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
|
|
resources:
|
|
limits:
|
|
nvidia.com/mig-1g.5gb: "1"
|
|
command:
|
|
- sh
|
|
- -c
|
|
- |
|
|
env && /cuda-samples/vectorAdd && sleep 3600
|
|
----
|
|
|
|
. Apply the deployment configuration by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc apply -f cuda-vectoradd-deployment.yaml
|
|
----
|
|
|
|
. Verify that the deployment is created and pods are scheduled by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get deployment cuda-vectoradd
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME READY UP-TO-DATE AVAILABLE AGE
|
|
cuda-vectoradd 2/2 2 2 2m
|
|
----
|
|
|
|
. Check the status of the pods by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get pods -l app=cuda-vectoradd
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME READY STATUS RESTARTS AGE
|
|
cuda-vectoradd-6b8c7d4f9b-abc12 1/1 Running 0 2m
|
|
cuda-vectoradd-6b8c7d4f9b-def34 1/1 Running 0 2m
|
|
----
|
|
|
|
.Verification
|
|
|
|
. Check that `AllocationClaim` resources were created for your deployment pods by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get allocationclaims -n das-operator
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME AGE
|
|
13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 2m
|
|
ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 2m
|
|
----
|
|
|
|
. Verify that the GPU slices are properly allocated by checking one of the pod's resource allocation by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc describe pod -l app=cuda-vectoradd
|
|
----
|
|
|
|
. Check the logs to verify the CUDA sample application runs successfully by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc logs -l app=cuda-vectoradd
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
[Vector addition of 50000 elements]
|
|
Copy input data from the host memory to the CUDA device
|
|
CUDA kernel launch with 196 blocks of 256 threads
|
|
Copy output data from the CUDA device to the host memory
|
|
Test PASSED
|
|
----
|
|
|
|
. Check the environment variables to verify that the GPU devices are properly exposed to the container by running the following command:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
|
|
CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
|
|
----
|
|
+
|
|
These environment variables indicate that the GPU MIG slice has been properly allocated and is visible to the CUDA runtime within the container.
|