1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00
Files
openshift-docs/modules/das-operator-deploying-workloads.adoc
2025-08-20 13:42:46 +00:00

150 lines
4.0 KiB
Plaintext

// Module included in the following assemblies:
//
// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
//
:_mod-docs-content-type: PROCEDURE
[id="das-operator-deploying-workloads_{context}"]
= Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
You can deploy workloads that request GPU slices managed by the Dynamic Accelerator Slicer (DAS) Operator. The Operator dynamically partitions GPU accelerators and schedules workloads to available GPU slices.
.Prerequisites
* You have MIG supported GPU hardware available in your cluster.
* The NVIDIA GPU Operator is installed and the `ClusterPolicy` shows a **Ready** state.
* You have installed the DAS Operator.
.Procedure
. Create a namespace by running the following command:
+
[source,terminal]
----
oc new-project cuda-workloads
----
. Create a deployment that requests GPU resources using the NVIDIA MIG resource:
+
[source,yaml]
----
apiVersion: apps/v1
kind: Deployment
metadata:
name: cuda-vectoradd
spec:
replicas: 2
selector:
matchLabels:
app: cuda-vectoradd
template:
metadata:
labels:
app: cuda-vectoradd
spec:
restartPolicy: Always
containers:
- name: cuda-vectoradd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
resources:
limits:
nvidia.com/mig-1g.5gb: "1"
command:
- sh
- -c
- |
env && /cuda-samples/vectorAdd && sleep 3600
----
. Apply the deployment configuration by running the following command:
+
[source,terminal]
----
$ oc apply -f cuda-vectoradd-deployment.yaml
----
. Verify that the deployment is created and pods are scheduled by running the following command:
+
[source,terminal]
----
$ oc get deployment cuda-vectoradd
----
+
.Example output
[source,terminal]
----
NAME READY UP-TO-DATE AVAILABLE AGE
cuda-vectoradd 2/2 2 2 2m
----
. Check the status of the pods by running the following command:
+
[source,terminal]
----
$ oc get pods -l app=cuda-vectoradd
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
cuda-vectoradd-6b8c7d4f9b-abc12 1/1 Running 0 2m
cuda-vectoradd-6b8c7d4f9b-def34 1/1 Running 0 2m
----
.Verification
. Check that `AllocationClaim` resources were created for your deployment pods by running the following command:
+
[source,terminal]
----
$ oc get allocationclaims -n das-operator
----
+
.Example output
[source,terminal]
----
NAME AGE
13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 2m
ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 2m
----
. Verify that the GPU slices are properly allocated by checking one of the pod's resource allocation by running the following command:
+
[source,terminal]
----
$ oc describe pod -l app=cuda-vectoradd
----
. Check the logs to verify the CUDA sample application runs successfully by running the following command:
+
[source,terminal]
----
$ oc logs -l app=cuda-vectoradd
----
+
.Example output
[source,terminal]
----
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
----
. Check the environment variables to verify that the GPU devices are properly exposed to the container by running the following command:
+
[source,terminal]
----
$ oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
----
+
.Example output
[source,terminal]
----
NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
----
+
These environment variables indicate that the GPU MIG slice has been properly allocated and is visible to the CUDA runtime within the container.