TELCODOCS-2140: First draft

2026-02-05 12:46:18 +01:00 · 2025-08-08 13:00:16 -04:00
parent 8d17975692
commit 7cb7177663
10 changed files with 1213 additions and 0 deletions
--- a/_topic_maps/_topic_map.yml
+++ b/_topic_maps/_topic_map.yml
@@ -3581,6 +3581,8 @@ Topics:
  File: gaudi-ai-accelerator
 - Name: Remote Direct Memory Access (RDMA)
  File: rdma-remote-direct-memory-access
+- Name: Dynamic Accelerator Slicer (DAS) Operator
+  File: das-about-dynamic-accelerator-slicer-operator
 ---
 Name: Backup and restore
 Dir: backup_and_restore
--- a/hardware_accelerators/das-about-dynamic-accelerator-slicer-operator.adoc
+++ b/hardware_accelerators/das-about-dynamic-accelerator-slicer-operator.adoc
@@ -0,0 +1,81 @@
+:_mod-docs-content-type: ASSEMBLY
+[id="das-about-dynamic-accelerator-slicer-operator"]
+= Dynamic Accelerator Slicer (DAS) Operator 
+include::_attributes/common-attributes.adoc[]
+:context: das-about-dynamic-accelerator-slicer-operator
+
+toc::[]
+
+:FeatureName: Dynamic Accelerator Slicer Operator
+
+include::snippets/technology-preview.adoc[]
+
+The Dynamic Accelerator Slicer (DAS) Operator allows you to dynamically slice GPU accelerators in {product-title}, instead of relying on statically sliced GPUs defined when the node is booted. This allows you to dynamically slice GPUs based on specific workload demands, ensuring efficient resource utilization. 
+
+Dynamic slicing is useful if you do not know all the accelerator partitions needed in advance on every node on the cluster.
+
+The DAS Operator currently includes a reference implementation for NVIDIA Multi-Instance GPU (MIG) and is designed to support additional technologies such as NVIDIA MPS or GPUs from other vendors in the future.
+
+.Limitations
+
+The following limitations apply when using the Dynamic Accelerator Slicer Operator:
+
+ * You need to identify potential incompatibilities and ensure the system works seamlessly with various GPU drivers and operating systems.
+
+ * The Operator only works with specific MIG compatible NVIDIA GPUs and drivers, such as H100 and A100.
+
+ * The Operator cannot use only a subset of the GPUs of a node.
+
+ * The NVIDIA device plugin cannot be used together with the Dynamic Accelerator Slicer Operator to manage the GPU resources of a cluster.
+
+[NOTE]
+====
+The DAS Operator is designed to work with MIG-enabled GPUs. It allocates MIG slices instead of whole GPUs. Installing the DAS Operator prevents the use of the standard resource request through the NVIDIA device plugin such as `nvidia.com/gpu: "1"`, for allocating the entire GPU.
+====
+
+//Installing the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-installing.adoc[leveloffset=+1]
+
+//Installing the Dynamic Accelerator Slicer Operator using the web console
+include::modules/das-operator-installing-web-console.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+** xref:../security/cert_manager_operator/cert-manager-operator-install.adoc#cert-manager-operator-install[{cert-manager-operator}]
+** xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery (NFD) Operator]
+** link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[NVIDIA GPU Operator]
+
+** link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator#creating-nfd-cr-web-console_psap-node-feature-discovery-operator[NodeFeatureDiscovery CR]
+
+//Installing the Dynamic Accelerator Slicer Operator using the CLI
+include::modules/das-operator-installing-cli.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+* xref:../security/cert_manager_operator/cert-manager-operator-install.adoc#cert-manager-operator-install[{cert-manager-operator}]
+* xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery (NFD) Operator]
+* link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[NVIDIA GPU Operator]
+* link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator#creating-nfd-cr-cli_psap-node-feature-discovery-operator[NodeFeatureDiscovery CR]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-uninstalling.adoc[leveloffset=+1]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator using the web console
+include::modules/das-operator-uninstalling-web-console.adoc[leveloffset=+2]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator using the CLI
+include::modules/das-operator-uninstalling-cli.adoc[leveloffset=+2]
+
+//Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-deploying-workloads.adoc[leveloffset=+1]
+
+//Troubleshooting DAS Operator
+include::modules/das-operator-troubleshooting.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://github.com/kubernetes/kubernetes/issues/128043[Kubernetes issue #128043]
+* xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery Operator]
+* link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/troubleshooting.html[NVIDIA GPU Operator troubleshooting]
+
+
+
+
--- a/modules/das-operator-deploying-workloads.adoc
+++ b/modules/das-operator-deploying-workloads.adoc
@@ -0,0 +1,149 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-deploying-workloads_{context}"]
+= Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
+
+You can deploy workloads that request GPU slices managed by the Dynamic Accelerator Slicer (DAS) Operator. The Operator dynamically partitions GPU accelerators and schedules workloads to available GPU slices.
+
+.Prerequisites
+
+* You have MIG supported GPU hardware available in your cluster.
+* The NVIDIA GPU Operator is installed and the `ClusterPolicy` shows a **Ready** state.
+* You have installed the DAS Operator.
+
+.Procedure
+
+. Create a namespace by running the following command:
+
+[source,terminal]
+----
+oc new-project cuda-workloads
+----
+
+. Create a deployment that requests GPU resources using the NVIDIA MIG resource:
+
+[source,yaml]
+----
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: cuda-vectoradd
+spec:
+  replicas: 2
+  selector:
+    matchLabels:
+      app: cuda-vectoradd
+  template:
+    metadata:
+      labels:
+        app: cuda-vectoradd
+    spec:
+      restartPolicy: Always
+      containers:
+      - name: cuda-vectoradd
+        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
+        resources:
+          limits:
+            nvidia.com/mig-1g.5gb: "1"
+        command:
+          - sh
+          - -c
+          - |
+            env && /cuda-samples/vectorAdd && sleep 3600
+----
+
+. Apply the deployment configuration by running the following command:
+
+[source,terminal]
+----
+$ oc apply -f cuda-vectoradd-deployment.yaml
+----
+
+. Verify that the deployment is created and pods are scheduled by running the following command:
+
+[source,terminal]
+----
+$ oc get deployment cuda-vectoradd
+----
+
+.Example output
+[source,terminal]
+----
+NAME             READY   UP-TO-DATE   AVAILABLE   AGE
+cuda-vectoradd   2/2     2            2           2m
+----
+
+. Check the status of the pods by running the following command:
+
+[source,terminal]
+----
+$ oc get pods -l app=cuda-vectoradd
+----
+
+.Example output
+[source,terminal]
+----
+NAME                              READY   STATUS    RESTARTS   AGE
+cuda-vectoradd-6b8c7d4f9b-abc12   1/1     Running   0          2m
+cuda-vectoradd-6b8c7d4f9b-def34   1/1     Running   0          2m
+----
+
+.Verification
+
+. Check that `AllocationClaim` resources were created for your deployment pods by running the following command:
+
+[source,terminal]
+----
+$ oc get allocationclaims -n das-operator
+----
+
+.Example output
+[source,terminal]
+----
+NAME                                                                                           AGE
+13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0   2m
+ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0   2m
+----
+
+. Verify that the GPU slices are properly allocated by checking one of the pod's resource allocation by running the following command:
+
+[source,terminal]
+----
+$ oc describe pod -l app=cuda-vectoradd
+----
+
+. Check the logs to verify the CUDA sample application runs successfully by running the following command:
+
+[source,terminal]
+----
+$ oc logs -l app=cuda-vectoradd
+----
+
+.Example output
+[source,terminal]
+----
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+----
+
+. Check the environment variables to verify that the GPU devices are properly exposed to the container by running the following command:
+
+[source,terminal]
+----
+$ oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
+----
+
+.Example output
+[source,terminal]
+----
+NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
+CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
+----
+
+These environment variables indicate that the GPU MIG slice has been properly allocated and is visible to the CUDA runtime within the container.
--- a/modules/das-operator-installing-cli.adoc
+++ b/modules/das-operator-installing-cli.adoc
@@ -0,0 +1,319 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-installing-cli_{context}"]
+= Installing the Dynamic Accelerator Slicer Operator using the CLI
+
+As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift CLI.
+
+.Prerequisites
+
+* You have access to an {product-title} cluster using an account with `cluster-admin` permissions.
+* You have installed the OpenShift CLI (`oc`).
+* You have installed the required prerequisites:
+** cert-manager Operator for Red Hat OpenShift
+** Node Feature Discovery (NFD) Operator
+** NVIDIA GPU Operator
+** NodeFeatureDiscovery CR
+
+.Procedure
+
+. Configure the NVIDIA GPU Operator for MIG support:
+
+.. Apply the following cluster policy to disable the default NVIDIA device plugin and enable MIG support. Create a file named `gpu-cluster-policy.yaml` with the following content:
+
+[source,yaml]
+----
+apiVersion: nvidia.com/v1
+kind: ClusterPolicy
+metadata:
+  name: gpu-cluster-policy
+spec:
+  daemonsets:
+    rollingUpdate:
+      maxUnavailable: "1"
+    updateStrategy: RollingUpdate
+  dcgm:
+    enabled: true
+  dcgmExporter:
+    config:
+      name: ""
+    enabled: true
+    serviceMonitor:
+      enabled: true
+  devicePlugin:
+    config:
+      default: ""
+      name: ""
+    enabled: false
+    mps:
+      root: /run/nvidia/mps
+  driver:
+    certConfig:
+      name: ""
+    enabled: true
+    kernelModuleConfig:
+      name: ""
+    licensingConfig:
+      configMapName: ""
+      nlsEnabled: true
+    repoConfig:
+      configMapName: ""
+    upgradePolicy:
+      autoUpgrade: true
+      drain:
+        deleteEmptyDir: false
+        enable: false
+        force: false
+        timeoutSeconds: 300
+      maxParallelUpgrades: 1
+      maxUnavailable: 25%
+      podDeletion:
+        deleteEmptyDir: false
+        force: false
+        timeoutSeconds: 300
+      waitForCompletion:
+        timeoutSeconds: 0
+    useNvidiaDriverCRD: false
+    useOpenKernelModules: false
+    virtualTopology:
+      config: ""
+  gdrcopy:
+    enabled: false
+  gds:
+    enabled: false
+  gfd:
+    enabled: true
+  mig:
+    strategy: mixed
+  migManager:
+    config:
+      default: ""
+      name: default-mig-parted-config
+    enabled: true
+    env:
+      - name: WITH_REBOOT
+        value: 'true'
+      - name: MIG_PARTED_MODE_CHANGE_ONLY
+        value: 'true'    
+  nodeStatusExporter:
+    enabled: true
+  operator:
+    defaultRuntime: crio
+    initContainer: {}
+    runtimeClass: nvidia
+    use_ocp_driver_toolkit: true
+  sandboxDevicePlugin:
+    enabled: true
+  sandboxWorkloads:
+    defaultWorkload: container
+    enabled: false
+  toolkit:
+    enabled: true
+    installDir: /usr/local/nvidia
+  validator:
+    plugin:
+      env:
+      - name: WITH_WORKLOAD
+        value: "false"
+    cuda:
+      env:
+      - name: WITH_WORKLOAD
+        value: "false"
+  vfioManager:
+    enabled: true
+  vgpuDeviceManager:
+    enabled: true
+  vgpuManager:
+    enabled: false
+----
+
+.. Apply the cluster policy by running the following command:
+
+[source,terminal]
+----
+$ oc apply -f gpu-cluster-policy.yaml
+----
+
+.. Verify the NVIDIA GPU Operator cluster policy reaches the `Ready` state by running the following command:
+
+[source,terminal]
+----
+$ oc get clusterpolicies.nvidia.com gpu-cluster-policy -w
+----
+
+Wait until the `STATUS` column shows `ready`.
+
+.Example output
+
+[source,terminal]
+----
+NAME                 STATUS   AGE
+gpu-cluster-policy   ready    2025-08-14T08:56:45Z
+----
+
+.. Verify that all pods in the NVIDIA GPU Operator namespace are running by running the following command:
+
+[source,terminal]
+----
+$ oc get pods -n nvidia-gpu-operator
+----
+
+All pods should show a `Running` or `Completed` status.
+
+.. Label nodes with MIG-capable GPUs to enable MIG mode by running the following command:
+
+[source,terminal]
+----
+$ oc label node $NODE_NAME nvidia.com/mig.config=all-enabled --overwrite
+----
+
+Replace `$NODE_NAME` with the name of each node that has MIG-capable GPUs.
+
+[IMPORTANT]
+====
+After applying the MIG label, the labeled nodes reboot to enable MIG mode. Wait for the nodes to come back online before proceeding.
+====
+
+.. Verify that the nodes have successfully enabled MIG mode by running the following command:
+
+[source,terminal]
+----
+$ oc get nodes -l nvidia.com/mig.config=all-enabled
+----
+
+. Create a namespace for the DAS Operator:
+
+.. Create the following `Namespace` custom resource (CR) that defines the `das-operator` namespace, and save the YAML in the `das-namespace.yaml` file:
+
+[source,yaml]
+----
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: das-operator
+  labels:
+    name: das-operator
+    openshift.io/cluster-monitoring: "true"
+----
+
+.. Create the namespace by running the following command:
+
+[source,terminal]
+----
+$ oc create -f das-namespace.yaml
+----
+
+. Install the DAS Operator in the namespace you created in the previous step by creating the following objects:
+
+.. Create the following `OperatorGroup` CR and save the YAML in the `das-operatorgroup.yaml` file:
+
+[source,yaml]
+----
+apiVersion: operators.coreos.com/v1
+kind: OperatorGroup
+metadata:
+  generateName: das-operator-
+  name: das-operator
+  namespace: das-operator
+----
+
+.. Create the `OperatorGroup` CR by running the following command:
+
+[source,terminal]
+----
+$ oc create -f das-operatorgroup.yaml
+----
+
+.. Create the following `Subscription` CR and save the YAML in the `das-sub.yaml` file:
+
+.Example Subscription
+[source,yaml]
+----
+apiVersion: operators.coreos.com/v1alpha1
+kind: Subscription
+metadata:
+  name: das-operator
+  namespace: das-operator
+spec:
+  channel: "stable"
+  installPlanApproval: Automatic
+  name: das-operator
+  source: redhat-operators
+  sourceNamespace: openshift-marketplace
+----
+
+.. Create the subscription object by running the following command:
+
+[source,terminal]
+----
+$ oc create -f das-sub.yaml
+----
+
+.. Change to the `das-operator` project:
+
+[source,terminal]
+----
+$ oc project das-operator
+----
+
+.. Create the following `DASOperator` CR and save the YAML in the `das-dasoperator.yaml` file:
+
+.Example `DASOperator` CR
+[source,yaml]
+----
+apiVersion: inference.redhat.com/v1alpha1
+kind: DASOperator
+metadata:
+  name: cluster <1>
+  namespace: das-operator
+spec:
+  managementState: Managed
+  logLevel: Normal
+  operatorLogLevel: Normal
+----
+<1> The name of the `DASOperator` CR must be `cluster`.
+
+.. Create the `dasoperator` CR by running the following command:
+
+[source,terminal]
+----
+oc create -f das-dasoperator.yaml
+----
+
+.Verification
+
+* Verify that the Operator deployment is successful by running the following command:
+
+[source,terminal]
+----
+$ oc get pods
+----
+
+.Example output
+[source,terminal]
+----
+NAME                                    READY   STATUS    RESTARTS   AGE
+das-daemonset-6rsfd                     1/1     Running   0          5m16s
+das-daemonset-8qzgf                     1/1     Running   0          5m16s
+das-operator-5946478b47-cjfcp           1/1     Running   0          5m18s
+das-operator-5946478b47-npwmn           1/1     Running   0          5m18s
+das-operator-webhook-59949d4f85-5n9qt   1/1     Running   0          68s
+das-operator-webhook-59949d4f85-nbtdl   1/1     Running   0          68s
+das-scheduler-6cc59dbf96-4r85f          1/1     Running   0          68s
+das-scheduler-6cc59dbf96-bf6ml          1/1     Running   0          68s
+----
+
+A successful deployment shows all pods with a `Running` status. The deployment includes:
+
+das-operator:: Main Operator controller pods
+das-operator-webhook:: Webhook server pods for mutating pod requests
+das-scheduler:: Scheduler plugin pods for MIG slice allocation
+das-daemonset:: Daemonset pods that run only on nodes with MIG-compatible GPUs
+
+[NOTE]
+====
+The `das-daemonset` pods only appear on nodes that have MIG-compatible GPU hardware. If you do not see any daemonset pods, verify that your cluster has nodes with supported GPU hardware and that the NVIDIA GPU Operator is properly configured.
+====
--- a/modules/das-operator-installing-web-console.adoc
+++ b/modules/das-operator-installing-web-console.adoc
@@ -0,0 +1,256 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-installing-web-console_{context}"]
+= Installing the Dynamic Accelerator Slicer Operator using the web console
+
+As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator using the {product-title} web console.
+
+.Prerequisites
+
+* You have access to an {product-title} cluster using an account with `cluster-admin` permissions.
+* You have installed the required prerequisites:
+** cert-manager Operator for Red Hat OpenShift
+** Node Feature Discovery (NFD) Operator
+** NVIDIA GPU Operator
+** NodeFeatureDiscovery CR
+
+.Procedure
+
+. Configure the NVIDIA GPU Operator for MIG support:
+
+.. In the {product-title} web console, navigate to *Operators* -> *Installed Operators*.
+
+.. Select the *NVIDIA GPU Operator* from the list of installed operators.
+
+.. Click the *ClusterPolicy* tab and then click *Create ClusterPolicy*.
+
+.. In the YAML editor, replace the default content with the following cluster policy configuration to disable the default NVIDIA device plugin and enable MIG support:
+
+[source,yaml]
+----
+apiVersion: nvidia.com/v1
+kind: ClusterPolicy
+metadata:
+  name: gpu-cluster-policy
+spec:
+  daemonsets:
+    rollingUpdate:
+      maxUnavailable: "1"
+    updateStrategy: RollingUpdate
+  dcgm:
+    enabled: true
+  dcgmExporter:
+    config:
+      name: ""
+    enabled: true
+    serviceMonitor:
+      enabled: true
+  devicePlugin:
+    config:
+      default: ""
+      name: ""
+    enabled: false
+    mps:
+      root: /run/nvidia/mps
+  driver:
+    certConfig:
+      name: ""
+    enabled: true
+    kernelModuleConfig:
+      name: ""
+    licensingConfig:
+      configMapName: ""
+      nlsEnabled: true
+    repoConfig:
+      configMapName: ""
+    upgradePolicy:
+      autoUpgrade: true
+      drain:
+        deleteEmptyDir: false
+        enable: false
+        force: false
+        timeoutSeconds: 300
+      maxParallelUpgrades: 1
+      maxUnavailable: 25%
+      podDeletion:
+        deleteEmptyDir: false
+        force: false
+        timeoutSeconds: 300
+      waitForCompletion:
+        timeoutSeconds: 0
+    useNvidiaDriverCRD: false
+    useOpenKernelModules: false
+    virtualTopology:
+      config: ""
+  gdrcopy:
+    enabled: false
+  gds:
+    enabled: false
+  gfd:
+    enabled: true
+  mig:
+    strategy: mixed
+  migManager:
+    config:
+      default: ""
+      name: default-mig-parted-config
+    enabled: true
+    env:
+      - name: WITH_REBOOT
+        value: 'true'
+      - name: MIG_PARTED_MODE_CHANGE_ONLY
+        value: 'true'    
+  nodeStatusExporter:
+    enabled: true
+  operator:
+    defaultRuntime: crio
+    initContainer: {}
+    runtimeClass: nvidia
+    use_ocp_driver_toolkit: true
+  sandboxDevicePlugin:
+    enabled: true
+  sandboxWorkloads:
+    defaultWorkload: container
+    enabled: false
+  toolkit:
+    enabled: true
+    installDir: /usr/local/nvidia
+  validator:
+    plugin:
+      env:
+      - name: WITH_WORKLOAD
+        value: "false"
+    cuda:
+      env:
+      - name: WITH_WORKLOAD
+        value: "false"
+  vfioManager:
+    enabled: true
+  vgpuDeviceManager:
+    enabled: true
+  vgpuManager:
+    enabled: false
+----
+
+.. Click *Create* to apply the cluster policy.
+
+.. Navigate to *Workloads* -> *Pods* and select the `nvidia-gpu-operator` namespace to monitor the cluster policy deployment.
+
+.. Wait for the NVIDIA GPU Operator cluster policy to reach the `Ready` state. You can monitor this by:
+
+... Navigating to *Operators* -> *Installed Operators* -> *NVIDIA GPU Operator*.
+... Clicking the *ClusterPolicy* tab and checking that the status shows `ready`.
+
+.. Verify that all pods in the NVIDIA GPU Operator namespace are running by selecting the `nvidia-gpu-operator` namespace and navigating to *Workloads* -> *Pods*.
+
+.. Label nodes with MIG-capable GPUs to enable MIG mode:
+
+... Navigate to *Compute* -> *Nodes*.
+... Select a node that has MIG-capable GPUs.
+... Click *Actions* -> *Edit Labels*.
+... Add the label `nvidia.com/mig.config=all-enabled`.
+... Click *Save*.
+... Repeat for each node with MIG-capable GPUs.
+
+[IMPORTANT]
+====
+After applying the MIG label, the labeled nodes will reboot to enable MIG mode. Wait for the nodes to come back online before proceeding.
+====
+
+.. Verify that MIG mode is successfully enabled on the GPU nodes by checking that the `nvidia.com/mig.config=all-enabled` label appears in the *Labels* section. To locate the label, navigate to *Compute → Nodes*, select the GPU node, and click the *Details* tab.
+
+. In the {product-title} web console, click *Operators* -> *OperatorHub*.
+
+. Search for *Dynamic Accelerator Slicer* or *DAS* in the filter box to locate the DAS Operator.
+
+. Select the *Dynamic Accelerator Slicer* and click *Install*.
+
+. On the *Install Operator* page:
+.. Select *All namespaces on the cluster (default)* for the installation mode.
+.. Select *Installed Namespace* -> *Operator recommended Namespace: Project  das-operator*.
+.. If creating a new namespace, enter `das-operator` as the namespace name.
+.. Select an update channel.
+.. Select *Automatic* or *Manual* for the approval strategy.
+
+. Click *Install*.
+
+. In the {product-title} web console, click *Operators* -> *Installed Operators*.
+
+. Select *DAS Operator* from the list.
+
+. In the *Provided APIs* table column, click *DASOperator*. This takes you to the *DASOperator* tab of the *Operator details* page. 
+
+. Click *Create DASOperator*. This takes you to the *Create DASOperator* YAML view. 
+
+. In the YAML editor, paste the following example:
+
+.Example `DASOperator` CR
+[source,yaml]
+----
+apiVersion: inference.redhat.com/v1alpha1
+kind: DASOperator
+metadata:
+  name: cluster <1>
+  namespace: das-operator
+spec:
+  logLevel: Normal
+  operatorLogLevel: Normal
+  managementState: Managed
+----
+<1> The name of the `DASOperator` CR must be `cluster`.
+
+. Click *Create*.
+
+.Verification
+
+To verify that the DAS Operator installed successfully:
+
+. Navigate to the *Operators* -> *Installed Operators* page.
+. Ensure that *Dynamic Accelerator Slicer* is listed in the `das-operator` namespace with a *Status* of *Succeeded*.
+
+To verify that the `DASOperator` CR installed successfully:
+
+* After you create the `DASOperator` CR, the web console brings you to the *DASOperator list view*. The *Status* field of the CR changes to *Available* when all of the components are running.
+
+* Optional. You can verify that the `DASOperator` CR installed successfully by running the following command in the OpenShift CLI:
+
+[source,terminal]
+----
+$ oc get dasoperator -n das-operator
+----
+
+.Example output
+
+[source,terminal]
+----
+NAME     	STATUS    	AGE
+cluster  	Available	3m
+----
+
+[NOTE]
+====
+During installation an Operator might display a *Failed* status. If the installation later succeeds with an *Succeeded* message, you can ignore the *Failed* message.
+====
+
+You can also verify the installation by checking the pods:
+
+. Navigate to the *Workloads* -> *Pods* page and select the `das-operator` namespace.
+. Verify that all DAS Operator component pods are running:
+** `das-operator` pods (main operator controllers)
+** `das-operator-webhook` pods (webhook servers)
+** `das-scheduler` pods (scheduler plugins)
+** `das-daemonset` pods (only on nodes with MIG-compatible GPUs)
+
+[NOTE]
+====
+The `das-daemonset` pods will only appear on nodes that have MIG-compatible GPU hardware. If you do not see any daemonset pods, verify that your cluster has nodes with supported GPU hardware and that the NVIDIA GPU Operator is properly configured.
+====
+
+.Troubleshooting
+Use the following procedure if the Operator does not appear to be installed:
+
+. Navigate to the *Operators* -> *Installed Operators* page and inspect the *Operator Subscriptions* and *Install Plans* tabs for any failure or errors under *Status*.
+. Navigate to the *Workloads* -> *Pods* page and check the logs for pods in the `das-operator` namespace.
--- a/modules/das-operator-installing.adoc
+++ b/modules/das-operator-installing.adoc
@@ -0,0 +1,10 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="das-operator-installing_{context}"]
+= Installing the Dynamic Accelerator Slicer Operator
+
+As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator  by using the {product-title} web console or the OpenShift CLI. 
+
--- a/modules/das-operator-troubleshooting.adoc
+++ b/modules/das-operator-troubleshooting.adoc
@@ -0,0 +1,240 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-troubleshooting_{context}"]
+= Troubleshooting the Dynamic Accelerator Slicer Operator
+
+If you experience issues with the Dynamic Accelerator Slicer (DAS) Operator, use the following troubleshooting steps to diagnose and resolve problems.
+
+.Prerequisites
+
+* You have installed the DAS Operator.
+* You have access to the {product-title} cluster as a user with the cluster-admin role. 
+
+== Debugging DAS Operator components
+
+.Procedure
+
+. Check the status of all DAS Operator components by running the following command:
+
+[source,terminal]
+----
+$ oc get pods -n das-operator
+----
+
+.Example output
+[source,terminal]
+----
+NAME                                    READY   STATUS    RESTARTS   AGE
+das-daemonset-6rsfd                     1/1     Running   0          5m16s
+das-daemonset-8qzgf                     1/1     Running   0          5m16s
+das-operator-5946478b47-cjfcp           1/1     Running   0          5m18s
+das-operator-5946478b47-npwmn           1/1     Running   0          5m18s
+das-operator-webhook-59949d4f85-5n9qt   1/1     Running   0          68s
+das-operator-webhook-59949d4f85-nbtdl   1/1     Running   0          68s
+das-scheduler-6cc59dbf96-4r85f          1/1     Running   0          68s
+das-scheduler-6cc59dbf96-bf6ml          1/1     Running   0          68s
+----
+
+. Inspect the logs of the DAS Operator controller by running the following command:
+
+[source,terminal]
+----
+$ oc logs -n das-operator deployment/das-operator
+----
+
+. Check the logs of the webhook server by running the following command:
+
+[source,terminal]
+----
+$ oc logs -n das-operator deployment/das-operator-webhook
+----
+
+. Check the logs of the scheduler plugin by running the following command:
+
+[source,terminal]
+----
+$ oc logs -n das-operator deployment/das-scheduler
+----
+
+. Check the logs of the device plugin daemonset by running the following command:
+
+[source,terminal]
+----
+$ oc logs -n das-operator daemonset/das-daemonset
+----
+
+== Monitoring AllocationClaims
+
+.Procedure
+
+. Inspect active `AllocationClaim` resources by running the following command:
+
+[source,terminal]
+----
+$ oc get allocationclaims -n das-operator
+----
+
+.Example output
+
+[source,terminal]
+----
+NAME                                                                                           AGE
+13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0   5m
+ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0   5m
+----
+
+. View detailed information about a specific `AllocationClaim` by running the following command:
+
+[source,terminal]
+----
+$ oc get allocationclaims -n das-operator -o yaml
+----
+
+.Example output (truncated)
+
+[source,yaml]
+----
+apiVersion: inference.redhat.com/v1alpha1
+kind: AllocationClaim
+metadata:
+  name: 13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0
+  namespace: das-operator
+spec:
+  gpuUUID: GPU-9003fd9c-1ad1-c935-d8cd-d1ae69ef17c0
+  migPlacement:
+    size: 1
+    start: 0
+  nodename: harpatil000034jma-qh5fm-worker-f-57md9
+  podRef:
+    kind: Pod
+    name: cuda-vectoradd-f4b84b678-l2m69
+    namespace: default
+    uid: 13950288-57df-4ab5-82bc-6138f646633e
+  profile: 1g.5gb
+status:
+  conditions:
+  - lastTransitionTime: "2025-08-06T19:28:48Z"
+    message: Allocation is inUse
+    reason: inUse
+    status: "True"
+    type: State
+  state: inUse
+----
+
+. Check for claims in different states by running the following command:
+
+[source,terminal]
+----
+$ oc get allocationclaims -n das-operator -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.state}{"\n"}{end}'
+----
+
+.Example output
+[source,terminal]
+----
+13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0	inUse
+ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0	inUse
+----
+
+. View events related to `AllocationClaim` resources by running the following command:
+
+[source,terminal]
+----
+$ oc get events -n das-operator --field-selector involvedObject.kind=AllocationClaim
+----
+
+. Check `NodeAccelerator` resources to verify GPU hardware detection by running the following command:
+
+[source,terminal]
+----
+$ oc get nodeaccelerator -n das-operator
+----
+
+.Example output
+[source,terminal]
+----
+NAME                                     AGE
+harpatil000034jma-qh5fm-worker-f-57md9   96m
+harpatil000034jma-qh5fm-worker-f-fl4wg   96m
+----
+
+The `NodeAccelerator` resources represent the GPU-capable nodes detected by the DAS Operator.
+
+.Additional information
+
+The `AllocationClaim` custom resource tracks the following information:
+
+GPU UUID:: The unique identifier of the GPU device.
+Slice position:: The position of the MIG slice on the GPU.
+Pod reference:: The pod that requested the GPU slice.
+State:: The current state of the claim (`staged`, `created`, or `released`).
+
+Claims start in the `staged` state and transition to `created` when all requests are satisfied. When a pod is deleted, the associated claim is automatically cleaned up.
+
+== Verifying GPU device availability
+
+.Procedure
+
+. On a node with GPU hardware, verify that CDI devices were created by running the following command:
+
+[source,terminal]
+----
+$ oc debug node/<node-name>
+----
+
+[source,terminal]
+----
+sh-4.4# chroot /host
+sh-4.4# ls -l /var/run/cdi/
+----
+
+. Check the NVIDIA GPU Operator status by running the following command:
+
+[source,terminal]
+----
+$ oc get clusterpolicies.nvidia.com -o jsonpath='{.items[0].status.state}'
+----
+
+The output should show `ready`.
+
+== Increasing log verbosity
+
+.Procedure
+
+To get more detailed debugging information:
+
+. Edit the `DASOperator` resource to increase log verbosity by running the following command:
+
+[source,terminal]
+----
+$ oc edit dasoperator -n das-operator
+----
+
+. Set the `operatorLogLevel` field to `Debug` or `Trace`:
+
+[source,yaml]
+----
+spec:
+  operatorLogLevel: Debug
+----
+
+. Save the changes and verify that the operator pods restart with increased verbosity.
+
+== Common issues and solutions
+
+.Pods stuck in UnexpectedAdmissionError state
+[NOTE]
+====
+Due to link:https://github.com/kubernetes/kubernetes/issues/128043[kubernetes/kubernetes#128043], pods might enter an `UnexpectedAdmissionError` state if admission fails. Pods managed by higher level controllers such as Deployments are recreated automatically. Naked pods, however, must be cleaned up manually with `oc delete pod`. Using controllers is recommended until the upstream issue is resolved.
+====
+
+.Prerequisites not met
+If the DAS Operator fails to start or function properly, verify that all prerequisites are installed:
+
+* Cert-manager
+* Node Feature Discovery (NFD) Operator
+* NVIDIA GPU Operator
+
+
--- a/modules/das-operator-uninstalling-cli.adoc
+++ b/modules/das-operator-uninstalling-cli.adoc
@@ -0,0 +1,96 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-uninstalling-cli_{context}"]
+= Uninstalling the Dynamic Accelerator Slicer Operator using the CLI
+
+You can uninstall the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift CLI.
+
+.Prerequisites
+
+* You have access to an {product-title} cluster using an account with `cluster-admin` permissions.
+* You have installed the OpenShift CLI (`oc`).
+* The DAS Operator is installed in your cluster.
+
+.Procedure
+
+. List the installed operators to find the DAS Operator subscription by running the following command:
+
+[source,terminal]
+----
+$ oc get subscriptions -n das-operator
+----
+
+.Example output
+[source,terminal]
+----
+NAME           PACKAGE        SOURCE             CHANNEL
+das-operator   das-operator   redhat-operators   stable
+----
+
+. Delete the subscription by running the following command:
+
+[source,terminal]
+----
+$ oc delete subscription das-operator -n das-operator
+----
+
+. List and delete the cluster service version (CSV) by running the following commands:
+
+[source,terminal]
+----
+$ oc get csv -n das-operator
+----
+
+[source,terminal]
+----
+$ oc delete csv <csv-name> -n das-operator
+----
+
+. Remove the operator group by running the following command:
+
+[source,terminal]
+----
+$ oc delete operatorgroup das-operator -n das-operator
+----
+
+. Delete any remaining `AllocationClaim` resources by running the following command:
+
+[source,terminal]
+----
+$ oc delete allocationclaims --all -n das-operator
+----
+
+. Remove the DAS Operator namespace by running the following command:
+
+[source,terminal]
+----
+$ oc delete namespace das-operator
+----
+
+.Verification
+
+. Verify that the DAS Operator resources have been removed by running the following command:
+
+[source,terminal]
+----
+$ oc get namespace das-operator
+----
+
+The command should return an error indicating that the namespace is not found.
+
+. Verify that no `AllocationClaim` custom resource definitions remain by running the following command:
+
+[source,terminal]
+----
+$ oc get crd | grep allocationclaim
+----
+
+The command should return an error indicating that no custom resource definitions are found.
+
+[WARNING]
+====
+Uninstalling the DAS Operator removes all GPU slice allocations and might cause running workloads that depend on GPU slices to fail. Ensure that no critical workloads are using GPU slices before proceeding with the uninstallation.
+====
--- a/modules/das-operator-uninstalling-web-console.adoc
+++ b/modules/das-operator-uninstalling-web-console.adoc
@@ -0,0 +1,51 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-uninstalling-web-console_{context}"]
+= Uninstalling the Dynamic Accelerator Slicer Operator using the web console
+
+You can uninstall the Dynamic Accelerator Slicer (DAS) Operator using the {product-title} web console.
+
+.Prerequisites
+
+* You have access to an {product-title} cluster using an account with `cluster-admin` permissions.
+* The DAS Operator is installed in your cluster.
+
+.Procedure
+
+. In the {product-title} web console, navigate to *Operators* -> *Installed Operators*.
+
+. Locate the *Dynamic Accelerator Slicer* in the list of installed Operators.
+
+. Click the *Options* menu {kebab} for the DAS Operator and select *Uninstall Operator*.
+
+. In the confirmation dialog, click *Uninstall* to confirm the removal.
+
+. Navigate to *Home* -> *Projects*.
+
+. Search for *das-operator* in the search box to locate the DAS Operator project.
+
+. Click the *Options* menu {kebab} next to the das-operator project, and select *Delete Project*.
+
+. In the confirmation dialog, type `das-operator` in the dialog box, and click *Delete* to confirm the deletion.
+
+
+.Verification
+
+. Navigate to the *Operators* -> *Installed Operators* page.
+. Verify that the Dynamic Accelerator Slicer (DAS) Operator is no longer listed.
+. Optional. Verify that the `das-operator` namespace and its resources have been removed by running the following command:
+
+[source,terminal]
+----
+$ oc get namespace das-operator
+----
+
+The command should return an error indicating that the namespace is not found.
+
+[WARNING]
+====
+Uninstalling the DAS Operator removes all GPU slice allocations and might cause running workloads that depend on GPU slices to fail. Ensure that no critical workloads are using GPU slices before proceeding with the uninstallation.
+====
--- a/modules/das-operator-uninstalling.adoc
+++ b/modules/das-operator-uninstalling.adoc
@@ -0,0 +1,9 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: CONCEPT
+[id="das-operator-uninstalling_{context}"]
+= Uninstalling the Dynamic Accelerator Slicer Operator
+
+Use one of the following procedures to uninstall the Dynamic Accelerator Slicer (DAS) Operator, depending on how the Operator was installed.