From 12892752a7cba094963401e5bd39ef732eef2f0e Mon Sep 17 00:00:00 2001 From: Pan Ousley Date: Mon, 9 Oct 2023 23:55:48 -0400 Subject: [PATCH] CNV#24223: updating mediated devices assembly for NVIDIA operator --- _topic_maps/_topic_map.yml | 6 +- modules/about-using-gpu-operator.adoc | 16 +--- modules/using-mediated-devices.adoc | 9 -- ...ut-changing-removing-mediated-devices.adoc | 15 ++-- modules/virt-about-using-virtual-gpus.adoc | 2 +- modules/virt-add-remove-mediated-devices.adoc | 9 -- ...-adding-kernel-arguments-enable-iommu.adoc | 11 ++- .../virt-assign-vgpu-passthrough-to-vm.adoc | 31 ------- ...e.adoc => virt-assigning-vgpu-vm-cli.adoc} | 11 +-- modules/virt-assigning-vgpu-vm-web.adoc | 31 +++++++ ...reating-and-exposing-mediated-devices.adoc | 89 +++++++++++++++---- .../virt-how-virtual-gpus-assigned-nodes.adoc | 8 +- modules/virt-options-configuring-mdevs.adoc | 83 +++++++++++++++++ ...-preparing-hosts-for-mediated-devices.adoc | 10 --- .../virt-prerequisites-mediated-devices.adoc | 10 --- ...-gpu-operands-from-deploying-on-nodes.adoc | 5 -- ...ving-mediated-device-from-cluster-cli.adoc | 8 +- .../virt-virtual-gpus-config-overview.adoc | 64 ------------- .../virt-configuring-mediated-devices.adoc | 48 ---------- .../virt-configuring-vgpu-passthrough.adoc | 22 ----- .../virt-configuring-virtual-gpus.adoc | 63 +++++++++++++ 21 files changed, 280 insertions(+), 271 deletions(-) delete mode 100644 modules/using-mediated-devices.adoc delete mode 100644 modules/virt-add-remove-mediated-devices.adoc delete mode 100644 modules/virt-assign-vgpu-passthrough-to-vm.adoc rename modules/{virt-assigning-mediated-device-virtual-machine.adoc => virt-assigning-vgpu-vm-cli.adoc} (84%) create mode 100644 modules/virt-assigning-vgpu-vm-web.adoc create mode 100644 modules/virt-options-configuring-mdevs.adoc delete mode 100644 modules/virt-preparing-hosts-for-mediated-devices.adoc delete mode 100644 modules/virt-prerequisites-mediated-devices.adoc delete mode 100644 modules/virt-virtual-gpus-config-overview.adoc delete mode 100644 virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc delete mode 100644 virt/virtual_machines/advanced_vm_management/virt-configuring-vgpu-passthrough.adoc create mode 100644 virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 933a6da435..309eeb7ca2 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -3793,10 +3793,8 @@ Topics: File: virt-schedule-vms - Name: Configuring PCI passthrough File: virt-configuring-pci-passthrough - - Name: Configuring vGPU passthrough - File: virt-configuring-vgpu-passthrough - - Name: Configuring mediated devices - File: virt-configuring-mediated-devices + - Name: Configuring virtual GPUs + File: virt-configuring-virtual-gpus - Name: Enabling descheduler evictions on virtual machines File: virt-enabling-descheduler-evictions - Name: About high availability for virtual machines diff --git a/modules/about-using-gpu-operator.adoc b/modules/about-using-gpu-operator.adoc index 4940afca09..0bc670744a 100644 --- a/modules/about-using-gpu-operator.adoc +++ b/modules/about-using-gpu-operator.adoc @@ -1,21 +1,11 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: CONCEPT [id="about-using-nvidia-gpu_{context}"] = About using the NVIDIA GPU Operator -The NVIDIA GPU Operator manages NVIDIA GPU resources in a {product-title} cluster and automates tasks related to bootstrapping GPU nodes. Because the GPU is a special resource in the cluster, you must install some components before you can deploy application workloads to the GPU. These components include the NVIDIA drivers that enable the compute unified device architecture (CUDA), Kubernetes device plugin, container runtime, and other features such as automatic node labeling, monitoring, and more. +You can use the NVIDIA GPU Operator with {VirtProductName} to rapidly provision worker nodes for running GPU-enabled virtual machines (VMs). The NVIDIA GPU Operator manages NVIDIA GPU resources in an {product-title} cluster and automates tasks that are required when preparing nodes for GPU workloads. -[NOTE] -==== -The NVIDIA GPU Operator is supported only by NVIDIA. For more information about obtaining support from NVIDIA, see link:https://access.redhat.com/solutions/5174941[Obtaining Support from NVIDIA]. -==== - -There are two ways to enable GPUs with {product-title} {VirtProductName}: the {product-title}-native way described here and by using the NVIDIA GPU Operator. - -The NVIDIA GPU Operator is a Kubernetes Operator that uses {product-title} {VirtProductName} to provision GPUs for virtualized workloads running on {product-title}. With the Operator, you can easily provision and manage GPU-enabled virtual machines to run complex artificial intelligence/machine learning (AI/ML) workloads on the same platform as their other workloads. The Operator also provides an easy way to scale the GPU capacity of their infrastructure, enabling rapid growth of GPU-based workloads. - -For more information about using the NVIDIA GPU Operator to provision worker nodes for running GPU-accelerated VMs, see link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/openshift-virtualization.html[NVIDIA GPU Operator with OpenShift Virtualization]. +Before you can deploy application workloads to a GPU resource, you must install components such as the NVIDIA drivers that enable the compute unified device architecture (CUDA), Kubernetes device plugin, container runtime, and other features, such as automatic node labeling and monitoring. By automating these tasks, you can quickly scale the GPU capacity of your infrastructure. The NVIDIA GPU Operator can especially facilitate provisioning complex artificial intelligence and machine learning (AI/ML) workloads. \ No newline at end of file diff --git a/modules/using-mediated-devices.adoc b/modules/using-mediated-devices.adoc deleted file mode 100644 index c069e6cf5e..0000000000 --- a/modules/using-mediated-devices.adoc +++ /dev/null @@ -1,9 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - -:_content-type: CONCEPT -[id="virt-using-mediated-devices_{context}"] -= Using mediated devices - -A vGPU is a type of mediated device; the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines. \ No newline at end of file diff --git a/modules/virt-about-changing-removing-mediated-devices.adoc b/modules/virt-about-changing-removing-mediated-devices.adoc index a774689c45..753ce4caa5 100644 --- a/modules/virt-about-changing-removing-mediated-devices.adoc +++ b/modules/virt-about-changing-removing-mediated-devices.adoc @@ -1,23 +1,20 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: CONCEPT - [id="about-changing-removing-mediated-devices_{context}"] = About changing and removing mediated devices -The cluster's mediated device configuration can be updated with {VirtProductName} by: +You can reconfigure or remove mediated devices in several ways: -* Editing the `HyperConverged` CR and change the contents of the `mediatedDevicesTypes` stanza. +* Edit the `HyperConverged` CR and change the contents of the `mediatedDeviceTypes` stanza. -* Changing the node labels that match the `nodeMediatedDeviceTypes` node selector. +* Change the node labels that match the `nodeMediatedDeviceTypes` node selector. -* Removing the device information from the `spec.mediatedDevicesConfiguration` and `spec.permittedHostDevices` stanzas of the `HyperConverged` CR. +* Remove the device information from the `spec.mediatedDevicesConfiguration` and `spec.permittedHostDevices` stanzas of the `HyperConverged` CR. + [NOTE] ==== If you remove the device information from the `spec.permittedHostDevices` stanza without also removing it from the `spec.mediatedDevicesConfiguration` stanza, you cannot create a new mediated device type on the same node. To properly remove mediated devices, remove the device information from both stanzas. -==== - -Depending on the specific changes, these actions cause {VirtProductName} to reconfigure mediated devices or remove them from the cluster nodes. \ No newline at end of file +==== \ No newline at end of file diff --git a/modules/virt-about-using-virtual-gpus.adoc b/modules/virt-about-using-virtual-gpus.adoc index 0b389deb70..cb0a7e019c 100644 --- a/modules/virt-about-using-virtual-gpus.adoc +++ b/modules/virt-about-using-virtual-gpus.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: CONCEPT [id="virt-about-using-virtual-gpus_{context}"] diff --git a/modules/virt-add-remove-mediated-devices.adoc b/modules/virt-add-remove-mediated-devices.adoc deleted file mode 100644 index 5023ecbd66..0000000000 --- a/modules/virt-add-remove-mediated-devices.adoc +++ /dev/null @@ -1,9 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - -:_content-type: CONCEPT -[id="virt-adding-and-removing-mediated-devices_context"] -= Adding and removing mediated devices - -You can add or remove mediated devices. \ No newline at end of file diff --git a/modules/virt-adding-kernel-arguments-enable-iommu.adoc b/modules/virt-adding-kernel-arguments-enable-iommu.adoc index 18776245ce..8b106e24ca 100644 --- a/modules/virt-adding-kernel-arguments-enable-iommu.adoc +++ b/modules/virt-adding-kernel-arguments-enable-iommu.adoc @@ -1,19 +1,22 @@ // Module included in the following assemblies: // // * virt/virtual_machines/advanced_vm_management/configuring-pci-passthrough.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: PROCEDURE [id="virt-adding-kernel-arguments-enable-IOMMU_{context}"] = Adding kernel arguments to enable the IOMMU driver -To enable the IOMMU (Input-Output Memory Management Unit) driver in the kernel, create the `MachineConfig` object and add the kernel arguments. +To enable the IOMMU driver in the kernel, create the `MachineConfig` object and add the kernel arguments. .Prerequisites -* Administrative privilege to a working {product-title} cluster. -* Intel or AMD CPU hardware. -* Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS (Basic Input/Output System) is enabled. + +* You have cluster administrator permissions. +* Your CPU hardware is Intel or AMD. +* You enabled Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS. .Procedure + . Create a `MachineConfig` object that identifies the kernel argument. The following example shows a kernel argument for an Intel CPU. + diff --git a/modules/virt-assign-vgpu-passthrough-to-vm.adoc b/modules/virt-assign-vgpu-passthrough-to-vm.adoc deleted file mode 100644 index fc2dae0c30..0000000000 --- a/modules/virt-assign-vgpu-passthrough-to-vm.adoc +++ /dev/null @@ -1,31 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-vgpu-passthrough.adoc - -[id="virt-assign-vgpu-passthrough-to-vm_{context}"] -= Assigning vGPU passthrough devices to a virtual machine - -Use the {product-title} web console to assign vGPU passthrough devices to your virtual machine. - -.Prerequisites - -* The virtual machine must be stopped. - -.Procedure - -. In the {product-title} web console, click *Virtualization -> VirtualMachines* from the side menu. -. Select the virtual machine to which you want to assign the device. -. On the *Details* tab, click *GPU devices*. -+ -If you add a vGPU device as a host device, you cannot access the device with the VNC console. - -. Click *Add GPU device*, enter the *Name* and select the device from the *Device name* list. -. Click *Save*. -. Click the *YAML* tab to verify that the new devices have been added to your cluster configuration in the `hostDevices` section. - -[NOTE] -==== -You can add hardware devices to virtual machines created from customized templates or a YAML file. You cannot add devices to pre-supplied boot source templates for specific operating systems, such as Windows 10 or RHEL 7. - -To display resources that are connected to your cluster, click *Compute* -> *Hardware Devices* from the side menu. -==== diff --git a/modules/virt-assigning-mediated-device-virtual-machine.adoc b/modules/virt-assigning-vgpu-vm-cli.adoc similarity index 84% rename from modules/virt-assigning-mediated-device-virtual-machine.adoc rename to modules/virt-assigning-vgpu-vm-cli.adoc index e1a4037b79..b2f8f176da 100644 --- a/modules/virt-assigning-mediated-device-virtual-machine.adoc +++ b/modules/virt-assigning-vgpu-vm-cli.adoc @@ -1,16 +1,17 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: PROCEDURE -[id="virt-assigning-mediated-device-virtual-machine_{context}"] -= Assigning a mediated device to a virtual machine +[id="virt-assigning-mdev-vm-cli_{context}"] += Assigning a vGPU to a VM by using the CLI -Assign mediated devices such as virtual GPUs (vGPUs) to virtual machines. +Assign mediated devices such as virtual GPUs (vGPUs) to virtual machines (VMs). .Prerequisites * The mediated device is configured in the `HyperConverged` custom resource. +* The VM is stopped. .Procedure @@ -27,7 +28,7 @@ spec: gpus: - deviceName: nvidia.com/TU104GL_Tesla_T4 <1> name: gpu1 <2> - - deviceName: nvidia.com/GRID_T4-1Q + - deviceName: nvidia.com/GRID_T4-2Q name: gpu2 ---- <1> The resource name associated with the mediated device. diff --git a/modules/virt-assigning-vgpu-vm-web.adoc b/modules/virt-assigning-vgpu-vm-web.adoc new file mode 100644 index 0000000000..fe9581528c --- /dev/null +++ b/modules/virt-assigning-vgpu-vm-web.adoc @@ -0,0 +1,31 @@ +// Module included in the following assemblies: +// +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc + +[id="virt-assigning-vgpu-vm-web_{context}"] += Assigning a vGPU to a VM by using the web console + +You can assign virtual GPUs to virtual machines by using the {product-title} web console. +[NOTE] +==== +You can add hardware devices to virtual machines created from customized templates or a YAML file. You cannot add devices to pre-supplied boot source templates for specific operating systems. +==== + +.Prerequisites + +* The vGPU is configured as a mediated device in your cluster. +** To view the devices that are connected to your cluster, click *Compute* -> *Hardware Devices* from the side menu. +* The VM is stopped. + +.Procedure + +. In the {product-title} web console, click *Virtualization* -> *VirtualMachines* from the side menu. +. Select the VM that you want to assign the device to. +. On the *Details* tab, click *GPU devices*. +. Click *Add GPU device*. +. Enter an identifying value in the *Name* field. +. From the *Device name* list, select the device that you want to add to the VM. +. Click *Save*. + +.Verification +* To confirm that the devices were added to the VM, click the *YAML* tab and review the `VirtualMachine` configuration. Mediated devices are added to the `spec.domain.devices` stanza. \ No newline at end of file diff --git a/modules/virt-creating-and-exposing-mediated-devices.adoc b/modules/virt-creating-and-exposing-mediated-devices.adoc index 6f19ddd98b..349ae04abc 100644 --- a/modules/virt-creating-and-exposing-mediated-devices.adoc +++ b/modules/virt-creating-and-exposing-mediated-devices.adoc @@ -1,29 +1,31 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: PROCEDURE -[id="virt-creating-and-exposing-mediated-devices_{context}"] +[id="virt-creating-exposing-mediated-devices_{context}"] = Creating and exposing mediated devices -You can expose and create mediated devices such as virtual GPUs (vGPUs) by editing the `HyperConverged` custom resource (CR). +As an administrator, you can create mediated devices and expose them to the cluster by editing the `HyperConverged` custom resource (CR). .Prerequisites -* You enabled the IOMMU (Input-Output Memory Management Unit) driver. +* You enabled the Input-Output Memory Management Unit (IOMMU) driver. +* If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices. +** If you use NVIDIA cards, you link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[installed the NVIDIA GRID driver]. .Procedure -. Edit the `HyperConverged` CR in your default editor by running the following command: +. Open the `HyperConverged` CR in your default editor by running the following command: + [source,terminal,subs="attributes+"] ---- $ oc edit hyperconverged kubevirt-hyperconverged -n {CNVNamespace} ---- - -. Add the mediated device information to the `HyperConverged` CR `spec`, ensuring that you include the `mediatedDevicesConfiguration` and `permittedHostDevices` stanzas. For example: + -.Example configuration file +.Example configuration file with mediated devices configured +[%collapsible] +==== [source,yaml,subs="attributes+"] ---- apiVersion: hco.kubevirt.io/v1 @@ -32,15 +34,15 @@ metadata: name: kubevirt-hyperconverged namespace: {CNVNamespace} spec: - mediatedDevicesConfiguration: <.> - mediatedDevicesTypes: <.> + mediatedDevicesConfiguration: + mediatedDeviceTypes: - nvidia-231 - nodeMediatedDeviceTypes: <.> - - mediatedDevicesTypes: <.> + nodeMediatedDeviceTypes: + - mediatedDeviceTypes: - nvidia-233 nodeSelector: kubernetes.io/hostname: node-11.redhat.com - permittedHostDevices: <.> + permittedHostDevices: mediatedDevices: - mdevNameSelector: GRID T4-2Q resourceName: nvidia.com/GRID_T4-2Q @@ -48,17 +50,66 @@ spec: resourceName: nvidia.com/GRID_T4-8Q # ... ---- -<.> Creates mediated devices. -<.> Required: Global `mediatedDevicesTypes` configuration. -<.> Optional: Overrides the global configuration for specific nodes. -<.> Required if you use `nodeMediatedDeviceTypes`. -<.> Exposes mediated devices to the cluster. +==== + +. Create mediated devices by adding them to the `spec.mediatedDevicesConfiguration` stanza: ++ +.Example YAML snippet +[source,yaml] +---- +# ... +spec: + mediatedDevicesConfiguration: + mediatedDeviceTypes: <1> + - + nodeMediatedDeviceTypes: <2> + - mediatedDeviceTypes: <3> + - + nodeSelector: <4> + : +# ... +---- +<1> Required: Configures global settings for the cluster. +<2> Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDeviceTypes` configuration. +<3> Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDeviceTypes` configuration for the specified nodes. +<4> Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair. + +. Identify the name selector and resource name values for the devices that you want to expose to the cluster. You will add these values to the `HyperConverged` CR in the next step. +.. Find the `resourceName` value by running the following command: ++ +[source,terminal] +---- +$ oc get $NODE -o json \ + | jq '.status.allocatable \ + | with_entries(select(.key | startswith("nvidia.com/"))) \ + | with_entries(select(.value != "0"))' +---- + +.. Find the `mdevNameSelector` value by viewing the contents of `/sys/bus/pci/devices/::./mdev_supported_types//name`, substituting the correct values for your system. ++ +For example, the name file for the `nvidia-231` type contains the selector string `GRID T4-2Q`. Using `GRID T4-2Q` as the `mdevNameSelector` value allows nodes to use the `nvidia-231` type. + +. Expose the mediated devices to the cluster by adding the `mdevNameSelector` and `resourceName` values to the +`spec.permittedHostDevices.mediatedDevices` stanza of the `HyperConverged` CR: ++ +.Example YAML snippet +[source,yaml] +---- +# ... + permittedHostDevices: + mediatedDevices: + - mdevNameSelector: GRID T4-2Q <1> + resourceName: nvidia.com/GRID_T4-2Q <2> +# ... +---- +<1> Exposes the mediated devices that map to this value on the host. +<2> Matches the resource name that is allocated on the node. . Save your changes and exit the editor. .Verification -* You can verify that a device was added to a specific node by running the following command: +* Optional: Confirm that a device was added to a specific node by running the following command: + [source,terminal] ---- diff --git a/modules/virt-how-virtual-gpus-assigned-nodes.adoc b/modules/virt-how-virtual-gpus-assigned-nodes.adoc index 682e997833..d4b04f9ff9 100644 --- a/modules/virt-how-virtual-gpus-assigned-nodes.adoc +++ b/modules/virt-how-virtual-gpus-assigned-nodes.adoc @@ -20,7 +20,7 @@ For example: ---- # ... mediatedDevicesConfiguration: - mediatedDevicesTypes: + mediatedDeviceTypes: - nvidia-222 - nvidia-228 - nvidia-105 @@ -45,15 +45,15 @@ On each node, {VirtProductName} creates the following vGPUs: * 16 vGPUs of type nvidia-105 on the first card. * 2 vGPUs of type nvidia-108 on the second card. -One node has a single card that supports more than one requested vGPU type:: {VirtProductName} uses the supported type that comes first on the `mediatedDevicesTypes` list. +One node has a single card that supports more than one requested vGPU type:: {VirtProductName} uses the supported type that comes first on the `mediatedDeviceTypes` list. + -For example, the card on a node card supports `nvidia-223` and `nvidia-224`. The following `mediatedDevicesTypes` list is configured: +For example, the card on a node card supports `nvidia-223` and `nvidia-224`. The following `mediatedDeviceTypes` list is configured: + [source,yaml] ---- # ... mediatedDevicesConfiguration: - mediatedDevicesTypes: + mediatedDeviceTypes: - nvidia-22 - nvidia-223 - nvidia-224 diff --git a/modules/virt-options-configuring-mdevs.adoc b/modules/virt-options-configuring-mdevs.adoc new file mode 100644 index 0000000000..bb06b6acff --- /dev/null +++ b/modules/virt-options-configuring-mdevs.adoc @@ -0,0 +1,83 @@ +// Module included in the following assemblies: +// +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc + +:_content-type: REFERENCE +[id="virt-options-configuring-mdevs_{context}"] += Options for configuring mediated devices + +There are two available methods for configuring mediated devices when using the NVIDIA GPU Operator. The method that Red Hat tests uses {VirtProductName} features to schedule mediated devices, while the NVIDIA method only uses the GPU Operator. + +Using the NVIDIA GPU Operator to configure mediated devices:: +This method exclusively uses the NVIDIA GPU Operator to configure mediated devices. To use this method, refer to link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[NVIDIA GPU Operator with {VirtProductName}] in the NVIDIA documentation. + +Using {VirtProductName} to configure mediated devices:: +This method, which is tested by Red Hat, uses {VirtProductName}'s capabilities to configure mediated devices. In this case, the NVIDIA GPU Operator is only used for installing drivers with the NVIDIA vGPU Manager. The GPU Operator does not configure mediated devices. ++ +When using the {VirtProductName} method, you still configure the GPU Operator by following link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[the NVIDIA documentation]. However, this method differs from the NVIDIA documentation in the following ways: + +* You must not overwrite the default `disableMDEVConfiguration: false` setting in the `HyperConverged` custom resource (CR). ++ +[IMPORTANT] +==== +Setting this feature gate as described in the link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html#prerequisites[NVIDIA documentation] prevents {VirtProductName} from configuring mediated devices. +==== +* You must configure your `ClusterPolicy` manifest so that it matches the following example: ++ +.Example manifest +[source,yaml] +---- +kind: ClusterPolicy +apiVersion: nvidia.com/v1 +metadata: + name: gpu-cluster-policy +spec: + operator: + defaultRuntime: crio + use_ocp_driver_toolkit: true + initContainer: {} + sandboxWorkloads: + enabled: true + defaultWorkload: vm-vgpu + driver: + enabled: false <1> + dcgmExporter: {} + dcgm: + enabled: true + daemonsets: {} + devicePlugin: {} + gfd: {} + migManager: + enabled: true + nodeStatusExporter: + enabled: true + mig: + strategy: single + toolkit: + enabled: true + validator: + plugin: + env: + - name: WITH_WORKLOAD + value: "true" + vgpuManager: + enabled: true <2> + repository: <3> + image: + version: nvidia-vgpu-manager + vgpuDeviceManager: + enabled: false <4> + config: + name: vgpu-devices-config + default: default + sandboxDevicePlugin: + enabled: false <5> + vfioManager: + enabled: false <6> +---- +<1> Set this value to `false`. Not required for VMs. +<2> Set this value to `true`. Required for using vGPUs with VMs. +<3> Substitute `` with your registry value. +<4> Set this value to `false` to allow {VirtProductName} to configure mediated devices instead of the NVIDIA GPU Operator. +<5> Set this value to `false` to prevent discovery and advertising of the vGPU devices to the kubelet. +<6> Set this value to `false` to prevent loading the `vfio-pci` driver. Instead, follow the {VirtProductName} documentation to configure PCI passthrough. diff --git a/modules/virt-preparing-hosts-for-mediated-devices.adoc b/modules/virt-preparing-hosts-for-mediated-devices.adoc deleted file mode 100644 index e5c3eb5f27..0000000000 --- a/modules/virt-preparing-hosts-for-mediated-devices.adoc +++ /dev/null @@ -1,10 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - -:_content-type: CONCEPT - -[id="virt-preparing-host-for-mdevs_{context}"] -= Preparing hosts for mediated devices - -You must enable the Input-Output Memory Management Unit (IOMMU) driver before you can configure mediated devices. \ No newline at end of file diff --git a/modules/virt-prerequisites-mediated-devices.adoc b/modules/virt-prerequisites-mediated-devices.adoc deleted file mode 100644 index 634c861a64..0000000000 --- a/modules/virt-prerequisites-mediated-devices.adoc +++ /dev/null @@ -1,10 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - -:_content-type: CONCEPT -[id="prerequisites_{context}"] -= Prerequisites - -* If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices. -** If you use NVIDIA cards, you link:https://access.redhat.com/solutions/6738411[installed the NVIDIA GRID driver]. diff --git a/modules/virt-preventing-nvidia-gpu-operands-from-deploying-on-nodes.adoc b/modules/virt-preventing-nvidia-gpu-operands-from-deploying-on-nodes.adoc index cec104514b..75dfc6830d 100644 --- a/modules/virt-preventing-nvidia-gpu-operands-from-deploying-on-nodes.adoc +++ b/modules/virt-preventing-nvidia-gpu-operands-from-deploying-on-nodes.adoc @@ -9,11 +9,6 @@ If you use the link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/contents.html[NVIDIA GPU Operator] in your cluster, you can apply the `nvidia.com/gpu.deploy.operands=false` label to nodes that you do not want to configure for GPU or vGPU operands. This label prevents the creation of the pods that configure GPU or vGPU operands and terminates the pods if they already exist. -ifdef::openshift-enterprise[] -:FeatureName: Using the NVIDIA GPU Operator with {VirtProductName} -include::snippets/technology-preview.adoc[] -endif::[] - .Prerequisites * The OpenShift CLI (`oc`) is installed. diff --git a/modules/virt-removing-mediated-device-from-cluster-cli.adoc b/modules/virt-removing-mediated-device-from-cluster-cli.adoc index 0418a4884d..fba05136d4 100644 --- a/modules/virt-removing-mediated-device-from-cluster-cli.adoc +++ b/modules/virt-removing-mediated-device-from-cluster-cli.adoc @@ -1,10 +1,10 @@ // Module included in the following assemblies: // -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc :_content-type: PROCEDURE [id="virt-removing-mediated-device-from-cluster-cli_{context}"] -= Removing mediated devices from the cluster using the CLI += Removing mediated devices from the cluster To remove a mediated device from the cluster, delete the information for that device from the `HyperConverged` custom resource (CR). @@ -29,14 +29,14 @@ metadata: namespace: {CNVNamespace} spec: mediatedDevicesConfiguration: - mediatedDevicesTypes: <1> + mediatedDeviceTypes: <1> - nvidia-231 permittedHostDevices: mediatedDevices: <2> - mdevNameSelector: GRID T4-2Q resourceName: nvidia.com/GRID_T4-2Q ---- -<1> To remove the `nvidia-231` device type, delete it from the `mediatedDevicesTypes` array. +<1> To remove the `nvidia-231` device type, delete it from the `mediatedDeviceTypes` array. <2> To remove the `GRID T4-2Q` device, delete the `mdevNameSelector` field and its corresponding `resourceName` field. . Save your changes and exit the editor. \ No newline at end of file diff --git a/modules/virt-virtual-gpus-config-overview.adoc b/modules/virt-virtual-gpus-config-overview.adoc deleted file mode 100644 index cec080b90c..0000000000 --- a/modules/virt-virtual-gpus-config-overview.adoc +++ /dev/null @@ -1,64 +0,0 @@ -// Module included in the following assemblies: -// -// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc - -:_content-type: REFERENCE -[id="configuration-overview_{context}"] -= Configuration overview - -When configuring mediated devices, an administrator must complete the following tasks: - -* Create the mediated devices. -* Expose the mediated devices to the cluster. - -The `HyperConverged` CR includes APIs that accomplish both tasks. - -.Creating mediated devices - -[source,yaml] ----- -# ... -spec: - mediatedDevicesConfiguration: - mediatedDevicesTypes: <1> - - - nodeMediatedDeviceTypes: <2> - - mediatedDevicesTypes: <3> - - - nodeSelector: <4> - : -# ... ----- -<1> Required: Configures global settings for the cluster. -<2> Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDevicesTypes` configuration. -<3> Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDevicesTypes` configuration for the specified nodes. -<4> Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair. - -.Exposing mediated devices to the cluster - -[source,yaml] ----- -# ... - permittedHostDevices: - mediatedDevices: - - mdevNameSelector: GRID T4-2Q <1> - resourceName: nvidia.com/GRID_T4-2Q <2> -# ... ----- -<1> Exposes the mediated devices that map to this value on the host. -+ -[NOTE] -==== -You can see the mediated device types that your device supports by viewing the contents of `/sys/bus/pci/devices/::./mdev_supported_types//name`, substituting the correct values for your system. - -For example, the name file for the `nvidia-231` type contains the selector string `GRID T4-2Q`. Using `GRID T4-2Q` as the `mdevNameSelector` value allows nodes to use the `nvidia-231` type. -==== -<2> The `resourceName` should match that allocated on the node. Find the `resourceName` by using the following command: -+ -[source,terminal] ----- -$ oc get $NODE -o json \ - | jq '.status.allocatable \ - | with_entries(select(.key | startswith("nvidia.com/"))) \ - | with_entries(select(.value != "0"))' ----- \ No newline at end of file diff --git a/virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc b/virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc deleted file mode 100644 index 3d01845431..0000000000 --- a/virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc +++ /dev/null @@ -1,48 +0,0 @@ -:_content-type: ASSEMBLY -[id="virt-configuring-mediated-devices"] -= Configuring mediated devices -include::_attributes/common-attributes.adoc[] -:context: virt-configuring-mediated-devices -:toclevels: 3 - -toc::[] - -{VirtProductName} automatically creates mediated devices, such as virtual GPUs (vGPUs), if you provide a list of devices in the `HyperConverged` custom resource (CR). - -ifdef::openshift-enterprise[] -:FeatureName: Declarative configuration of mediated devices -include::snippets/technology-preview.adoc[] -endif::[] - -include::modules/about-using-gpu-operator.adoc[leveloffset=+1] - -include::modules/virt-about-using-virtual-gpus.adoc[leveloffset=+1] - -include::modules/virt-prerequisites-mediated-devices.adoc[leveloffset=+2] - -include::modules/virt-virtual-gpus-config-overview.adoc[leveloffset=+2] - -include::modules/virt-how-virtual-gpus-assigned-nodes.adoc[leveloffset=+2] - -include::modules/virt-about-changing-removing-mediated-devices.adoc[leveloffset=+2] - -include::modules/virt-preparing-hosts-for-mediated-devices.adoc[leveloffset=+2] - -include::modules/virt-adding-kernel-arguments-enable-iommu.adoc[leveloffset=+3] - -include::modules/virt-add-remove-mediated-devices.adoc[leveloffset=+2] - -include::modules/virt-creating-and-exposing-mediated-devices.adoc[leveloffset=+3] - -include::modules/virt-removing-mediated-device-from-cluster-cli.adoc[leveloffset=+3] - -// VM owner task: - -include::modules/using-mediated-devices.adoc[leveloffset=+1] - -include::modules/virt-assigning-mediated-device-virtual-machine.adoc[leveloffset=+2] - -[role="_additional-resources"] -[id="additional-resources_virt-configuring-mediated-devices"] -== Additional resources -* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-troubleshooting-enabling_intel_vt_x_and_amd_v_virtualization_hardware_extensions_in_bios[Enabling Intel VT-X and AMD-V Virtualization Hardware Extensions in BIOS] diff --git a/virt/virtual_machines/advanced_vm_management/virt-configuring-vgpu-passthrough.adoc b/virt/virtual_machines/advanced_vm_management/virt-configuring-vgpu-passthrough.adoc deleted file mode 100644 index 8b86598a05..0000000000 --- a/virt/virtual_machines/advanced_vm_management/virt-configuring-vgpu-passthrough.adoc +++ /dev/null @@ -1,22 +0,0 @@ -:_content-type: ASSEMBLY -[id="virt-configuring-vgpu-passthrough"] -= Configuring vGPU passthrough -include::_attributes/common-attributes.adoc[] -:context: virt-configuring-vgpu-passthrough - -toc::[] - -Your virtual machines can access a virtual GPU (vGPU) hardware. Assigning a vGPU to your virtual machine allows you do the following: - -* Access a fraction of the underlying hardware's GPU to achieve high performance benefits in your virtual machine. - -* Streamline resource-intensive I/O operations. - -[IMPORTANT] -==== -vGPU passthrough can only be assigned to devices that are connected to clusters running in a bare metal environment. -==== - -include::modules/virt-assign-vgpu-passthrough-to-vm.adoc[leveloffset=+1] - - diff --git a/virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc b/virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc new file mode 100644 index 0000000000..d2d31b0d11 --- /dev/null +++ b/virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc @@ -0,0 +1,63 @@ +:_content-type: ASSEMBLY +[id="virt-configuring-virtual-gpus"] += Configuring virtual GPUs +include::_attributes/common-attributes.adoc[] +:context: virt-configuring-virtual-gpus + +toc::[] + +If you have graphics processing unit (GPU) cards, {VirtProductName} can automatically create virtual GPUs (vGPUs) that you can assign to virtual machines (VMs). + +include::modules/virt-about-using-virtual-gpus.adoc[leveloffset=+1] + +[id="preparing-hosts-mdevs_{context}"] +== Preparing hosts for mediated devices + +You must enable the Input-Output Memory Management Unit (IOMMU) driver before you can configure mediated devices. + +include::modules/virt-adding-kernel-arguments-enable-iommu.adoc[leveloffset=+2] + +[id="configuring-nvidia-gpu-operator_{context}"] +== Configuring the NVIDIA GPU Operator + +You can use the NVIDIA GPU Operator to provision worker nodes for running GPU-accelerated virtual machines (VMs) in {VirtProductName}. + +[NOTE] +==== +The NVIDIA GPU Operator is supported only by NVIDIA. For more information, see link:https://access.redhat.com/solutions/5174941[Obtaining Support from NVIDIA] in the Red Hat Knowledgebase. +==== + +include::modules/about-using-gpu-operator.adoc[leveloffset=+2] + +include::modules/virt-options-configuring-mdevs.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources +* xref:../../../virt/virtual_machines/advanced_vm_management/virt-configuring-pci-passthrough.adoc#virt-configuring-pci-passthrough[Configuring PCI passthrough] + +include::modules/virt-how-virtual-gpus-assigned-nodes.adoc[leveloffset=+1] + +[id="managing-mediated-devices_{context}"] +== Managing mediated devices + +Before you can assign mediated devices to virtual machines, you must create the devices and expose them to the cluster. You can also reconfigure and remove mediated devices. + +include::modules/virt-creating-and-exposing-mediated-devices.adoc[leveloffset=+2] + +include::modules/virt-about-changing-removing-mediated-devices.adoc[leveloffset=+2] + +include::modules/virt-removing-mediated-device-from-cluster-cli.adoc[leveloffset=+2] + +[id="using-mediated-devices_{context}"] +== Using mediated devices + +You can assign mediated devices to one or more virtual machines. + +include::modules/virt-assigning-vgpu-vm-cli.adoc[leveloffset=+2] + +include::modules/virt-assigning-vgpu-vm-web.adoc[leveloffset=+2] + +[role="_additional-resources"] +[id="additional-resources_{context}"] +== Additional resources +* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-troubleshooting-enabling_intel_vt_x_and_amd_v_virtualization_hardware_extensions_in_bios[Enabling Intel VT-X and AMD-V Virtualization Hardware Extensions in BIOS] \ No newline at end of file