mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
TELCODOCS-2226: merge review
This commit is contained in:
committed by
openshift-cherrypick-robot
parent
c0023c9c54
commit
4b61e717cc
@@ -3615,6 +3615,8 @@ Topics:
|
||||
File: amd-gpu-operator
|
||||
- Name: Intel Gaudi AI accelerators
|
||||
File: gaudi-ai-accelerator
|
||||
- Name: Remote Direct Memory Access (RDMA)
|
||||
File: rdma-remote-direct-memory-access
|
||||
---
|
||||
Name: Backup and restore
|
||||
Dir: backup_and_restore
|
||||
|
||||
45
hardware_accelerators/rdma-remote-direct-memory-access.adoc
Normal file
45
hardware_accelerators/rdma-remote-direct-memory-access.adoc
Normal file
@@ -0,0 +1,45 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="rdma-remote-direct-memory-access"]
|
||||
= NVIDIA GPUDirect Remote Direct Memory Access (RDMA)
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: rdma-remote-direct-memory-access
|
||||
|
||||
toc::[]
|
||||
|
||||
NVIDIA GPUDirect Remote Direct Memory Access (RDMA) allows for the memory in one computer to directly access the memory of another computer without needing access through the operating system. This provides the ability to bypass kernel intervention in the process, freeing up resources and greatly reducing the CPU overhead normally needed to process network communications. This is useful for distributing GPU-accelerated workloads across clusters. And because RDMA is so suited toward high bandwidth and low latency applications, this makes it ideal for big data and machine learning applications.
|
||||
|
||||
There are currently three configuration methods for NVIDIA GPUDirect RDMA:
|
||||
|
||||
Shared device:: This method allows for an NVIDIA GPUDirect RDMA device to be shared among multiple pods on the {product-title} worker node where the device is exposed.
|
||||
|
||||
Host device:: This method provides direct physical Ethernet access on the worker node by
|
||||
creating an additional host network on a pod. A plugin allows the network device to be moved from the host network namespace to the network namespace on the pod.
|
||||
|
||||
SR-IOV legacy device:: The Single Root IO Virtualization (SR-IOV) method can share a single network device, such as an Ethernet adapter, with multiple pods. SR-IOV segments the device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device.
|
||||
|
||||
Each of these methods can be used across either the NVIDIA GPUDirect RDMA over Converged Ethernet (RoCE) or Infiniband infrastructures, providing an aggregate total of six methods of configuration.
|
||||
|
||||
:FeatureName: Remote Direct Memory Access
|
||||
|
||||
include::modules/rdma-prerequisites.adoc[leveloffset=+1]
|
||||
|
||||
* Install the xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#installing-the-node-feature-discovery-operator_node-feature-discovery-operator[Node Feature Discovery Operator].
|
||||
|
||||
* Install the xref:../networking/networking_operators/sr-iov-operator/installing-sriov-operator.adoc#installing-sriov-operator[SR-IOV Operator].
|
||||
|
||||
* Install the link:https://docs.nvidia.com/networking/display/kubernetes2501/getting-started-openshift.html#network-operator-installation-using-openshift-oc-cli[NVIDIA Network Operator] (NVIDIA documentation).
|
||||
|
||||
* Install the link:https://docs.nvidia.com/datacenter/cloud-native/openshift/24.9.2/install-gpu-ocp.html[NVIDIA GPU Operator] (NVIDIA documentation).
|
||||
|
||||
include::modules/rdma-disabling-irdma-kernel-module.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/rdma-creating-persistent-naming-rules.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/rdma-configuring-the-nfd-operator.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/rdma-configuring-the-sriov-operator.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/rdma-configuring-the-nvidia-network-operator.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/rdma-configuring-the-gpu-operator.adoc[leveloffset=+1]
|
||||
|
||||
323
modules/rdma-configuring-the-gpu-operator.adoc
Normal file
323
modules/rdma-configuring-the-gpu-operator.adoc
Normal file
@@ -0,0 +1,323 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-configuring-the-gpu-operator_{context}"]
|
||||
|
||||
= Configuring the GPU Operator
|
||||
|
||||
The GPU Operator automates the management of the NVIDIA drivers, device plugins for GPUs, the NVIDIA Container Toolkit, and other components required for GPU provisioning.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the GPU Operator.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Check that the Operator pod is running to look at the pods under the namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n nvidia-gpu-operator
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
gpu-operator-b4cb7d74-zxpwq 1/1 Running 0 32s
|
||||
----
|
||||
|
||||
. Create a GPU cluster policy custom resource file similar to the following example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: nvidia.com/v1
|
||||
kind: ClusterPolicy
|
||||
metadata:
|
||||
name: gpu-cluster-policy
|
||||
spec:
|
||||
vgpuDeviceManager:
|
||||
config:
|
||||
default: default
|
||||
enabled: true
|
||||
migManager:
|
||||
config:
|
||||
default: all-disabled
|
||||
name: default-mig-parted-config
|
||||
enabled: true
|
||||
operator:
|
||||
defaultRuntime: crio
|
||||
initContainer: {}
|
||||
runtimeClass: nvidia
|
||||
use_ocp_driver_toolkit: true
|
||||
dcgm:
|
||||
enabled: true
|
||||
gfd:
|
||||
enabled: true
|
||||
dcgmExporter:
|
||||
config:
|
||||
name: ''
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
enabled: true
|
||||
cdi:
|
||||
default: false
|
||||
enabled: false
|
||||
driver:
|
||||
licensingConfig:
|
||||
nlsEnabled: true
|
||||
configMapName: ''
|
||||
certConfig:
|
||||
name: ''
|
||||
rdma:
|
||||
enabled: false
|
||||
kernelModuleConfig:
|
||||
name: ''
|
||||
upgradePolicy:
|
||||
autoUpgrade: true
|
||||
drain:
|
||||
deleteEmptyDir: false
|
||||
enable: false
|
||||
force: false
|
||||
timeoutSeconds: 300
|
||||
maxParallelUpgrades: 1
|
||||
maxUnavailable: 25%
|
||||
podDeletion:
|
||||
deleteEmptyDir: false
|
||||
force: false
|
||||
timeoutSeconds: 300
|
||||
waitForCompletion:
|
||||
timeoutSeconds: 0
|
||||
repoConfig:
|
||||
configMapName: ''
|
||||
virtualTopology:
|
||||
config: ''
|
||||
enabled: true
|
||||
useNvidiaDriverCRD: false
|
||||
useOpenKernelModules: true
|
||||
devicePlugin:
|
||||
config:
|
||||
name: ''
|
||||
default: ''
|
||||
mps:
|
||||
root: /run/nvidia/mps
|
||||
enabled: true
|
||||
gdrcopy:
|
||||
enabled: true
|
||||
kataManager:
|
||||
config:
|
||||
artifactsDir: /opt/nvidia-gpu-operator/artifacts/runtimeclasses
|
||||
mig:
|
||||
strategy: single
|
||||
sandboxDevicePlugin:
|
||||
enabled: true
|
||||
validator:
|
||||
plugin:
|
||||
env:
|
||||
- name: WITH_WORKLOAD
|
||||
value: 'false'
|
||||
nodeStatusExporter:
|
||||
enabled: true
|
||||
daemonsets:
|
||||
rollingUpdate:
|
||||
maxUnavailable: '1'
|
||||
updateStrategy: RollingUpdate
|
||||
sandboxWorkloads:
|
||||
defaultWorkload: container
|
||||
enabled: false
|
||||
gds:
|
||||
enabled: true
|
||||
image: nvidia-fs
|
||||
version: 2.20.5
|
||||
repository: nvcr.io/nvidia/cloud-native
|
||||
vgpuManager:
|
||||
enabled: false
|
||||
vfioManager:
|
||||
enabled: true
|
||||
toolkit:
|
||||
installDir: /usr/local/nvidia
|
||||
enabled: true
|
||||
----
|
||||
|
||||
. When the GPU `ClusterPolicy` custom resource has generated, create the resource on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f gpu-cluster-policy.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
clusterpolicy.nvidia.com/gpu-cluster-policy created
|
||||
----
|
||||
|
||||
. Validate that the Operator is installed and running by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n nvidia-gpu-operator
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
gpu-feature-discovery-d5ngn 1/1 Running 0 3m20s
|
||||
gpu-feature-discovery-z42rx 1/1 Running 0 3m23s
|
||||
gpu-operator-6bb4d4b4c5-njh78 1/1 Running 0 4m35s
|
||||
nvidia-container-toolkit-daemonset-bkh8l 1/1 Running 0 3m20s
|
||||
nvidia-container-toolkit-daemonset-c4hzm 1/1 Running 0 3m23s
|
||||
nvidia-cuda-validator-4blvg 0/1 Completed 0 106s
|
||||
nvidia-cuda-validator-tw8sl 0/1 Completed 0 112s
|
||||
nvidia-dcgm-exporter-rrw4g 1/1 Running 0 3m20s
|
||||
nvidia-dcgm-exporter-xc78t 1/1 Running 0 3m23s
|
||||
nvidia-dcgm-nvxpf 1/1 Running 0 3m20s
|
||||
nvidia-dcgm-snj4j 1/1 Running 0 3m23s
|
||||
nvidia-device-plugin-daemonset-fk2xz 1/1 Running 0 3m23s
|
||||
nvidia-device-plugin-daemonset-wq87j 1/1 Running 0 3m20s
|
||||
nvidia-driver-daemonset-416.94.202410211619-0-ngrjg 4/4 Running 0 3m58s
|
||||
nvidia-driver-daemonset-416.94.202410211619-0-tm4x6 4/4 Running 0 3m58s
|
||||
nvidia-node-status-exporter-jlzxh 1/1 Running 0 3m57s
|
||||
nvidia-node-status-exporter-zjffs 1/1 Running 0 3m57s
|
||||
nvidia-operator-validator-l49hx 1/1 Running 0 3m20s
|
||||
nvidia-operator-validator-n44nn 1/1 Running 0 3m23s
|
||||
----
|
||||
|
||||
. Optional: When you have verified the pods are running, remote shell into the NVIDIA driver daemonset pod and confirm that the NVIDIA modules are loaded. Specifically, ensure the `nvidia_peermem` is loaded.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc rsh -n nvidia-gpu-operator $(oc -n nvidia-gpu-operator get pod -o name -l app.kubernetes.io/component=nvidia-driver)
|
||||
sh-4.4# lsmod|grep nvidia
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nvidia_fs 327680 0
|
||||
nvidia_peermem 24576 0
|
||||
nvidia_modeset 1507328 0
|
||||
video 73728 1 nvidia_modeset
|
||||
nvidia_uvm 6889472 8
|
||||
nvidia 8810496 43 nvidia_uvm,nvidia_peermem,nvidia_fs,gdrdrv,nvidia_modeset
|
||||
ib_uverbs 217088 3 nvidia_peermem,rdma_ucm,mlx5_ib
|
||||
drm 741376 5 drm_kms_helper,drm_shmem_helper,nvidia,mgag200
|
||||
----
|
||||
|
||||
. Optional: Run the `nvidia-smi` utility to show the details about the driver and the hardware:
|
||||
[source,terminal]
|
||||
----
|
||||
sh-4.4# nvidia-smi
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
Wed Nov 6 22:03:53 2024
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|
||||
|-----------------------------------------+------------------------+----------------------+
|
||||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|=========================================+========================+======================|
|
||||
| 0 NVIDIA A40 On | 00000000:61:00.0 Off | 0 |
|
||||
| 0% 37C P0 88W / 300W | 1MiB / 46068MiB | 0% Default |
|
||||
| | | N/A |
|
||||
+-----------------------------------------+------------------------+----------------------+
|
||||
| 1 NVIDIA A40 On | 00000000:E1:00.0 Off | 0 |
|
||||
| 0% 28C P8 29W / 300W | 1MiB / 46068MiB | 0% Default |
|
||||
| | | N/A |
|
||||
+-----------------------------------------+------------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=========================================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
----
|
||||
|
||||
. While you are still in the driver pod, set the GPU clock to maximum using the `nvidia-smi` command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc rsh -n nvidia-gpu-operator nvidia-driver-daemonset-416.94.202410172137-0-ndhzc
|
||||
sh-4.4# nvidia-smi -i 0 -lgc $(nvidia-smi -i 0 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1)
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:61:00.0
|
||||
All done.
|
||||
----
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
sh-4.4# nvidia-smi -i 1 -lgc $(nvidia-smi -i 1 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1)
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:E1:00.0
|
||||
All done.
|
||||
----
|
||||
|
||||
. Validate the resource is available from a node describe perspective by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A9
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
Capacity:
|
||||
cpu: 128
|
||||
ephemeral-storage: 1561525616Ki
|
||||
hugepages-1Gi: 0
|
||||
hugepages-2Mi: 0
|
||||
memory: 263596712Ki
|
||||
nvidia.com/gpu: 2
|
||||
pods: 250
|
||||
rdma/rdma_shared_device_eth: 63
|
||||
rdma/rdma_shared_device_ib: 63
|
||||
Allocatable:
|
||||
cpu: 127500m
|
||||
ephemeral-storage: 1438028263499
|
||||
hugepages-1Gi: 0
|
||||
hugepages-2Mi: 0
|
||||
memory: 262445736Ki
|
||||
nvidia.com/gpu: 2
|
||||
pods: 250
|
||||
rdma/rdma_shared_device_eth: 63
|
||||
rdma/rdma_shared_device_ib: 63
|
||||
--
|
||||
Capacity:
|
||||
cpu: 128
|
||||
ephemeral-storage: 1561525616Ki
|
||||
hugepages-1Gi: 0
|
||||
hugepages-2Mi: 0
|
||||
memory: 263596672Ki
|
||||
nvidia.com/gpu: 2
|
||||
pods: 250
|
||||
rdma/rdma_shared_device_eth: 63
|
||||
rdma/rdma_shared_device_ib: 63
|
||||
Allocatable:
|
||||
cpu: 127500m
|
||||
ephemeral-storage: 1438028263499
|
||||
hugepages-1Gi: 0
|
||||
hugepages-2Mi: 0
|
||||
memory: 262445696Ki
|
||||
nvidia.com/gpu: 2
|
||||
pods: 250
|
||||
rdma/rdma_shared_device_eth: 63
|
||||
rdma/rdma_shared_device_ib: 63
|
||||
----
|
||||
238
modules/rdma-configuring-the-nfd-operator.adoc
Normal file
238
modules/rdma-configuring-the-nfd-operator.adoc
Normal file
@@ -0,0 +1,238 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-configuring-the-nfd-operator_{context}"]
|
||||
|
||||
= Configuring the NFD Operator
|
||||
|
||||
The Node Feature Discovery (NFD) Operator manages the detection of hardware features and configuration in an {product-title} cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the NFD Operator.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Validate that the Operator is installed and running by looking at the pods in the `openshift-nfd` namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n openshift-nfd
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nfd-controller-manager-8698c88cdd-t8gbc 2/2 Running 0 2m
|
||||
----
|
||||
|
||||
. With the NFD controller running, generate the `NodeFeatureDiscovery` instance and add it to the cluster.
|
||||
+
|
||||
The `ClusterServiceVersion` specification for NFD Operator provides default values, including the NFD operand image that is part of the Operator payload. Retrieve its value by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ NFD_OPERAND_IMAGE=`echo $(oc get csv -n openshift-nfd -o json | jq -r '.items[0].metadata.annotations["alm-examples"]') | jq -r '.[] | select(.kind == "NodeFeatureDiscovery") | .spec.operand.image'`
|
||||
----
|
||||
|
||||
. Optional: Add entries to the default `deviceClasseWhiteList` field, to support more network adapters, such as the NVIDIA BlueField DPUs.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
apiVersion: nfd.openshift.io/v1
|
||||
kind: NodeFeatureDiscovery
|
||||
metadata:
|
||||
name: nfd-instance
|
||||
namespace: openshift-nfd
|
||||
spec:
|
||||
instance: ''
|
||||
operand:
|
||||
image: '${NFD_OPERAND_IMAGE}'
|
||||
servicePort: 12000
|
||||
prunerOnDelete: false
|
||||
topologyUpdater: false
|
||||
workerConfig:
|
||||
configData: |
|
||||
core:
|
||||
sleepInterval: 60s
|
||||
sources:
|
||||
pci:
|
||||
deviceClassWhitelist:
|
||||
- "02"
|
||||
- "03"
|
||||
- "0200"
|
||||
- "0207"
|
||||
- "12"
|
||||
deviceLabelFields:
|
||||
- "vendor"
|
||||
----
|
||||
|
||||
. Create the 'NodeFeatureDiscovery` instance by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f nfd-instance.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nodefeaturediscovery.nfd.openshift.io/nfd-instance created
|
||||
----
|
||||
|
||||
. Validate that the instance is up and running by looking at the pods under the `openshift-nfd` namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n openshift-nfd
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nfd-controller-manager-7cb6d656-jcnqb 2/2 Running 0 4m
|
||||
nfd-gc-7576d64889-s28k9 1/1 Running 0 21s
|
||||
nfd-master-b7bcf5cfd-qnrmz 1/1 Running 0 21s
|
||||
nfd-worker-96pfh 1/1 Running 0 21s
|
||||
nfd-worker-b2gkg 1/1 Running 0 21s
|
||||
nfd-worker-bd9bk 1/1 Running 0 21s
|
||||
nfd-worker-cswf4 1/1 Running 0 21s
|
||||
nfd-worker-kp6gg 1/1 Running 0 21s
|
||||
----
|
||||
|
||||
. Wait a short period of time and then verify that NFD has added labels to the node. The NFD labels are prefixed with `feature.node.kubernetes.io`, so you can easily filter them.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get node -o json | jq '.items[0].metadata.labels | with_entries(select(.key | startswith("feature.node.kubernetes.io")))'
|
||||
{
|
||||
"feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.AVX": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.AVX2": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.CETSS": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.CLZERO": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.CPBOOST": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.EFER_LMSLE_UNS": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.FMA3": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.FP256": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.FSRM": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.FXSR": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.FXSROPT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBPB": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBRS": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBRS_PREFERRED": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBRS_PROVIDES_SMP": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBS": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSBRNTRGT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSFETCHSAM": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSFFV": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSOPCNT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSOPCNTEXT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSOPSAM": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSRDWROPCNT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBSRIPINVALIDCHK": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBS_FETCH_CTLX": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBS_OPFUSE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.IBS_PREVENTHOST": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.INT_WBINVD": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.INVLPGB": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.LAHF": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.LBRVIRT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MCAOVERFLOW": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MCOMMIT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MOVBE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MOVU": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MSRIRC": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.MSR_PAGEFLUSH": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.NRIPS": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.OSXSAVE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.PPIN": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.PSFD": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.RDPRU": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_64BIT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_ALTERNATIVE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_DEBUGSWAP": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_ES": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_RESTRICTED": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SEV_SNP": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SHA": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SME": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SME_COHERENT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SPEC_CTRL_SSBD": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SSE4A": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.STIBP": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.STIBP_ALWAYSON": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SUCCOR": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVM": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVMDA": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVMFBASID": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVML": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVMNP": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVMPF": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SVMPFT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SYSCALL": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.SYSEE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.TLB_FLUSH_NESTED": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.TOPEXT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.TSCRATEMSR": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VAES": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VMCBCLEAN": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VMPL": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VMSA_REGPROT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VPCLMULQDQ": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.VTE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.WBNOINVD": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.X87": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.XGETBV1": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.XSAVE": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.XSAVEC": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT": "true",
|
||||
"feature.node.kubernetes.io/cpu-cpuid.XSAVES": "true",
|
||||
"feature.node.kubernetes.io/cpu-hardware_multithreading": "false",
|
||||
"feature.node.kubernetes.io/cpu-model.family": "25",
|
||||
"feature.node.kubernetes.io/cpu-model.id": "1",
|
||||
"feature.node.kubernetes.io/cpu-model.vendor_id": "AMD",
|
||||
"feature.node.kubernetes.io/kernel-config.NO_HZ": "true",
|
||||
"feature.node.kubernetes.io/kernel-config.NO_HZ_FULL": "true",
|
||||
"feature.node.kubernetes.io/kernel-selinux.enabled": "true",
|
||||
"feature.node.kubernetes.io/kernel-version.full": "5.14.0-427.35.1.el9_4.x86_64",
|
||||
"feature.node.kubernetes.io/kernel-version.major": "5",
|
||||
"feature.node.kubernetes.io/kernel-version.minor": "14",
|
||||
"feature.node.kubernetes.io/kernel-version.revision": "0",
|
||||
"feature.node.kubernetes.io/memory-numa": "true",
|
||||
"feature.node.kubernetes.io/network-sriov.capable": "true",
|
||||
"feature.node.kubernetes.io/pci-102b.present": "true",
|
||||
"feature.node.kubernetes.io/pci-10de.present": "true",
|
||||
"feature.node.kubernetes.io/pci-10de.sriov.capable": "true",
|
||||
"feature.node.kubernetes.io/pci-15b3.present": "true",
|
||||
"feature.node.kubernetes.io/pci-15b3.sriov.capable": "true",
|
||||
"feature.node.kubernetes.io/rdma.available": "true",
|
||||
"feature.node.kubernetes.io/rdma.capable": "true",
|
||||
"feature.node.kubernetes.io/storage-nonrotationaldisk": "true",
|
||||
"feature.node.kubernetes.io/system-os_release.ID": "rhcos",
|
||||
"feature.node.kubernetes.io/system-os_release.OPENSHIFT_VERSION": "4.17",
|
||||
"feature.node.kubernetes.io/system-os_release.OSTREE_VERSION": "417.94.202409121747-0",
|
||||
"feature.node.kubernetes.io/system-os_release.RHEL_VERSION": "9.4",
|
||||
"feature.node.kubernetes.io/system-os_release.VERSION_ID": "4.17",
|
||||
"feature.node.kubernetes.io/system-os_release.VERSION_ID.major": "4",
|
||||
"feature.node.kubernetes.io/system-os_release.VERSION_ID.minor": "17"
|
||||
}
|
||||
----
|
||||
|
||||
. Confirm there is a network device that is discovered:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc describe node | grep -E 'Roles|pci' | grep pci-15b3
|
||||
feature.node.kubernetes.io/pci-15b3.present=true
|
||||
feature.node.kubernetes.io/pci-15b3.sriov.capable=true
|
||||
feature.node.kubernetes.io/pci-15b3.present=true
|
||||
feature.node.kubernetes.io/pci-15b3.sriov.capable=true
|
||||
----
|
||||
144
modules/rdma-configuring-the-nmstate-operator.adoc
Normal file
144
modules/rdma-configuring-the-nmstate-operator.adoc
Normal file
@@ -0,0 +1,144 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-configuring-the-nmstate-operator_{context}"]
|
||||
|
||||
= Configuring the NMState Operator
|
||||
|
||||
You need to configure network interfaces on the nodes that were not configured at initial cluster creation time. The NMState operator is designed for those use cases.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the NMState Operator.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Validate that the Operator is installed and running by looking at the pods in the `openshift-nmstate` namespace:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n openshift-nmstate
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nmstate-operator-d587966c9-qkl5m 1/1 Running 0 43s
|
||||
----
|
||||
|
||||
. Create a custom resource file for the required nmstate instance:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: nmstate.io/v1
|
||||
kind: NMState
|
||||
metadata:
|
||||
name: nmstate
|
||||
----
|
||||
|
||||
. Create the instance on the cluster:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f nmstate-instance.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nmstate.nmstate.io/nmstate created
|
||||
----
|
||||
|
||||
. Validate the instance is running:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n openshift-nmstate
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nmstate-cert-manager-6dc78dc6bf-ds7kj 1/1 Running 0 17s
|
||||
nmstate-console-plugin-5b7595c56c-tgzbw 1/1 Running 0 17s
|
||||
nmstate-handler-lxkd5 1/1 Running 0 17s
|
||||
nmstate-operator-d587966c9-qkl5m 1/1 Running 0 3m27s
|
||||
nmstate-webhook-54dbd47d9d-cvsf6 0/1 Running 0 17s
|
||||
----
|
||||
|
||||
. Build a `NodeNetworkConfigurationPolicy`. The example below configures a static ipaddress on the `ens8f0np0` interface on `vd-srv-32`.
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: nmstate.io/v1
|
||||
kind: NodeNetworkConfigurationPolicy
|
||||
metadata:
|
||||
name: ens8f0np0-policy
|
||||
spec:
|
||||
nodeSelector:
|
||||
kubernetes.io/hostname: nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com
|
||||
desiredState:
|
||||
interfaces:
|
||||
- name: ens8f0np0
|
||||
description: Configuring ens8f0np0 on nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com
|
||||
type: ethernet
|
||||
state: up
|
||||
ipv4:
|
||||
dhcp: false
|
||||
address:
|
||||
- ip: 10.6.145.32
|
||||
prefix-length: 24
|
||||
enabled: true
|
||||
----
|
||||
|
||||
. Create the custom resource file on the cluster:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f nncp-static-ip.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nodenetworkconfigurationpolicy.nmstate.io/ens8f0np0-policy created
|
||||
----
|
||||
|
||||
. Validate the policy is successfully configured:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nncp -A
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME STATUS REASON
|
||||
ens8f0np0-policy Available SuccessfullyConfigured
|
||||
----
|
||||
|
||||
. Validate the ipaddress is set by viewing inside the node at the interface:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc debug node/nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com
|
||||
Starting pod/nvd-srv-32nvidiaengrdu2dcredhatcom-debug-8mx6q ...
|
||||
To use host binaries, run `chroot /host`
|
||||
Pod IP: 10.6.135.11
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
sh-5.1# chroot /host
|
||||
|
||||
sh-5.1# ip address show dev ens8f0np0
|
||||
96: ens8f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
|
||||
link/ether 58:a2:e1:e1:42:78 brd ff:ff:ff:ff:ff:ff
|
||||
altname enp160s0f0np0
|
||||
inet 10.6.145.32/24 brd 10.6.145.255 scope global noprefixroute ens8f0np0
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 fe80::c397:5afa:d618:e752/64 scope link noprefixroute
|
||||
valid_lft forever preferred_lft forever
|
||||
----
|
||||
242
modules/rdma-configuring-the-nvidia-network-operator.adoc
Normal file
242
modules/rdma-configuring-the-nvidia-network-operator.adoc
Normal file
@@ -0,0 +1,242 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-configuring-the-nvidia-network-operator_{context}"]
|
||||
|
||||
= Configuring the NVIDIA network Operator
|
||||
|
||||
The NVIDIA network Operator manages NVIDIA networking resources and networking related components such as drivers and device plugins to enable NVIDIA GPUDirect RDMA workloads.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the NVIDIA network Operator.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Validate that the network Operator is installed and running by confirming the controller is running in the `nvidia-network-operator` namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n nvidia-network-operator
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nvidia-network-operator-controller-manager-6f7d6956cd-fw5wg 1/1 Running 0 5m
|
||||
----
|
||||
|
||||
. With the Operator running, create the `NicClusterPolicy` custom resource file. The device you choose depends on your system configuration. In this example, the Infiniband interface `ibs2f0` is hard coded and is used as the shared NVIDIA GPUDirect RDMA device.
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: mellanox.com/v1alpha1
|
||||
kind: NicClusterPolicy
|
||||
metadata:
|
||||
name: nic-cluster-policy
|
||||
spec:
|
||||
nicFeatureDiscovery:
|
||||
image: nic-feature-discovery
|
||||
repository: ghcr.io/mellanox
|
||||
version: v0.0.1
|
||||
docaTelemetryService:
|
||||
image: doca_telemetry
|
||||
repository: nvcr.io/nvidia/doca
|
||||
version: 1.16.5-doca2.6.0-host
|
||||
rdmaSharedDevicePlugin:
|
||||
config: |
|
||||
{
|
||||
"configList": [
|
||||
{
|
||||
"resourceName": "rdma_shared_device_ib",
|
||||
"rdmaHcaMax": 63,
|
||||
"selectors": {
|
||||
"ifNames": ["ibs2f0"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"resourceName": "rdma_shared_device_eth",
|
||||
"rdmaHcaMax": 63,
|
||||
"selectors": {
|
||||
"ifNames": ["ens8f0np0"]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
image: k8s-rdma-shared-dev-plugin
|
||||
repository: ghcr.io/mellanox
|
||||
version: v1.5.1
|
||||
secondaryNetwork:
|
||||
ipoib:
|
||||
image: ipoib-cni
|
||||
repository: ghcr.io/mellanox
|
||||
version: v1.2.0
|
||||
nvIpam:
|
||||
enableWebhook: false
|
||||
image: nvidia-k8s-ipam
|
||||
repository: ghcr.io/mellanox
|
||||
version: v0.2.0
|
||||
ofedDriver:
|
||||
readinessProbe:
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
forcePrecompiled: false
|
||||
terminationGracePeriodSeconds: 300
|
||||
livenessProbe:
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
upgradePolicy:
|
||||
autoUpgrade: true
|
||||
drain:
|
||||
deleteEmptyDir: true
|
||||
enable: true
|
||||
force: true
|
||||
timeoutSeconds: 300
|
||||
podSelector: ''
|
||||
maxParallelUpgrades: 1
|
||||
safeLoad: false
|
||||
waitForCompletion:
|
||||
timeoutSeconds: 0
|
||||
startupProbe:
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 20
|
||||
image: doca-driver
|
||||
repository: nvcr.io/nvidia/mellanox
|
||||
version: 24.10-0.7.0.0-0
|
||||
env:
|
||||
- name: UNLOAD_STORAGE_MODULES
|
||||
value: "true"
|
||||
- name: RESTORE_DRIVER_ON_POD_TERMINATION
|
||||
value: "true"
|
||||
- name: CREATE_IFNAMES_UDEV
|
||||
value: "true"
|
||||
----
|
||||
|
||||
. Create the `NicClusterPolicy` custom resource on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f network-sharedrdma-nic-cluster-policy.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nicclusterpolicy.mellanox.com/nic-cluster-policy created
|
||||
----
|
||||
|
||||
. Validate the `NicClusterPolicy` by running the following command in the DOCA/MOFED container:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n nvidia-network-operator
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
doca-telemetry-service-hwj65 1/1 Running 2 160m
|
||||
kube-ipoib-cni-ds-fsn8g 1/1 Running 2 160m
|
||||
mofed-rhcos4.16-9b5ddf4c6-ds-ct2h5 2/2 Running 4 160m
|
||||
nic-feature-discovery-ds-dtksz 1/1 Running 2 160m
|
||||
nv-ipam-controller-854585f594-c5jpp 1/1 Running 2 160m
|
||||
nv-ipam-controller-854585f594-xrnp5 1/1 Running 2 160m
|
||||
nv-ipam-node-xqttl 1/1 Running 2 160m
|
||||
nvidia-network-operator-controller-manager-5798b564cd-5cq99 1/1 Running 2 5d23h
|
||||
rdma-shared-dp-ds-p9vvg 1/1 Running 0 85m
|
||||
----
|
||||
|
||||
. `rsh` into the `mofed` container to check the status by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ MOFED_POD=$(oc get pods -n nvidia-network-operator -o name | grep mofed)
|
||||
$ oc rsh -n nvidia-network-operator -c mofed-container ${MOFED_POD}
|
||||
sh-5.1# ofed_info -s
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
OFED-internal-24.07-0.6.1:
|
||||
----
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
sh-5.1# ibdev2netdev -v
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
0000:0d:00.0 mlx5_0 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ibs2f0 (Up)
|
||||
0000:a0:00.0 mlx5_1 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ens8f0np0 (Up)
|
||||
----
|
||||
|
||||
. Create a `IPoIBNetwork` custom resource file:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: mellanox.com/v1alpha1
|
||||
kind: IPoIBNetwork
|
||||
metadata:
|
||||
name: example-ipoibnetwork
|
||||
spec:
|
||||
ipam: |
|
||||
{
|
||||
"type": "whereabouts",
|
||||
"range": "192.168.6.225/28",
|
||||
"exclude": [
|
||||
"192.168.6.229/30",
|
||||
"192.168.6.236/32"
|
||||
]
|
||||
}
|
||||
master: ibs2f0
|
||||
networkNamespace: default
|
||||
----
|
||||
|
||||
. Create the `IPoIBNetwork` resource on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f ipoib-network.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
ipoibnetwork.mellanox.com/example-ipoibnetwork created
|
||||
----
|
||||
|
||||
. Create a `MacvlanNetwork` custom resource file for your other interface:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: mellanox.com/v1alpha1
|
||||
kind: MacvlanNetwork
|
||||
metadata:
|
||||
name: rdmashared-net
|
||||
spec:
|
||||
networkNamespace: default
|
||||
master: ens8f0np0
|
||||
mode: bridge
|
||||
mtu: 1500
|
||||
ipam: '{"type": "whereabouts", "range": "192.168.2.0/24", "gateway": "192.168.2.1"}'
|
||||
----
|
||||
|
||||
. Create the resource on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f macvlan-network.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
macvlannetwork.mellanox.com/rdmashared-net created
|
||||
----
|
||||
71
modules/rdma-configuring-the-sriov-operator.adoc
Normal file
71
modules/rdma-configuring-the-sriov-operator.adoc
Normal file
@@ -0,0 +1,71 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-configuring-the-sriov-operator_{context}"]
|
||||
|
||||
= Configuring the SR-IOV Operator
|
||||
|
||||
Single root I/O virtualization (SR-IOV) enhances the performance of NVIDIA GPUDirect RDMA by providing sharing across multiple pods from a single device.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the SR-IOV Operator.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Validate that the Operator is installed and running by looking at the pods in the `openshift-sriov-network-operator` namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pods -n openshift-sriov-network-operator
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
sriov-network-operator-7cb6c49868-89486 1/1 Running 0 22s
|
||||
----
|
||||
|
||||
. For the default `SriovOperatorConfig` CR to work with the MLNX_OFED container, run this command to update the following values:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: sriovnetwork.openshift.io/v1
|
||||
kind: SriovOperatorConfig
|
||||
metadata:
|
||||
name: default
|
||||
namespace: openshift-sriov-network-operator
|
||||
spec:
|
||||
enableInjector: true
|
||||
enableOperatorWebhook: true
|
||||
logLevel: 2
|
||||
----
|
||||
|
||||
. Create the resource on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f sriov-operator-config.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
sriovoperatorconfig.sriovnetwork.openshift.io/default created
|
||||
----
|
||||
|
||||
. Patch the sriov-operator so the MOFED container can work with it by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "configDaemonNodeSelector": { "network.nvidia.com/operator.mofed.wait": "false", "node-role.kubernetes.io/worker": "", "feature.node.kubernetes.io/pci-15b3.sriov.capable": "true" } } }'
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
sriovoperatorconfig.sriovnetwork.openshift.io/default patched
|
||||
----
|
||||
51
modules/rdma-creating-a-service-user.adoc
Normal file
51
modules/rdma-creating-a-service-user.adoc
Normal file
@@ -0,0 +1,51 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-creating-a-service-user_{context}"]
|
||||
|
||||
= Creating a service user
|
||||
|
||||
This section describes how to create a service account and user privileges for NVIDIA GPUDirect RDMA.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Generate a service account CRD to use in the `default` namespace:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: rdma
|
||||
namespace: default
|
||||
----
|
||||
|
||||
. Create the account on your cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f default-serviceaccount.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
serviceaccount/rdma created
|
||||
----
|
||||
|
||||
. Add user privileges to the account by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n default adm policy add-scc-to-user privileged -z rdma
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "rdma"
|
||||
----
|
||||
|
||||
|
||||
92
modules/rdma-creating-persistent-naming-rules.adoc
Normal file
92
modules/rdma-creating-persistent-naming-rules.adoc
Normal file
@@ -0,0 +1,92 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-creating-persistent-naming-rules_{context}"]
|
||||
|
||||
= Creating persistent naming rules
|
||||
|
||||
In some cases, device names won't persist following a reboot. For example, on R760xa systems Mellanox devices might be renamed after a reboot. You can avoid this problem by using a `MachineConfig` to set persistence.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Gather the MAC address names from the worker nodes for the node into a file and provide names for the interfaces that need to persist. This example uses the file `70-persistent-net.rules` and stashes the details in it.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ cat <<EOF > 70-persistent-net.rules
|
||||
SUBSYSTEM=="net",ACTION=="add",ATTR{address}=="b8:3f:d2:3b:51:28",ATTR{type}=="1",NAME="ibs2f0"
|
||||
SUBSYSTEM=="net",ACTION=="add",ATTR{address}=="b8:3f:d2:3b:51:29",ATTR{type}=="1",NAME="ens8f0np0"
|
||||
SUBSYSTEM=="net",ACTION=="add",ATTR{address}=="b8:3f:d2:f0:36:d0",ATTR{type}=="1",NAME="ibs2f0"
|
||||
SUBSYSTEM=="net",ACTION=="add",ATTR{address}=="b8:3f:d2:f0:36:d1",ATTR{type}=="1",NAME="ens8f0np0"
|
||||
EOF
|
||||
----
|
||||
|
||||
. Convert that file into a base64 string without line breaks and set the output to the variable `PERSIST`:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ PERSIST=`cat 70-persistent-net.rules| base64 -w 0`
|
||||
|
||||
$ echo $PERSIST
|
||||
U1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIK
|
||||
----
|
||||
|
||||
. Create a machine configuration and set the base64 encoding in the custom resource file by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ cat <<EOF > 99-machine-config-udev-network.yaml
|
||||
----
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: MachineConfig
|
||||
metadata:
|
||||
labels:
|
||||
machineconfiguration.openshift.io/role: worker
|
||||
name: 99-machine-config-udev-network
|
||||
spec:
|
||||
config:
|
||||
ignition:
|
||||
version: 3.2.0
|
||||
storage:
|
||||
files:
|
||||
- contents:
|
||||
source: data:text/plain;base64,$PERSIST
|
||||
filesystem: root
|
||||
mode: 420
|
||||
path: /etc/udev/rules.d/70-persistent-net.rules
|
||||
----
|
||||
|
||||
. Create the machine configuration on the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f 99-machine-config-udev-network.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
machineconfig.machineconfiguration.openshift.io/99-machine-config-udev-network created
|
||||
----
|
||||
|
||||
. Use the `get mcp` command to view the machine configuration status:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get mcp
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
|
||||
master rendered-master-9adfe851c2c14d9598eea5ec3df6c187 True False False 1 1 1 0 6h21m
|
||||
worker rendered-worker-4568f1b174066b4b1a4de794cf538fee False True False 2 0 0 0 6h21m
|
||||
----
|
||||
|
||||
The nodes will reboot and when the updating field returns to `false`, you can validate on the nodes by looking at the devices in a debug pod.
|
||||
60
modules/rdma-disabling-irdma-kernel-module.adoc
Normal file
60
modules/rdma-disabling-irdma-kernel-module.adoc
Normal file
@@ -0,0 +1,60 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-disabling-irdma-kernel-module_{context}"]
|
||||
|
||||
= Disabling the IRDMA kernel module
|
||||
|
||||
On some systems, including the DellR750xa, the IRDMA kernel module creates problems for the NVIDIA Network Operator when unloading and loading the DOCA drivers. Use the following procedure to disable the module.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Generate the following machine configuration file by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ cat <<EOF > 99-machine-config-blacklist-irdma.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: MachineConfig
|
||||
metadata:
|
||||
labels:
|
||||
machineconfiguration.openshift.io/role: worker
|
||||
name: 99-worker-blacklist-irdma
|
||||
spec:
|
||||
kernelArguments:
|
||||
- "module_blacklist=irdma"
|
||||
----
|
||||
|
||||
. Create the machine configuration on the cluster and wait for the nodes to reboot by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc create -f 99-machine-config-blacklist-irdma.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
machineconfig.machineconfiguration.openshift.io/99-worker-blacklist-irdma created
|
||||
----
|
||||
|
||||
. Validate in a debug pod on each node that the module has not loaded by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc debug node/nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com
|
||||
Starting pod/nvd-srv-32nvidiaengrdu2dcredhatcom-debug-btfj2 ...
|
||||
To use host binaries, run `chroot /host`
|
||||
Pod IP: 10.6.135.11
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
sh-5.1# chroot /host
|
||||
sh-5.1# lsmod|grep irdma
|
||||
sh-5.1#
|
||||
----
|
||||
12
modules/rdma-prerequisites.adoc
Normal file
12
modules/rdma-prerequisites.adoc
Normal file
@@ -0,0 +1,12 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * hardware_accelerators/rdma-remote-direct-memory-access.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="rdma-prerequisites_{context}"]
|
||||
|
||||
= NVIDIA GPUDirect RDMA prerequisites
|
||||
|
||||
All methods of NVIDIA GPUDirect RDMA configuration require the installation of specific Operators.
|
||||
Use the following steps to install the Operators:
|
||||
|
||||
Reference in New Issue
Block a user