1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00
Files
openshift-docs/modules/nvidia-gpu-aws-deploying-the-node-feature-discovery-operator.adoc
2025-10-07 12:26:32 -04:00

80 lines
3.1 KiB
Plaintext

// Module included in the following assemblies:
//
// * machine_management/creating_machinesets/creating-machineset-aws.adoc
// * machine_management/creating_machinesets/creating-machineset-gcp.adoc
// * machine_management/creating_machinesets/creating-machineset-azure.adoc
:_mod-docs-content-type: PROCEDURE
[id="nvidia-gpu-aws-deploying-the-node-feature-discovery-operator_{context}"]
= Deploying the Node Feature Discovery Operator
After the GPU-enabled node is created, you need to discover the GPU-enabled node so it can be scheduled. To do this, install the Node Feature Discovery (NFD) Operator. The NFD Operator identifies hardware device features in nodes. It solves the general problem of identifying and cataloging hardware resources in the infrastructure nodes so they can be made available to {product-title}.
.Procedure
. Install the Node Feature Discovery Operator from the software catalog in the {product-title} console.
. After installing the NFD Operator, select *Node Feature Discovery* from the installed Operators list and select *Create instance*. This installs the `nfd-master` and `nfd-worker` pods, one `nfd-worker` pod for each compute node, in the `openshift-nfd` namespace.
. Verify that the Operator is installed and running by running the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-nfd
----
+
.Example output
+
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
nfd-controller-manager-8646fcbb65-x5qgk 2/2 Running 7 (8h ago) 1d
----
. Browse to the installed Oerator in the console and select *Create Node Feature Discovery*.
. Select *Create* to build a NFD custom resource. This creates NFD pods in the `openshift-nfd` namespace that poll the {product-title} nodes for hardware resources and catalogue them.
.Verification
. After a successful build, verify that a NFD pod is running on each nodes by running the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-nfd
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
nfd-controller-manager-8646fcbb65-x5qgk 2/2 Running 7 (8h ago) 12d
nfd-master-769656c4cb-w9vrv 1/1 Running 0 12d
nfd-worker-qjxb2 1/1 Running 3 (3d14h ago) 12d
nfd-worker-xtz9b 1/1 Running 5 (3d14h ago) 12d
----
+
The NFD Operator uses vendor PCI IDs to identify hardware in a node. NVIDIA uses the PCI ID `10de`.
. View the NVIDIA GPU discovered by the NFD Operator by running the following command:
+
[source,terminal]
----
$ oc describe node ip-10-0-132-138.us-east-2.compute.internal | egrep 'Roles|pci'
----
+
.Example output
[source,terminal]
----
Roles: worker
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-1d0f.present=true
----
+
`10de` appears in the node feature list for the GPU-enabled node. This mean the NFD Operator correctly identified the node from the GPU-enabled MachineSet.