1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 21:46:22 +01:00
Files
openshift-docs/modules/nvidia-gpu-admin-dashboard-using.adoc
2023-01-17 20:34:15 +00:00

60 lines
2.4 KiB
Plaintext

// Module included in the following assemblies:
//
// * monitoring/nvidia-gpu-admin-dashboard.adoc
:_content-type: PROCEDURE
[id="nvidia-gpu-admin-dashboard-using_{context}"]
= Using the NVIDIA GPU administration dashboard
After deploying the OpenShift Console NVIDIA GPU plugin, log in to the OpenShift Container Platform web console using your login credentials to access the *Administrator* perspective.
To view the changes, you need to refresh the console to see the **GPUs** tab under **Compute**.
== Viewing the cluster GPU overview
You can view the status of your cluster GPUs in the Overview page by selecting
Overview in the Home section.
The Overview page provides information about the cluster GPUs, including:
* Details about the GPU providers
* Status of the GPUs
* Cluster utilization of the GPUs
== Viewing the GPUs dashboard
You can view the NVIDIA GPU administration dashboard by selecting GPUs
in the Compute section of the OpenShift Console.
Charts on the GPUs dashboard include:
* *GPU utilization*: Shows the ratio of time the graphics engine is active and is based on the ``DCGM_FI_PROF_GR_ENGINE_ACTIVE`` metric.
* *Memory utilization*: Shows the memory being used by the GPU and is based on the ``DCGM_FI_DEV_MEM_COPY_UTIL`` metric.
* *Encoder utilization*: Shows the video encoder rate of utilization and is based on the ``DCGM_FI_DEV_ENC_UTIL`` metric.
* *Decoder utilization*: *Encoder utilization*: Shows the video decoder rate of utilization and is based on the ``DCGM_FI_DEV_DEC_UTIL`` metric.
* *Power consumption*: Shows the average power usage of the GPU in Watts and is based on the ``DCGM_FI_DEV_POWER_USAGE`` metric.
* *GPU temperature*: Shows the current GPU temperature and is based on the ``DCGM_FI_DEV_GPU_TEMP`` metric. The maximum is set to ``110``, which is an empirical number, as the actual number is not exposed via a metric.
* *GPU clock speed*: Shows the average clock speed utilized by the GPU and is based on the ``DCGM_FI_DEV_SM_CLOCK`` metric.
* *Memory clock speed*: Shows the average clock speed utilized by memory and is based on the ``DCGM_FI_DEV_MEM_CLOCK`` metric.
== Viewing the GPU Metrics
You can view the metrics for the GPUs by selecting the metric at the bottom of
each GPU to view the Metrics page.
On the Metrics page, you can:
* Specify a refresh rate for the metrics
* Add, run, disable, and delete queries
* Insert Metrics
* Reset the zoom view