mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
60 lines
2.4 KiB
Plaintext
60 lines
2.4 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * monitoring/nvidia-gpu-admin-dashboard.adoc
|
|
|
|
:_content-type: PROCEDURE
|
|
[id="nvidia-gpu-admin-dashboard-using_{context}"]
|
|
= Using the NVIDIA GPU administration dashboard
|
|
|
|
After deploying the OpenShift Console NVIDIA GPU plugin, log in to the OpenShift Container Platform web console using your login credentials to access the *Administrator* perspective.
|
|
|
|
To view the changes, you need to refresh the console to see the **GPUs** tab under **Compute**.
|
|
|
|
|
|
== Viewing the cluster GPU overview
|
|
|
|
You can view the status of your cluster GPUs in the Overview page by selecting
|
|
Overview in the Home section.
|
|
|
|
The Overview page provides information about the cluster GPUs, including:
|
|
|
|
* Details about the GPU providers
|
|
* Status of the GPUs
|
|
* Cluster utilization of the GPUs
|
|
|
|
== Viewing the GPUs dashboard
|
|
|
|
You can view the NVIDIA GPU administration dashboard by selecting GPUs
|
|
in the Compute section of the OpenShift Console.
|
|
|
|
|
|
Charts on the GPUs dashboard include:
|
|
|
|
* *GPU utilization*: Shows the ratio of time the graphics engine is active and is based on the ``DCGM_FI_PROF_GR_ENGINE_ACTIVE`` metric.
|
|
|
|
* *Memory utilization*: Shows the memory being used by the GPU and is based on the ``DCGM_FI_DEV_MEM_COPY_UTIL`` metric.
|
|
|
|
* *Encoder utilization*: Shows the video encoder rate of utilization and is based on the ``DCGM_FI_DEV_ENC_UTIL`` metric.
|
|
|
|
* *Decoder utilization*: *Encoder utilization*: Shows the video decoder rate of utilization and is based on the ``DCGM_FI_DEV_DEC_UTIL`` metric.
|
|
|
|
* *Power consumption*: Shows the average power usage of the GPU in Watts and is based on the ``DCGM_FI_DEV_POWER_USAGE`` metric.
|
|
|
|
* *GPU temperature*: Shows the current GPU temperature and is based on the ``DCGM_FI_DEV_GPU_TEMP`` metric. The maximum is set to ``110``, which is an empirical number, as the actual number is not exposed via a metric.
|
|
|
|
* *GPU clock speed*: Shows the average clock speed utilized by the GPU and is based on the ``DCGM_FI_DEV_SM_CLOCK`` metric.
|
|
|
|
* *Memory clock speed*: Shows the average clock speed utilized by memory and is based on the ``DCGM_FI_DEV_MEM_CLOCK`` metric.
|
|
|
|
== Viewing the GPU Metrics
|
|
|
|
You can view the metrics for the GPUs by selecting the metric at the bottom of
|
|
each GPU to view the Metrics page.
|
|
|
|
On the Metrics page, you can:
|
|
|
|
* Specify a refresh rate for the metrics
|
|
* Add, run, disable, and delete queries
|
|
* Insert Metrics
|
|
* Reset the zoom view
|