1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

TELCODOCS-1786: Updated AMD link

This commit is contained in:
StephenJamesSmith
2025-03-13 15:35:21 -04:00
committed by openshift-cherrypick-robot
parent 152a5458b2
commit ab6b99c5e9
7 changed files with 173 additions and 2 deletions

View File

@@ -3594,6 +3594,8 @@ Topics:
File: about-hardware-accelerators
- Name: NVIDIA GPU architecture
File: nvidia-gpu-architecture
- Name: AMD GPU Operator
File: amd-gpu-operator
---
Name: Backup and restore
Dir: backup_and_restore

View File

@@ -6,7 +6,7 @@ include::_attributes/common-attributes.adoc[]
toc::[]
Specialized hardware accelerators play a key role in the emerging generative artificial intelligence and machine learning (AI/ML) industry. Specifically, hardware accelerators are essential to the training and serving of large language and other foundational models that power this new technology. Data scientists, data engineers, ML engineers, and developers can take advantage of the specialized hardware acceleration for data-intensive transformations and model development and serving. Much of that ecosystem is open source, with a number of contributing partners and open source foundations.
Specialized hardware accelerators play a key role in the emerging generative artificial intelligence and machine learning (AI/ML) industry. Specifically, hardware accelerators are essential to the training and serving of large language and other foundational models that power this new technology. Data scientists, data engineers, ML engineers, and developers can take advantage of the specialized hardware acceleration for data-intensive transformations and model development and serving. Much of that ecosystem is open source, with several contributing partners and open source foundations.
Red{nbsp}Hat {product-title} provides support for cards and peripheral hardware that add processing units that comprise hardware accelerators:
@@ -39,7 +39,7 @@ include::modules/hardware-accelerators.adoc[leveloffset=+1]
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2-latest/html/introduction_to_red_hat_openshift_ai/index[Introduction to Red Hat OpenShift AI]
* link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[
NVIDIA GPU Operator on Red Hat OpenShift Container Platform]
NVIDIA GPU Operator on Red Hat {product-title}]
* link:https://www.amd.com/en/products/accelerators/instinct.html[AMD Instinct Accelerators]

View File

@@ -0,0 +1,12 @@
// Module included in the following assemblies:
//
// * hardware_accelerators/amd-gpu-operator.adoc
:_content-type: CONCEPT
[id="amd-about-amd-gpu-operator_{context}"]
= About the AMD GPU Operator
The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwdith retrictions.

View File

@@ -0,0 +1,30 @@
:_mod-docs-content-type: ASSEMBLY
[id="amd-gpu-operator"]
= AMD GPU Operator
include::_attributes/common-attributes.adoc[]
:context: amd-gpu-operator
toc::[]
AMD Instinct GPU accelerators combined with the AMD GPU Operator within your {product-title} cluster lets you seamlessly harness computing capabilities for machine learning, Generative AI, and GPU-accelerated applications.
This documentation provides the information you need to enable, configure, and test the AMD GPU Operator. For more information, see link:https://www.amd.com/en/products/accelerators/instinct.html[AMD Instinct™ Accelerators].
:FeatureName: AMD GPU Operator
include::modules/amd-about-amd-gpu-operator.adoc[leveloffset=+1]
include::modules/amd-installing-gpu-operator.adoc[leveloffset=+1]
.Next steps
. Install the xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#installing-the-node-feature-discovery-operator_node-feature-discovery-operator[Node Feature Discovery Operator].
. Install the xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-install_kernel-module-management-operator[Kernel Module Management Operator].
. Install and configure the link:https://instinct.docs.amd.com/projects/gpu-operator/en/main/installation/openshift-olm.html#install-amd-gpu-operator[AMD GPU Operator].
include::modules/amd-testing-the-amd-gpu-operator.adoc[leveloffset=+1]

View File

@@ -0,0 +1,12 @@
// Module included in the following assemblies:
//
// * hardware_accelerators/amd-gpu-operator.adoc
:_mod-docs-content-type: CONCEPT
[id="amd-about-amd-gpu-operator_{context}"]
= About the AMD GPU Operator
The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwidth restrictions.

View File

@@ -0,0 +1,14 @@
// Module included in the following assemblies:
//
// * hardware_accelerators/amd-gpu-operator.adoc
:_mod-docs-content-type: REFERENCE
[id="amd-installing-gpu-operator_{context}"]
= Installing the AMD GPU Operator
As a cluster administrator, you can install the AMD GPU Operator by using the OpenShift CLI and the web console. This is a multi-step procedure that requires the installation of the Node Feature Discovery Operator, the Kernel Module Management Operator, and then the AMD GPU Operator. Use the following steps in succession to install the AMD community release of the Operator.

View File

@@ -0,0 +1,101 @@
// Module included in the following assemblies:
//
// * hardware_accelerators/amd-gpu-operator.adoc
:_mod-docs-content-type: PROCEDURE
[id="amd-testing-the-amd-gpu-operator_{context}"]
= Testing the AMD GPU Operator
Use the following procedure to test the ROCmInfo installation and view the logs for the AMD MI210 GPU.
.Procedure
. Create a YAML file that tests ROCmInfo:
+
[source,terminal]
----
$ cat << EOF > rocminfo.yaml
apiVersion: v1
kind: Pod
metadata:
name: rocminfo
spec:
containers:
- image: docker.io/rocm/pytorch:latest
name: rocminfo
command: ["/bin/sh","-c"]
args: ["rocminfo"]
resources:
limits:
amd.com/gpu: 1
requests:
amd.com/gpu: 1
restartPolicy: Never
EOF
----
. Create the `rocminfo` pod:
+
[source,terminal]
----
$ oc create -f rocminfo.yaml
----
+
.Example output
[source,terminal]
----
apiVersion: v1
pod/rocminfo created
----
. Check the `rocmnfo` log with one MI210 GPU:
+
[source,terminal]
----
$ oc logs rocminfo | grep -A5 "Agent"
----
+
.Example output
[source,terminal]
----
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Vendor Name: CPU
--
Agent 2
*******
Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Vendor Name: CPU
--
Agent 3
*******
Name: gfx90a
Uuid: GPU-024b776f768a638b
Marketing Name: AMD Instinct MI210
Vendor Name: AMD
----
. Delete the pod:
+
[source,terminal]
----
$ oc delete -f rocminfo.yaml
----
+
.Example output
[source,terminal]
----
pod "rocminfo" deleted
----