docs/ramalama-cuda.7.md

% ramalama 7

# Setting Up RamaLama with CUDA Support on Linux systems

This guide walks through the steps required to set up RamaLama with CUDA support.

## Install the NVIDIA Container Toolkit

Follow the installation instructions provided in the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

### Installation using dnf/yum (For RPM based distros like Fedora)

* Install the NVIDIA Container Toolkit packages

   ```bash
   sudo dnf install -y nvidia-container-toolkit
   ```

  > **Note:** The NVIDIA Container Toolkit is required on the host for running CUDA in containers.
  > **Note:** If the above installation is not working for you and you are running Fedora, try removing it and using the [COPR](https://copr.fedorainfracloud.org/coprs/g/ai-ml/nvidia-container-toolkit/).

### Installation using APT (For Debian based distros like Ubuntu)

* Configure the Production Repository

   ```bash
   curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
   sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

   curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
   sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
   sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
   ```

* Update the packages list from the repository

   ```bash
   sudo apt-get update
   ```

* Install the NVIDIA Container Toolkit packages

   ```bash
   sudo apt-get install -y nvidia-container-toolkit
   ```

  > **Note:** The NVIDIA Container Toolkit is required for WSL to have CUDA resources while running a container.

## Setting Up CUDA Support

   For additional information see:  [Support for Container Device Interface](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)

# Generate the CDI specification file

   ```bash
   sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
   ```

# Check the names of the generated devices

   Open and edit the NVIDIA container runtime configuration:

   ```bash
   nvidia-ctk cdi list
   INFO[0000] Found 1 CDI devices
   nvidia.com/gpu=all
   ```

   > **Note:** Generate a new CDI specification after any configuration change most notably when the driver is upgraded!

## Testing the Setup

**Based on this Documentation:**  [Running a Sample Workload](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html)

---

# **Test the Installation**

   Run the following command to verify setup:

   ```bash
   podman run --rm --device=nvidia.com/gpu=all fedora nvidia-smi
   ```

# **Expected Output**

   Verify everything is configured correctly, with output similar to this:

   ```text
   Thu Dec  5 19:58:40 2024
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 565.72                 Driver Version: 566.14         CUDA Version: 12.7     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 3080        On  |   00000000:09:00.0  On |                  N/A |
   | 34%   24C    P5             31W /  380W |     867MiB /  10240MiB |      7%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
   |    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
   +-----------------------------------------------------------------------------------------+
   ```

   > **NOTE:** On systems that have SELinux enabled, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command successfully from a container.

   To check the status of the boolean, run the following:

   ```bash
   getsebool container_use_devices
   ```

   If the result of the command shows that the boolean is `off`, run the following to turn the boolean on:

   ```bash
   sudo setsebool -P container_use_devices 1
   ```

### CUDA_VISIBLE_DEVICES

RamaLama respects the `CUDA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by nvidia-smi.

You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands:

```bash
export CUDA_VISIBLE_DEVICES="0,1"  # Use GPUs 0 and 1
ramalama run granite
```

This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.

## Troubleshooting

### CUDA Updates

On some CUDA software updates, RamaLama stops working complaining about missing shared NVIDIA libraries for example:

```bash
ramalama run granite
Error: crun: cannot stat `/lib64/libEGL_nvidia.so.565.77`: No such file or directory: OCI runtime attempted to invoke a command that was not found
```

Because the CUDA version is updated, the CDI specification file needs to be recreated.

   ```bash
   sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
   ```

## SEE ALSO

**[ramalama(1)](ramalama.1.md)**, **[podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md)**

## HISTORY

Jan 2025, Originally compiled by Dan Walsh <dwalsh@redhat.com>
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`% ramalama 7`

			`# Setting Up RamaLama with CUDA Support on Linux systems`

			`This guide walks through the steps required to set up RamaLama with CUDA support.`

			`## Install the NVIDIA Container Toolkit`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`Follow the installation instructions provided in the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).`

			`### Installation using dnf/yum (For RPM based distros like Fedora)`

			`* Install the NVIDIA Container Toolkit packages`

			```bash
			`sudo dnf install -y nvidia-container-toolkit`
			```
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
docs: use uppercase NVIDIA Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-31 08:13:01 -04:00			`> Note: The NVIDIA Container Toolkit is required on the host for running CUDA in containers.`
docs: add note about COPR repo for Fedora users Signed-off-by: Nathan Weinberg <nweinber@redhat.com> 2025-04-04 16:11:08 -04:00			`> Note: If the above installation is not working for you and you are running Fedora, try removing it and using the [COPR](https://copr.fedorainfracloud.org/coprs/g/ai-ml/nvidia-container-toolkit/).`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`### Installation using APT (For Debian based distros like Ubuntu)`

			`* Configure the Production Repository`

			```bash
Update ramalama-cuda.7.md Looks like a typo? Signed-off-by: Florian Schüller <florian.schueller@redhat.com> 2025-05-05 11:24:03 +02:00			`curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \| \`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Update ramalama-cuda.7.md Looks like a typo? Signed-off-by: Florian Schüller <florian.schueller@redhat.com> 2025-05-05 11:24:03 +02:00			`curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \| \`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \| \`
			`sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list`
			```

			`* Update the packages list from the repository`

			```bash
			`sudo apt-get update`
			```
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`* Install the NVIDIA Container Toolkit packages`

			```bash
			`sudo apt-get install -y nvidia-container-toolkit`
			```
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
docs: use uppercase NVIDIA Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-31 08:13:01 -04:00			`> Note: The NVIDIA Container Toolkit is required for WSL to have CUDA resources while running a container.`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`## Setting Up CUDA Support`

			`For additional information see: [Support for Container Device Interface](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)`

			`# Generate the CDI specification file`

			```bash
			`sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml`
			```

docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00			`# Check the names of the generated devices`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`Open and edit the NVIDIA container runtime configuration:`

			```bash
			`nvidia-ctk cdi list`
			`INFO[0000] Found 1 CDI devices`
			`nvidia.com/gpu=all`
			```

			`> Note: Generate a new CDI specification after any configuration change most notably when the driver is upgraded!`

			`## Testing the Setup`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`Based on this Documentation: [Running a Sample Workload](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html)`

			`---`

docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00			`# Test the Installation`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`Run the following command to verify setup:`

			```bash
			`podman run --rm --device=nvidia.com/gpu=all fedora nvidia-smi`
			```

			`# Expected Output`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Add perplexity subcommand to RamaLama CLI - Added a new subcommand `perplexity` to the RamaLama CLI in `cli.py`. - Implemented the `perplexity` method in the `Model` class in `model.py`. - Updated the documentation in `ramalama.1.md` to include the new `perplexity` command. Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2025-01-28 10:50:17 +00:00			`Verify everything is configured correctly, with output similar to this:`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			```text
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00			`Thu Dec 5 19:58:40 2024`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`+-----------------------------------------------------------------------------------------+`
			`\| NVIDIA-SMI 565.72 Driver Version: 566.14 CUDA Version: 12.7 \|`
			`\|-----------------------------------------+------------------------+----------------------+`
			`\| GPU Name Persistence-M \| Bus-Id Disp.A \| Volatile Uncorr. ECC \|`
			`\| Fan Temp Perf Pwr:Usage/Cap \| Memory-Usage \| GPU-Util Compute M. \|`
			`\| \| \| MIG M. \|`
			`\|=========================================+========================+======================\|`
			`\| 0 NVIDIA GeForce RTX 3080 On \| 00000000:09:00.0 On \| N/A \|`
			`\| 34% 24C P5 31W / 380W \| 867MiB / 10240MiB \| 7% Default \|`
			`\| \| \| N/A \|`
			`+-----------------------------------------+------------------------+----------------------+`

			`+-----------------------------------------------------------------------------------------+`
			`\| Processes: \|`
			`\| GPU GI CI PID Type Process name GPU Memory \|`
			`\| ID ID Usage \|`
			`\|=========================================================================================\|`
			`\| 0 N/A N/A 35 G /Xwayland N/A \|`
			`\| 0 N/A N/A 35 G /Xwayland N/A \|`
			`+-----------------------------------------------------------------------------------------+`
			```
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
docs: add note about container_use_devices usage On SELinux systems, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command from within a container. Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:37:43 -04:00			> NOTE: On systems that have SELinux enabled, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command successfully from a container.

			`To check the status of the boolean, run the following:`

			```bash
Fix get/set selbool references. Documentation: * As installed in current versions of Fedora, these commands are not 'boolean', but 'bool'. * The set command will give an error message when the value of the boolean is not set. Signed-off-by: Jason Guiditta <jguiditt@redhat.com> 2025-04-04 10:50:58 -04:00			`getsebool container_use_devices`
docs: add note about container_use_devices usage On SELinux systems, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command from within a container. Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:37:43 -04:00			```

			If the result of the command shows that the boolean is `off`, run the following to turn the boolean on:

			```bash
Fix get/set selbool references. Documentation: * As installed in current versions of Fedora, these commands are not 'boolean', but 'bool'. * The set command will give an error message when the value of the boolean is not set. Signed-off-by: Jason Guiditta <jguiditt@redhat.com> 2025-04-04 10:50:58 -04:00			`sudo setsebool -P container_use_devices 1`
docs: add note about container_use_devices usage On SELinux systems, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command from within a container. Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:37:43 -04:00			```

Allow user-defined CUDA_VISIBLE_DEVICES environment variable The check_nvidia function was previously overriding any user-defined CUDA_VISIBLE_DEVICES environment variable with a default value of "0". This change adds a check to only set CUDA_VISIBLE_DEVICES=0 when it's not already present in the environment. Signed-off-by: Marius Cornea <mcornea@redhat.com> 2025-05-07 00:14:17 +03:00			`### CUDA_VISIBLE_DEVICES`

Use all GPUs in CUDA_VISIBLE_DEVICES as default Currently the CUDA_VISIBLE_DEVICES environment variable defaults to '0' when it's not overidden by the user. This commit updates it to include all available GPUs detected by nvidia-smi, allowing the application to utilize multiple GPUs by default. Signed-off-by: Marius Cornea <mcornea@redhat.com> 2025-05-09 09:47:04 +03:00			RamaLama respects the `CUDA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by nvidia-smi.
Allow user-defined CUDA_VISIBLE_DEVICES environment variable The check_nvidia function was previously overriding any user-defined CUDA_VISIBLE_DEVICES environment variable with a default value of "0". This change adds a check to only set CUDA_VISIBLE_DEVICES=0 when it's not already present in the environment. Signed-off-by: Marius Cornea <mcornea@redhat.com> 2025-05-07 00:14:17 +03:00
			`You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands:`

			```bash
			`export CUDA_VISIBLE_DEVICES="0,1" # Use GPUs 0 and 1`
			`ramalama run granite`
			```

			`This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.`

Add note about updating nvidia.yaml file Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-03-10 16:33:12 -04:00			`## Troubleshooting`

			`### CUDA Updates`

			`On some CUDA software updates, RamaLama stops working complaining about missing shared NVIDIA libraries for example:`

docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00			```bash
Add note about updating nvidia.yaml file Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-03-10 16:33:12 -04:00			`ramalama run granite`
			Error: crun: cannot stat `/lib64/libEGL_nvidia.so.565.77`: No such file or directory: OCI runtime attempted to invoke a command that was not found
			```

			`Because the CUDA version is updated, the CDI specification file needs to be recreated.`

			```bash
			`sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml`
			```
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`## SEE ALSO`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Update podman markdown links Signed-off-by: Jelle van der Waa <jvanderwaa@redhat.com> 2025-05-09 14:14:48 +02:00			`[ramalama(1)](ramalama.1.md), [podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md)`
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00
			`## HISTORY`
docs: linting on ramalama-cuda Applied some fixes based on the Markdown linter in VSCode See: https://github.com/DavidAnson/vscode-markdownlint Signed-off-by: Micah Abbott <miabbott@redhat.com> 2025-03-28 17:34:48 -04:00
Add man page for cuda support Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> 2025-01-23 13:55:40 -05:00			`Jan 2025, Originally compiled by Dan Walsh <dwalsh@redhat.com>`