1
0
mirror of https://github.com/containers/ramalama.git synced 2026-02-05 06:46:39 +01:00

README: fix model name and improve CUDA section

- Corrected the model name under the Benchmark section; previous name was not available in Ollama's registry.

- Added instructions to switch between CPU-only mode and using all available GPUs via CUDA_VISIBLE_DEVICES.

Signed-off-by: Mario Antonio Bortoli Filho <mario@bortoli.dev>
This commit is contained in:
Mario Antonio Bortoli Filho
2025-07-13 16:33:01 -03:00
committed by Mario Antonio Bortoli Filho
parent 1d2e1a1e01
commit b5826c96e9
4 changed files with 17 additions and 4 deletions

View File

@@ -224,7 +224,7 @@ $ cat /usr/share/ramalama/shortnames.conf
<br>
```
$ ramalama bench granite-moe3
$ ramalama bench granite3-moe
```
</details>
@@ -831,7 +831,7 @@ $ cat /usr/share/ramalama/shortnames.conf
Perplexity measures how well the model can predict the next token with lower values being better
```
$ ramalama perplexity granite-moe3
$ ramalama perplexity granite3-moe
```
</details>

View File

@@ -148,7 +148,7 @@ Benchmark specified AI Model.
## EXAMPLES
```
ramalama bench granite-moe3
ramalama bench granite3-moe
```
## SEE ALSO

View File

@@ -137,6 +137,19 @@ ramalama run granite
This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.
If the `CUDA_VISIBLE_DEVICES` environment variable is set to an empty string, RamaLama will default to using the CPU.
```bash
export CUDA_VISIBLE_DEVICES="" # Defaults to CPU
ramalama run granite
```
To revert to using all available GPUs, unset the environment variable:
```bash
unset CUDA_VISIBLE_DEVICES
```
## Troubleshooting
### CUDA Updates

View File

@@ -156,7 +156,7 @@ Calculate the perplexity of an AI Model. Perplexity measures how well the model
## EXAMPLES
```
ramalama perplexity granite-moe3
ramalama perplexity granite3-moe
```
## SEE ALSO