README: fix model name and improve CUDA section

- Corrected the model name under the Benchmark section; previous name was not available in Ollama's registry. - Added instructions to switch between CPU-only mode and using all available GPUs via CUDA_VISIBLE_DEVICES. Signed-off-by: Mario Antonio Bortoli Filho <mario@bortoli.dev>
2026-02-05 06:46:39 +01:00 · 2025-07-13 16:33:01 -03:00
parent 1d2e1a1e01
commit b5826c96e9
4 changed files with 17 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -224,7 +224,7 @@ $ cat /usr/share/ramalama/shortnames.conf
 	<br>

 	```
-	$ ramalama bench granite-moe3
+	$ ramalama bench granite3-moe
 	```
 </details>

@@ -831,7 +831,7 @@ $ cat /usr/share/ramalama/shortnames.conf

 	Perplexity measures how well the model can predict the next token with lower values being better
 	```
-	$ ramalama perplexity granite-moe3
+	$ ramalama perplexity granite3-moe
 	```
 </details>

--- a/docs/ramalama-bench.1.md
+++ b/docs/ramalama-bench.1.md
@@ -148,7 +148,7 @@ Benchmark specified AI Model.
 ## EXAMPLES

 ```
-ramalama bench granite-moe3
+ramalama bench granite3-moe
 ```

 ## SEE ALSO
--- a/docs/ramalama-cuda.7.md
+++ b/docs/ramalama-cuda.7.md
@@ -137,6 +137,19 @@ ramalama run granite

 This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.

+If the `CUDA_VISIBLE_DEVICES` environment variable is set to an empty string, RamaLama will default to using the CPU.
+
+```bash
+export CUDA_VISIBLE_DEVICES=""  # Defaults to CPU
+ramalama run granite
+```
+
+To revert to using all available GPUs, unset the environment variable:
+
+```bash
+unset CUDA_VISIBLE_DEVICES
+```
+
 ## Troubleshooting

 ### CUDA Updates
--- a/docs/ramalama-perplexity.1.md
+++ b/docs/ramalama-perplexity.1.md
@@ -156,7 +156,7 @@ Calculate the perplexity of an AI Model. Perplexity measures how well the model
 ## EXAMPLES

 ```
-ramalama perplexity granite-moe3
+ramalama perplexity granite3-moe
 ```

 ## SEE ALSO