1
0
mirror of https://github.com/containers/ramalama.git synced 2026-02-05 15:47:26 +01:00

Remove generated doc files.

Clean the docsite directory as part of "make clean". Remove generated
doc files from the repository; their presence creates a risk of
publishing incorrect documentation.

Signed-off-by: John Wiele <jwiele@redhat.com>
This commit is contained in:
John Wiele
2026-01-30 08:31:30 -05:00
parent eea5c4f35f
commit 6ffa25ff89
28 changed files with 2 additions and 3780 deletions

View File

@@ -257,5 +257,6 @@ clean:
@find . -name \*# -delete
@find . -name \*.rej -delete
@find . -name \*.orig -delete
rm -rf $$(<.gitignore)
make -C docs clean
make -C docsite clean clean-generated
rm -rf $$(<.gitignore)

View File

@@ -1,172 +0,0 @@
---
title: bench
description: benchmark specified AI Model
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-bench.1.md
---
# bench
## Synopsis
**ramalama bench** [*options*] *model* [arg ...]
## MODEL TRANSPORTS
| Transports | Prefix | Web Site |
| ------------- | ------ | --------------------------------------------------- |
| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`|
| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)|
| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)|
| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)|
| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) |
| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)|
|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)|
RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS
environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.
Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model.
URL support means if a model is on a web site or even on your local system, you can run it directly.
## Options
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--device**
Add a host device to the container. Optional permissions parameter can
be used to specify device permissions by combining r for read, w for
write, and m for mknod(2).
Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm
The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information.
Pass '--device=none' explicitly add no device to the container, eg for
running a CPU-only performance comparison.
#### **--env**=
Set environment variables inside of the container.
This option allows arbitrary environment variables that are available for the
process to be launched inside of the container. If an environment variable is
specified without a value, the container engine checks the host environment
for a value and set the variable only if it is set on the host.
#### **--help**, **-h**
show this help message and exit
#### **--image**=IMAGE
OCI container image to run with specified AI model. RamaLama defaults to using
images based on the accelerator it discovers. For example:
`quay.io/ramalama/ramalama`. See the table below for all default images.
The default image tag is based on the minor version of the RamaLama package.
Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
The default can be overridden in the ramalama.conf file or via the
RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
RamaLama to use the `quay.io/ramalama/aiimage:1.2` image.
Accelerated images:
| Accelerator | Image |
| ------------------------| -------------------------- |
| CPU, Apple | quay.io/ramalama/ramalama |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |
#### **--keep-groups**
pass --group-add keep-groups to podman (default: False)
If GPU device on host system is accessible to user via group access, this option leaks the groups into the container.
#### **--name**, **-n**
name of the container to run the Model in
#### **--network**=*none*
set the network mode for the container
#### **--ngl**
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)
#### **--oci-runtime**
Override the default OCI runtime used to launch the container. Container
engines like Podman and Docker, have their own default oci runtime that they
use. Using this option RamaLama will override these defaults.
On Nvidia based GPU systems, RamaLama defaults to using the
`nvidia-container-runtime`. Use this option to override this selection.
#### **--privileged**
By default, RamaLama containers are unprivileged (=false) and cannot, for
example, modify parts of the operating system. This is because by de
fault a container is only allowed limited access to devices. A "privi
leged" container is given the same access to devices as the user launch
ing the container, with the exception of virtual consoles (/dev/tty\d+)
when running in systemd mode (--systemd=always).
A privileged container turns off the security features that isolate the
container from the host. Dropped Capabilities, limited devices, read-
only mount points, Apparmor/SELinux separation, and Seccomp filters are
all disabled. Due to the disabled security features, the privileged
field should almost never be set as containers can easily break out of
confinement.
Containers running in a user namespace (e.g., rootless containers) can
not have more privileges than the user that launched them.
#### **--pull**=*policy*
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
#### **--seed**=
Specify seed rather than using random seed model interaction
#### **--selinux**=*true*
Enable SELinux container separation
#### **--temp**="0.8"
Temperature of the response from the AI Model
llama.cpp explains this as:
The lower the number is, the more deterministic the response.
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
#### **--thinking**=*true*
Enable or disable thinking mode in reasoning models
#### **--threads**, **-t**
Maximum number of cpu threads to use.
The default is to use half the cores available on this system for the number of threads.
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
## Description
Benchmark specified AI Model.
## Examples
```text
ramalama bench granite3-moe
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Jan 2025, Originally compiled by Eric Curtin &lt;ecurtin&#64;redhat.com&gt;*

View File

@@ -1,83 +0,0 @@
---
title: chat
description: OpenAI chat with the specified REST API URL
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-chat.1.md
---
# chat
## Synopsis
**ramalama chat** [*options*] [arg...]
positional arguments:
ARGS overrides the default prompt, and the output is
returned without entering the chatbot
## Description
Chat with an OpenAI Rest API
## Options
#### **--api-key**
OpenAI-compatible API key.
Can also be set via the RAMALAMA_API_KEY environment variable.
#### **--color**
Indicate whether or not to use color in the chat.
Possible values are "never", "always" and "auto". (default: auto)
#### **--help**, **-h**
Show this help message and exit
#### **--list**
List the available models at an endpoint
#### **--mcp**=SERVER_URL
MCP (Model Context Protocol) servers to use for enhanced tool calling capabilities.
Can be specified multiple times to connect to multiple MCP servers.
Each server provides tools that can be automatically invoked during chat conversations.
#### **--model**=MODEL
Model for inferencing (may not be required for endpoints that only serve one model)
#### **--prefix**
Prefix for the user prompt (default: 🦭 > )
#### **--rag**=path
A file or directory of files to be loaded and provided as local context in the chat history.
#### **--summarize-after**=*N*
Automatically summarize conversation history after N messages to prevent context growth.
When enabled, ramalama will periodically condense older messages into a summary,
keeping only recent messages and the summary. This prevents the context from growing
indefinitely during long chat sessions. Set to 0 to disable (default: 4).
#### **--url**=URL
The host to send requests to (default: http://127.0.0.1:8080)
## Examples
Communicate with the default local OpenAI REST API. (http://127.0.0.1:8080)
With Podman containers.
```bash
$ ramalama chat
🦭 >
Communicate with an alternative OpenAI REST API URL. With Docker containers.
$ ramalama chat --url http://localhost:1234
🐋 >
Send multiple lines at once
$ ramalama chat
🦭 > Hi \
🦭 > tell me a funny story \
🦭 > please
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Jun 2025, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,81 +0,0 @@
---
title: containers
description: list all RamaLama containers
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-containers.1.md
---
# containers
## Synopsis
**ramalama containers** [*options*]
**ramalama ps** [*options*]
## Description
List all containers running AI Models
Command conflicts with the --nocontainer option.
## Options
#### **--format**=*format*
pretty-print containers to JSON or using a Go template
Valid placeholders for the Go template are listed below:
| **Placeholder** | **Description** |
|--------------------|----------------------------------------------|
| .Command | Quoted command used |
| .Created ... | Creation time for container, Y-M-D H:M:S |
| .CreatedAt | Creation time for container (same as above) |
| .CreatedHuman | Creation time, relative |
| .ExitCode | Container exit code |
| .Exited | "true" if container has exited |
| .ExitedAt | Time (epoch seconds) that container exited |
| .ExposedPorts ... | Map of exposed ports on this container |
| .ID | Container ID |
| .Image | Image Name/ID |
| .ImageID | Image ID |
| .Label *string* | Specified label of the container |
| .Labels ... | All the labels assigned to the container |
| .Names | Name of container |
| .Networks | Show all networks connected to the container |
| .Pid | Process ID on host system |
| .Ports | Forwarded and exposed ports |
| .RunningFor | Time elapsed since container was started |
| .Size | Size of container |
| .StartedAt | Time (epoch seconds) the container started |
| .State | Human-friendly description of ctr state |
| .Status | Status of container |
#### **--help**, **-h**
Print usage message
#### **--no-trunc**
Display the extended information
#### **--noheading**, **-n**
Do not print heading
## EXAMPLE
```bash
$ ramalama containers
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 5 hours ago Up 5 hours 0.0.0.0:8080->8080/tcp ramalama_s3Oh6oDfOP
85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 4 minutes ago Exited (0) 4 minutes ago granite-server
```
```bash
$ ramalama ps --noheading --format "{{ .Names }}"
ramalama_s3Oh6oDfOP
granite-server
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,86 +0,0 @@
---
title: convert
description: convert AI Models from local storage to OCI Image
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-convert.1.md
---
# convert
## Synopsis
**ramalama convert** [*options*] *model* [*target*]
## Description
Convert specified AI Model to an OCI Formatted AI Model
The model can be from RamaLama model storage in Huggingface, Ollama, or a local model stored on disk. Converting from an OCI model is not supported.
:::note
The convert command must be run with containers. Use of the --nocontainer option is not allowed.
:::
## Options
#### **--gguf**=*Q2_K* | *Q3_K_S* | *Q3_K_M* | *Q3_K_L* | *Q4_0* | *Q4_K_S* | *Q4_K_M* | *Q5_0* | *Q5_K_S* | *Q5_K_M* | *Q6_K* | *Q8_0*
Convert Safetensor models into a GGUF with the specified quantization format. To learn more about model quantization, read llama.cpp documentation:
https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md
#### **--help**, **-h**
Print usage message
#### **--image**=IMAGE
Image to use for model quantization when converting to GGUF format (when the `--gguf` option has been specified). The image must have the
`llama-quantize` executable available on the `PATH`. Defaults to the appropriate `ramalama` image based on available accelerators. If no
accelerators are available, the current `quay.io/ramalama/ramalama` image will be used.
#### **--network**=*none*
sets the configuration for network namespaces when handling RUN instructions
#### **--pull**=*policy*
Pull image policy. The default is **missing**.
#### **--rag-image**=IMAGE
Image to use when converting to GGUF format (when then `--gguf` option has been specified). The image must have the `convert_hf_to_gguf.py` script
executable and available in the `PATH`. The script is available from the `llama.cpp` GitHub repo. Defaults to the current
`quay.io/ramalama/ramalama-rag` image.
#### **--type**="artifact" | *raw* | *car*
Convert the MODEL to the specified OCI Object
| Type | Description |
| -------- | ------------------------------------------------------------- |
| artifact | Store AI Models as artifacts |
| car | Traditional OCI image including base image with the model stored in a /models subdir |
| raw | Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / |
## EXAMPLE
Generate an oci model out of an Ollama model.
```bash
$ ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
Building quay.io/rhatdan/tiny:latest...
STEP 1/2: FROM scratch
STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model
--> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
COMMIT quay.io/rhatdan/tiny:latest
--> 69db4a10191c
Successfully tagged quay.io/rhatdan/tiny:latest
69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
```
Generate and run an oci model with a quantized GGUF converted from Safetensors.
```bash
$ ramalama convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
Converting /Users/kugupta/.local/share/ramalama/models/huggingface/ibm-granite/granite-3.2-2b-instruct to quay.io/kugupta/granite-3.2-q4-k-m:latest...
Building quay.io/kugupta/granite-3.2-q4-k-m:latest...
$ ramalama run oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
```
## See Also
[ramalama(1)](/docs/commands/ramalama/), [ramalama-push(1)](/docs/commands/ramalama/push)
---
*Aug 2024, Originally compiled by Eric Curtin &lt;ecurtin&#64;redhat.com&gt;*

View File

@@ -1,83 +0,0 @@
---
title: daemon
description: run a RamaLama REST server
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-daemon.1.md
---
# daemon
## Synopsis
**ramalama daemon** [*options*] [start|run]
## Description
Inspect the specified AI Model about additional information
like the repository, its metadata and tensor information.
## Options
#### **--help**, **-h**
Print usage message
## COMMANDS
#### **start**
pepares to run a new RamaLama REST server so it will be run either inside a RamaLama container or on the host
#### **run**
start a new RamaLama REST server
## Examples
Inspect the smollm:135m model for basic information
```bash
$ ramalama inspect smollm:135m
smollm:135m
Path: /var/lib/ramalama/models/ollama/smollm:135m
Registry: ollama
Format: GGUF
Version: 3
Endianness: little
Metadata: 39 entries
Tensors: 272 entries
```
Inspect the smollm:135m model for all information in json format
```bash
$ ramalama inspect smollm:135m --all --json
{
"Name": "smollm:135m",
"Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m",
"Registry": "ollama",
"Format": "GGUF",
"Version": 3,
"LittleEndian": true,
"Metadata": {
"general.architecture": "llama",
"general.base_model.0.name": "SmolLM 135M",
"general.base_model.0.organization": "HuggingFaceTB",
"general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M",
...
},
"Tensors": [
{
"dimensions": [
576,
49152
],
"n_dimensions": 2,
"name": "token_embd.weight",
"offset": 0,
"type": 8
},
...
]
}
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Feb 2025, Originally compiled by Michael Engel &lt;mengel&#64;redhat.com&gt;*

View File

@@ -1,389 +0,0 @@
---
title: info
description: display RamaLama configuration information
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-info.1.md
---
# info
## Synopsis
**ramalama info** [*options*]
## Description
Display configuration information in a json format.
## Options
#### **--help**, **-h**
show this help message and exit
## FIELDS
The `Accelerator` field indicates the accelerator type for the machine.
The `Config` field shows the list of paths to RamaLama configuration files used.
The `Engine` field indicates the OCI container engine used to launch the container in which to run the AI Model
The `Image` field indicates the default container image in which to run the AI Model
The `Inference` field lists the currently used inference engine as well as a list of available engine specification and schema files used for model inference.
For example:
- `llama.cpp`
- `vllm`
- `mlx`
The `Selinux` field indicates if SELinux is activated or not.
The `Shortnames` field shows the used list of configuration files specifying AI Model short names as well as the merged list of shortnames.
The `Store` field indicates the directory path where RamaLama stores its persistent data, including downloaded models, configuration files, and cached data. By default, this is located in the user's local share directory.
The `UseContainer` field indicates whether RamaLama will use containers or run the AI Models natively.
The `Version` field shows the RamaLama version.
## EXAMPLE
Info with no container engine
```bash
$ ramalama info
{
"Accelerator": "cuda",
"Engine": {
"Name": ""
},
"Image": "quay.io/ramalama/cuda:0.7",
"Inference": {
"Default": "llama.cpp",
"Engines": {
"llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml",
"mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml",
"vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml"
},
"Schema": {
"1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json"
}
},
"Shortnames": {
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
"gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
"gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
"gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"granite": "ollama://granite3.1-dense",
"granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
"granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
"granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"smollm:135m": "ollama://smollm:135m",
"tiny": "ollama://tinyllama"
},
"Files": [
"/usr/share/ramalama/shortnames.conf",
"/home/dwalsh/.config/ramalama/shortnames.conf",
]
},
"Store": "/usr/share/ramalama",
"UseContainer": true,
"Version": "0.7.5"
}
```
Info with Podman engine
```bash
$ ramalama info
{
"Accelerator": "cuda",
"Engine": {
"Info": {
"host": {
"arch": "amd64",
"buildahVersion": "1.39.4",
"cgroupControllers": [
"cpu",
"io",
"memory",
"pids"
],
"cgroupManager": "systemd",
"cgroupVersion": "v2",
"conmon": {
"package": "conmon-2.1.13-1.fc42.x86_64",
"path": "/usr/bin/conmon",
"version": "conmon version 2.1.13, commit: "
},
"cpuUtilization": {
"idlePercent": 97.36,
"systemPercent": 0.64,
"userPercent": 2
},
"cpus": 32,
"databaseBackend": "sqlite",
"distribution": {
"distribution": "fedora",
"variant": "workstation",
"version": "42"
},
"eventLogger": "journald",
"freeLocks": 2043,
"hostname": "danslaptop",
"idMappings": {
"gidmap": [
{
"container_id": 0,
"host_id": 3267,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
],
"uidmap": [
{
"container_id": 0,
"host_id": 3267,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
]
},
"kernel": "6.14.2-300.fc42.x86_64",
"linkmode": "dynamic",
"logDriver": "journald",
"memFree": 65281908736,
"memTotal": 134690979840,
"networkBackend": "netavark",
"networkBackendInfo": {
"backend": "netavark",
"dns": {
"package": "aardvark-dns-1.14.0-1.fc42.x86_64",
"path": "/usr/libexec/podman/aardvark-dns",
"version": "aardvark-dns 1.14.0"
},
"package": "netavark-1.14.1-1.fc42.x86_64",
"path": "/usr/libexec/podman/netavark",
"version": "netavark 1.14.1"
},
"ociRuntime": {
"name": "crun",
"package": "crun-1.21-1.fc42.x86_64",
"path": "/usr/bin/crun",
"version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/3267/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
},
"os": "linux",
"pasta": {
"executable": "/bin/pasta",
"package": "passt-0^20250415.g2340bbf-1.fc42.x86_64",
"version": ""
},
"remoteSocket": {
"exists": true,
"path": "/run/user/3267/podman/podman.sock"
},
"rootlessNetworkCmd": "pasta",
"security": {
"apparmorEnabled": false,
"capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
"rootless": true,
"seccompEnabled": true,
"seccompProfilePath": "/usr/share/containers/seccomp.json",
"selinuxEnabled": true
},
"serviceIsRemote": false,
"slirp4netns": {
"executable": "/bin/slirp4netns",
"package": "slirp4netns-1.3.1-2.fc42.x86_64",
"version": "slirp4netns version 1.3.1\ncommit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.5.5"
},
"swapFree": 8589930496,
"swapTotal": 8589930496,
"uptime": "116h 35m 40.00s (Approximately 4.83 days)",
"variant": ""
},
"plugins": {
"authorization": null,
"log": [
"k8s-file",
"none",
"passthrough",
"journald"
],
"network": [
"bridge",
"macvlan",
"ipvlan"
],
"volume": [
"local"
]
},
"registries": {
"search": [
"registry.fedoraproject.org",
"registry.access.redhat.com",
"docker.io"
]
},
"store": {
"configFile": "/home/dwalsh/.config/containers/storage.conf",
"containerStore": {
"number": 5,
"paused": 0,
"running": 0,
"stopped": 5
},
"graphDriverName": "overlay",
"graphOptions": {},
"graphRoot": "/usr/share/containers/storage",
"graphRootAllocated": 2046687182848,
"graphRootUsed": 399990419456,
"graphStatus": {
"Backing Filesystem": "btrfs",
"Native Overlay Diff": "true",
"Supports d_type": "true",
"Supports shifting": "false",
"Supports volatile": "true",
"Using metacopy": "false"
},
"imageCopyTmpDir": "/var/tmp",
"imageStore": {
"number": 297
},
"runRoot": "/run/user/3267/containers",
"transientStore": false,
"volumePath": "/usr/share/containers/storage/volumes"
},
"version": {
"APIVersion": "5.4.2",
"BuildOrigin": "Fedora Project",
"Built": 1743552000,
"BuiltTime": "Tue Apr 1 19:00:00 2025",
"GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff",
"GoVersion": "go1.24.1",
"Os": "linux",
"OsArch": "linux/amd64",
"Version": "5.4.2"
}
},
"Name": "podman"
},
"Image": "quay.io/ramalama/cuda:0.7",
"Inference": {
"Default": "llama.cpp",
"Engines": {
"llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml",
"mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml",
"vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml"
},
"Schema": {
"1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json"
}
},
"Shortnames": {
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
"gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
"gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
"gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"granite": "ollama://granite3.1-dense",
"granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
"granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
"granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"smollm:135m": "ollama://smollm:135m",
"tiny": "ollama://tinyllama"
},
"Files": [
"/usr/share/ramalama/shortnames.conf",
"/home/dwalsh/.config/ramalama/shortnames.conf",
]
},
"Store": "/usr/share/ramalama",
"UseContainer": true,
"Version": "0.7.5"
}
```
Using jq to print specific `ramalama info` content.
```bash
$ ramalama info | jq .Shortnames.Names.mixtao
"huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf"
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Oct 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,123 +0,0 @@
---
title: inspect
description: inspect the specified AI Model
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-inspect.1.md
---
# inspect
## Synopsis
**ramalama inspect** [*options*] *model*
## Description
Inspect the specified AI Model about additional information
like the repository, its metadata and tensor information.
## Options
#### **--all**
Print all available information about the AI Model.
By default, only a basic subset is printed.
#### **--get**=*field*
Print the value of a specific metadata field of the AI Model.
This option supports autocomplete with the available metadata
fields of the given model.
The special value `all` will print all available metadata
fields and values.
#### **--help**, **-h**
Print usage message
#### **--json**
Print the AI Model information in json format.
## Examples
Inspect the smollm:135m model for basic information
```bash
$ ramalama inspect smollm:135m
smollm:135m
Path: /var/lib/ramalama/models/ollama/smollm:135m
Registry: ollama
Format: GGUF
Version: 3
Endianness: little
Metadata: 39 entries
Tensors: 272 entries
```
Inspect the smollm:135m model for all information in json format
```bash
$ ramalama inspect smollm:135m --all --json
{
"Name": "smollm:135m",
"Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m",
"Registry": "ollama",
"Format": "GGUF",
"Version": 3,
"LittleEndian": true,
"Metadata": {
"general.architecture": "llama",
"general.base_model.0.name": "SmolLM 135M",
"general.base_model.0.organization": "HuggingFaceTB",
"general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M",
...
},
"Tensors": [
{
"dimensions": [
576,
49152
],
"n_dimensions": 2,
"name": "token_embd.weight",
"offset": 0,
"type": 8
},
...
]
}
```
Use the autocomplete function of `--get` to view a list of fields:
```bash
$ ramalama inspect smollm:135m --get general.
general.architecture general.languages
general.base_model.0.name general.license
general.base_model.0.organization general.name
general.base_model.0.repo_url general.organization
general.base_model.count general.quantization_version
general.basename general.size_label
general.datasets general.tags
general.file_type general.type
general.finetune
```
Print the value of a specific field of the smollm:135m model:
```bash
$ ramalama inspect smollm:135m --get tokenizer.chat_template
{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
```
Print all key-value pairs of the metadata of the smollm:135m model:
```bash
$ ramalama inspect smollm:135m --get all
general.architecture: llama
general.base_model.0.name: SmolLM 135M
general.base_model.0.organization: HuggingFaceTB
general.base_model.0.repo_url: https://huggingface.co/HuggingFaceTB/SmolLM-135M
general.base_model.count: 1
...
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Feb 2025, Originally compiled by Michael Engel &lt;mengel&#64;redhat.com&gt;*

View File

@@ -1,62 +0,0 @@
---
title: list
description: list all downloaded AI Models
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-list.1.md
---
# list
## Synopsis
**ramalama list** [*options*]
**ramalama ls** [*options*]
## Description
List all the AI Models in local storage
## Options
#### **--all**
include partially downloaded Models
#### **--help**, **-h**
show this help message and exit
#### **--json**
print Model list in json format
#### **--noheading**, **-n**
do not print heading
#### **--order**
order used to sort the AI Models. Valid options are 'asc' and 'desc'
#### **--sort**
field used to sort the AI Models. Valid options are 'name', 'size', and 'modified'.
## Examples
List all Models downloaded to users homedir
```bash
$ ramalama list
NAME MODIFIED SIZE
ollama://smollm:135m 16 hours ago 5.5M
huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M
ollama://granite-code:3b (partial) 5 days ago 1.9G
ollama://granite-code:latest 1 day ago 1.9G
ollama://moondream:latest 6 days ago 791M
```
List all Models in json format
```bash
$ ramalama list --json
{"models": [{"name": "oci://quay.io/mmortari/gguf-py-example/v1/example.gguf", "modified": 427330, "size": "4.0K"}, {"name": "huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf", "modified": 427333, "size": "460M"}, {"name": "ollama://smollm:135m", "modified": 420833, "size": "5.5M"}, {"name": "ollama://mistral:latest", "modified": 433998, "size": "3.9G"}, {"name": "ollama://granite-code:latest", "modified": 2180483, "size": "1.9G"}, {"name": "ollama://tinyllama:latest", "modified": 364870, "size": "609M"}, {"name": "ollama://tinyllama:1.1b", "modified": 364866, "size": "609M"}]}
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,76 +0,0 @@
---
title: login
description: login to remote registry
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-login.1.md
---
# login
## Synopsis
**ramalama login** [*options*] [*registry*]
## Description
login to remote model registry
By default, RamaLama uses the Ollama registry transport. You can override this default by configuring the `ramalama.conf` file or setting the `RAMALAMA_TRANSPORTS` environment variable. Ensure a registry transport is set before attempting to log in.
## Options
Options are specific to registry types.
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--help**, **-h**
show this help message and exit
#### **--password**, **-p**=*password*
password for registry
#### **--password-stdin**
take the password from stdin
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
#### **--token**=*token*
token to be passed to Model registry
#### **--username**, **-u**=*username*
username for registry
## Examples
Login to quay.io/username oci registry
```bash
$ export RAMALAMA_TRANSPORT=quay.io/username
$ ramalama login -u username
```
Login to ollama registry
```bash
$ export RAMALAMA_TRANSPORT=ollama
$ ramalama login
```
Login to huggingface registry
```bash
$ export RAMALAMA_TRANSPORT=huggingface
$ ramalama login --token=XYZ
```
Logging in to Hugging Face requires the `hf` tool. For installation and usage instructions, see the documentation of the Hugging Face command line interface: [*https://huggingface.co/docs/huggingface_hub/en/guides/cli*](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
Login to ModelScope registry
```bash
$ export RAMALAMA_TRANSPORT=modelscope
$ ramalama login --token=XYZ
```
Logging in to ModelScope requires the `modelscope` tool. For installation and usage instructions, see the documentation of the ModelScope command line interface: [*https://www.modelscope.cn/docs/Beginner-s-Guide/Environment-Setup*](https://www.modelscope.cn/docs/Beginner-s-Guide/Environment-Setup).
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,48 +0,0 @@
---
title: logout
description: logout from remote registry
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-logout.1.md
---
# logout
## Synopsis
**ramalama logout** [*options*] [*registry*]
## Description
Logout to remote model registry
## Options
Options are specific to registry types.
#### **--help**, **-h**
Print usage message
#### **--token**
Token to be passed to Model registry
## EXAMPLE
Logout to quay.io/username oci repository
```bash
$ ramalama logout quay.io/username
```
Logout from ollama repository
```bash
$ ramalama logout ollama
```
Logout from huggingface
```bash
$ ramalama logout huggingface
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,185 +0,0 @@
---
title: perplexity
description: calculate the perplexity value of an AI Model
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-perplexity.1.md
---
# perplexity
## Synopsis
**ramalama perplexity** [*options*] *model* [arg ...]
## MODEL TRANSPORTS
| Transports | Prefix | Web Site |
| ------------- | ------ | --------------------------------------------------- |
| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`|
| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)|
| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)|
| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)|
| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) |
| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)|
|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)|
RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS
environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.
Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model.
URL support means if a model is on a web site or even on your local system, you can run it directly.
## Options
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--cache-reuse**=256
Min chunk size to attempt reusing from the cache via KV shifting
#### **--ctx-size**, **-c**
size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model)
#### **--device**
Add a host device to the container. Optional permissions parameter can
be used to specify device permissions by combining r for read, w for
write, and m for mknod(2).
Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm
The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information.
#### **--env**=
Set environment variables inside of the container.
This option allows arbitrary environment variables that are available for the
process to be launched inside of the container. If an environment variable is
specified without a value, the container engine checks the host environment
for a value and set the variable only if it is set on the host.
#### **--help**, **-h**
show this help message and exit
#### **--image**=IMAGE
OCI container image to run with specified AI model. RamaLama defaults to using
images based on the accelerator it discovers. For example:
`quay.io/ramalama/ramalama`. See the table below for all default images.
The default image tag is based on the minor version of the RamaLama package.
Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
The default can be overridden in the ramalama.conf file or via the
RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
RamaLama to use the `quay.io/ramalama/aiimage:1.2` image.
Accelerated images:
| Accelerator | Image |
| ------------------------| -------------------------- |
| CPU, Apple | quay.io/ramalama/ramalama |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |
#### **--keep-groups**
pass --group-add keep-groups to podman (default: False)
If GPU device on host system is accessible to user via group access, this option leaks the groups into the container.
#### **--max-tokens**=*integer*
Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0).
This parameter is mapped to the appropriate runtime-specific parameter:
- llama.cpp: `-n` parameter
- MLX: `--max-tokens` parameter
- vLLM: `--max-tokens` parameter
#### **--name**, **-n**
name of the container to run the Model in
#### **--network**=*none*
set the network mode for the container
#### **--ngl**
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)
#### **--oci-runtime**
Override the default OCI runtime used to launch the container. Container
engines like Podman and Docker, have their own default oci runtime that they
use. Using this option RamaLama will override these defaults.
On Nvidia based GPU systems, RamaLama defaults to using the
`nvidia-container-runtime`. Use this option to override this selection.
#### **--privileged**
By default, RamaLama containers are unprivileged (=false) and cannot, for
example, modify parts of the operating system. This is because by de
fault a container is only allowed limited access to devices. A "privi
leged" container is given the same access to devices as the user launch
ing the container, with the exception of virtual consoles (/dev/tty\d+)
when running in systemd mode (--systemd=always).
A privileged container turns off the security features that isolate the
container from the host. Dropped Capabilities, limited devices, read-
only mount points, Apparmor/SELinux separation, and Seccomp filters are
all disabled. Due to the disabled security features, the privileged
field should almost never be set as containers can easily break out of
confinement.
Containers running in a user namespace (e.g., rootless containers) can
not have more privileges than the user that launched them.
#### **--pull**=*policy*
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
#### **--runtime-args**="*args*"
Add *args* to the runtime (llama.cpp or vllm) invocation.
#### **--seed**=
Specify seed rather than using random seed model interaction
#### **--selinux**=*true*
Enable SELinux container separation
#### **--temp**="0.8"
Temperature of the response from the AI Model
llama.cpp explains this as:
The lower the number is, the more deterministic the response.
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
#### **--thinking**=*true*
Enable or disable thinking mode in reasoning models
#### **--threads**, **-t**
Maximum number of cpu threads to use.
The default is to use half the cores available on this system for the number of threads.
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
## Description
Calculate the perplexity of an AI Model. Perplexity measures how well the model can predict the next token with lower values being better.
## Examples
```text
ramalama perplexity granite3-moe
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Jan 2025, Originally compiled by Eric Curtin &lt;ecurtin&#64;redhat.com&gt;*

View File

@@ -1,50 +0,0 @@
---
title: pull
description: pull AI Models from Model registries to local storage
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-pull.1.md
---
# pull
## Synopsis
**ramalama pull** [*options*] *model*
## Description
Pull specified AI Model into local storage
## Options
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--help**, **-h**
Print usage message
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
#### **--verify**=*true*
verify the model after pull, disable to allow pulling of models with different endianness
## PROXY SUPPORT
RamaLama supports HTTP, HTTPS, and SOCKS proxies via standard environment variables:
- **HTTP_PROXY** or **http_proxy**: Proxy for HTTP connections
- **HTTPS_PROXY** or **https_proxy**: Proxy for HTTPS connections
- **NO_PROXY** or **no_proxy**: Comma-separated list of hosts to bypass proxy
Example proxy URL formats:
- HTTP/HTTPS: `http://proxy.example.com:8080` or `https://proxy.example.com:8443`
- SOCKS4: `socks4://proxy.example.com:1080`
- SOCKS5: `socks5://proxy.example.com:1080` or `socks5h://proxy.example.com:1080` (DNS through proxy)
SOCKS proxy support requires the PySocks library (`pip install PySocks`).
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,81 +0,0 @@
---
title: push
description: push AI Models from local storage to remote registries
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-push.1.md
---
# push
## Synopsis
**ramalama push** [*options*] *model* [*target*]
## Description
Push specified AI Model (OCI-only at present)
The model can be from RamaLama model storage in Huggingface, Ollama, or OCI Model format.
The model can also just be a model stored on disk.
Users can convert without pushing using the `ramalama convert` command.
## Options
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--help**, **-h**
Print usage message
#### **--network**=*none*
sets the configuration for network namespaces when handling RUN instructions
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
#### **--type**=*raw* | *car*
type of OCI Model Image to push.
| Type | Description |
| ---- | ------------------------------------------------------------- |
| car | Includes base image with the model stored in a /models subdir |
| raw | Only the model and a link file model.file to it stored at / |
Only supported for pushing OCI Model Images.
## EXAMPLE
Push and OCI model to registry
```bash
$ ramalama push oci://quay.io/rhatdan/tiny:latest
Pushing quay.io/rhatdan/tiny:latest...
Getting image source signatures
Copying blob e0166756db86 skipped: already exists
Copying config ebe856e203 done |
Writing manifest to image destination
```
Generate an oci model out of an Ollama model and push to registry
```bash
$ ramalama push ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
Building quay.io/rhatdan/tiny:latest...
STEP 1/2: FROM scratch
STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model
--> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
COMMIT quay.io/rhatdan/tiny:latest
--> 69db4a10191c
Successfully tagged quay.io/rhatdan/tiny:latest
69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
Pushing quay.io/rhatdan/tiny:latest...
Getting image source signatures
Copying blob e0166756db86 skipped: already exists
Copying config 69db4a1019 done |
Writing manifest to image destination
```
## See Also
[ramalama(1)](/docs/commands/ramalama/), [ramalama-convert(1)](/docs/commands/ramalama/convert)
---
*Aug 2024, Originally compiled by Eric Curtin &lt;ecurtin&#64;redhat.com&gt;*

View File

@@ -1,127 +0,0 @@
---
title: rag
description: generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-rag.1.md
---
# rag
## Synopsis
**ramalama rag** [options] [path ...] image
## Description
Generate RAG data from provided documents and convert into an OCI Image. This command uses a specific container image containing the docling
tool to convert the specified content into a RAG vector database. If the image does not exist locally, RamaLama will pull the image
down and launch a container to process the data.
:::note
this command does not work without a container engine.
:::
positional arguments:
*PATH* Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML,
AsciiDoc & Markdown formatted files to be processed.
Can be specified multiple times.
*DESTINATION* Path or OCI Image name to contain processed rag data
## Options
#### **--env**=
Set environment variables inside of the container.
This option allows arbitrary environment variables that are available for the
process to be launched inside of the container. If an environment variable is
specified without a value, the container engine checks the host environment
for a value and set the variable only if it is set on the host.
#### **--format**=*json* | *markdown* | *qdrant* |
Convert documents into the following formats
| Type | Description |
| ------- | ---------------------------------------------------- |
| json | JavaScript Object Notation. lightweight format for exchanging data |
| markdown| Lightweight markup language using plain text editing |
| qdrant | Retrieval-Augmented Generation (RAG) Vector database Qdrant distribution |
| milvus | Retrieval-Augmented Generation (RAG) Vector database Milvus distribution |
#### **--help**, **-h**
Print usage message
#### **--image**=IMAGE
OCI container image to run with specified AI model. RamaLama defaults to using
images based on the accelerator it discovers. For example:
`quay.io/ramalama/ramalama-rag`. See the table below for all default images.
The default image tag is based on the minor version of the RamaLama package.
Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
The default can be overridden in the ramalama.conf file or via the
RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
RamaLama to use the `quay.io/ramalama/aiimage:1.2` image.
Accelerated images:
| Accelerator | Image |
| ------------------------| ------------------------------ |
| CPU, Apple | quay.io/ramalama/ramalama-rag |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm-rag |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda-rag |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi-rag |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu-rag |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann-rag |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa-rag |
#### **--keep-groups**
pass --group-add keep-groups to podman (default: False)
If GPU device on host system is accessible to user via group access, this option leaks the groups into the container.
#### **--network**=*none*
sets the configuration for network namespaces when handling RUN instructions
#### **--ocr**
Sets the Docling OCR flag. OCR stands for Optical Character Recognition and is used to extract text from images within PDFs converting it into raw text that an LLM can understand. This feature is useful if the PDF's one is converting has a lot of embedded images with text. This process uses a great amount of RAM so the default is false.
#### **--pull**=*policy*
Pull image policy. The default is **missing**.
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
#### **--selinux**=*true*
Enable SELinux container separation
## Examples
```bash
$ ramalama rag ./README.md https://github.com/containers/podman/blob/main/README.md quay.io/rhatdan/myrag
100% |███████████████████████████████████████████████████████| 114.00 KB/ 0.00 B 922.89 KB/s 59m 59s
Building quay.io/ramalama/myrag...
adding vectordb...
c857ebc65c641084b34e39b740fdb6a2d9d2d97be320e6aa9439ed0ab8780fe0
```
```bash
$ ramalama rag --ocr README.md https://mysight.edu/document quay.io/rhatdan/myrag
```
```bash
$ ramalama rag --format markdown /tmp/internet.pdf /tmp/output
$ ls /tmp/output/docs/tmp/
/tmp/output/docs/tmp/internet.md
$ ramalama rag --format json /tmp/internet.pdf /tmp/output
$ ls /tmp/output/docs/tmp/
/tmp/output/docs/tmp/internet.md
/tmp/output/docs/tmp/internet.json
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Dec 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,205 +0,0 @@
---
title: ramalama
description: Simple management tool for working with AI Models
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama.1.md
---
# ramalama
## Synopsis
**ramalama** [*options*] *command*
## Description
RamaLama : The goal of RamaLama is to make AI boring.
RamaLama tool facilitates local management and serving of AI Models.
On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present.
RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup.
Running in containers eliminates the need for users to configure the host
system for AI. After the initialization, RamaLama runs the AI Models within a
container based on the OCI image. RamaLama pulls container image specific to
the GPUs discovered on the host system. These images are tied to the minor
version of RamaLama. For example RamaLama version 1.2.3 on an NVIDIA system
pulls quay.io/ramalama/cuda:1.2. To override the default image use the
`--image` option.
RamaLama pulls AI Models from model registries. Starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.
When both Podman and Docker are installed, RamaLama defaults to Podman, The `RAMALAMA_CONTAINER_ENGINE=docker` environment variable can override this behaviour. When neither are installed RamaLama attempts to run the model with software on the local system.
:::note
On MacOS systems that use Podman for containers, configure the Podman machine to use the `libkrun` machine provider. The `libkrun` provider enables containers within the Podman Machine access to the Mac's GPU. See [ramalama-macos(7)](/docs/platform-guides/macos) for further information.
:::
:::note
On systems with NVIDIA GPUs, see [ramalama-cuda(7)](/docs/platform-guides/cuda) to correctly configure the host system.
:::
RamaLama CLI defaults can be modified via ramalama.conf files. Default settings for flags are defined in [ramalama.conf(5)](/docs/configuration/conf).
## SECURITY
### Test and run your models more securely
Because RamaLama defaults to running AI models inside of rootless containers using Podman on Docker. These containers isolate the AI models from information on the underlying host. With RamaLama containers, the AI model is mounted as a volume into the container in read/only mode. This results in the process running the model, llama.cpp or vLLM, being isolated from the host. In addition, since `ramalama run` uses the --network=none option, the container can not reach the network and leak any information out of the system. Finally, containers are run with --rm options which means that any content written during the running of the container is wiped out when the application exits.
### Heres how RamaLama delivers a robust security footprint:
✅ Container Isolation AI models run within isolated containers, preventing direct access to the host system.
✅ Read-Only Volume Mounts The AI model is mounted in read-only mode, meaning that processes inside the container cannot modify host files.
✅ No Network Access ramalama run is executed with --network=none, meaning the model has no outbound connectivity for which information can be leaked.
✅ Auto-Cleanup Containers run with --rm, wiping out any temporary data once the session ends.
✅ Drop All Linux Capabilities No access to Linux capabilities to attack the underlying host.
✅ No New Privileges Linux Kernel feature which disables container processes from gaining additional privileges.
## MODEL TRANSPORTS
RamaLama supports multiple AI model registries types called transports. Supported transports:
| Transports | Prefix | Web Site |
| ------------- | ------ | --------------------------------------------------- |
| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`|
| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)|
| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)|
| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)|
| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) |
| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)|
|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)|
RamaLama uses to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS
environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.
Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model.
URL support means if a model is on a web site or even on your local system, you can run it directly.
ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf
ramalama run `file://`$HOME/granite-7b-lab-Q4_K_M.gguf
To make it easier for users, RamaLama uses shortname files, which container
alias names for fully specified AI Models allowing users to specify the shorter
names when referring to models. RamaLama reads shortnames.conf files if they
exist . These files contain a list of name value pairs for specification of
the model. The following table specifies the order which RamaLama reads the files
. Any duplicate names that exist override previously defined shortnames.
| Shortnames type | Path |
| --------------- | ---------------------------------------- |
| Distribution | /usr/share/ramalama/shortnames.conf |
| Local install | /usr/local/share/ramalama/shortnames.conf |
| Administrators | /etc/ramamala/shortnames.conf |
| Users | $HOME/.config/ramalama/shortnames.conf |
```toml
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
```
**ramalama [GLOBAL OPTIONS]**
## GLOBAL OPTIONS
#### **--debug**
print debug messages
#### **--dryrun**
show container runtime command without executing it (default: False)
#### **--engine**
run RamaLama using the specified container engine. Default is `podman` if installed otherwise docker.
The default can be overridden in the ramalama.conf file or via the RAMALAMA_CONTAINER_ENGINE environment variable.
#### **--help**, **-h**
show this help message and exit
#### **--nocontainer**
Do not run RamaLama workloads in containers (default: False)
The default can be overridden in the ramalama.conf file.
:::note
OCI images cannot be used with the --nocontainer option. This option disables the following features: Automatic GPU acceleration, containerized environment isolation, and dynamic resource allocation. For a complete list of affected features, please see the RamaLama documentation at [link-to-feature-list].
:::
#### **--quiet**
Decrease output verbosity.
#### **--runtime**=*llama.cpp* | *vllm*
specify the runtime to use, valid options are 'llama.cpp' and 'vllm' (default: llama.cpp)
The default can be overridden in the ramalama.conf file.
#### **--store**=STORE
store AI Models in the specified directory (default rootless: `$HOME/.local/share/ramalama`, default rootful: `/var/lib/ramalama`)
The default can be overridden in the ramalama.conf file.
## COMMANDS
| Command | Description |
| ------------------------------------------------- | ---------------------------------------------------------- |
| [ramalama-bench(1)](/docs/commands/ramalama/bench) |benchmark specified AI Model|
| [ramalama-chat(1)](/docs/commands/ramalama/chat) |OpenAI chat with the specified REST API URL|
| [ramalama-containers(1)](/docs/commands/ramalama/containers)|list all RamaLama containers|
| [ramalama-convert(1)](/docs/commands/ramalama/convert) |convert AI Models from local storage to OCI Image|
| [ramalama-daemon(1)](/docs/commands/ramalama/daemon) |run a RamaLama REST server|
| [ramalama-info(1)](/docs/commands/ramalama/info) |display RamaLama configuration information|
| [ramalama-inspect(1)](/docs/commands/ramalama/inspect) |inspect the specified AI Model|
| [ramalama-list(1)](/docs/commands/ramalama/list) |list all downloaded AI Models|
| [ramalama-login(1)](/docs/commands/ramalama/login) |login to remote registry|
| [ramalama-logout(1)](/docs/commands/ramalama/logout) |logout from remote registry|
| [ramalama-perplexity(1)](/docs/commands/ramalama/perplexity)|calculate the perplexity value of an AI Model|
| [ramalama-pull(1)](/docs/commands/ramalama/pull) |pull AI Models from Model registries to local storage|
| [ramalama-push(1)](/docs/commands/ramalama/push) |push AI Models from local storage to remote registries|
| [ramalama-rag(1)](/docs/commands/ramalama/rag) |generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image|
| [ramalama-rm(1)](/docs/commands/ramalama/rm) |remove AI Models from local storage|
| [ramalama-run(1)](/docs/commands/ramalama/run) |run specified AI Model as a chatbot|
| [ramalama-serve(1)](/docs/commands/ramalama/serve) |serve REST API on specified AI Model|
| [ramalama-stop(1)](/docs/commands/ramalama/stop) |stop named container that is running AI Model|
| [ramalama-version(1)](/docs/commands/ramalama/version) |display version of RamaLama|
## CONFIGURATION FILES
**ramalama.conf** (`/usr/share/ramalama/ramalama.conf`, `/etc/ramalama/ramalama.conf`, `/etc/ramalama/ramalama.conf.d/*.conf`, `$HOME/.config/ramalama/ramalama.conf`, `$HOME/.config/ramalama/ramalama.conf.d/*.conf`)
RamaLama has builtin defaults for command line options. These defaults can be overridden using the ramalama.conf configuration files.
Distributions ship the `/usr/share/ramalama/ramalama.conf` file with their default settings. Administrators can override fields in this file by creating the `/etc/ramalama/ramalama.conf` file. Users can further modify defaults by creating the `$HOME/.config/ramalama/ramalama.conf` file. RamaLama merges its builtin defaults with the specified fields from these files, if they exist. Fields specified in the users file override the administrator's file, which overrides the distribution's file, which override the built-in defaults.
RamaLama uses builtin defaults if no ramalama.conf file is found.
If the **RAMALAMA_CONFIG** environment variable is set, then its value is used for the ramalama.conf file rather than the default.
## ENVIRONMENT VARIABLES
RamaLama default behaviour can also be overridden via environment variables,
although the recommended way is to use the ramalama.conf file.
| ENV Name | Description |
| ------------------------- | ------------------------------------------ |
| HTTP_PROXY, http_proxy | proxy URL for HTTP connections |
| HTTPS_PROXY, https_proxy | proxy URL for HTTPS connections |
| NO_PROXY, no_proxy | comma-separated list of hosts to bypass proxy (e.g., localhost,127.0.0.1,.local) |
| RAMALAMA_CONFIG | specific configuration file to be used |
| RAMALAMA_CONTAINER_ENGINE | container engine (Podman/Docker) to use |
| RAMALAMA_FORCE_EMOJI | define whether `ramalama run` uses EMOJI |
| RAMALAMA_IMAGE | container image to use for serving AI Model|
| RAMALAMA_IN_CONTAINER | Run RamaLama in the default container |
| RAMALAMA_STORE | location to store AI Models |
| RAMALAMA_TRANSPORT | default AI Model transport (ollama, huggingface, OCI) |
| TMPDIR | directory for temporary files. Defaults to /var/tmp if unset.|
## See Also
[podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md), **docker(1)**, [ramalama.conf(5)](/docs/configuration/conf), [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama-macos(7)](/docs/platform-guides/macos)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,42 +0,0 @@
---
title: rm
description: remove AI Models from local storage
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-rm.1.md
---
# rm
## Synopsis
**ramalama rm** [*options*] *model* [...]
## Description
Specify one or more AI Models to be removed from local storage
## Options
#### **--all**, **-a**
remove all local Models
#### **--help**, **-h**
show this help message and exit
#### **--ignore**
ignore errors when specified Model does not exist
## Examples
```bash
$ ramalama rm ollama://tinyllama
$ ramalama rm --all
$ ramalama rm --ignore bogusmodel
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,272 +0,0 @@
---
title: run
description: run specified AI Model as a chatbot
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-run.1.md
---
# run
## Synopsis
**ramalama run** [*options*] *model* [arg ...]
## MODEL TRANSPORTS
| Transports | Prefix | Web Site |
| ------------- | ------ | --------------------------------------------------- |
| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`|
| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)|
| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)|
| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)|
| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) |
| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)|
|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)|
RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS
environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.
Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model.
URL support means if a model is on a web site or even on your local system, you can run it directly.
## Options
#### **--api**=**llama-stack** | none**
unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.(default: none)
The default can be overridden in the `ramalama.conf` file.
#### **--authfile**=*password*
path of the authentication file for OCI registries
#### **--cache-reuse**=256
Min chunk size to attempt reusing from the cache via KV shifting
#### **--color**
Indicate whether or not to use color in the chat.
Possible values are "never", "always" and "auto". (default: auto)
#### **--ctx-size**, **-c**
size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model)
#### **--device**
Add a host device to the container. Optional permissions parameter can
be used to specify device permissions by combining r for read, w for
write, and m for mknod(2).
Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm
The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information.
Pass '--device=none' explicitly add no device to the container, eg for
running a CPU-only performance comparison.
#### **--env**=
Set environment variables inside of the container.
This option allows arbitrary environment variables that are available for the
process to be launched inside of the container. If an environment variable is
specified without a value, the container engine checks the host environment
for a value and set the variable only if it is set on the host.
#### **--help**, **-h**
Show this help message and exit
#### **--image**=IMAGE
OCI container image to run with specified AI model. RamaLama defaults to using
images based on the accelerator it discovers. For example:
`quay.io/ramalama/ramalama`. See the table below for all default images.
The default image tag is based on the minor version of the RamaLama package.
Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
The default can be overridden in the `ramalama.conf` file or via the
RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
RamaLama to use the `quay.io/ramalama/aiimage:1.2` image.
Accelerated images:
| Accelerator | Image |
| ------------------------| -------------------------- |
| CPU, Apple | quay.io/ramalama/ramalama |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |
#### **--keep-groups**
pass --group-add keep-groups to podman (default: False)
If GPU device on host system is accessible to user via group access, this option leaks the groups into the container.
#### **--keepalive**
duration to keep a model loaded (e.g. 5m)
#### **--max-tokens**=*integer*
Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0).
This parameter is mapped to the appropriate runtime-specific parameter:
- llama.cpp: `-n` parameter
- MLX: `--max-tokens` parameter
- vLLM: `--max-tokens` parameter
#### **--mcp**=SERVER_URL
MCP (Model Context Protocol) servers to use for enhanced tool calling capabilities.
Can be specified multiple times to connect to multiple MCP servers.
Each server provides tools that can be automatically invoked during chat conversations.
#### **--name**, **-n**
name of the container to run the Model in
#### **--network**=*none*
set the network mode for the container
#### **--ngl**
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)
#### **--oci-runtime**
Override the default OCI runtime used to launch the container. Container
engines like Podman and Docker, have their own default oci runtime that they
use. Using this option RamaLama will override these defaults.
On Nvidia based GPU systems, RamaLama defaults to using the
`nvidia-container-runtime`. Use this option to override this selection.
#### **--port**, **-p**=*port*
Port for AI Model server to listen on (default: 8080)
The default can be overridden in the `ramalama.conf` file.
#### **--prefix**
Prefix for the user prompt (default: 🦭 > )
#### **--privileged**
By default, RamaLama containers are unprivileged (=false) and cannot, for
example, modify parts of the operating system. This is because by de
fault a container is only allowed limited access to devices. A "privi
leged" container is given the same access to devices as the user launch
ing the container, with the exception of virtual consoles (/dev/tty\d+)
when running in systemd mode (--systemd=always).
A privileged container turns off the security features that isolate the
container from the host. Dropped Capabilities, limited devices, read-
only mount points, Apparmor/SELinux separation, and Seccomp filters are
all disabled. Due to the disabled security features, the privileged
field should almost never be set as containers can easily break out of
confinement.
Containers running in a user namespace (e.g., rootless containers) can
not have more privileges than the user that launched them.
#### **--pull**=*policy*
Pull image policy. The default is **missing**.
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
#### **--rag**=
Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image containing a RAG database
#### **--rag-image**=
The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which
will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses.
#### **--runtime-args**="*args*"
Add *args* to the runtime (llama.cpp or vllm) invocation.
#### **--seed**=
Specify seed rather than using random seed model interaction
#### **--selinux**=*true*
Enable SELinux container separation
#### **--summarize-after**=*N*
Automatically summarize conversation history after N messages to prevent context growth.
When enabled, ramalama will periodically condense older messages into a summary,
keeping only recent messages and the summary. This prevents the context from growing
indefinitely during long chat sessions. Set to 0 to disable (default: 4).
#### **--temp**="0.8"
Temperature of the response from the AI Model
llama.cpp explains this as:
The lower the number is, the more deterministic the response.
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
#### **--thinking**=*true*
Enable or disable thinking mode in reasoning models
#### **--threads**, **-t**
Maximum number of cpu threads to use.
The default is to use half the cores available on this system for the number of threads.
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
## Description
Run specified AI Model as a chat bot. RamaLama pulls specified AI Model from
registry if it does not exist in local storage. By default a prompt for a chat
bot is started. When arguments are specified, the arguments will be given
to the AI Model and the output returned without entering the chatbot.
## Examples
Run command without arguments starts a chatbot
```text
ramalama run granite
>
```
Run command with local downloaded model for 10 minutes
```text
ramalama run --keepalive 10m file:///tmp/mymodel
>
```
Run command with a custom port to allow multiple models running simultaneously
```text
ramalama run --port 8081 granite
>
```
```text
ramalama run merlinite "when is the summer solstice"
The summer solstice, which is the longest day of the year, will happen on June ...
```
Run command with a custom prompt and a file passed by the stdin
```text
cat file.py | ramalama run quay.io/USER/granite-code:1.0 'what does this program do?'
This program is a Python script that allows the user to interact with a terminal. ...
[end of text]
```
Run command and send multiple lines at once to the chatbot by adding a backslash `\`
at the end of the line
$ ramalama run granite
🦭 > Hi \
🦭 > tell me a funny story \
🦭 > please
## Exit Codes:
0 Success
124 RamaLama command did not exit within the keepalive time.
## NVIDIA CUDA Support
See [ramalama-cuda(7)](/docs/platform-guides/cuda) for setting up the host Linux system for CUDA support.
## See Also
[ramalama(1)](/docs/commands/ramalama/), [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama.conf(5)](/docs/configuration/conf)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,589 +0,0 @@
---
title: serve
description: serve REST API on specified AI Model
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-serve.1.md
---
# serve
## Synopsis
**ramalama serve** [*options*] _model_
## Description
Serve specified AI Model as a chat bot. RamaLama pulls specified AI Model from
registry if it does not exist in local storage.
## MODEL TRANSPORTS
| Transports | Prefix | Web Site |
| ------------- | ------ | --------------------------------------------------- |
| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`|
| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)|
| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)|
| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)|
| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)|
| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) |
|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)|
RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS
environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.
Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model.
URL support means if a model is on a web site or even on your local system, you can run it directly.
## REST API ENDPOINTS
Under the hood, `ramalama-serve` uses the `llama.cpp` HTTP server by default. When using `--runtime=vllm`, it uses the vLLM server. When using `--runtime=mlx`, it uses the MLX LM server.
For REST API endpoint documentation, see:
- llama.cpp: [https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#api-endpoints](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#api-endpoints)
- vLLM: [https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html)
- MLX LM: [https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md](https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md)
## Options
#### **--add-to-unit**
format: --add-to-unit section:key:value
Adds to the generated unit file (quadlet) in the section *section* the key *key* with the value *value*.
Useful, for instance, to add environment variables to the generated unit file, or to place the container in a specific pod/network (Container:Network:xxx.network).
**Only valid with *--generate* parameter.**
Section, key and value are required and must be separated by colons.
#### **--api**=**llama-stack** | none**
Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.(default: none)
The default can be overridden in the `ramalama.conf` file.
#### **--authfile**=*password*
Path of the authentication file for OCI registries
#### **--cache-reuse**=256
Min chunk size to attempt reusing from the cache via KV shifting
#### **--ctx-size**, **-c**
size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model)
#### **--detach**, **-d**
Run the container in the background and print the new container ID.
The default is TRUE. The --nocontainer option forces this option to False.
Use the `ramalama stop` command to stop the container running the served ramalama Model.
#### **--device**
Add a host device to the container. Optional permissions parameter can
be used to specify device permissions by combining r for read, w for
write, and m for mknod(2).
Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm
The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information.
Pass '--device=none' explicitly add no device to the container, eg for
running a CPU-only performance comparison.
#### **--dri**=*on* | *off*
Enable or disable mounting `/dev/dri` into the container when running with `--api=llama-stack` (enabled by default). Use to prevent access to the host device when not required, or avoid errors in environments where `/dev/dri` is not available.
#### **--env**=
Set environment variables inside of the container.
This option allows arbitrary environment variables that are available for the
process to be launched inside of the container. If an environment variable is
specified without a value, the container engine checks the host environment
for a value and set the variable only if it is set on the host.
#### **--generate**=type
Generate specified configuration format for running the AI Model as a service
| Key | Description |
| ------------ | -------------------------------------------------------------------------|
| quadlet | Podman supported container definition for running AI Model under systemd |
| kube | Kubernetes YAML definition for running the AI Model as a service |
| quadlet/kube | Kubernetes YAML definition for running the AI Model as a service and Podman supported container definition for running the Kube YAML specified pod under systemd|
| compose | Compose YAML definition for running the AI Model as a service |
Optionally, an output directory for the generated files can be specified by
appending the path to the type, e.g. `--generate kube:/etc/containers/systemd`.
#### **--help**, **-h**
show this help message and exit
#### **--host**="0.0.0.0"
IP address for llama.cpp to listen on.
#### **--image**=IMAGE
OCI container image to run with specified AI model. RamaLama defaults to using
images based on the accelerator it discovers. For example:
`quay.io/ramalama/ramalama`. See the table above for all default images.
The default image tag is based on the minor version of the RamaLama package.
Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
The default can be overridden in the `ramalama.conf` file or via the
RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
RamaLama to use the `quay.io/ramalama/aiimage:1.2` image.
Accelerated images:
| Accelerator | Image |
| ------------------------| -------------------------- |
| CPU, Apple | quay.io/ramalama/ramalama |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |
#### **--keep-groups**
pass --group-add keep-groups to podman (default: False)
If GPU device on host system is accessible to user via group access, this option leaks the groups into the container.
#### **--max-tokens**=*integer*
Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0).
This parameter is mapped to the appropriate runtime-specific parameter:
- llama.cpp: `-n` parameter
- MLX: `--max-tokens` parameter
- vLLM: `--max-tokens` parameter
#### **--model-draft**
A draft model is a smaller, faster model that helps accelerate the decoding
process of larger, more complex models, like Large Language Models (LLMs). It
works by generating candidate sequences of tokens that the larger model then
verifies and refines. This approach, often referred to as speculative decoding,
can significantly improve the speed of inferencing by reducing the number of
times the larger model needs to be invoked.
Use --runtime-arg to pass the other draft model related parameters.
Make sure the sampling parameters like top_k on the web UI are set correctly.
#### **--name**, **-n**
Name of the container to run the Model in.
#### **--network**=*""*
set the network mode for the container
#### **--ngl**
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)
#### **--oci-runtime**
Override the default OCI runtime used to launch the container. Container
engines like Podman and Docker, have their own default oci runtime that they
use. Using this option RamaLama will override these defaults.
On Nvidia based GPU systems, RamaLama defaults to using the
`nvidia-container-runtime`. Use this option to override this selection.
#### **--port**, **-p**
port for AI Model server to listen on. It must be available. If not specified,
a free port in the 8080-8180 range is selected, starting with 8080.
The default can be overridden in the `ramalama.conf` file.
#### **--privileged**
By default, RamaLama containers are unprivileged (=false) and cannot, for
example, modify parts of the operating system. This is because by de
fault a container is only allowed limited access to devices. A "privi
leged" container is given the same access to devices as the user launch
ing the container, with the exception of virtual consoles (/dev/tty\d+)
when running in systemd mode (--systemd=always).
A privileged container turns off the security features that isolate the
container from the host. Dropped Capabilities, limited devices, read-
only mount points, Apparmor/SELinux separation, and Seccomp filters are
all disabled. Due to the disabled security features, the privileged
field should almost never be set as containers can easily break out of
confinement.
Containers running in a user namespace (e.g., rootless containers) can
not have more privileges than the user that launched them.
#### **--pull**=*policy*
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
#### **--rag**=
Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image containing a RAG database
:::note
RAG support requires AI Models be run within containers, --nocontainer not supported. Docker does not support image mounting, meaning Podman support required.
:::
#### **--rag-image**=
The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which
will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses.
#### **--runtime-args**="*args*"
Add *args* to the runtime (llama.cpp or vllm) invocation.
#### **--seed**=
Specify seed rather than using random seed model interaction
#### **--selinux**=*true*
Enable SELinux container separation
#### **--temp**="0.8"
Temperature of the response from the AI Model.
llama.cpp explains this as:
The lower the number is, the more deterministic the response.
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
#### **--thinking**=*true*
Enable or disable thinking mode in reasoning models
#### **--threads**, **-t**
Maximum number of cpu threads to use.
The default is to use half the cores available on this system for the number of threads.
#### **--tls-verify**=*true*
require HTTPS and verify certificates when contacting OCI registries
#### **--webui**=*on* | *off*
Enable or disable the web UI for the served model (enabled by default). When set to "on" (the default), the web interface is properly initialized. When set to "off", the `--no-webui` option is passed to the llama-server command to disable the web interface.
## Examples
### Run two AI Models at the same time. Notice both are running within Podman Containers.
```bash
$ ramalama serve -d -p 8080 --name mymodel ollama://smollm:135m
09b0e0d26ed28a8418fb5cd0da641376a08c435063317e89cf8f5336baf35cfa
$ ramalama serve -d -n example --port 8081 oci://quay.io/mmortari/gguf-py-example/v1/example.gguf
3f64927f11a5da5ded7048b226fbe1362ee399021f5e8058c73949a677b6ac9c
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
09b0e0d26ed2 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 32 seconds ago Up 32 seconds 0.0.0.0:8081->8081/tcp ramalama_sTLNkijNNP
3f64927f11a5 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 17 seconds ago Up 17 seconds 0.0.0.0:8082->8082/tcp ramalama_YMPQvJxN97
```
### Generate quadlet service off of HuggingFace granite Model
```bash
$ ramalama serve --name MyGraniteServer --generate=quadlet granite
Generating quadlet file: MyGraniteServer.container
$ cat MyGraniteServer.container
[Unit]
Description=RamaLama $HOME/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf AI Model Service
After=local-fs.target
[Container]
AddDevice=-/dev/accel
AddDevice=-/dev/dri
AddDevice=-/dev/kfd
Exec=llama-server --port 1234 -m $HOME/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf
Image=quay.io/ramalama/ramalama:latest
Mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf,target=/mnt/models/model.file,ro,Z
ContainerName=MyGraniteServer
PublishPort=8080
[Install]
# Start by default on boot
WantedBy=multi-user.target default.target
$ mv MyGraniteServer.container $HOME/.config/containers/systemd/
$ systemctl --user daemon-reload
$ systemctl start --user MyGraniteServer
$ systemctl status --user MyGraniteServer
● MyGraniteServer.service - RamaLama granite AI Model Service
Loaded: loaded (/home/dwalsh/.config/containers/systemd/MyGraniteServer.container; generated)
Drop-In: /usr/lib/systemd/user/service.d
└─10-timeout-abort.conf
Active: active (running) since Fri 2024-09-27 06:54:17 EDT; 3min 3s ago
Main PID: 3706287 (conmon)
Tasks: 20 (limit: 76808)
Memory: 1.0G (peak: 1.0G)
...
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bb35b97a0fe quay.io/ramalama/ramalama:latest llama-server --po... 3 minutes ago Up 3 minutes 0.0.0.0:43869->8080/tcp MyGraniteServer
```
### Generate quadlet service off of tiny OCI Model
```bash
$ ramalama --runtime=vllm serve --name tiny --generate=quadlet oci://quay.io/rhatdan/tiny:latest
Downloading quay.io/rhatdan/tiny:latest...
Trying to pull quay.io/rhatdan/tiny:latest...
Getting image source signatures
Copying blob 65ba8d40e14a skipped: already exists
Copying blob e942a1bf9187 skipped: already exists
Copying config d8e0b28ee6 done |
Writing manifest to image destination
Generating quadlet file: tiny.container
Generating quadlet file: tiny.image
Generating quadlet file: tiny.volume
$cat tiny.container
[Unit]
Description=RamaLama /run/model/model.file AI Model Service
After=local-fs.target
[Container]
AddDevice=-/dev/accel
AddDevice=-/dev/dri
AddDevice=-/dev/kfd
Exec=vllm serve --port 8080 /run/model/model.file
Image=quay.io/ramalama/ramalama:latest
Mount=type=volume,source=tiny:latest.volume,dest=/mnt/models,ro
ContainerName=tiny
PublishPort=8080
[Install]
# Start by default on boot
WantedBy=multi-user.target default.target
$ cat tiny.volume
[Volume]
Driver=image
Image=tiny:latest.image
$ cat tiny.image
[Image]
Image=quay.io/rhatdan/tiny:latest
```
### Generate quadlet service off of tiny OCI Model and output to directory
```bash
$ ramalama --runtime=vllm serve --name tiny --generate=quadlet:~/.config/containers/systemd/ oci://quay.io/rhatdan/tiny:latest
Generating quadlet file: tiny.container
Generating quadlet file: tiny.image
Generating quadlet file: tiny.volume
$ ls ~/.config/containers/systemd/
tiny.container tiny.image tiny.volume
```
### Generate a kubernetes YAML file named MyTinyModel
```bash
$ ramalama serve --name MyTinyModel --generate=kube oci://quay.io/rhatdan/tiny-car:latest
Generating Kubernetes YAML file: MyTinyModel.yaml
$ cat MyTinyModel.yaml
# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with ramalama-0.0.21
apiVersion: v1
kind: Deployment
metadata:
name: MyTinyModel
labels:
app: MyTinyModel
spec:
replicas: 1
selector:
matchLabels:
app: MyTinyModel
template:
metadata:
labels:
app: MyTinyModel
spec:
containers:
- name: MyTinyModel
image: quay.io/ramalama/ramalama:latest
command: ["llama-server"]
args: ['--port', '8080', '-m', '/mnt/models/model.file']
ports:
- containerPort: 8080
volumeMounts:
- mountPath: /mnt/models
subPath: /models
name: model
- mountPath: /dev/dri
name: dri
volumes:
- image:
reference: quay.io/rhatdan/tiny-car:latest
pullPolicy: IfNotPresent
name: model
- hostPath:
path: /dev/dri
name: dri
```
### Generate Compose file
```bash
$ ramalama serve --name=my-smollm-server --port 1234 --generate=compose smollm:135m
Generating Compose YAML file: docker-compose.yaml
$ cat docker-compose.yaml
version: '3.8'
services:
my-smollm-server:
image: quay.io/ramalama/ramalama:latest
container_name: my-smollm-server
command: ramalama serve --host 0.0.0.0 --port 1234 smollm:135m
ports:
- "1234:1234"
volumes:
- ~/.local/share/ramalama/models/smollm-135m-instruct:/mnt/models/model.file:ro
environment:
- HOME=/tmp
cap_drop:
- ALL
security_opt:
- no-new-privileges
- label=disable
```
### Generate a Llama Stack Kubernetes YAML file named MyLamaStack
```bash
$ ramalama serve --api llama-stack --name MyLamaStack --generate=kube oci://quay.io/rhatdan/granite:latest
Generating Kubernetes YAML file: MyLamaStack.yaml
$ cat MyLamaStack.yaml
apiVersion: v1
kind: Deployment
metadata:
name: MyLamaStack
labels:
app: MyLamaStack
spec:
replicas: 1
selector:
matchLabels:
app: MyLamaStack
template:
metadata:
labels:
ai.ramalama: ""
app: MyLamaStack
ai.ramalama.model: oci://quay.io/rhatdan/granite:latest
ai.ramalama.engine: podman
ai.ramalama.runtime: llama.cpp
ai.ramalama.port: 8080
ai.ramalama.command: serve
spec:
containers:
- name: model-server
image: quay.io/ramalama/ramalama:0.8
command: ["llama-server"]
args: ['--port', '8081', '--model', '/mnt/models/model.file', '--alias', 'quay.io/rhatdan/granite:latest', '--temp', '0.8', '--jinja', '--cache-reuse', '256', '-v', '--threads', 16, '--host', '127.0.0.1']
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- CAP_CHOWN
- CAP_FOWNER
- CAP_FSETID
- CAP_KILL
- CAP_NET_BIND_SERVICE
- CAP_SETFCAP
- CAP_SETGID
- CAP_SETPCAP
- CAP_SETUID
- CAP_SYS_CHROOT
add:
- CAP_DAC_OVERRIDE
seLinuxOptions:
type: spc_t
volumeMounts:
- mountPath: /mnt/models
subPath: /models
name: model
- mountPath: /dev/dri
name: dri
- name: llama-stack
image: quay.io/ramalama/llama-stack:0.8
args:
- /bin/sh
- -c
- llama stack run --image-type venv /etc/ramalama/ramalama-run.yaml
env:
- name: RAMALAMA_URL
value: http://127.0.0.1:8081
- name: INFERENCE_MODEL
value: quay.io/rhatdan/granite:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- CAP_CHOWN
- CAP_FOWNER
- CAP_FSETID
- CAP_KILL
- CAP_NET_BIND_SERVICE
- CAP_SETFCAP
- CAP_SETGID
- CAP_SETPCAP
- CAP_SETUID
- CAP_SYS_CHROOT
add:
- CAP_DAC_OVERRIDE
seLinuxOptions:
type: spc_t
ports:
- containerPort: 8321
hostPort: 8080
volumes:
- hostPath:
path: quay.io/rhatdan/granite:latest
name: model
- hostPath:
path: /dev/dri
name: dri
```
### Generate a kubernetes YAML file named MyTinyModel shown above, but also generate a quadlet to run it in.
```bash
$ ramalama --name MyTinyModel --generate=quadlet/kube oci://quay.io/rhatdan/tiny-car:latest
run_cmd: podman image inspect quay.io/rhatdan/tiny-car:latest
Generating Kubernetes YAML file: MyTinyModel.yaml
Generating quadlet file: MyTinyModel.kube
$ cat MyTinyModel.kube
[Unit]
Description=RamaLama quay.io/rhatdan/tiny-car:latest Kubernetes YAML - AI Model Service
After=local-fs.target
[Kube]
Yaml=MyTinyModel.yaml
[Install]
# Start by default on boot
WantedBy=multi-user.target default.target
```
## NVIDIA CUDA Support
See [ramalama-cuda(7)](/docs/platform-guides/cuda) for setting up the host Linux system for CUDA support.
## MLX Support
The MLX runtime is designed for Apple Silicon Macs and provides optimized performance on these systems. MLX support has the following requirements:
- **Operating System**: macOS only
- **Hardware**: Apple Silicon (M1, M2, M3, or later)
- **Container Mode**: MLX requires `--nocontainer` as it cannot run inside containers
- **Dependencies**: The `mlx-lm` uv package installed on the host system as a uv tool
To install MLX dependencies, use `uv`:
```bash
uv tool install mlx-lm
# or upgrade to the latest version:
uv tool upgrade mlx-lm
```
Example usage:
```bash
ramalama --runtime=mlx serve hf://mlx-community/Unsloth-Phi-4-4bit
```
## See Also
[ramalama(1)](/docs/commands/ramalama/), [ramalama-stop(1)](/docs/commands/ramalama/stop), **quadlet(1)**, **systemctl(1)**, **podman(1)**, **podman-ps(1)**, [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama.conf(5)](/docs/configuration/conf)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,45 +0,0 @@
---
title: stop
description: stop named container that is running AI Model
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-stop.1.md
---
# stop
## Synopsis
**ramalama stop** [*options*] *name*
Tells container engine to stop the specified container.
The stop command conflicts with --nocontainer option.
## Options
#### **--all**, **-a**
Stop all containers
#### **--help**, **-h**
Print usage message
#### **--ignore**
Ignore missing containers when stopping
## Description
Stop specified container that is executing the AI Model.
The ramalama stop command conflicts with the --nocontainer option. The user needs to stop the RamaLama processes manually when running with --nocontainer.
## Examples
```bash
$ ramalama stop mymodel
$ ramalama stop --all
```
## See Also
[ramalama(1)](/docs/commands/ramalama/), [ramalama-run(1)](/docs/commands/ramalama/run), [ramalama-serve(1)](/docs/commands/ramalama/serve)
---
*Sep 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,35 +0,0 @@
---
title: version
description: display version of RamaLama
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-version.1.md
---
# version
## Synopsis
**ramalama version**
## Description
Print version of RamaLama
## Options
#### **--help**, **-h**
Print usage message
## Examples
```bash
$ ramalama version
ramalama version 0.16.0
$ ramalama -q version
0.16.0
>
```
## See Also
[ramalama(1)](/docs/commands/ramalama/)
---
*Aug 2024, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,267 +0,0 @@
---
title: Configuration File
description: Configuration file reference
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama.conf.5.md
---
# Configuration File
# DESCRIPTION
RamaLama reads all ramalama.conf files, if they exists
and modify the defaults for running RamaLama on the host. ramalama.conf uses
a TOML format that can be easily modified and versioned.
RamaLama reads the he following paths for global configuration that effects all users.
| Paths | Exception |
| ----------------------------------- | ----------------------------------- |
| __/usr/share/ramalama/ramalama.conf__ | On Linux |
| __/usr/local/share/ramalama/ramalama.conf__ | On Linux |
| __/etc/ramalama/ramalama.conf__ | On Linux |
| __/etc/ramalama/ramalama.conf.d/\*.conf__ | On Linux |
| __$HOME/.local/.pipx/venvs/usr/share/ramalama/ramalama.conf__ |On pipx installed macOS |
For user specific configuration it reads
| Paths | Exception |
| ----------------------------------- | ------------------------------ |
| __$XDG_CONFIG_HOME/ramalama/ramalama.conf__ | |
| __$XDG_CONFIG_HOME/ramalama/ramalama.conf.d/\*.conf__ | |
| __$HOME/.config/ramalama/ramalama.conf__ | `$XDG_CONFIG_HOME` not set |
| __$HOME/.config/ramalama/ramalama.conf.d/\*.conf__ | `$XDG_CONFIG_HOME` not set |
Fields specified in ramalama conf files override the default options, as well as
options in previously read ramalama conf files.
Config files in the `.d` directories, are added in alpha numeric sorted order and must end in `.conf`.
## ENVIRONMENT VARIABLES
If the `RAMALAMA_CONFIG` environment variable is set, all system and user
config files are ignored and only the specified config file is loaded.
# FORMAT
The [TOML format][toml] is used as the encoding of the configuration file.
Every option is nested under its table. No bare options are used. The format of
TOML can be simplified to:
[table1]
option = value
[table2]
option = value
[table3]
option = value
[table3.subtable1]
option = value
## RAMALAMA TABLE
The ramalama table contains settings to configure and manage the OCI runtime.
`[[ramalama]]`
**api**="none"
Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
Options: llama-stack, none
**api_key**=""
OpenAI-compatible API key. Can also be set via the RAMALAMA_API_KEY environment variable.
**carimage**="registry.access.redhat.com/ubi10-micro:latest"
OCI model car image
Image to be used when building and pushing --type=car models
**cache_reuse**=256
Min chunk size to attempt reusing from the cache via KV shifting
**container**=true
Run RamaLama in the default container.
RAMALAMA_IN_CONTAINER environment variable overrides this field.
**convert_type**="raw"
Convert the MODEL to the specified OCI Object
Options: artifact, car, raw
| Type | Description |
| -------- | ------------------------------------------------------------- |
| artifact | Store AI Models as artifacts |
| car | Traditional OCI image including base image with the model stored in a /models subdir |
| raw | Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / |
**ctx_size**=0
Size of the prompt context (0 = loaded from model)
**engine**="podman"
Run RamaLama using the specified container engine.
Valid options are: Podman and Docker
This field can be overridden by the RAMALAMA_CONTAINER_ENGINE environment variable.
**env**=[]
Environment variables to be added to the environment used when running in a container engine (e.g., Podman, Docker). For example "LLAMA_ARG_THREADS=10".
**gguf_quantization_mode**="Q4_K_M"
The quantization mode used when creating OCI formatted AI Models.
Available options: Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0.
**host**="0.0.0.0"
IP address for llama.cpp to listen on.
**image**="quay.io/ramalama/ramalama:latest"
OCI container image to run with the specified AI model
RAMALAMA_IMAGE environment variable overrides this field.
`[[ramalama.images]]`
HIP_VISIBLE_DEVICES = "quay.io/ramalama/rocm"
CUDA_VISIBLE_DEVICES = "quay.io/ramalama/cuda"
ASAHI_VISIBLE_DEVICES = "quay.io/ramalama/asahi"
INTEL_VISIBLE_DEVICES = "quay.io/ramalama/intel-gpu"
ASCEND_VISIBLE_DEVICES = "quay.io/ramalama/cann"
MUSA_VISIBLE_DEVICES = "quay.io/ramalama/musa"
VLLM = "registry.redhat.io/rhelai1/ramalama-vllm"
Alternative images to use when RamaLama recognizes specific hardware or user
specified vllm model runtime.
**keep_groups**=false
Pass `--group-add keep-groups` to podman, when using podman.
In some cases this is needed to access the gpu from a rootless container
**log_level**=warning
Set the logging level of RamaLama application.
Valid Values:
debug, info, warning, error, critical
:::note
--debug option overrides this field and forces the system to debug
:::
**max_tokens**=0
Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0).
This parameter is mapped to the appropriate runtime-specific parameter when executing models.
**ngl**=-1
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)
**prefix**=""
Specify default prefix for chat and run command. By default the prefix
is based on the container engine used.
| Container Engine| Prefix |
| --------------- | ------- |
| Podman | "🦭 > " |
| Docker | "🐋 > " |
| No Engine | "🦙 > " |
| No EMOJI support| "> " |
**port**="8080"
Specify initial port for a range of 101 ports for services to listen on.
If this port is unavailable, another free port from this range will be selected.
**pull**="newer"
- **always**: Always pull the image and throw an error if the pull fails.
- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails.
- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found.
- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found.
**rag_format**="qdrant"
Specify the default output format for output of the `ramalama rag` command.
Options: qdrant, json, markdown, milvus.
**rag_images**="quay.io/ramalama/ramalama-rag"
OCI container image to run with the specified AI model when using RAG content.
`[[ramalama.rag_images]]`
CUDA_VISIBLE_DEVICES = "quay.io/ramalama/cuda-rag"
HIP_VISIBLE_DEVICES = "quay.io/ramalama/rocm-rag"
INTEL_VISIBLE_DEVICES = "quay.io/ramalama/intel-gpu-rag"
GGML_VK_VISIBLE_DEVICES = "quay.io/ramalama/ramalama"
**runtime**="llama.cpp"
Specify the AI runtime to use; valid options are 'llama.cpp', 'vllm', and 'mlx' (default: llama.cpp)
Options: llama.cpp, vllm, mlx
**selinux**=false
SELinux container separation enforcement
**store**="$HOME/.local/share/ramalama"
Store AI Models in the specified directory
**summarize_after**=4
Automatically summarize conversation history after N messages to prevent context growth.
When enabled, ramalama will periodically condense older messages into a summary,
keeping only recent messages and the summary. This prevents the context from growing
indefinitely during long chat sessions. Set to 0 to disable (default: 4).
**temp**="0.8"
Temperature of the response from the AI Model
llama.cpp explains this as:
The lower the number is, the more deterministic the response.
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
**thinking**=true
Enable thinking mode on reasoning models
**threads**=-1
maximum number of cpu threads to use for inferencing
The default -1, uses the default of the underlying implementation
**transport**="ollama"
Specify the default transport to be used for pulling and pushing of AI Models.
Options: oci, ollama, huggingface.
RAMALAMA_TRANSPORT environment variable overrides this field.
`[[ramalama.http_client]]`
Http client configuration
**max_retries**=5
The maximum number of times to retry a failed download
**max_retry_delay**=30
The maximum delay between retry attempts in seconds
## RAMALAMA.USER TABLE
The ramalama.user table contains user preference settings.
`[[ramalama.user]]`
**no_missing_gpu_prompt**=false
Suppress the interactive prompt when running on macOS with a Podman VM that does not support GPU acceleration (e.g., applehv provider). When set to true, RamaLama will automatically proceed without GPU support instead of prompting the user for confirmation. This is useful for automation and scripting scenarios where interactive prompts are not desired.
Can also be set via the RAMALAMA_USER__NO_MISSING_GPU_PROMPT environment variable.

View File

@@ -1,40 +0,0 @@
---
title: OCI Spec
description: Configuration file reference
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-oci.5.md
---
# OCI Spec
# DESCRIPTION
RamaLamas `oci://` transport uses [OpenContainers image registries](https://github.com/opencontainers/distribution-spec) to store AI models.
Each model is stored in an ordinary [container image](https://github.com/opencontainers/image-spec) (currently not using a specialized OCI artifact).
The image is, structurally, a single-platform image (the top-level element is an OCI Image Manifest, not an OCI Image Index).
## Model Data
Because the AI model is stored in an image, not an artifact, the data is, like in all OCI images, wrapped in the standard tar layer format.
The contents of the image must contain a `/models/model.file` file (or, usually, a symbolic link),
which contains an AI model in GGUF format (consumable by `llama-server`).
## Metadata
The images config contains an `org.containers.type` label. The value of the label can be one of:
- `ai.image.model.raw`: The image contains only the AI model
- `ai.image.model.car`: The image also contains other software; more details of that software are currently unspecified in this document.
## Local Image Storage
The model image may be pulled into, or created in, Podmans local image storage.
In such a situation, to simplify identification of AI models,
the model image may be wrapped in an OCI index pointing at the AI model image,
and in the index, the manifests descriptor pointing at the AI model image contains an `org.cnai.model.model` annotation.
Note that the wrapping in an OCI index does not happen in all situations,
and in particular does not happen when RamaLama uses Docker instead of Podman.

View File

@@ -1,219 +0,0 @@
---
title: MACOS_INSTALL
description: RamaLama documentation
# This file is auto-generated from manpages. Do not edit manually.
# Source: MACOS_INSTALL.md
---
# MACOS_INSTALL
# macOS Installation Guide for RamaLama
This guide covers the different ways to install RamaLama on macOS.
## Method 1: Self-Contained Installer Package (Recommended)
The easiest way to install RamaLama on macOS is using our self-contained `.pkg` installer. This method includes Python and all dependencies, so you don't need to install anything else.
### Download and Install
1. Download the latest installer from the [Releases page](https://github.com/containers/ramalama/releases)
2. Double-click the downloaded `.pkg` file
3. Follow the installation wizard
Or via command line:
```bash
# Download the installer (replace VERSION with the actual version)
curl -LO https://github.com/containers/ramalama/releases/download/vVERSION/RamaLama-VERSION-macOS-Installer.pkg
# Verify the SHA256 checksum (optional but recommended)
curl -LO https://github.com/containers/ramalama/releases/download/vVERSION/RamaLama-VERSION-macOS-Installer.pkg.sha256
shasum -a 256 -c RamaLama-VERSION-macOS-Installer.pkg.sha256
# Install
sudo installer -pkg RamaLama-VERSION-macOS-Installer.pkg -target /
```
### What Gets Installed
The installer places files in:
- `/usr/local/bin/ramalama` - Main executable
- `/usr/local/share/ramalama/` - Configuration files
- `/usr/local/share/man/` - Man pages
- `/usr/local/share/bash-completion/` - Bash completions
- `/usr/local/share/fish/` - Fish completions
- `/usr/local/share/zsh/` - Zsh completions
### Verify Installation
```bash
# Check version
ramalama --version
# Get help
ramalama --help
```
## Method 2: Python Package (pip)
If you prefer to use Python package management:
```bash
# Install Python 3.10 or later (if not already installed)
brew install python@3.11
# Install ramalama
pip3 install ramalama
# Or install from source
git clone https://github.com/containers/ramalama.git
cd ramalama
pip3 install .
```
## Method 3: Build from Source
For developers or if you want the latest code:
```bash
# Clone the repository
git clone https://github.com/containers/ramalama.git
cd ramalama
# Install build dependencies
pip3 install build
# Build and install
make install
```
## Prerequisites
Before using RamaLama, you'll need a container engine:
### Option A: Podman (Recommended)
```bash
brew install podman
# Initialize Podman machine with libkrun for GPU access
podman machine init --provider libkrun
podman machine start
```
For more details, see [ramalama-macos(7)](/docs/platform-guides/macos).
### Option B: Docker
```bash
brew install docker
```
## Building the Installer Package (For Maintainers)
If you want to build the installer package yourself:
```bash
# Install PyInstaller
pip3 install pyinstaller
# Build the package
./scripts/build_macos_pkg.sh
# The built package will be in:
# build/macos-pkg/RamaLama-VERSION-macOS-Installer.pkg
```
## Uninstallation
To remove RamaLama:
```bash
# Remove the executable
sudo rm /usr/local/bin/ramalama
# Remove configuration and data files (optional)
sudo rm -rf /usr/local/share/ramalama
rm -rf ~/.local/share/ramalama
rm -rf ~/.config/ramalama
# Remove man pages (optional)
sudo rm /usr/local/share/man/man1/ramalama*.1
sudo rm /usr/local/share/man/man5/ramalama*.5
sudo rm /usr/local/share/man/man7/ramalama*.7
# Remove shell completions (optional)
sudo rm /usr/local/share/bash-completion/completions/ramalama
sudo rm /usr/local/share/fish/vendor_completions.d/ramalama.fish
sudo rm /usr/local/share/zsh/site-functions/_ramalama
```
## Troubleshooting
### "ramalama: command not found"
Make sure `/usr/local/bin` is in your PATH:
```bash
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
```
### "Cannot verify developer" warning
macOS may show a security warning for unsigned packages.
:::note
We're working on getting keys to sign it.
:::
To bypass:
1. Right-click the `.pkg` file
2. Select "Open"
3. Click "Open" in the dialog
### Podman machine issues
If Podman isn't working:
```bash
# Reset Podman machine
podman machine stop
podman machine rm
podman machine init --provider libkrun
podman machine start
```
## Getting Started
Once installed, try these commands:
```bash
# Check version
ramalama --version
# Pull a model
ramalama pull tinyllama
# Run a chatbot
ramalama run tinyllama
# Get help
ramalama --help
```
## Additional Resources
- [RamaLama Documentation](https://ramalama.ai)
- [GitHub Repository](https://github.com/containers/ramalama)
- [macOS-specific Documentation](/docs/platform-guides/macos)
- [Report Issues](https://github.com/containers/ramalama/issues)
## System Requirements
- macOS 10.15 (Catalina) or later
- Intel or Apple Silicon (M1/M2/M3) processor
- 4GB RAM minimum (8GB+ recommended for running models)
- 10GB free disk space
- Podman or Docker

View File

@@ -1,76 +0,0 @@
---
title: cann
description: Platform-specific setup guide
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-cann.7.md
---
# cann
# Setting Up RamaLama with Ascend NPU Support on Linux systems
This guide walks through the steps required to set up RamaLama with Ascend NPU support.
- [Background](#background)
- [Hardware](#hardware)
- [Model](#model)
- [Docker](#docker)
## Background
**Ascend NPU** is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars.
**CANN** (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for AI scenarios, providing support for multiple AI frameworks on the top and serving AI processors and programming at the bottom. It plays a crucial role in bridging the gap between upper and lower layers, and is a key platform for improving the computing efficiency of Ascend AI processors. Meanwhile, it offers a highly efficient and easy-to-use programming interface for diverse application scenarios, allowing users to rapidly build AI applications and services based on the Ascend platform.
## Hardware
### Ascend NPU
**Verified devices**
Table Supported Hardware List:
| Ascend NPU | Status |
| ----------------------------- | ------- |
| Atlas A2 Training series | Support |
| Atlas 800I A2 Inference series | Support |
*Notes:*
- If you have trouble with Ascend NPU device, please create an issue with **[CANN]** prefix/tag.
- If you are running successfully with an Ascend NPU device, please help update the "Supported Hardware List" table above.
## Model
Currently, Ascend NPU acceleration is only supported when the llama.cpp backend is selected. For supported models, please refer to the page [llama.cpp/backend/CANN.md](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/CANN.md).
## Docker
### Install the Ascend driver
This provides NPU acceleration using the AI cores of your Ascend NPU. And [CANN](https://www.hiascend.com/en/software/cann) is a hierarchical APIs to help you to quickly build AI applications and service based on Ascend NPU.
For more information about Ascend NPU in [Ascend Community](https://www.hiascend.com/en/).
Make sure to have the CANN toolkit installed. You can download it from here: [CANN Toolkit](https://www.hiascend.com/developer/download/community/result?module=cann)
Make sure the Ascend Docker runtime is installed. You can download it from here: [Ascend-docker-runtime](https://www.hiascend.com/document/detail/en/mindx-dl/300/dluserguide/clusterscheduling/dlug_installation_02_000025.html)
### Build Images
Go to `ramalama` directory and build using make.
```bash
make build IMAGE=cann
make install
```
You can test with:
```bash
export ASCEND_VISIBLE_DEVICES=0
ramalama --image quay.io/ramalama/cann:latest serve -d -p 8080 -name ollama://smollm:135m
```
In a window see the running podman container.
```bash
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
80fc31c131b0 quay.io/ramalama/cann:latest "/bin/bash -c 'expor…" About an hour ago Up About an hour ame
```
Other using guides see RamaLama ([README.md](https://github.com/containers/ramalama/blob/main/README.md))
---
*Mar 2025, Originally compiled*

View File

@@ -1,193 +0,0 @@
---
title: cuda
description: Platform-specific setup guide
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-cuda.7.md
---
# cuda
# Setting Up RamaLama with CUDA Support on Linux systems
This guide walks through the steps required to set up RamaLama with CUDA support.
## Install the NVIDIA Container Toolkit
Follow the installation instructions provided in the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
### Installation using dnf/yum (For RPM based distros like Fedora)
* Install the NVIDIA Container Toolkit packages
```bash
sudo dnf install -y nvidia-container-toolkit
```
:::note
The NVIDIA Container Toolkit is required on the host for running CUDA in containers.
:::
:::note
If the above installation is not working for you and you are running Fedora, try removing it and using the [COPR](https://copr.fedorainfracloud.org/coprs/g/ai-ml/nvidia-container-toolkit/).
:::
### Installation using APT (For Debian based distros like Ubuntu)
* Configure the Production Repository
```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
```
* Update the packages list from the repository
```bash
sudo apt-get update
```
* Install the NVIDIA Container Toolkit packages
```bash
sudo apt-get install -y nvidia-container-toolkit
```
:::note
The NVIDIA Container Toolkit is required for WSL to have CUDA resources while running a container.
:::
## Setting Up CUDA Support
For additional information see: [Support for Container Device Interface](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)
# Generate the CDI specification file
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
# Check the names of the generated devices
Open and edit the NVIDIA container runtime configuration:
```bash
nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all
```
:::note
Generate a new CDI specification after any configuration change most notably when the driver is upgraded!
:::
## Testing the Setup
**Based on this Documentation:** [Running a Sample Workload](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html)
---
# **Test the Installation**
Run the following command to verify setup:
```bash
podman run --rm --device=nvidia.com/gpu=all fedora nvidia-smi
```
# **Expected Output**
Verify everything is configured correctly, with output similar to this:
```text
Thu Dec 5 19:58:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.72 Driver Version: 566.14 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 On | 00000000:09:00.0 On | N/A |
| 34% 24C P5 31W / 380W | 867MiB / 10240MiB | 7% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 35 G /Xwayland N/A |
| 0 N/A N/A 35 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
```
:::note
On systems that have SELinux enabled, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command successfully from a container.
:::
To check the status of the boolean, run the following:
```bash
getsebool container_use_devices
```
If the result of the command shows that the boolean is `off`, run the following to turn the boolean on:
```bash
sudo setsebool -P container_use_devices 1
```
### CUDA_VISIBLE_DEVICES
RamaLama respects the `CUDA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by nvidia-smi.
You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands:
```bash
export CUDA_VISIBLE_DEVICES="0,1" # Use GPUs 0 and 1
ramalama run granite
```
This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.
If the `CUDA_VISIBLE_DEVICES` environment variable is set to an empty string, RamaLama will default to using the CPU.
```bash
export CUDA_VISIBLE_DEVICES="" # Defaults to CPU
ramalama run granite
```
To revert to using all available GPUs, unset the environment variable:
```bash
unset CUDA_VISIBLE_DEVICES
```
## Troubleshooting
### CUDA Updates
On some CUDA software updates, RamaLama stops working complaining about missing shared NVIDIA libraries for example:
```bash
ramalama run granite
Error: crun: cannot stat `/lib64/libEGL_nvidia.so.565.77`: No such file or directory: OCI runtime attempted to invoke a command that was not found
```
Because the CUDA version is updated, the CDI specification file needs to be recreated.
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
## See Also
[ramalama(1)](/docs/commands/ramalama/), [podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md)
---
*Jan 2025, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,67 +0,0 @@
---
title: macos
description: Platform-specific setup guide
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-macos.7.md
---
# macos
# Configure Podman Machine on Mac for GPU Acceleration
Leveraging GPU acceleration on a Mac with Podman requires the configuration of
the `libkrun` machine provider.
This can be done by either setting an environment variable or modifying the
`containers.conf` file. On MacOS, you'll likely need to create a new Podman
machine with libkrun to access the GPU.
Previously created Podman Machines must be recreated to take
advantage of the `libkrun` provider.
## Configuration Methods:
### containers.conf
Open the containers.conf file, typically located at $HOME/.config/containers/containers.conf.
Add the following line within the [machine] section: provider = "libkrun".
This change will persist across sessions.
### Environment Variable
Set the CONTAINERS_MACHINE_PROVIDER environment variable to libkrun. This will be a temporary change until you restart your terminal or session.
For example: export CONTAINERS_MACHINE_PROVIDER=libkrun
### ramalama.conf
RamaLama can also be run in a limited manner without using Containers, by
specifying the --nocontainer option. Open the ramalama.conf file, typically located at $HOME/.config/ramalama/ramalama.conf.
Add the following line within the [machine] section: `container = false`
This change will persist across sessions.
## Podman Desktop
Creating a Podman Machine with libkrun (MacOS):
Go to Settings > Resources in Podman Desktop.
In the Podman tile, click Create new.
In the Create a Podman machine screen, you can configure the machine's resources (CPU, Memory, Disk size) and enable Machine with root privileges if needed.
To use libkrun, ensure that the environment variable is set or the containers.conf file is configured before creating the machine.
Once the machine is created, Podman Desktop will manage the connection to the new machine.
## Important Notes:
On MacOS, `libkrun` is used to leverage the system's virtualization framework for running containers, and it requires a Podman machine to be created.
Refer to the [Podman Desktop documentation](https://podman-desktop.io/docs/podman/creating-a-podman-machine) for detailed instructions and troubleshooting tips.
## See Also
[ramalama(1)](/docs/commands/ramalama/), [podman-machine(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman-machine.1.md)
---
*Apr 2025, Originally compiled by Dan Walsh &lt;dwalsh&#64;redhat.com&gt;*

View File

@@ -1,83 +0,0 @@
---
title: musa
description: Platform-specific setup guide
# This file is auto-generated from manpages. Do not edit manually.
# Source: ramalama-musa.7.md
---
# musa
# Setting Up RamaLama with MUSA Support on Linux systems
This guide walks through the steps required to set up RamaLama with MUSA support.
## Install the MT Linux Driver
Download the appropriate [MUSA SDK](https://developer.mthreads.com/sdk/download/musa) and follow the installation instructions provided in the [MT Linux Driver installation guide](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/install_guide#2%E9%A9%B1%E5%8A%A8%E5%AE%89%E8%A3%85).
## Install the MT Container Toolkit
Obtain the latest [MT CloudNative Toolkits](https://developer.mthreads.com/sdk/download/CloudNative) and follow the installation instructions provided in the [MT Container Toolkit installation guide](https://docs.mthreads.com/cloud-native/cloud-native-doc-online/install_guide/#%E6%91%A9%E5%B0%94%E7%BA%BF%E7%A8%8B%E5%AE%B9%E5%99%A8%E8%BF%90%E8%A1%8C%E6%97%B6%E5%A5%97%E4%BB%B6).
## Setting Up MUSA Support
```bash
$ (cd /usr/bin/musa && sudo ./docker setup $PWD)
$ docker info | grep mthreads
Runtimes: mthreads mthreads-experimental runc
Default Runtime: mthreads
```
## Testing the Setup
# **Test the Installation**
Run the following command to verify setup:
```bash
docker run --rm --env MTHREADS_VISIBLE_DEVICES=all ubuntu:22.04 mthreads-gmi
```
# **Expected Output**
Verify everything is configured correctly, with output similar to this:
```text
Thu May 15 01:53:39 2025
---------------------------------------------------------------
mthreads-gmi:2.0.0 Driver Version:3.0.0
---------------------------------------------------------------
ID Name |PCIe |%GPU Mem
Device Type |Pcie Lane Width |Temp MPC Capable
| ECC Mode
+-------------------------------------------------------------+
0 MTT S80 |00000000:01:00.0 |0% 3419MiB(16384MiB)
Physical |16x(16x) |59C YES
| N/A
---------------------------------------------------------------
---------------------------------------------------------------
Processes:
ID PID Process name GPU Memory
Usage
+-------------------------------------------------------------+
No running processes found
---------------------------------------------------------------
```
### MUSA_VISIBLE_DEVICES
RamaLama respects the `MUSA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by mthreads-gmi.
You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands:
```bash
export MUSA_VISIBLE_DEVICES="0,1" # Use GPUs 0 and 1
ramalama run granite
```
This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.
---
*May 2025, Originally compiled by Xiaodong Ye &lt;yeahdongcn&#64;gmail.com&gt;*