From 6ffa25ff89f622576d7a481aa56c68ea1fbf89a8 Mon Sep 17 00:00:00 2001 From: John Wiele Date: Fri, 30 Jan 2026 08:31:30 -0500 Subject: [PATCH] Remove generated doc files. Clean the docsite directory as part of "make clean". Remove generated doc files from the repository; their presence creates a risk of publishing incorrect documentation. Signed-off-by: John Wiele --- Makefile | 3 +- docsite/docs/commands/ramalama/bench.mdx | 172 ----- docsite/docs/commands/ramalama/chat.mdx | 83 --- docsite/docs/commands/ramalama/containers.mdx | 81 --- docsite/docs/commands/ramalama/convert.mdx | 86 --- docsite/docs/commands/ramalama/daemon.mdx | 83 --- docsite/docs/commands/ramalama/info.mdx | 389 ------------ docsite/docs/commands/ramalama/inspect.mdx | 123 ---- docsite/docs/commands/ramalama/list.mdx | 62 -- docsite/docs/commands/ramalama/login.mdx | 76 --- docsite/docs/commands/ramalama/logout.mdx | 48 -- docsite/docs/commands/ramalama/perplexity.mdx | 185 ------ docsite/docs/commands/ramalama/pull.mdx | 50 -- docsite/docs/commands/ramalama/push.mdx | 81 --- docsite/docs/commands/ramalama/rag.mdx | 127 ---- docsite/docs/commands/ramalama/ramalama.mdx | 205 ------ docsite/docs/commands/ramalama/rm.mdx | 42 -- docsite/docs/commands/ramalama/run.mdx | 272 -------- docsite/docs/commands/ramalama/serve.mdx | 589 ------------------ docsite/docs/commands/ramalama/stop.mdx | 45 -- docsite/docs/commands/ramalama/version.mdx | 35 -- docsite/docs/configuration/conf.mdx | 267 -------- docsite/docs/configuration/ramalama-oci.mdx | 40 -- docsite/docs/misc/MACOS_INSTALL.mdx | 219 ------- docsite/docs/platform-guides/cann.mdx | 76 --- docsite/docs/platform-guides/cuda.mdx | 193 ------ docsite/docs/platform-guides/macos.mdx | 67 -- docsite/docs/platform-guides/musa.mdx | 83 --- 28 files changed, 2 insertions(+), 3780 deletions(-) delete mode 100644 docsite/docs/commands/ramalama/bench.mdx delete mode 100644 docsite/docs/commands/ramalama/chat.mdx delete mode 100644 docsite/docs/commands/ramalama/containers.mdx delete mode 100644 docsite/docs/commands/ramalama/convert.mdx delete mode 100644 docsite/docs/commands/ramalama/daemon.mdx delete mode 100644 docsite/docs/commands/ramalama/info.mdx delete mode 100644 docsite/docs/commands/ramalama/inspect.mdx delete mode 100644 docsite/docs/commands/ramalama/list.mdx delete mode 100644 docsite/docs/commands/ramalama/login.mdx delete mode 100644 docsite/docs/commands/ramalama/logout.mdx delete mode 100644 docsite/docs/commands/ramalama/perplexity.mdx delete mode 100644 docsite/docs/commands/ramalama/pull.mdx delete mode 100644 docsite/docs/commands/ramalama/push.mdx delete mode 100644 docsite/docs/commands/ramalama/rag.mdx delete mode 100644 docsite/docs/commands/ramalama/ramalama.mdx delete mode 100644 docsite/docs/commands/ramalama/rm.mdx delete mode 100644 docsite/docs/commands/ramalama/run.mdx delete mode 100644 docsite/docs/commands/ramalama/serve.mdx delete mode 100644 docsite/docs/commands/ramalama/stop.mdx delete mode 100644 docsite/docs/commands/ramalama/version.mdx delete mode 100644 docsite/docs/configuration/conf.mdx delete mode 100644 docsite/docs/configuration/ramalama-oci.mdx delete mode 100644 docsite/docs/misc/MACOS_INSTALL.mdx delete mode 100644 docsite/docs/platform-guides/cann.mdx delete mode 100644 docsite/docs/platform-guides/cuda.mdx delete mode 100644 docsite/docs/platform-guides/macos.mdx delete mode 100644 docsite/docs/platform-guides/musa.mdx diff --git a/Makefile b/Makefile index 4d836d0c..ac03cd09 100644 --- a/Makefile +++ b/Makefile @@ -257,5 +257,6 @@ clean: @find . -name \*# -delete @find . -name \*.rej -delete @find . -name \*.orig -delete - rm -rf $$(<.gitignore) make -C docs clean + make -C docsite clean clean-generated + rm -rf $$(<.gitignore) diff --git a/docsite/docs/commands/ramalama/bench.mdx b/docsite/docs/commands/ramalama/bench.mdx deleted file mode 100644 index 22b2a4bb..00000000 --- a/docsite/docs/commands/ramalama/bench.mdx +++ /dev/null @@ -1,172 +0,0 @@ ---- -title: bench -description: benchmark specified AI Model -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-bench.1.md ---- - -# bench - -## Synopsis -**ramalama bench** [*options*] *model* [arg ...] - -## MODEL TRANSPORTS - -| Transports | Prefix | Web Site | -| ------------- | ------ | --------------------------------------------------- | -| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`| -| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)| -| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)| -| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)| -| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) | -| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)| -|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)| - -RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS -environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. - -Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model. - -URL support means if a model is on a web site or even on your local system, you can run it directly. - -## Options - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--device** -Add a host device to the container. Optional permissions parameter can -be used to specify device permissions by combining r for read, w for -write, and m for mknod(2). - -Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm - -The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information. - -Pass '--device=none' explicitly add no device to the container, eg for -running a CPU-only performance comparison. - -#### **--env**= - -Set environment variables inside of the container. - -This option allows arbitrary environment variables that are available for the -process to be launched inside of the container. If an environment variable is -specified without a value, the container engine checks the host environment -for a value and set the variable only if it is set on the host. - -#### **--help**, **-h** -show this help message and exit - -#### **--image**=IMAGE -OCI container image to run with specified AI model. RamaLama defaults to using -images based on the accelerator it discovers. For example: -`quay.io/ramalama/ramalama`. See the table below for all default images. -The default image tag is based on the minor version of the RamaLama package. -Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default. - -The default can be overridden in the ramalama.conf file or via the -RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells -RamaLama to use the `quay.io/ramalama/aiimage:1.2` image. - -Accelerated images: - -| Accelerator | Image | -| ------------------------| -------------------------- | -| CPU, Apple | quay.io/ramalama/ramalama | -| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | -| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | -| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | -| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | -| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | -| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa | - -#### **--keep-groups** -pass --group-add keep-groups to podman (default: False) -If GPU device on host system is accessible to user via group access, this option leaks the groups into the container. - -#### **--name**, **-n** -name of the container to run the Model in - -#### **--network**=*none* -set the network mode for the container - -#### **--ngl** -number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1) -The default -1, means use whatever is automatically deemed appropriate (0 or 999) - -#### **--oci-runtime** - -Override the default OCI runtime used to launch the container. Container -engines like Podman and Docker, have their own default oci runtime that they -use. Using this option RamaLama will override these defaults. - -On Nvidia based GPU systems, RamaLama defaults to using the -`nvidia-container-runtime`. Use this option to override this selection. - -#### **--privileged** -By default, RamaLama containers are unprivileged (=false) and cannot, for -example, modify parts of the operating system. This is because by de‐ -fault a container is only allowed limited access to devices. A "privi‐ -leged" container is given the same access to devices as the user launch‐ -ing the container, with the exception of virtual consoles (/dev/tty\d+) -when running in systemd mode (--systemd=always). - -A privileged container turns off the security features that isolate the -container from the host. Dropped Capabilities, limited devices, read- -only mount points, Apparmor/SELinux separation, and Seccomp filters are -all disabled. Due to the disabled security features, the privileged -field should almost never be set as containers can easily break out of -confinement. - -Containers running in a user namespace (e.g., rootless containers) can‐ -not have more privileges than the user that launched them. - -#### **--pull**=*policy* - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -#### **--seed**= -Specify seed rather than using random seed model interaction - -#### **--selinux**=*true* -Enable SELinux container separation - -#### **--temp**="0.8" -Temperature of the response from the AI Model -llama.cpp explains this as: - - The lower the number is, the more deterministic the response. - - The higher the number is the more creative the response is, but more likely to hallucinate when set too high. - - Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories - -#### **--thinking**=*true* -Enable or disable thinking mode in reasoning models - -#### **--threads**, **-t** -Maximum number of cpu threads to use. -The default is to use half the cores available on this system for the number of threads. - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -## Description -Benchmark specified AI Model. - -## Examples - -```text -ramalama bench granite3-moe -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Jan 2025, Originally compiled by Eric Curtin <ecurtin@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/chat.mdx b/docsite/docs/commands/ramalama/chat.mdx deleted file mode 100644 index 8b96469c..00000000 --- a/docsite/docs/commands/ramalama/chat.mdx +++ /dev/null @@ -1,83 +0,0 @@ ---- -title: chat -description: OpenAI chat with the specified REST API URL -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-chat.1.md ---- - -# chat - -## Synopsis -**ramalama chat** [*options*] [arg...] - -positional arguments: - ARGS overrides the default prompt, and the output is - returned without entering the chatbot - -## Description -Chat with an OpenAI Rest API - -## Options - -#### **--api-key** -OpenAI-compatible API key. -Can also be set via the RAMALAMA_API_KEY environment variable. - -#### **--color** -Indicate whether or not to use color in the chat. -Possible values are "never", "always" and "auto". (default: auto) - -#### **--help**, **-h** -Show this help message and exit - -#### **--list** -List the available models at an endpoint - -#### **--mcp**=SERVER_URL -MCP (Model Context Protocol) servers to use for enhanced tool calling capabilities. -Can be specified multiple times to connect to multiple MCP servers. -Each server provides tools that can be automatically invoked during chat conversations. - -#### **--model**=MODEL -Model for inferencing (may not be required for endpoints that only serve one model) - -#### **--prefix** -Prefix for the user prompt (default: 🦭 > ) - -#### **--rag**=path -A file or directory of files to be loaded and provided as local context in the chat history. - -#### **--summarize-after**=*N* -Automatically summarize conversation history after N messages to prevent context growth. -When enabled, ramalama will periodically condense older messages into a summary, -keeping only recent messages and the summary. This prevents the context from growing -indefinitely during long chat sessions. Set to 0 to disable (default: 4). - -#### **--url**=URL -The host to send requests to (default: http://127.0.0.1:8080) - -## Examples - -Communicate with the default local OpenAI REST API. (http://127.0.0.1:8080) -With Podman containers. -```bash -$ ramalama chat -🦭 > - -Communicate with an alternative OpenAI REST API URL. With Docker containers. -$ ramalama chat --url http://localhost:1234 -🐋 > - -Send multiple lines at once -$ ramalama chat -🦭 > Hi \ -🦭 > tell me a funny story \ -🦭 > please -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Jun 2025, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/containers.mdx b/docsite/docs/commands/ramalama/containers.mdx deleted file mode 100644 index da454395..00000000 --- a/docsite/docs/commands/ramalama/containers.mdx +++ /dev/null @@ -1,81 +0,0 @@ ---- -title: containers -description: list all RamaLama containers -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-containers.1.md ---- - -# containers - -## Synopsis -**ramalama containers** [*options*] - -**ramalama ps** [*options*] - -## Description -List all containers running AI Models - -Command conflicts with the --nocontainer option. - -## Options - -#### **--format**=*format* -pretty-print containers to JSON or using a Go template - -Valid placeholders for the Go template are listed below: - -| **Placeholder** | **Description** | -|--------------------|----------------------------------------------| -| .Command | Quoted command used | -| .Created ... | Creation time for container, Y-M-D H:M:S | -| .CreatedAt | Creation time for container (same as above) | -| .CreatedHuman | Creation time, relative | -| .ExitCode | Container exit code | -| .Exited | "true" if container has exited | -| .ExitedAt | Time (epoch seconds) that container exited | -| .ExposedPorts ... | Map of exposed ports on this container | -| .ID | Container ID | -| .Image | Image Name/ID | -| .ImageID | Image ID | -| .Label *string* | Specified label of the container | -| .Labels ... | All the labels assigned to the container | -| .Names | Name of container | -| .Networks | Show all networks connected to the container | -| .Pid | Process ID on host system | -| .Ports | Forwarded and exposed ports | -| .RunningFor | Time elapsed since container was started | -| .Size | Size of container | -| .StartedAt | Time (epoch seconds) the container started | -| .State | Human-friendly description of ctr state | -| .Status | Status of container | - -#### **--help**, **-h** -Print usage message - -#### **--no-trunc** -Display the extended information - -#### **--noheading**, **-n** -Do not print heading - -## EXAMPLE - -```bash -$ ramalama containers -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 5 hours ago Up 5 hours 0.0.0.0:8080->8080/tcp ramalama_s3Oh6oDfOP -85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 4 minutes ago Exited (0) 4 minutes ago granite-server -``` - -```bash -$ ramalama ps --noheading --format "{{ .Names }}" -ramalama_s3Oh6oDfOP -granite-server -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/convert.mdx b/docsite/docs/commands/ramalama/convert.mdx deleted file mode 100644 index ded4de71..00000000 --- a/docsite/docs/commands/ramalama/convert.mdx +++ /dev/null @@ -1,86 +0,0 @@ ---- -title: convert -description: convert AI Models from local storage to OCI Image -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-convert.1.md ---- - -# convert - -## Synopsis -**ramalama convert** [*options*] *model* [*target*] - -## Description -Convert specified AI Model to an OCI Formatted AI Model - -The model can be from RamaLama model storage in Huggingface, Ollama, or a local model stored on disk. Converting from an OCI model is not supported. - -:::note - The convert command must be run with containers. Use of the --nocontainer option is not allowed. -::: - -## Options - -#### **--gguf**=*Q2_K* | *Q3_K_S* | *Q3_K_M* | *Q3_K_L* | *Q4_0* | *Q4_K_S* | *Q4_K_M* | *Q5_0* | *Q5_K_S* | *Q5_K_M* | *Q6_K* | *Q8_0* - -Convert Safetensor models into a GGUF with the specified quantization format. To learn more about model quantization, read llama.cpp documentation: -https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md - -#### **--help**, **-h** -Print usage message - -#### **--image**=IMAGE -Image to use for model quantization when converting to GGUF format (when the `--gguf` option has been specified). The image must have the -`llama-quantize` executable available on the `PATH`. Defaults to the appropriate `ramalama` image based on available accelerators. If no -accelerators are available, the current `quay.io/ramalama/ramalama` image will be used. - -#### **--network**=*none* -sets the configuration for network namespaces when handling RUN instructions - -#### **--pull**=*policy* -Pull image policy. The default is **missing**. - -#### **--rag-image**=IMAGE -Image to use when converting to GGUF format (when then `--gguf` option has been specified). The image must have the `convert_hf_to_gguf.py` script -executable and available in the `PATH`. The script is available from the `llama.cpp` GitHub repo. Defaults to the current -`quay.io/ramalama/ramalama-rag` image. - -#### **--type**="artifact" | *raw* | *car* - -Convert the MODEL to the specified OCI Object - -| Type | Description | -| -------- | ------------------------------------------------------------- | -| artifact | Store AI Models as artifacts | -| car | Traditional OCI image including base image with the model stored in a /models subdir | -| raw | Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / | - -## EXAMPLE - -Generate an oci model out of an Ollama model. -```bash -$ ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest -Building quay.io/rhatdan/tiny:latest... -STEP 1/2: FROM scratch -STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model ---> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344 -COMMIT quay.io/rhatdan/tiny:latest ---> 69db4a10191c -Successfully tagged quay.io/rhatdan/tiny:latest -69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344 -``` - -Generate and run an oci model with a quantized GGUF converted from Safetensors. -```bash -$ ramalama convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest -Converting /Users/kugupta/.local/share/ramalama/models/huggingface/ibm-granite/granite-3.2-2b-instruct to quay.io/kugupta/granite-3.2-q4-k-m:latest... -Building quay.io/kugupta/granite-3.2-q4-k-m:latest... -$ ramalama run oci://quay.io/kugupta/granite-3.2-q4-k-m:latest -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/), [ramalama-push(1)](/docs/commands/ramalama/push) - ---- - -*Aug 2024, Originally compiled by Eric Curtin <ecurtin@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/daemon.mdx b/docsite/docs/commands/ramalama/daemon.mdx deleted file mode 100644 index 6718897b..00000000 --- a/docsite/docs/commands/ramalama/daemon.mdx +++ /dev/null @@ -1,83 +0,0 @@ ---- -title: daemon -description: run a RamaLama REST server -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-daemon.1.md ---- - -# daemon - -## Synopsis -**ramalama daemon** [*options*] [start|run] - -## Description -Inspect the specified AI Model about additional information -like the repository, its metadata and tensor information. - -## Options - -#### **--help**, **-h** -Print usage message - -## COMMANDS - -#### **start** -pepares to run a new RamaLama REST server so it will be run either inside a RamaLama container or on the host - -#### **run** -start a new RamaLama REST server - -## Examples - -Inspect the smollm:135m model for basic information -```bash -$ ramalama inspect smollm:135m -smollm:135m - Path: /var/lib/ramalama/models/ollama/smollm:135m - Registry: ollama - Format: GGUF - Version: 3 - Endianness: little - Metadata: 39 entries - Tensors: 272 entries -``` - -Inspect the smollm:135m model for all information in json format -```bash -$ ramalama inspect smollm:135m --all --json -{ - "Name": "smollm:135m", - "Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m", - "Registry": "ollama", - "Format": "GGUF", - "Version": 3, - "LittleEndian": true, - "Metadata": { - "general.architecture": "llama", - "general.base_model.0.name": "SmolLM 135M", - "general.base_model.0.organization": "HuggingFaceTB", - "general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M", - ... - }, - "Tensors": [ - { - "dimensions": [ - 576, - 49152 - ], - "n_dimensions": 2, - "name": "token_embd.weight", - "offset": 0, - "type": 8 - }, - ... - ] -} -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Feb 2025, Originally compiled by Michael Engel <mengel@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/info.mdx b/docsite/docs/commands/ramalama/info.mdx deleted file mode 100644 index dc19d582..00000000 --- a/docsite/docs/commands/ramalama/info.mdx +++ /dev/null @@ -1,389 +0,0 @@ ---- -title: info -description: display RamaLama configuration information -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-info.1.md ---- - -# info - -## Synopsis -**ramalama info** [*options*] - -## Description -Display configuration information in a json format. - -## Options - -#### **--help**, **-h** -show this help message and exit - -## FIELDS - -The `Accelerator` field indicates the accelerator type for the machine. - -The `Config` field shows the list of paths to RamaLama configuration files used. - -The `Engine` field indicates the OCI container engine used to launch the container in which to run the AI Model - -The `Image` field indicates the default container image in which to run the AI Model - -The `Inference` field lists the currently used inference engine as well as a list of available engine specification and schema files used for model inference. -For example: - - - `llama.cpp` - - `vllm` - - `mlx` - -The `Selinux` field indicates if SELinux is activated or not. - -The `Shortnames` field shows the used list of configuration files specifying AI Model short names as well as the merged list of shortnames. - -The `Store` field indicates the directory path where RamaLama stores its persistent data, including downloaded models, configuration files, and cached data. By default, this is located in the user's local share directory. - -The `UseContainer` field indicates whether RamaLama will use containers or run the AI Models natively. - -The `Version` field shows the RamaLama version. - -## EXAMPLE - -Info with no container engine -```bash -$ ramalama info -{ - "Accelerator": "cuda", - "Engine": { - "Name": "" - }, - "Image": "quay.io/ramalama/cuda:0.7", - "Inference": { - "Default": "llama.cpp", - "Engines": { - "llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml", - "mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml", - "vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml" - }, - "Schema": { - "1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json" - } - }, - "Shortnames": { - "Names": { - "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf", - "deepseek": "ollama://deepseek-r1", - "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf", - "gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf", - "gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf", - "gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf", - "gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf", - "gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf", - "granite": "ollama://granite3.1-dense", - "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf", - "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf", - "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf", - "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf", - "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf", - "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf", - "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite:2b": "ollama://granite3.1-dense:2b", - "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite:8b": "ollama://granite3.1-dense:8b", - "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf", - "ibm/granite": "ollama://granite3.1-dense:8b", - "ibm/granite:2b": "ollama://granite3.1-dense:2b", - "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "ibm/granite:8b": "ollama://granite3.1-dense:8b", - "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf", - "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf", - "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf", - "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf", - "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf", - "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf", - "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf", - "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf", - "smollm:135m": "ollama://smollm:135m", - "tiny": "ollama://tinyllama" - }, - "Files": [ - "/usr/share/ramalama/shortnames.conf", - "/home/dwalsh/.config/ramalama/shortnames.conf", - ] - }, - "Store": "/usr/share/ramalama", - "UseContainer": true, - "Version": "0.7.5" -} -``` - -Info with Podman engine -```bash -$ ramalama info -{ - "Accelerator": "cuda", - "Engine": { - "Info": { - "host": { - "arch": "amd64", - "buildahVersion": "1.39.4", - "cgroupControllers": [ - "cpu", - "io", - "memory", - "pids" - ], - "cgroupManager": "systemd", - "cgroupVersion": "v2", - "conmon": { - "package": "conmon-2.1.13-1.fc42.x86_64", - "path": "/usr/bin/conmon", - "version": "conmon version 2.1.13, commit: " - }, - "cpuUtilization": { - "idlePercent": 97.36, - "systemPercent": 0.64, - "userPercent": 2 - }, - "cpus": 32, - "databaseBackend": "sqlite", - "distribution": { - "distribution": "fedora", - "variant": "workstation", - "version": "42" - }, - "eventLogger": "journald", - "freeLocks": 2043, - "hostname": "danslaptop", - "idMappings": { - "gidmap": [ - { - "container_id": 0, - "host_id": 3267, - "size": 1 - }, - { - "container_id": 1, - "host_id": 524288, - "size": 65536 - } - ], - "uidmap": [ - { - "container_id": 0, - "host_id": 3267, - "size": 1 - }, - { - "container_id": 1, - "host_id": 524288, - "size": 65536 - } - ] - }, - "kernel": "6.14.2-300.fc42.x86_64", - "linkmode": "dynamic", - "logDriver": "journald", - "memFree": 65281908736, - "memTotal": 134690979840, - "networkBackend": "netavark", - "networkBackendInfo": { - "backend": "netavark", - "dns": { - "package": "aardvark-dns-1.14.0-1.fc42.x86_64", - "path": "/usr/libexec/podman/aardvark-dns", - "version": "aardvark-dns 1.14.0" - }, - "package": "netavark-1.14.1-1.fc42.x86_64", - "path": "/usr/libexec/podman/netavark", - "version": "netavark 1.14.1" - }, - "ociRuntime": { - "name": "crun", - "package": "crun-1.21-1.fc42.x86_64", - "path": "/usr/bin/crun", - "version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/3267/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL" - }, - "os": "linux", - "pasta": { - "executable": "/bin/pasta", - "package": "passt-0^20250415.g2340bbf-1.fc42.x86_64", - "version": "" - }, - "remoteSocket": { - "exists": true, - "path": "/run/user/3267/podman/podman.sock" - }, - "rootlessNetworkCmd": "pasta", - "security": { - "apparmorEnabled": false, - "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT", - "rootless": true, - "seccompEnabled": true, - "seccompProfilePath": "/usr/share/containers/seccomp.json", - "selinuxEnabled": true - }, - "serviceIsRemote": false, - "slirp4netns": { - "executable": "/bin/slirp4netns", - "package": "slirp4netns-1.3.1-2.fc42.x86_64", - "version": "slirp4netns version 1.3.1\ncommit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.5.5" - }, - "swapFree": 8589930496, - "swapTotal": 8589930496, - "uptime": "116h 35m 40.00s (Approximately 4.83 days)", - "variant": "" - }, - "plugins": { - "authorization": null, - "log": [ - "k8s-file", - "none", - "passthrough", - "journald" - ], - "network": [ - "bridge", - "macvlan", - "ipvlan" - ], - "volume": [ - "local" - ] - }, - "registries": { - "search": [ - "registry.fedoraproject.org", - "registry.access.redhat.com", - "docker.io" - ] - }, - "store": { - "configFile": "/home/dwalsh/.config/containers/storage.conf", - "containerStore": { - "number": 5, - "paused": 0, - "running": 0, - "stopped": 5 - }, - "graphDriverName": "overlay", - "graphOptions": {}, - "graphRoot": "/usr/share/containers/storage", - "graphRootAllocated": 2046687182848, - "graphRootUsed": 399990419456, - "graphStatus": { - "Backing Filesystem": "btrfs", - "Native Overlay Diff": "true", - "Supports d_type": "true", - "Supports shifting": "false", - "Supports volatile": "true", - "Using metacopy": "false" - }, - "imageCopyTmpDir": "/var/tmp", - "imageStore": { - "number": 297 - }, - "runRoot": "/run/user/3267/containers", - "transientStore": false, - "volumePath": "/usr/share/containers/storage/volumes" - }, - "version": { - "APIVersion": "5.4.2", - "BuildOrigin": "Fedora Project", - "Built": 1743552000, - "BuiltTime": "Tue Apr 1 19:00:00 2025", - "GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff", - "GoVersion": "go1.24.1", - "Os": "linux", - "OsArch": "linux/amd64", - "Version": "5.4.2" - } - }, - "Name": "podman" - }, - "Image": "quay.io/ramalama/cuda:0.7", - "Inference": { - "Default": "llama.cpp", - "Engines": { - "llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml", - "mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml", - "vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml" - }, - "Schema": { - "1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json" - } - }, - "Shortnames": { - "Names": { - "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf", - "deepseek": "ollama://deepseek-r1", - "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf", - "gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf", - "gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf", - "gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf", - "gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf", - "gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf", - "granite": "ollama://granite3.1-dense", - "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf", - "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf", - "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf", - "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf", - "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf", - "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf", - "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite:2b": "ollama://granite3.1-dense:2b", - "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "granite:8b": "ollama://granite3.1-dense:8b", - "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf", - "ibm/granite": "ollama://granite3.1-dense:8b", - "ibm/granite:2b": "ollama://granite3.1-dense:2b", - "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf", - "ibm/granite:8b": "ollama://granite3.1-dense:8b", - "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf", - "mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf", - "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf", - "mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf", - "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf", - "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf", - "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf", - "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf", - "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf", - "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf", - "smollm:135m": "ollama://smollm:135m", - "tiny": "ollama://tinyllama" - }, - "Files": [ - "/usr/share/ramalama/shortnames.conf", - "/home/dwalsh/.config/ramalama/shortnames.conf", - ] - }, - "Store": "/usr/share/ramalama", - "UseContainer": true, - "Version": "0.7.5" -} -``` - -Using jq to print specific `ramalama info` content. -```bash -$ ramalama info | jq .Shortnames.Names.mixtao -"huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf" -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Oct 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/inspect.mdx b/docsite/docs/commands/ramalama/inspect.mdx deleted file mode 100644 index 2b49c32c..00000000 --- a/docsite/docs/commands/ramalama/inspect.mdx +++ /dev/null @@ -1,123 +0,0 @@ ---- -title: inspect -description: inspect the specified AI Model -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-inspect.1.md ---- - -# inspect - -## Synopsis -**ramalama inspect** [*options*] *model* - -## Description -Inspect the specified AI Model about additional information -like the repository, its metadata and tensor information. - -## Options - -#### **--all** -Print all available information about the AI Model. -By default, only a basic subset is printed. - -#### **--get**=*field* -Print the value of a specific metadata field of the AI Model. -This option supports autocomplete with the available metadata -fields of the given model. -The special value `all` will print all available metadata -fields and values. - -#### **--help**, **-h** -Print usage message - -#### **--json** -Print the AI Model information in json format. - -## Examples - -Inspect the smollm:135m model for basic information -```bash -$ ramalama inspect smollm:135m -smollm:135m - Path: /var/lib/ramalama/models/ollama/smollm:135m - Registry: ollama - Format: GGUF - Version: 3 - Endianness: little - Metadata: 39 entries - Tensors: 272 entries -``` - -Inspect the smollm:135m model for all information in json format -```bash -$ ramalama inspect smollm:135m --all --json -{ - "Name": "smollm:135m", - "Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m", - "Registry": "ollama", - "Format": "GGUF", - "Version": 3, - "LittleEndian": true, - "Metadata": { - "general.architecture": "llama", - "general.base_model.0.name": "SmolLM 135M", - "general.base_model.0.organization": "HuggingFaceTB", - "general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M", - ... - }, - "Tensors": [ - { - "dimensions": [ - 576, - 49152 - ], - "n_dimensions": 2, - "name": "token_embd.weight", - "offset": 0, - "type": 8 - }, - ... - ] -} -``` - -Use the autocomplete function of `--get` to view a list of fields: -```bash -$ ramalama inspect smollm:135m --get general. -general.architecture general.languages -general.base_model.0.name general.license -general.base_model.0.organization general.name -general.base_model.0.repo_url general.organization -general.base_model.count general.quantization_version -general.basename general.size_label -general.datasets general.tags -general.file_type general.type -general.finetune -``` - -Print the value of a specific field of the smollm:135m model: -```bash -$ ramalama inspect smollm:135m --get tokenizer.chat_template -{% for message in messages %}{{'<|im_start|>' + message['role'] + ' -' + message['content'] + '<|im_end|>' + ' -'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant -' }}{% endif %} -``` - -Print all key-value pairs of the metadata of the smollm:135m model: -```bash -$ ramalama inspect smollm:135m --get all -general.architecture: llama -general.base_model.0.name: SmolLM 135M -general.base_model.0.organization: HuggingFaceTB -general.base_model.0.repo_url: https://huggingface.co/HuggingFaceTB/SmolLM-135M -general.base_model.count: 1 -... -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Feb 2025, Originally compiled by Michael Engel <mengel@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/list.mdx b/docsite/docs/commands/ramalama/list.mdx deleted file mode 100644 index 302c4bb1..00000000 --- a/docsite/docs/commands/ramalama/list.mdx +++ /dev/null @@ -1,62 +0,0 @@ ---- -title: list -description: list all downloaded AI Models -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-list.1.md ---- - -# list - -## Synopsis -**ramalama list** [*options*] - -**ramalama ls** [*options*] - -## Description -List all the AI Models in local storage - -## Options - -#### **--all** -include partially downloaded Models - -#### **--help**, **-h** -show this help message and exit - -#### **--json** -print Model list in json format - -#### **--noheading**, **-n** -do not print heading - -#### **--order** -order used to sort the AI Models. Valid options are 'asc' and 'desc' - -#### **--sort** -field used to sort the AI Models. Valid options are 'name', 'size', and 'modified'. - -## Examples - -List all Models downloaded to users homedir -```bash -$ ramalama list -NAME MODIFIED SIZE -ollama://smollm:135m 16 hours ago 5.5M -huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M -ollama://granite-code:3b (partial) 5 days ago 1.9G -ollama://granite-code:latest 1 day ago 1.9G -ollama://moondream:latest 6 days ago 791M -``` - -List all Models in json format -```bash -$ ramalama list --json -{"models": [{"name": "oci://quay.io/mmortari/gguf-py-example/v1/example.gguf", "modified": 427330, "size": "4.0K"}, {"name": "huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf", "modified": 427333, "size": "460M"}, {"name": "ollama://smollm:135m", "modified": 420833, "size": "5.5M"}, {"name": "ollama://mistral:latest", "modified": 433998, "size": "3.9G"}, {"name": "ollama://granite-code:latest", "modified": 2180483, "size": "1.9G"}, {"name": "ollama://tinyllama:latest", "modified": 364870, "size": "609M"}, {"name": "ollama://tinyllama:1.1b", "modified": 364866, "size": "609M"}]} -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/login.mdx b/docsite/docs/commands/ramalama/login.mdx deleted file mode 100644 index 9fb91989..00000000 --- a/docsite/docs/commands/ramalama/login.mdx +++ /dev/null @@ -1,76 +0,0 @@ ---- -title: login -description: login to remote registry -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-login.1.md ---- - -# login - -## Synopsis -**ramalama login** [*options*] [*registry*] - -## Description -login to remote model registry - -By default, RamaLama uses the Ollama registry transport. You can override this default by configuring the `ramalama.conf` file or setting the `RAMALAMA_TRANSPORTS` environment variable. Ensure a registry transport is set before attempting to log in. - -## Options -Options are specific to registry types. - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--help**, **-h** -show this help message and exit - -#### **--password**, **-p**=*password* -password for registry - -#### **--password-stdin** -take the password from stdin - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -#### **--token**=*token* -token to be passed to Model registry - -#### **--username**, **-u**=*username* -username for registry - -## Examples - -Login to quay.io/username oci registry -```bash -$ export RAMALAMA_TRANSPORT=quay.io/username -$ ramalama login -u username -``` - -Login to ollama registry -```bash -$ export RAMALAMA_TRANSPORT=ollama -$ ramalama login -``` - -Login to huggingface registry -```bash -$ export RAMALAMA_TRANSPORT=huggingface -$ ramalama login --token=XYZ -``` -Logging in to Hugging Face requires the `hf` tool. For installation and usage instructions, see the documentation of the Hugging Face command line interface: [*https://huggingface.co/docs/huggingface_hub/en/guides/cli*](https://huggingface.co/docs/huggingface_hub/en/guides/cli). - -Login to ModelScope registry -```bash -$ export RAMALAMA_TRANSPORT=modelscope -$ ramalama login --token=XYZ -``` - -Logging in to ModelScope requires the `modelscope` tool. For installation and usage instructions, see the documentation of the ModelScope command line interface: [*https://www.modelscope.cn/docs/Beginner-s-Guide/Environment-Setup*](https://www.modelscope.cn/docs/Beginner-s-Guide/Environment-Setup). - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/logout.mdx b/docsite/docs/commands/ramalama/logout.mdx deleted file mode 100644 index c0f948ac..00000000 --- a/docsite/docs/commands/ramalama/logout.mdx +++ /dev/null @@ -1,48 +0,0 @@ ---- -title: logout -description: logout from remote registry -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-logout.1.md ---- - -# logout - -## Synopsis -**ramalama logout** [*options*] [*registry*] - -## Description -Logout to remote model registry - -## Options - -Options are specific to registry types. - -#### **--help**, **-h** -Print usage message - -#### **--token** - -Token to be passed to Model registry - -## EXAMPLE - -Logout to quay.io/username oci repository -```bash -$ ramalama logout quay.io/username -``` - -Logout from ollama repository -```bash -$ ramalama logout ollama -``` - -Logout from huggingface -```bash -$ ramalama logout huggingface -``` -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/perplexity.mdx b/docsite/docs/commands/ramalama/perplexity.mdx deleted file mode 100644 index 71034210..00000000 --- a/docsite/docs/commands/ramalama/perplexity.mdx +++ /dev/null @@ -1,185 +0,0 @@ ---- -title: perplexity -description: calculate the perplexity value of an AI Model -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-perplexity.1.md ---- - -# perplexity - -## Synopsis -**ramalama perplexity** [*options*] *model* [arg ...] - -## MODEL TRANSPORTS - -| Transports | Prefix | Web Site | -| ------------- | ------ | --------------------------------------------------- | -| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`| -| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)| -| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)| -| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)| -| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) | -| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)| -|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)| - -RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS -environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. - -Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model. - -URL support means if a model is on a web site or even on your local system, you can run it directly. - -## Options - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--cache-reuse**=256 -Min chunk size to attempt reusing from the cache via KV shifting - -#### **--ctx-size**, **-c** -size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model) - -#### **--device** -Add a host device to the container. Optional permissions parameter can -be used to specify device permissions by combining r for read, w for -write, and m for mknod(2). - -Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm - -The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information. - -#### **--env**= - -Set environment variables inside of the container. - -This option allows arbitrary environment variables that are available for the -process to be launched inside of the container. If an environment variable is -specified without a value, the container engine checks the host environment -for a value and set the variable only if it is set on the host. - -#### **--help**, **-h** -show this help message and exit - -#### **--image**=IMAGE -OCI container image to run with specified AI model. RamaLama defaults to using -images based on the accelerator it discovers. For example: -`quay.io/ramalama/ramalama`. See the table below for all default images. -The default image tag is based on the minor version of the RamaLama package. -Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default. - -The default can be overridden in the ramalama.conf file or via the -RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells -RamaLama to use the `quay.io/ramalama/aiimage:1.2` image. - -Accelerated images: - -| Accelerator | Image | -| ------------------------| -------------------------- | -| CPU, Apple | quay.io/ramalama/ramalama | -| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | -| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | -| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | -| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | -| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | -| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa | - -#### **--keep-groups** -pass --group-add keep-groups to podman (default: False) -If GPU device on host system is accessible to user via group access, this option leaks the groups into the container. - -#### **--max-tokens**=*integer* -Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0). -This parameter is mapped to the appropriate runtime-specific parameter: -- llama.cpp: `-n` parameter -- MLX: `--max-tokens` parameter -- vLLM: `--max-tokens` parameter - -#### **--name**, **-n** -name of the container to run the Model in - -#### **--network**=*none* -set the network mode for the container - -#### **--ngl** -number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1) -The default -1, means use whatever is automatically deemed appropriate (0 or 999) - -#### **--oci-runtime** - -Override the default OCI runtime used to launch the container. Container -engines like Podman and Docker, have their own default oci runtime that they -use. Using this option RamaLama will override these defaults. - -On Nvidia based GPU systems, RamaLama defaults to using the -`nvidia-container-runtime`. Use this option to override this selection. - -#### **--privileged** -By default, RamaLama containers are unprivileged (=false) and cannot, for -example, modify parts of the operating system. This is because by de‐ -fault a container is only allowed limited access to devices. A "privi‐ -leged" container is given the same access to devices as the user launch‐ -ing the container, with the exception of virtual consoles (/dev/tty\d+) -when running in systemd mode (--systemd=always). - -A privileged container turns off the security features that isolate the -container from the host. Dropped Capabilities, limited devices, read- -only mount points, Apparmor/SELinux separation, and Seccomp filters are -all disabled. Due to the disabled security features, the privileged -field should almost never be set as containers can easily break out of -confinement. - -Containers running in a user namespace (e.g., rootless containers) can‐ -not have more privileges than the user that launched them. - -#### **--pull**=*policy* - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -#### **--runtime-args**="*args*" -Add *args* to the runtime (llama.cpp or vllm) invocation. - -#### **--seed**= -Specify seed rather than using random seed model interaction - -#### **--selinux**=*true* -Enable SELinux container separation - -#### **--temp**="0.8" -Temperature of the response from the AI Model -llama.cpp explains this as: - - The lower the number is, the more deterministic the response. - - The higher the number is the more creative the response is, but more likely to hallucinate when set too high. - - Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories - -#### **--thinking**=*true* -Enable or disable thinking mode in reasoning models - -#### **--threads**, **-t** -Maximum number of cpu threads to use. -The default is to use half the cores available on this system for the number of threads. - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -## Description -Calculate the perplexity of an AI Model. Perplexity measures how well the model can predict the next token with lower values being better. - -## Examples - -```text -ramalama perplexity granite3-moe -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Jan 2025, Originally compiled by Eric Curtin <ecurtin@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/pull.mdx b/docsite/docs/commands/ramalama/pull.mdx deleted file mode 100644 index 62061a9b..00000000 --- a/docsite/docs/commands/ramalama/pull.mdx +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: pull -description: pull AI Models from Model registries to local storage -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-pull.1.md ---- - -# pull - -## Synopsis -**ramalama pull** [*options*] *model* - -## Description -Pull specified AI Model into local storage - -## Options - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--help**, **-h** -Print usage message - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -#### **--verify**=*true* -verify the model after pull, disable to allow pulling of models with different endianness - -## PROXY SUPPORT - -RamaLama supports HTTP, HTTPS, and SOCKS proxies via standard environment variables: - -- **HTTP_PROXY** or **http_proxy**: Proxy for HTTP connections -- **HTTPS_PROXY** or **https_proxy**: Proxy for HTTPS connections -- **NO_PROXY** or **no_proxy**: Comma-separated list of hosts to bypass proxy - -Example proxy URL formats: -- HTTP/HTTPS: `http://proxy.example.com:8080` or `https://proxy.example.com:8443` -- SOCKS4: `socks4://proxy.example.com:1080` -- SOCKS5: `socks5://proxy.example.com:1080` or `socks5h://proxy.example.com:1080` (DNS through proxy) - -SOCKS proxy support requires the PySocks library (`pip install PySocks`). - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/push.mdx b/docsite/docs/commands/ramalama/push.mdx deleted file mode 100644 index 7afd4332..00000000 --- a/docsite/docs/commands/ramalama/push.mdx +++ /dev/null @@ -1,81 +0,0 @@ ---- -title: push -description: push AI Models from local storage to remote registries -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-push.1.md ---- - -# push - -## Synopsis -**ramalama push** [*options*] *model* [*target*] - -## Description -Push specified AI Model (OCI-only at present) - -The model can be from RamaLama model storage in Huggingface, Ollama, or OCI Model format. -The model can also just be a model stored on disk. - -Users can convert without pushing using the `ramalama convert` command. - -## Options - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--help**, **-h** -Print usage message - -#### **--network**=*none* -sets the configuration for network namespaces when handling RUN instructions - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -#### **--type**=*raw* | *car* - -type of OCI Model Image to push. - -| Type | Description | -| ---- | ------------------------------------------------------------- | -| car | Includes base image with the model stored in a /models subdir | -| raw | Only the model and a link file model.file to it stored at / | - -Only supported for pushing OCI Model Images. - -## EXAMPLE - -Push and OCI model to registry -```bash -$ ramalama push oci://quay.io/rhatdan/tiny:latest -Pushing quay.io/rhatdan/tiny:latest... -Getting image source signatures -Copying blob e0166756db86 skipped: already exists -Copying config ebe856e203 done | -Writing manifest to image destination -``` - -Generate an oci model out of an Ollama model and push to registry -```bash -$ ramalama push ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest -Building quay.io/rhatdan/tiny:latest... -STEP 1/2: FROM scratch -STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model ---> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344 -COMMIT quay.io/rhatdan/tiny:latest ---> 69db4a10191c -Successfully tagged quay.io/rhatdan/tiny:latest -69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344 -Pushing quay.io/rhatdan/tiny:latest... -Getting image source signatures -Copying blob e0166756db86 skipped: already exists -Copying config 69db4a1019 done | -Writing manifest to image destination -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/), [ramalama-convert(1)](/docs/commands/ramalama/convert) - ---- - -*Aug 2024, Originally compiled by Eric Curtin <ecurtin@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/rag.mdx b/docsite/docs/commands/ramalama/rag.mdx deleted file mode 100644 index a7cfdabc..00000000 --- a/docsite/docs/commands/ramalama/rag.mdx +++ /dev/null @@ -1,127 +0,0 @@ ---- -title: rag -description: generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-rag.1.md ---- - -# rag - -## Synopsis -**ramalama rag** [options] [path ...] image - -## Description -Generate RAG data from provided documents and convert into an OCI Image. This command uses a specific container image containing the docling -tool to convert the specified content into a RAG vector database. If the image does not exist locally, RamaLama will pull the image -down and launch a container to process the data. - -:::note - this command does not work without a container engine. -::: - -positional arguments: - - *PATH* Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML, - AsciiDoc & Markdown formatted files to be processed. - Can be specified multiple times. - - *DESTINATION* Path or OCI Image name to contain processed rag data - -## Options - -#### **--env**= - -Set environment variables inside of the container. - -This option allows arbitrary environment variables that are available for the -process to be launched inside of the container. If an environment variable is -specified without a value, the container engine checks the host environment -for a value and set the variable only if it is set on the host. - -#### **--format**=*json* | *markdown* | *qdrant* | -Convert documents into the following formats - -| Type | Description | -| ------- | ---------------------------------------------------- | -| json | JavaScript Object Notation. lightweight format for exchanging data | -| markdown| Lightweight markup language using plain text editing | -| qdrant | Retrieval-Augmented Generation (RAG) Vector database Qdrant distribution | -| milvus | Retrieval-Augmented Generation (RAG) Vector database Milvus distribution | - -#### **--help**, **-h** -Print usage message - -#### **--image**=IMAGE -OCI container image to run with specified AI model. RamaLama defaults to using -images based on the accelerator it discovers. For example: -`quay.io/ramalama/ramalama-rag`. See the table below for all default images. -The default image tag is based on the minor version of the RamaLama package. -Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default. - -The default can be overridden in the ramalama.conf file or via the -RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells -RamaLama to use the `quay.io/ramalama/aiimage:1.2` image. - -Accelerated images: - -| Accelerator | Image | -| ------------------------| ------------------------------ | -| CPU, Apple | quay.io/ramalama/ramalama-rag | -| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm-rag | -| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda-rag | -| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi-rag | -| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu-rag | -| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann-rag | -| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa-rag | - -#### **--keep-groups** -pass --group-add keep-groups to podman (default: False) -If GPU device on host system is accessible to user via group access, this option leaks the groups into the container. - -#### **--network**=*none* -sets the configuration for network namespaces when handling RUN instructions - -#### **--ocr** -Sets the Docling OCR flag. OCR stands for Optical Character Recognition and is used to extract text from images within PDFs converting it into raw text that an LLM can understand. This feature is useful if the PDF's one is converting has a lot of embedded images with text. This process uses a great amount of RAM so the default is false. - -#### **--pull**=*policy* -Pull image policy. The default is **missing**. - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -#### **--selinux**=*true* -Enable SELinux container separation - -## Examples - -```bash -$ ramalama rag ./README.md https://github.com/containers/podman/blob/main/README.md quay.io/rhatdan/myrag -100% |███████████████████████████████████████████████████████| 114.00 KB/ 0.00 B 922.89 KB/s 59m 59s -Building quay.io/ramalama/myrag... -adding vectordb... -c857ebc65c641084b34e39b740fdb6a2d9d2d97be320e6aa9439ed0ab8780fe0 -``` - -```bash -$ ramalama rag --ocr README.md https://mysight.edu/document quay.io/rhatdan/myrag -``` - -```bash -$ ramalama rag --format markdown /tmp/internet.pdf /tmp/output -$ ls /tmp/output/docs/tmp/ -/tmp/output/docs/tmp/internet.md -$ ramalama rag --format json /tmp/internet.pdf /tmp/output -$ ls /tmp/output/docs/tmp/ -/tmp/output/docs/tmp/internet.md -/tmp/output/docs/tmp/internet.json -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Dec 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/ramalama.mdx b/docsite/docs/commands/ramalama/ramalama.mdx deleted file mode 100644 index 19ebf040..00000000 --- a/docsite/docs/commands/ramalama/ramalama.mdx +++ /dev/null @@ -1,205 +0,0 @@ ---- -title: ramalama -description: Simple management tool for working with AI Models -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama.1.md ---- - -# ramalama - -## Synopsis -**ramalama** [*options*] *command* - -## Description -RamaLama : The goal of RamaLama is to make AI boring. - -RamaLama tool facilitates local management and serving of AI Models. - -On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present. - -RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup. - -Running in containers eliminates the need for users to configure the host -system for AI. After the initialization, RamaLama runs the AI Models within a -container based on the OCI image. RamaLama pulls container image specific to -the GPUs discovered on the host system. These images are tied to the minor -version of RamaLama. For example RamaLama version 1.2.3 on an NVIDIA system -pulls quay.io/ramalama/cuda:1.2. To override the default image use the -`--image` option. - -RamaLama pulls AI Models from model registries. Starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images. - -When both Podman and Docker are installed, RamaLama defaults to Podman, The `RAMALAMA_CONTAINER_ENGINE=docker` environment variable can override this behaviour. When neither are installed RamaLama attempts to run the model with software on the local system. - -:::note - On MacOS systems that use Podman for containers, configure the Podman machine to use the `libkrun` machine provider. The `libkrun` provider enables containers within the Podman Machine access to the Mac's GPU. See [ramalama-macos(7)](/docs/platform-guides/macos) for further information. -::: - -:::note - On systems with NVIDIA GPUs, see [ramalama-cuda(7)](/docs/platform-guides/cuda) to correctly configure the host system. -::: - -RamaLama CLI defaults can be modified via ramalama.conf files. Default settings for flags are defined in [ramalama.conf(5)](/docs/configuration/conf). - -## SECURITY - -### Test and run your models more securely - -Because RamaLama defaults to running AI models inside of rootless containers using Podman on Docker. These containers isolate the AI models from information on the underlying host. With RamaLama containers, the AI model is mounted as a volume into the container in read/only mode. This results in the process running the model, llama.cpp or vLLM, being isolated from the host. In addition, since `ramalama run` uses the --network=none option, the container can not reach the network and leak any information out of the system. Finally, containers are run with --rm options which means that any content written during the running of the container is wiped out when the application exits. - -### Here’s how RamaLama delivers a robust security footprint: - - ✅ Container Isolation – AI models run within isolated containers, preventing direct access to the host system. - ✅ Read-Only Volume Mounts – The AI model is mounted in read-only mode, meaning that processes inside the container cannot modify host files. - ✅ No Network Access – ramalama run is executed with --network=none, meaning the model has no outbound connectivity for which information can be leaked. - ✅ Auto-Cleanup – Containers run with --rm, wiping out any temporary data once the session ends. - ✅ Drop All Linux Capabilities – No access to Linux capabilities to attack the underlying host. - ✅ No New Privileges – Linux Kernel feature which disables container processes from gaining additional privileges. - -## MODEL TRANSPORTS - -RamaLama supports multiple AI model registries types called transports. Supported transports: - -| Transports | Prefix | Web Site | -| ------------- | ------ | --------------------------------------------------- | -| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`| -| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)| -| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)| -| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)| -| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) | -| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)| -|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)| - -RamaLama uses to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS -environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. - -Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model. - -URL support means if a model is on a web site or even on your local system, you can run it directly. - -ramalama pull `huggingface://`afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf - -ramalama run `file://`$HOME/granite-7b-lab-Q4_K_M.gguf - -To make it easier for users, RamaLama uses shortname files, which container -alias names for fully specified AI Models allowing users to specify the shorter -names when referring to models. RamaLama reads shortnames.conf files if they -exist . These files contain a list of name value pairs for specification of -the model. The following table specifies the order which RamaLama reads the files -. Any duplicate names that exist override previously defined shortnames. - -| Shortnames type | Path | -| --------------- | ---------------------------------------- | -| Distribution | /usr/share/ramalama/shortnames.conf | -| Local install | /usr/local/share/ramalama/shortnames.conf | -| Administrators | /etc/ramamala/shortnames.conf | -| Users | $HOME/.config/ramalama/shortnames.conf | - -```toml -$ cat /usr/share/ramalama/shortnames.conf -[shortnames] - "tiny" = "ollama://tinyllama" - "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" - "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" - "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf" - "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" - "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf" -... -``` -**ramalama [GLOBAL OPTIONS]** - -## GLOBAL OPTIONS - -#### **--debug** -print debug messages - -#### **--dryrun** -show container runtime command without executing it (default: False) - -#### **--engine** -run RamaLama using the specified container engine. Default is `podman` if installed otherwise docker. -The default can be overridden in the ramalama.conf file or via the RAMALAMA_CONTAINER_ENGINE environment variable. - -#### **--help**, **-h** -show this help message and exit - -#### **--nocontainer** -Do not run RamaLama workloads in containers (default: False) -The default can be overridden in the ramalama.conf file. - -:::note - OCI images cannot be used with the --nocontainer option. This option disables the following features: Automatic GPU acceleration, containerized environment isolation, and dynamic resource allocation. For a complete list of affected features, please see the RamaLama documentation at [link-to-feature-list]. -::: - -#### **--quiet** -Decrease output verbosity. - -#### **--runtime**=*llama.cpp* | *vllm* -specify the runtime to use, valid options are 'llama.cpp' and 'vllm' (default: llama.cpp) -The default can be overridden in the ramalama.conf file. - -#### **--store**=STORE -store AI Models in the specified directory (default rootless: `$HOME/.local/share/ramalama`, default rootful: `/var/lib/ramalama`) -The default can be overridden in the ramalama.conf file. - -## COMMANDS - -| Command | Description | -| ------------------------------------------------- | ---------------------------------------------------------- | -| [ramalama-bench(1)](/docs/commands/ramalama/bench) |benchmark specified AI Model| -| [ramalama-chat(1)](/docs/commands/ramalama/chat) |OpenAI chat with the specified REST API URL| -| [ramalama-containers(1)](/docs/commands/ramalama/containers)|list all RamaLama containers| -| [ramalama-convert(1)](/docs/commands/ramalama/convert) |convert AI Models from local storage to OCI Image| -| [ramalama-daemon(1)](/docs/commands/ramalama/daemon) |run a RamaLama REST server| -| [ramalama-info(1)](/docs/commands/ramalama/info) |display RamaLama configuration information| -| [ramalama-inspect(1)](/docs/commands/ramalama/inspect) |inspect the specified AI Model| -| [ramalama-list(1)](/docs/commands/ramalama/list) |list all downloaded AI Models| -| [ramalama-login(1)](/docs/commands/ramalama/login) |login to remote registry| -| [ramalama-logout(1)](/docs/commands/ramalama/logout) |logout from remote registry| -| [ramalama-perplexity(1)](/docs/commands/ramalama/perplexity)|calculate the perplexity value of an AI Model| -| [ramalama-pull(1)](/docs/commands/ramalama/pull) |pull AI Models from Model registries to local storage| -| [ramalama-push(1)](/docs/commands/ramalama/push) |push AI Models from local storage to remote registries| -| [ramalama-rag(1)](/docs/commands/ramalama/rag) |generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image| -| [ramalama-rm(1)](/docs/commands/ramalama/rm) |remove AI Models from local storage| -| [ramalama-run(1)](/docs/commands/ramalama/run) |run specified AI Model as a chatbot| -| [ramalama-serve(1)](/docs/commands/ramalama/serve) |serve REST API on specified AI Model| -| [ramalama-stop(1)](/docs/commands/ramalama/stop) |stop named container that is running AI Model| -| [ramalama-version(1)](/docs/commands/ramalama/version) |display version of RamaLama| - -## CONFIGURATION FILES - -**ramalama.conf** (`/usr/share/ramalama/ramalama.conf`, `/etc/ramalama/ramalama.conf`, `/etc/ramalama/ramalama.conf.d/*.conf`, `$HOME/.config/ramalama/ramalama.conf`, `$HOME/.config/ramalama/ramalama.conf.d/*.conf`) - -RamaLama has builtin defaults for command line options. These defaults can be overridden using the ramalama.conf configuration files. - -Distributions ship the `/usr/share/ramalama/ramalama.conf` file with their default settings. Administrators can override fields in this file by creating the `/etc/ramalama/ramalama.conf` file. Users can further modify defaults by creating the `$HOME/.config/ramalama/ramalama.conf` file. RamaLama merges its builtin defaults with the specified fields from these files, if they exist. Fields specified in the users file override the administrator's file, which overrides the distribution's file, which override the built-in defaults. - -RamaLama uses builtin defaults if no ramalama.conf file is found. - -If the **RAMALAMA_CONFIG** environment variable is set, then its value is used for the ramalama.conf file rather than the default. - -## ENVIRONMENT VARIABLES - -RamaLama default behaviour can also be overridden via environment variables, -although the recommended way is to use the ramalama.conf file. - -| ENV Name | Description | -| ------------------------- | ------------------------------------------ | -| HTTP_PROXY, http_proxy | proxy URL for HTTP connections | -| HTTPS_PROXY, https_proxy | proxy URL for HTTPS connections | -| NO_PROXY, no_proxy | comma-separated list of hosts to bypass proxy (e.g., localhost,127.0.0.1,.local) | -| RAMALAMA_CONFIG | specific configuration file to be used | -| RAMALAMA_CONTAINER_ENGINE | container engine (Podman/Docker) to use | -| RAMALAMA_FORCE_EMOJI | define whether `ramalama run` uses EMOJI | -| RAMALAMA_IMAGE | container image to use for serving AI Model| -| RAMALAMA_IN_CONTAINER | Run RamaLama in the default container | -| RAMALAMA_STORE | location to store AI Models | -| RAMALAMA_TRANSPORT | default AI Model transport (ollama, huggingface, OCI) | -| TMPDIR | directory for temporary files. Defaults to /var/tmp if unset.| - -## See Also -[podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md), **docker(1)**, [ramalama.conf(5)](/docs/configuration/conf), [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama-macos(7)](/docs/platform-guides/macos) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/rm.mdx b/docsite/docs/commands/ramalama/rm.mdx deleted file mode 100644 index eee75391..00000000 --- a/docsite/docs/commands/ramalama/rm.mdx +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: rm -description: remove AI Models from local storage -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-rm.1.md ---- - -# rm - -## Synopsis -**ramalama rm** [*options*] *model* [...] - -## Description -Specify one or more AI Models to be removed from local storage - -## Options - -#### **--all**, **-a** -remove all local Models - -#### **--help**, **-h** -show this help message and exit - -#### **--ignore** -ignore errors when specified Model does not exist - -## Examples - -```bash -$ ramalama rm ollama://tinyllama - -$ ramalama rm --all - -$ ramalama rm --ignore bogusmodel -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/run.mdx b/docsite/docs/commands/ramalama/run.mdx deleted file mode 100644 index 4dd9cbc7..00000000 --- a/docsite/docs/commands/ramalama/run.mdx +++ /dev/null @@ -1,272 +0,0 @@ ---- -title: run -description: run specified AI Model as a chatbot -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-run.1.md ---- - -# run - -## Synopsis -**ramalama run** [*options*] *model* [arg ...] - -## MODEL TRANSPORTS - -| Transports | Prefix | Web Site | -| ------------- | ------ | --------------------------------------------------- | -| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`| -| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)| -| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)| -| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)| -| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) | -| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)| -|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)| - -RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS -environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. - -Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model. - -URL support means if a model is on a web site or even on your local system, you can run it directly. - -## Options - -#### **--api**=**llama-stack** | none** -unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.(default: none) -The default can be overridden in the `ramalama.conf` file. - -#### **--authfile**=*password* -path of the authentication file for OCI registries - -#### **--cache-reuse**=256 -Min chunk size to attempt reusing from the cache via KV shifting - -#### **--color** -Indicate whether or not to use color in the chat. -Possible values are "never", "always" and "auto". (default: auto) - -#### **--ctx-size**, **-c** -size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model) - -#### **--device** -Add a host device to the container. Optional permissions parameter can -be used to specify device permissions by combining r for read, w for -write, and m for mknod(2). - -Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm - -The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information. - -Pass '--device=none' explicitly add no device to the container, eg for -running a CPU-only performance comparison. - -#### **--env**= - -Set environment variables inside of the container. - -This option allows arbitrary environment variables that are available for the -process to be launched inside of the container. If an environment variable is -specified without a value, the container engine checks the host environment -for a value and set the variable only if it is set on the host. - -#### **--help**, **-h** -Show this help message and exit - -#### **--image**=IMAGE -OCI container image to run with specified AI model. RamaLama defaults to using -images based on the accelerator it discovers. For example: -`quay.io/ramalama/ramalama`. See the table below for all default images. -The default image tag is based on the minor version of the RamaLama package. -Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default. - -The default can be overridden in the `ramalama.conf` file or via the -RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells -RamaLama to use the `quay.io/ramalama/aiimage:1.2` image. - -Accelerated images: - -| Accelerator | Image | -| ------------------------| -------------------------- | -| CPU, Apple | quay.io/ramalama/ramalama | -| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | -| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | -| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | -| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | -| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | -| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa | - -#### **--keep-groups** -pass --group-add keep-groups to podman (default: False) -If GPU device on host system is accessible to user via group access, this option leaks the groups into the container. - -#### **--keepalive** -duration to keep a model loaded (e.g. 5m) - -#### **--max-tokens**=*integer* -Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0). -This parameter is mapped to the appropriate runtime-specific parameter: -- llama.cpp: `-n` parameter -- MLX: `--max-tokens` parameter -- vLLM: `--max-tokens` parameter - -#### **--mcp**=SERVER_URL -MCP (Model Context Protocol) servers to use for enhanced tool calling capabilities. -Can be specified multiple times to connect to multiple MCP servers. -Each server provides tools that can be automatically invoked during chat conversations. - -#### **--name**, **-n** -name of the container to run the Model in - -#### **--network**=*none* -set the network mode for the container - -#### **--ngl** -number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1) -The default -1, means use whatever is automatically deemed appropriate (0 or 999) - -#### **--oci-runtime** - -Override the default OCI runtime used to launch the container. Container -engines like Podman and Docker, have their own default oci runtime that they -use. Using this option RamaLama will override these defaults. - -On Nvidia based GPU systems, RamaLama defaults to using the -`nvidia-container-runtime`. Use this option to override this selection. - -#### **--port**, **-p**=*port* -Port for AI Model server to listen on (default: 8080) - -The default can be overridden in the `ramalama.conf` file. - -#### **--prefix** -Prefix for the user prompt (default: 🦭 > ) - -#### **--privileged** -By default, RamaLama containers are unprivileged (=false) and cannot, for -example, modify parts of the operating system. This is because by de‐ -fault a container is only allowed limited access to devices. A "privi‐ -leged" container is given the same access to devices as the user launch‐ -ing the container, with the exception of virtual consoles (/dev/tty\d+) -when running in systemd mode (--systemd=always). - -A privileged container turns off the security features that isolate the -container from the host. Dropped Capabilities, limited devices, read- -only mount points, Apparmor/SELinux separation, and Seccomp filters are -all disabled. Due to the disabled security features, the privileged -field should almost never be set as containers can easily break out of -confinement. - -Containers running in a user namespace (e.g., rootless containers) can‐ -not have more privileges than the user that launched them. - -#### **--pull**=*policy* -Pull image policy. The default is **missing**. - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -#### **--rag**= -Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image containing a RAG database - -#### **--rag-image**= -The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which -will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses. - -#### **--runtime-args**="*args*" -Add *args* to the runtime (llama.cpp or vllm) invocation. - -#### **--seed**= -Specify seed rather than using random seed model interaction - -#### **--selinux**=*true* -Enable SELinux container separation - -#### **--summarize-after**=*N* -Automatically summarize conversation history after N messages to prevent context growth. -When enabled, ramalama will periodically condense older messages into a summary, -keeping only recent messages and the summary. This prevents the context from growing -indefinitely during long chat sessions. Set to 0 to disable (default: 4). - -#### **--temp**="0.8" -Temperature of the response from the AI Model -llama.cpp explains this as: - - The lower the number is, the more deterministic the response. - - The higher the number is the more creative the response is, but more likely to hallucinate when set too high. - - Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories - -#### **--thinking**=*true* -Enable or disable thinking mode in reasoning models - -#### **--threads**, **-t** -Maximum number of cpu threads to use. -The default is to use half the cores available on this system for the number of threads. - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -## Description -Run specified AI Model as a chat bot. RamaLama pulls specified AI Model from -registry if it does not exist in local storage. By default a prompt for a chat -bot is started. When arguments are specified, the arguments will be given -to the AI Model and the output returned without entering the chatbot. - -## Examples - -Run command without arguments starts a chatbot -```text -ramalama run granite -> -``` - -Run command with local downloaded model for 10 minutes -```text -ramalama run --keepalive 10m file:///tmp/mymodel -> -``` - -Run command with a custom port to allow multiple models running simultaneously -```text -ramalama run --port 8081 granite -> -``` - -```text -ramalama run merlinite "when is the summer solstice" -The summer solstice, which is the longest day of the year, will happen on June ... -``` - -Run command with a custom prompt and a file passed by the stdin -```text -cat file.py | ramalama run quay.io/USER/granite-code:1.0 'what does this program do?' - -This program is a Python script that allows the user to interact with a terminal. ... - [end of text] -``` - -Run command and send multiple lines at once to the chatbot by adding a backslash `\` -at the end of the line -$ ramalama run granite -🦭 > Hi \ -🦭 > tell me a funny story \ -🦭 > please - -## Exit Codes: - -0 Success -124 RamaLama command did not exit within the keepalive time. - -## NVIDIA CUDA Support - -See [ramalama-cuda(7)](/docs/platform-guides/cuda) for setting up the host Linux system for CUDA support. - -## See Also -[ramalama(1)](/docs/commands/ramalama/), [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama.conf(5)](/docs/configuration/conf) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/serve.mdx b/docsite/docs/commands/ramalama/serve.mdx deleted file mode 100644 index bd9c2d24..00000000 --- a/docsite/docs/commands/ramalama/serve.mdx +++ /dev/null @@ -1,589 +0,0 @@ ---- -title: serve -description: serve REST API on specified AI Model -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-serve.1.md ---- - -# serve - -## Synopsis -**ramalama serve** [*options*] _model_ - -## Description -Serve specified AI Model as a chat bot. RamaLama pulls specified AI Model from -registry if it does not exist in local storage. - -## MODEL TRANSPORTS - -| Transports | Prefix | Web Site | -| ------------- | ------ | --------------------------------------------------- | -| URL based | https://, http://, file:// | `https://web.site/ai.model`, `file://tmp/ai.model`| -| HuggingFace | huggingface://, hf://, hf.co/ | [`huggingface.co`](https://www.huggingface.co)| -| ModelScope | modelscope://, ms:// | [`modelscope.cn`](https://modelscope.cn/)| -| Ollama | ollama:// | [`ollama.com`](https://www.ollama.com)| -| OCI Container Registries | oci:// | [`opencontainers.org`](https://opencontainers.org)| -| rlcr | rlcr:// | [`ramalama.com`](https://registry.ramalama.com) | -|||Examples: [`quay.io`](https://quay.io), [`Docker Hub`](https://docker.io),[`Artifactory`](https://artifactory.com)| - -RamaLama defaults to the Ollama registry transport. This default can be overridden in the `ramalama.conf` file or via the RAMALAMA_TRANSPORTS -environment. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport. - -Modify individual model transports by specifying the `huggingface://`, `oci://`, `ollama://`, `https://`, `http://`, `file://` prefix to the model. - -URL support means if a model is on a web site or even on your local system, you can run it directly. - -## REST API ENDPOINTS -Under the hood, `ramalama-serve` uses the `llama.cpp` HTTP server by default. When using `--runtime=vllm`, it uses the vLLM server. When using `--runtime=mlx`, it uses the MLX LM server. - -For REST API endpoint documentation, see: -- llama.cpp: [https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#api-endpoints](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#api-endpoints) -- vLLM: [https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) -- MLX LM: [https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md](https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md) - -## Options - -#### **--add-to-unit** - -format: --add-to-unit section:key:value - -Adds to the generated unit file (quadlet) in the section *section* the key *key* with the value *value*. - -Useful, for instance, to add environment variables to the generated unit file, or to place the container in a specific pod/network (Container:Network:xxx.network). - -**Only valid with *--generate* parameter.** - -Section, key and value are required and must be separated by colons. - -#### **--api**=**llama-stack** | none** -Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.(default: none) -The default can be overridden in the `ramalama.conf` file. - -#### **--authfile**=*password* -Path of the authentication file for OCI registries - -#### **--cache-reuse**=256 -Min chunk size to attempt reusing from the cache via KV shifting - -#### **--ctx-size**, **-c** -size of the prompt context. This option is also available as **--max-model-len**. Applies to llama.cpp and vllm regardless of alias (default: 4096, 0 = loaded from model) - -#### **--detach**, **-d** -Run the container in the background and print the new container ID. -The default is TRUE. The --nocontainer option forces this option to False. - -Use the `ramalama stop` command to stop the container running the served ramalama Model. - -#### **--device** -Add a host device to the container. Optional permissions parameter can -be used to specify device permissions by combining r for read, w for -write, and m for mknod(2). - -Example: --device=/dev/dri/renderD128:/dev/xvdc:rwm - -The device specification is passed directly to the underlying container engine. See documentation of the supported container engine for more information. - -Pass '--device=none' explicitly add no device to the container, eg for -running a CPU-only performance comparison. - -#### **--dri**=*on* | *off* -Enable or disable mounting `/dev/dri` into the container when running with `--api=llama-stack` (enabled by default). Use to prevent access to the host device when not required, or avoid errors in environments where `/dev/dri` is not available. - -#### **--env**= - -Set environment variables inside of the container. - -This option allows arbitrary environment variables that are available for the -process to be launched inside of the container. If an environment variable is -specified without a value, the container engine checks the host environment -for a value and set the variable only if it is set on the host. - -#### **--generate**=type -Generate specified configuration format for running the AI Model as a service - -| Key | Description | -| ------------ | -------------------------------------------------------------------------| -| quadlet | Podman supported container definition for running AI Model under systemd | -| kube | Kubernetes YAML definition for running the AI Model as a service | -| quadlet/kube | Kubernetes YAML definition for running the AI Model as a service and Podman supported container definition for running the Kube YAML specified pod under systemd| -| compose | Compose YAML definition for running the AI Model as a service | - -Optionally, an output directory for the generated files can be specified by -appending the path to the type, e.g. `--generate kube:/etc/containers/systemd`. - -#### **--help**, **-h** -show this help message and exit - -#### **--host**="0.0.0.0" -IP address for llama.cpp to listen on. - -#### **--image**=IMAGE -OCI container image to run with specified AI model. RamaLama defaults to using -images based on the accelerator it discovers. For example: -`quay.io/ramalama/ramalama`. See the table above for all default images. -The default image tag is based on the minor version of the RamaLama package. -Version 0.16.0 of RamaLama pulls an image with a `:0.16` tag from the quay.io/ramalama OCI repository. The --image option overrides this default. - -The default can be overridden in the `ramalama.conf` file or via the -RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells -RamaLama to use the `quay.io/ramalama/aiimage:1.2` image. - -Accelerated images: - -| Accelerator | Image | -| ------------------------| -------------------------- | -| CPU, Apple | quay.io/ramalama/ramalama | -| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | -| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | -| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | -| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | -| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | -| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa | - -#### **--keep-groups** -pass --group-add keep-groups to podman (default: False) -If GPU device on host system is accessible to user via group access, this option leaks the groups into the container. - -#### **--max-tokens**=*integer* -Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0). -This parameter is mapped to the appropriate runtime-specific parameter: -- llama.cpp: `-n` parameter -- MLX: `--max-tokens` parameter -- vLLM: `--max-tokens` parameter - -#### **--model-draft** - -A draft model is a smaller, faster model that helps accelerate the decoding -process of larger, more complex models, like Large Language Models (LLMs). It -works by generating candidate sequences of tokens that the larger model then -verifies and refines. This approach, often referred to as speculative decoding, -can significantly improve the speed of inferencing by reducing the number of -times the larger model needs to be invoked. - -Use --runtime-arg to pass the other draft model related parameters. -Make sure the sampling parameters like top_k on the web UI are set correctly. - -#### **--name**, **-n** -Name of the container to run the Model in. - -#### **--network**=*""* -set the network mode for the container - -#### **--ngl** -number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1) -The default -1, means use whatever is automatically deemed appropriate (0 or 999) - -#### **--oci-runtime** - -Override the default OCI runtime used to launch the container. Container -engines like Podman and Docker, have their own default oci runtime that they -use. Using this option RamaLama will override these defaults. - -On Nvidia based GPU systems, RamaLama defaults to using the -`nvidia-container-runtime`. Use this option to override this selection. - -#### **--port**, **-p** -port for AI Model server to listen on. It must be available. If not specified, -a free port in the 8080-8180 range is selected, starting with 8080. - -The default can be overridden in the `ramalama.conf` file. - -#### **--privileged** -By default, RamaLama containers are unprivileged (=false) and cannot, for -example, modify parts of the operating system. This is because by de‐ -fault a container is only allowed limited access to devices. A "privi‐ -leged" container is given the same access to devices as the user launch‐ -ing the container, with the exception of virtual consoles (/dev/tty\d+) -when running in systemd mode (--systemd=always). - -A privileged container turns off the security features that isolate the -container from the host. Dropped Capabilities, limited devices, read- -only mount points, Apparmor/SELinux separation, and Seccomp filters are -all disabled. Due to the disabled security features, the privileged -field should almost never be set as containers can easily break out of -confinement. - -Containers running in a user namespace (e.g., rootless containers) can‐ -not have more privileges than the user that launched them. - -#### **--pull**=*policy* - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -#### **--rag**= -Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image containing a RAG database - -:::note - RAG support requires AI Models be run within containers, --nocontainer not supported. Docker does not support image mounting, meaning Podman support required. -::: - -#### **--rag-image**= -The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which -will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses. - -#### **--runtime-args**="*args*" -Add *args* to the runtime (llama.cpp or vllm) invocation. - -#### **--seed**= -Specify seed rather than using random seed model interaction - -#### **--selinux**=*true* -Enable SELinux container separation - -#### **--temp**="0.8" -Temperature of the response from the AI Model. -llama.cpp explains this as: - - The lower the number is, the more deterministic the response. - - The higher the number is the more creative the response is, but more likely to hallucinate when set too high. - - Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories - -#### **--thinking**=*true* -Enable or disable thinking mode in reasoning models - -#### **--threads**, **-t** -Maximum number of cpu threads to use. -The default is to use half the cores available on this system for the number of threads. - -#### **--tls-verify**=*true* -require HTTPS and verify certificates when contacting OCI registries - -#### **--webui**=*on* | *off* -Enable or disable the web UI for the served model (enabled by default). When set to "on" (the default), the web interface is properly initialized. When set to "off", the `--no-webui` option is passed to the llama-server command to disable the web interface. - -## Examples -### Run two AI Models at the same time. Notice both are running within Podman Containers. -```bash -$ ramalama serve -d -p 8080 --name mymodel ollama://smollm:135m -09b0e0d26ed28a8418fb5cd0da641376a08c435063317e89cf8f5336baf35cfa - -$ ramalama serve -d -n example --port 8081 oci://quay.io/mmortari/gguf-py-example/v1/example.gguf -3f64927f11a5da5ded7048b226fbe1362ee399021f5e8058c73949a677b6ac9c - -$ podman ps -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -09b0e0d26ed2 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 32 seconds ago Up 32 seconds 0.0.0.0:8081->8081/tcp ramalama_sTLNkijNNP -3f64927f11a5 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 17 seconds ago Up 17 seconds 0.0.0.0:8082->8082/tcp ramalama_YMPQvJxN97 -``` - -### Generate quadlet service off of HuggingFace granite Model -```bash -$ ramalama serve --name MyGraniteServer --generate=quadlet granite -Generating quadlet file: MyGraniteServer.container - -$ cat MyGraniteServer.container -[Unit] -Description=RamaLama $HOME/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf AI Model Service -After=local-fs.target - -[Container] -AddDevice=-/dev/accel -AddDevice=-/dev/dri -AddDevice=-/dev/kfd -Exec=llama-server --port 1234 -m $HOME/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf -Image=quay.io/ramalama/ramalama:latest -Mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf,target=/mnt/models/model.file,ro,Z -ContainerName=MyGraniteServer -PublishPort=8080 - -[Install] -# Start by default on boot -WantedBy=multi-user.target default.target - -$ mv MyGraniteServer.container $HOME/.config/containers/systemd/ -$ systemctl --user daemon-reload -$ systemctl start --user MyGraniteServer -$ systemctl status --user MyGraniteServer -● MyGraniteServer.service - RamaLama granite AI Model Service - Loaded: loaded (/home/dwalsh/.config/containers/systemd/MyGraniteServer.container; generated) - Drop-In: /usr/lib/systemd/user/service.d - └─10-timeout-abort.conf - Active: active (running) since Fri 2024-09-27 06:54:17 EDT; 3min 3s ago - Main PID: 3706287 (conmon) - Tasks: 20 (limit: 76808) - Memory: 1.0G (peak: 1.0G) - -... -$ podman ps -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -7bb35b97a0fe quay.io/ramalama/ramalama:latest llama-server --po... 3 minutes ago Up 3 minutes 0.0.0.0:43869->8080/tcp MyGraniteServer -``` - -### Generate quadlet service off of tiny OCI Model -```bash -$ ramalama --runtime=vllm serve --name tiny --generate=quadlet oci://quay.io/rhatdan/tiny:latest -Downloading quay.io/rhatdan/tiny:latest... -Trying to pull quay.io/rhatdan/tiny:latest... -Getting image source signatures -Copying blob 65ba8d40e14a skipped: already exists -Copying blob e942a1bf9187 skipped: already exists -Copying config d8e0b28ee6 done | -Writing manifest to image destination -Generating quadlet file: tiny.container -Generating quadlet file: tiny.image -Generating quadlet file: tiny.volume - -$cat tiny.container -[Unit] -Description=RamaLama /run/model/model.file AI Model Service -After=local-fs.target - -[Container] -AddDevice=-/dev/accel -AddDevice=-/dev/dri -AddDevice=-/dev/kfd -Exec=vllm serve --port 8080 /run/model/model.file -Image=quay.io/ramalama/ramalama:latest -Mount=type=volume,source=tiny:latest.volume,dest=/mnt/models,ro -ContainerName=tiny -PublishPort=8080 - -[Install] -# Start by default on boot -WantedBy=multi-user.target default.target - -$ cat tiny.volume -[Volume] -Driver=image -Image=tiny:latest.image - -$ cat tiny.image -[Image] -Image=quay.io/rhatdan/tiny:latest -``` - -### Generate quadlet service off of tiny OCI Model and output to directory -```bash -$ ramalama --runtime=vllm serve --name tiny --generate=quadlet:~/.config/containers/systemd/ oci://quay.io/rhatdan/tiny:latest -Generating quadlet file: tiny.container -Generating quadlet file: tiny.image -Generating quadlet file: tiny.volume - -$ ls ~/.config/containers/systemd/ -tiny.container tiny.image tiny.volume -``` - -### Generate a kubernetes YAML file named MyTinyModel -```bash -$ ramalama serve --name MyTinyModel --generate=kube oci://quay.io/rhatdan/tiny-car:latest -Generating Kubernetes YAML file: MyTinyModel.yaml -$ cat MyTinyModel.yaml -# Save the output of this file and use kubectl create -f to import -# it into Kubernetes. -# -# Created with ramalama-0.0.21 -apiVersion: v1 -kind: Deployment -metadata: - name: MyTinyModel - labels: - app: MyTinyModel -spec: - replicas: 1 - selector: - matchLabels: - app: MyTinyModel - template: - metadata: - labels: - app: MyTinyModel - spec: - containers: - - name: MyTinyModel - image: quay.io/ramalama/ramalama:latest - command: ["llama-server"] - args: ['--port', '8080', '-m', '/mnt/models/model.file'] - ports: - - containerPort: 8080 - volumeMounts: - - mountPath: /mnt/models - subPath: /models - name: model - - mountPath: /dev/dri - name: dri - volumes: - - image: - reference: quay.io/rhatdan/tiny-car:latest - pullPolicy: IfNotPresent - name: model - - hostPath: - path: /dev/dri - name: dri -``` - -### Generate Compose file -```bash -$ ramalama serve --name=my-smollm-server --port 1234 --generate=compose smollm:135m -Generating Compose YAML file: docker-compose.yaml -$ cat docker-compose.yaml -version: '3.8' -services: - my-smollm-server: - image: quay.io/ramalama/ramalama:latest - container_name: my-smollm-server - command: ramalama serve --host 0.0.0.0 --port 1234 smollm:135m - ports: - - "1234:1234" - volumes: - - ~/.local/share/ramalama/models/smollm-135m-instruct:/mnt/models/model.file:ro - environment: - - HOME=/tmp - cap_drop: - - ALL - security_opt: - - no-new-privileges - - label=disable -``` - -### Generate a Llama Stack Kubernetes YAML file named MyLamaStack -```bash -$ ramalama serve --api llama-stack --name MyLamaStack --generate=kube oci://quay.io/rhatdan/granite:latest -Generating Kubernetes YAML file: MyLamaStack.yaml -$ cat MyLamaStack.yaml -apiVersion: v1 -kind: Deployment -metadata: - name: MyLamaStack - labels: - app: MyLamaStack -spec: - replicas: 1 - selector: - matchLabels: - app: MyLamaStack - template: - metadata: - labels: - ai.ramalama: "" - app: MyLamaStack - ai.ramalama.model: oci://quay.io/rhatdan/granite:latest - ai.ramalama.engine: podman - ai.ramalama.runtime: llama.cpp - ai.ramalama.port: 8080 - ai.ramalama.command: serve - spec: - containers: - - name: model-server - image: quay.io/ramalama/ramalama:0.8 - command: ["llama-server"] - args: ['--port', '8081', '--model', '/mnt/models/model.file', '--alias', 'quay.io/rhatdan/granite:latest', '--temp', '0.8', '--jinja', '--cache-reuse', '256', '-v', '--threads', 16, '--host', '127.0.0.1'] - securityContext: - allowPrivilegeEscalation: false - capabilities: - drop: - - CAP_CHOWN - - CAP_FOWNER - - CAP_FSETID - - CAP_KILL - - CAP_NET_BIND_SERVICE - - CAP_SETFCAP - - CAP_SETGID - - CAP_SETPCAP - - CAP_SETUID - - CAP_SYS_CHROOT - add: - - CAP_DAC_OVERRIDE - seLinuxOptions: - type: spc_t - volumeMounts: - - mountPath: /mnt/models - subPath: /models - name: model - - mountPath: /dev/dri - name: dri - - name: llama-stack - image: quay.io/ramalama/llama-stack:0.8 - args: - - /bin/sh - - -c - - llama stack run --image-type venv /etc/ramalama/ramalama-run.yaml - env: - - name: RAMALAMA_URL - value: http://127.0.0.1:8081 - - name: INFERENCE_MODEL - value: quay.io/rhatdan/granite:latest - securityContext: - allowPrivilegeEscalation: false - capabilities: - drop: - - CAP_CHOWN - - CAP_FOWNER - - CAP_FSETID - - CAP_KILL - - CAP_NET_BIND_SERVICE - - CAP_SETFCAP - - CAP_SETGID - - CAP_SETPCAP - - CAP_SETUID - - CAP_SYS_CHROOT - add: - - CAP_DAC_OVERRIDE - seLinuxOptions: - type: spc_t - ports: - - containerPort: 8321 - hostPort: 8080 - volumes: - - hostPath: - path: quay.io/rhatdan/granite:latest - name: model - - hostPath: - path: /dev/dri - name: dri -``` - -### Generate a kubernetes YAML file named MyTinyModel shown above, but also generate a quadlet to run it in. -```bash -$ ramalama --name MyTinyModel --generate=quadlet/kube oci://quay.io/rhatdan/tiny-car:latest -run_cmd: podman image inspect quay.io/rhatdan/tiny-car:latest -Generating Kubernetes YAML file: MyTinyModel.yaml -Generating quadlet file: MyTinyModel.kube -$ cat MyTinyModel.kube -[Unit] -Description=RamaLama quay.io/rhatdan/tiny-car:latest Kubernetes YAML - AI Model Service -After=local-fs.target - -[Kube] -Yaml=MyTinyModel.yaml - -[Install] -# Start by default on boot -WantedBy=multi-user.target default.target -``` - -## NVIDIA CUDA Support - -See [ramalama-cuda(7)](/docs/platform-guides/cuda) for setting up the host Linux system for CUDA support. - -## MLX Support - -The MLX runtime is designed for Apple Silicon Macs and provides optimized performance on these systems. MLX support has the following requirements: - -- **Operating System**: macOS only -- **Hardware**: Apple Silicon (M1, M2, M3, or later) -- **Container Mode**: MLX requires `--nocontainer` as it cannot run inside containers -- **Dependencies**: The `mlx-lm` uv package installed on the host system as a uv tool - -To install MLX dependencies, use `uv`: -```bash -uv tool install mlx-lm -# or upgrade to the latest version: -uv tool upgrade mlx-lm -``` - -Example usage: -```bash -ramalama --runtime=mlx serve hf://mlx-community/Unsloth-Phi-4-4bit -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/), [ramalama-stop(1)](/docs/commands/ramalama/stop), **quadlet(1)**, **systemctl(1)**, **podman(1)**, **podman-ps(1)**, [ramalama-cuda(7)](/docs/platform-guides/cuda), [ramalama.conf(5)](/docs/configuration/conf) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/stop.mdx b/docsite/docs/commands/ramalama/stop.mdx deleted file mode 100644 index 37fd8797..00000000 --- a/docsite/docs/commands/ramalama/stop.mdx +++ /dev/null @@ -1,45 +0,0 @@ ---- -title: stop -description: stop named container that is running AI Model -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-stop.1.md ---- - -# stop - -## Synopsis -**ramalama stop** [*options*] *name* - -Tells container engine to stop the specified container. - -The stop command conflicts with --nocontainer option. - -## Options - -#### **--all**, **-a** -Stop all containers - -#### **--help**, **-h** -Print usage message - -#### **--ignore** -Ignore missing containers when stopping - -## Description -Stop specified container that is executing the AI Model. - -The ramalama stop command conflicts with the --nocontainer option. The user needs to stop the RamaLama processes manually when running with --nocontainer. - -## Examples - -```bash -$ ramalama stop mymodel -$ ramalama stop --all -``` - -## See Also -[ramalama(1)](/docs/commands/ramalama/), [ramalama-run(1)](/docs/commands/ramalama/run), [ramalama-serve(1)](/docs/commands/ramalama/serve) - ---- - -*Sep 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/commands/ramalama/version.mdx b/docsite/docs/commands/ramalama/version.mdx deleted file mode 100644 index fbaa0d16..00000000 --- a/docsite/docs/commands/ramalama/version.mdx +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: version -description: display version of RamaLama -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-version.1.md ---- - -# version - -## Synopsis -**ramalama version** - -## Description -Print version of RamaLama - -## Options - -#### **--help**, **-h** -Print usage message - -## Examples - -```bash -$ ramalama version -ramalama version 0.16.0 -$ ramalama -q version -0.16.0 -> -``` -## See Also -[ramalama(1)](/docs/commands/ramalama/) - ---- - -*Aug 2024, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/configuration/conf.mdx b/docsite/docs/configuration/conf.mdx deleted file mode 100644 index 8cf0fc4f..00000000 --- a/docsite/docs/configuration/conf.mdx +++ /dev/null @@ -1,267 +0,0 @@ ---- -title: Configuration File -description: Configuration file reference -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama.conf.5.md ---- - -# Configuration File - -# DESCRIPTION -RamaLama reads all ramalama.conf files, if they exists -and modify the defaults for running RamaLama on the host. ramalama.conf uses -a TOML format that can be easily modified and versioned. - -RamaLama reads the he following paths for global configuration that effects all users. - -| Paths | Exception | -| ----------------------------------- | ----------------------------------- | -| __/usr/share/ramalama/ramalama.conf__ | On Linux | -| __/usr/local/share/ramalama/ramalama.conf__ | On Linux | -| __/etc/ramalama/ramalama.conf__ | On Linux | -| __/etc/ramalama/ramalama.conf.d/\*.conf__ | On Linux | -| __$HOME/.local/.pipx/venvs/usr/share/ramalama/ramalama.conf__ |On pipx installed macOS | - -For user specific configuration it reads - -| Paths | Exception | -| ----------------------------------- | ------------------------------ | -| __$XDG_CONFIG_HOME/ramalama/ramalama.conf__ | | -| __$XDG_CONFIG_HOME/ramalama/ramalama.conf.d/\*.conf__ | | -| __$HOME/.config/ramalama/ramalama.conf__ | `$XDG_CONFIG_HOME` not set | -| __$HOME/.config/ramalama/ramalama.conf.d/\*.conf__ | `$XDG_CONFIG_HOME` not set | - -Fields specified in ramalama conf files override the default options, as well as -options in previously read ramalama conf files. - -Config files in the `.d` directories, are added in alpha numeric sorted order and must end in `.conf`. - -## ENVIRONMENT VARIABLES -If the `RAMALAMA_CONFIG` environment variable is set, all system and user -config files are ignored and only the specified config file is loaded. - -# FORMAT -The [TOML format][toml] is used as the encoding of the configuration file. -Every option is nested under its table. No bare options are used. The format of -TOML can be simplified to: - - [table1] - option = value - - [table2] - option = value - - [table3] - option = value - - [table3.subtable1] - option = value - -## RAMALAMA TABLE -The ramalama table contains settings to configure and manage the OCI runtime. - -`[[ramalama]]` - -**api**="none" - -Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry. -Options: llama-stack, none - -**api_key**="" - -OpenAI-compatible API key. Can also be set via the RAMALAMA_API_KEY environment variable. - -**carimage**="registry.access.redhat.com/ubi10-micro:latest" - -OCI model car image - -Image to be used when building and pushing --type=car models - -**cache_reuse**=256 - -Min chunk size to attempt reusing from the cache via KV shifting - -**container**=true - -Run RamaLama in the default container. -RAMALAMA_IN_CONTAINER environment variable overrides this field. - -**convert_type**="raw" - -Convert the MODEL to the specified OCI Object -Options: artifact, car, raw - -| Type | Description | -| -------- | ------------------------------------------------------------- | -| artifact | Store AI Models as artifacts | -| car | Traditional OCI image including base image with the model stored in a /models subdir | -| raw | Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / | - -**ctx_size**=0 - -Size of the prompt context (0 = loaded from model) - -**engine**="podman" - -Run RamaLama using the specified container engine. -Valid options are: Podman and Docker -This field can be overridden by the RAMALAMA_CONTAINER_ENGINE environment variable. - -**env**=[] - -Environment variables to be added to the environment used when running in a container engine (e.g., Podman, Docker). For example "LLAMA_ARG_THREADS=10". - -**gguf_quantization_mode**="Q4_K_M" - -The quantization mode used when creating OCI formatted AI Models. -Available options: Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0. - -**host**="0.0.0.0" - -IP address for llama.cpp to listen on. - -**image**="quay.io/ramalama/ramalama:latest" - -OCI container image to run with the specified AI model -RAMALAMA_IMAGE environment variable overrides this field. - -`[[ramalama.images]]` - HIP_VISIBLE_DEVICES = "quay.io/ramalama/rocm" - CUDA_VISIBLE_DEVICES = "quay.io/ramalama/cuda" - ASAHI_VISIBLE_DEVICES = "quay.io/ramalama/asahi" - INTEL_VISIBLE_DEVICES = "quay.io/ramalama/intel-gpu" - ASCEND_VISIBLE_DEVICES = "quay.io/ramalama/cann" - MUSA_VISIBLE_DEVICES = "quay.io/ramalama/musa" - VLLM = "registry.redhat.io/rhelai1/ramalama-vllm" - -Alternative images to use when RamaLama recognizes specific hardware or user -specified vllm model runtime. - -**keep_groups**=false - -Pass `--group-add keep-groups` to podman, when using podman. -In some cases this is needed to access the gpu from a rootless container - -**log_level**=warning -Set the logging level of RamaLama application. -Valid Values: - debug, info, warning, error, critical -:::note - --debug option overrides this field and forces the system to debug -::: - -**max_tokens**=0 - -Maximum number of tokens to generate. Set to 0 for unlimited output (default: 0). -This parameter is mapped to the appropriate runtime-specific parameter when executing models. - -**ngl**=-1 - -number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1) -The default -1, means use whatever is automatically deemed appropriate (0 or 999) - -**prefix**="" -Specify default prefix for chat and run command. By default the prefix -is based on the container engine used. - -| Container Engine| Prefix | -| --------------- | ------- | -| Podman | "🦭 > " | -| Docker | "🐋 > " | -| No Engine | "🦙 > " | -| No EMOJI support| "> " | - -**port**="8080" - -Specify initial port for a range of 101 ports for services to listen on. -If this port is unavailable, another free port from this range will be selected. - -**pull**="newer" - -- **always**: Always pull the image and throw an error if the pull fails. -- **missing**: Only pull the image when it does not exist in the local containers storage. Throw an error if no image is found and the pull fails. -- **never**: Never pull the image but use the one from the local containers storage. Throw an error when no image is found. -- **newer**: Pull if the image on the registry is newer than the one in the local containers storage. An image is considered to be newer when the digests are different. Comparing the time stamps is prone to errors. Pull errors are suppressed if a local image was found. - -**rag_format**="qdrant" - -Specify the default output format for output of the `ramalama rag` command. -Options: qdrant, json, markdown, milvus. - -**rag_images**="quay.io/ramalama/ramalama-rag" - -OCI container image to run with the specified AI model when using RAG content. - -`[[ramalama.rag_images]]` - CUDA_VISIBLE_DEVICES = "quay.io/ramalama/cuda-rag" - HIP_VISIBLE_DEVICES = "quay.io/ramalama/rocm-rag" - INTEL_VISIBLE_DEVICES = "quay.io/ramalama/intel-gpu-rag" - GGML_VK_VISIBLE_DEVICES = "quay.io/ramalama/ramalama" - -**runtime**="llama.cpp" - -Specify the AI runtime to use; valid options are 'llama.cpp', 'vllm', and 'mlx' (default: llama.cpp) -Options: llama.cpp, vllm, mlx - -**selinux**=false - -SELinux container separation enforcement - -**store**="$HOME/.local/share/ramalama" - -Store AI Models in the specified directory - -**summarize_after**=4 - -Automatically summarize conversation history after N messages to prevent context growth. -When enabled, ramalama will periodically condense older messages into a summary, -keeping only recent messages and the summary. This prevents the context from growing -indefinitely during long chat sessions. Set to 0 to disable (default: 4). - -**temp**="0.8" -Temperature of the response from the AI Model -llama.cpp explains this as: - - The lower the number is, the more deterministic the response. - - The higher the number is the more creative the response is, but more likely to hallucinate when set too high. - - Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories - -**thinking**=true - -Enable thinking mode on reasoning models - -**threads**=-1 - -maximum number of cpu threads to use for inferencing -The default -1, uses the default of the underlying implementation - -**transport**="ollama" - -Specify the default transport to be used for pulling and pushing of AI Models. -Options: oci, ollama, huggingface. -RAMALAMA_TRANSPORT environment variable overrides this field. - -`[[ramalama.http_client]]` - -Http client configuration - -**max_retries**=5 - -The maximum number of times to retry a failed download - -**max_retry_delay**=30 - -The maximum delay between retry attempts in seconds - -## RAMALAMA.USER TABLE -The ramalama.user table contains user preference settings. - -`[[ramalama.user]]` - -**no_missing_gpu_prompt**=false - -Suppress the interactive prompt when running on macOS with a Podman VM that does not support GPU acceleration (e.g., applehv provider). When set to true, RamaLama will automatically proceed without GPU support instead of prompting the user for confirmation. This is useful for automation and scripting scenarios where interactive prompts are not desired. - -Can also be set via the RAMALAMA_USER__NO_MISSING_GPU_PROMPT environment variable. \ No newline at end of file diff --git a/docsite/docs/configuration/ramalama-oci.mdx b/docsite/docs/configuration/ramalama-oci.mdx deleted file mode 100644 index f19e1b84..00000000 --- a/docsite/docs/configuration/ramalama-oci.mdx +++ /dev/null @@ -1,40 +0,0 @@ ---- -title: OCI Spec -description: Configuration file reference -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-oci.5.md ---- - -# OCI Spec - -# DESCRIPTION -RamaLama’s `oci://` transport uses [OpenContainers image registries](https://github.com/opencontainers/distribution-spec) to store AI models. - -Each model is stored in an ordinary [container image](https://github.com/opencontainers/image-spec) (currently not using a specialized OCI artifact). - -The image is, structurally, a single-platform image (the top-level element is an OCI Image Manifest, not an OCI Image Index). - -## Model Data - -Because the AI model is stored in an image, not an artifact, the data is, like in all OCI images, wrapped in the standard tar layer format. - -The contents of the image must contain a `/models/model.file` file (or, usually, a symbolic link), -which contains an AI model in GGUF format (consumable by `llama-server`). - -## Metadata - -The image’s config contains an `org.containers.type` label. The value of the label can be one of: - -- `ai.image.model.raw`: The image contains only the AI model -- `ai.image.model.car`: The image also contains other software; more details of that software are currently unspecified in this document. - -## Local Image Storage - -The model image may be pulled into, or created in, Podman’s local image storage. - -In such a situation, to simplify identification of AI models, -the model image may be wrapped in an OCI index pointing at the AI model image, -and in the index, the manifests’ descriptor pointing at the AI model image contains an `org.cnai.model.model` annotation. - -Note that the wrapping in an OCI index does not happen in all situations, -and in particular does not happen when RamaLama uses Docker instead of Podman. \ No newline at end of file diff --git a/docsite/docs/misc/MACOS_INSTALL.mdx b/docsite/docs/misc/MACOS_INSTALL.mdx deleted file mode 100644 index 44b6eb9b..00000000 --- a/docsite/docs/misc/MACOS_INSTALL.mdx +++ /dev/null @@ -1,219 +0,0 @@ ---- -title: MACOS_INSTALL -description: RamaLama documentation -# This file is auto-generated from manpages. Do not edit manually. -# Source: MACOS_INSTALL.md ---- - -# MACOS_INSTALL - -# macOS Installation Guide for RamaLama - -This guide covers the different ways to install RamaLama on macOS. - -## Method 1: Self-Contained Installer Package (Recommended) - -The easiest way to install RamaLama on macOS is using our self-contained `.pkg` installer. This method includes Python and all dependencies, so you don't need to install anything else. - -### Download and Install - -1. Download the latest installer from the [Releases page](https://github.com/containers/ramalama/releases) -2. Double-click the downloaded `.pkg` file -3. Follow the installation wizard - -Or via command line: - -```bash -# Download the installer (replace VERSION with the actual version) -curl -LO https://github.com/containers/ramalama/releases/download/vVERSION/RamaLama-VERSION-macOS-Installer.pkg - -# Verify the SHA256 checksum (optional but recommended) -curl -LO https://github.com/containers/ramalama/releases/download/vVERSION/RamaLama-VERSION-macOS-Installer.pkg.sha256 -shasum -a 256 -c RamaLama-VERSION-macOS-Installer.pkg.sha256 - -# Install -sudo installer -pkg RamaLama-VERSION-macOS-Installer.pkg -target / -``` - -### What Gets Installed - -The installer places files in: -- `/usr/local/bin/ramalama` - Main executable -- `/usr/local/share/ramalama/` - Configuration files -- `/usr/local/share/man/` - Man pages -- `/usr/local/share/bash-completion/` - Bash completions -- `/usr/local/share/fish/` - Fish completions -- `/usr/local/share/zsh/` - Zsh completions - -### Verify Installation - -```bash -# Check version -ramalama --version - -# Get help -ramalama --help -``` - -## Method 2: Python Package (pip) - -If you prefer to use Python package management: - -```bash -# Install Python 3.10 or later (if not already installed) -brew install python@3.11 - -# Install ramalama -pip3 install ramalama - -# Or install from source -git clone https://github.com/containers/ramalama.git -cd ramalama -pip3 install . -``` - -## Method 3: Build from Source - -For developers or if you want the latest code: - -```bash -# Clone the repository -git clone https://github.com/containers/ramalama.git -cd ramalama - -# Install build dependencies -pip3 install build - -# Build and install -make install -``` - -## Prerequisites - -Before using RamaLama, you'll need a container engine: - -### Option A: Podman (Recommended) - -```bash -brew install podman - -# Initialize Podman machine with libkrun for GPU access -podman machine init --provider libkrun -podman machine start -``` - -For more details, see [ramalama-macos(7)](/docs/platform-guides/macos). - -### Option B: Docker - -```bash -brew install docker -``` - -## Building the Installer Package (For Maintainers) - -If you want to build the installer package yourself: - -```bash -# Install PyInstaller -pip3 install pyinstaller - -# Build the package -./scripts/build_macos_pkg.sh - -# The built package will be in: -# build/macos-pkg/RamaLama-VERSION-macOS-Installer.pkg -``` - -## Uninstallation - -To remove RamaLama: - -```bash -# Remove the executable -sudo rm /usr/local/bin/ramalama - -# Remove configuration and data files (optional) -sudo rm -rf /usr/local/share/ramalama -rm -rf ~/.local/share/ramalama -rm -rf ~/.config/ramalama - -# Remove man pages (optional) -sudo rm /usr/local/share/man/man1/ramalama*.1 -sudo rm /usr/local/share/man/man5/ramalama*.5 -sudo rm /usr/local/share/man/man7/ramalama*.7 - -# Remove shell completions (optional) -sudo rm /usr/local/share/bash-completion/completions/ramalama -sudo rm /usr/local/share/fish/vendor_completions.d/ramalama.fish -sudo rm /usr/local/share/zsh/site-functions/_ramalama -``` - -## Troubleshooting - -### "ramalama: command not found" - -Make sure `/usr/local/bin` is in your PATH: - -```bash -echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc -source ~/.zshrc -``` - -### "Cannot verify developer" warning - -macOS may show a security warning for unsigned packages. -:::note - We're working on getting keys to sign it. -::: - -To bypass: - -1. Right-click the `.pkg` file -2. Select "Open" -3. Click "Open" in the dialog - -### Podman machine issues - -If Podman isn't working: - -```bash -# Reset Podman machine -podman machine stop -podman machine rm -podman machine init --provider libkrun -podman machine start -``` - -## Getting Started - -Once installed, try these commands: - -```bash -# Check version -ramalama --version - -# Pull a model -ramalama pull tinyllama - -# Run a chatbot -ramalama run tinyllama - -# Get help -ramalama --help -``` - -## Additional Resources - -- [RamaLama Documentation](https://ramalama.ai) -- [GitHub Repository](https://github.com/containers/ramalama) -- [macOS-specific Documentation](/docs/platform-guides/macos) -- [Report Issues](https://github.com/containers/ramalama/issues) - -## System Requirements - -- macOS 10.15 (Catalina) or later -- Intel or Apple Silicon (M1/M2/M3) processor -- 4GB RAM minimum (8GB+ recommended for running models) -- 10GB free disk space -- Podman or Docker \ No newline at end of file diff --git a/docsite/docs/platform-guides/cann.mdx b/docsite/docs/platform-guides/cann.mdx deleted file mode 100644 index 4cdbf3d4..00000000 --- a/docsite/docs/platform-guides/cann.mdx +++ /dev/null @@ -1,76 +0,0 @@ ---- -title: cann -description: Platform-specific setup guide -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-cann.7.md ---- - -# cann - -# Setting Up RamaLama with Ascend NPU Support on Linux systems - -This guide walks through the steps required to set up RamaLama with Ascend NPU support. - - [Background](#background) - - [Hardware](#hardware) - - [Model](#model) - - [Docker](#docker) -## Background - -**Ascend NPU** is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars. - -**CANN** (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for AI scenarios, providing support for multiple AI frameworks on the top and serving AI processors and programming at the bottom. It plays a crucial role in bridging the gap between upper and lower layers, and is a key platform for improving the computing efficiency of Ascend AI processors. Meanwhile, it offers a highly efficient and easy-to-use programming interface for diverse application scenarios, allowing users to rapidly build AI applications and services based on the Ascend platform. - -## Hardware - -### Ascend NPU - -**Verified devices** - -Table Supported Hardware List: -| Ascend NPU | Status | -| ----------------------------- | ------- | -| Atlas A2 Training series | Support | -| Atlas 800I A2 Inference series | Support | - -*Notes:* - -- If you have trouble with Ascend NPU device, please create an issue with **[CANN]** prefix/tag. -- If you are running successfully with an Ascend NPU device, please help update the "Supported Hardware List" table above. - -## Model -Currently, Ascend NPU acceleration is only supported when the llama.cpp backend is selected. For supported models, please refer to the page [llama.cpp/backend/CANN.md](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/CANN.md). - -## Docker -### Install the Ascend driver -This provides NPU acceleration using the AI cores of your Ascend NPU. And [CANN](https://www.hiascend.com/en/software/cann) is a hierarchical APIs to help you to quickly build AI applications and service based on Ascend NPU. - -For more information about Ascend NPU in [Ascend Community](https://www.hiascend.com/en/). - -Make sure to have the CANN toolkit installed. You can download it from here: [CANN Toolkit](https://www.hiascend.com/developer/download/community/result?module=cann) -Make sure the Ascend Docker runtime is installed. You can download it from here: [Ascend-docker-runtime](https://www.hiascend.com/document/detail/en/mindx-dl/300/dluserguide/clusterscheduling/dlug_installation_02_000025.html) - -### Build Images -Go to `ramalama` directory and build using make. -```bash -make build IMAGE=cann -make install -``` - -You can test with: -```bash -export ASCEND_VISIBLE_DEVICES=0 -ramalama --image quay.io/ramalama/cann:latest serve -d -p 8080 -name ollama://smollm:135m -``` - -In a window see the running podman container. -```bash -$ podman ps -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -80fc31c131b0 quay.io/ramalama/cann:latest "/bin/bash -c 'expor…" About an hour ago Up About an hour ame -``` - -Other using guides see RamaLama ([README.md](https://github.com/containers/ramalama/blob/main/README.md)) - ---- - -*Mar 2025, Originally compiled* \ No newline at end of file diff --git a/docsite/docs/platform-guides/cuda.mdx b/docsite/docs/platform-guides/cuda.mdx deleted file mode 100644 index 00566f5b..00000000 --- a/docsite/docs/platform-guides/cuda.mdx +++ /dev/null @@ -1,193 +0,0 @@ ---- -title: cuda -description: Platform-specific setup guide -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-cuda.7.md ---- - -# cuda - -# Setting Up RamaLama with CUDA Support on Linux systems - -This guide walks through the steps required to set up RamaLama with CUDA support. - -## Install the NVIDIA Container Toolkit - -Follow the installation instructions provided in the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). - -### Installation using dnf/yum (For RPM based distros like Fedora) - -* Install the NVIDIA Container Toolkit packages - - ```bash -sudo dnf install -y nvidia-container-toolkit -``` - -:::note - The NVIDIA Container Toolkit is required on the host for running CUDA in containers. -::: -:::note - If the above installation is not working for you and you are running Fedora, try removing it and using the [COPR](https://copr.fedorainfracloud.org/coprs/g/ai-ml/nvidia-container-toolkit/). -::: - -### Installation using APT (For Debian based distros like Ubuntu) - -* Configure the Production Repository - - ```bash -curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ - sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg - - curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ - sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ - sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list -``` - -* Update the packages list from the repository - - ```bash -sudo apt-get update -``` - -* Install the NVIDIA Container Toolkit packages - - ```bash -sudo apt-get install -y nvidia-container-toolkit -``` - -:::note - The NVIDIA Container Toolkit is required for WSL to have CUDA resources while running a container. -::: - -## Setting Up CUDA Support - - For additional information see: [Support for Container Device Interface](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) - -# Generate the CDI specification file - - ```bash -sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml -``` - -# Check the names of the generated devices - - Open and edit the NVIDIA container runtime configuration: - - ```bash -nvidia-ctk cdi list - INFO[0000] Found 1 CDI devices - nvidia.com/gpu=all -``` - - :::note - Generate a new CDI specification after any configuration change most notably when the driver is upgraded! -::: - -## Testing the Setup - -**Based on this Documentation:** [Running a Sample Workload](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html) - ---- - -# **Test the Installation** - - Run the following command to verify setup: - - ```bash -podman run --rm --device=nvidia.com/gpu=all fedora nvidia-smi -``` - -# **Expected Output** - - Verify everything is configured correctly, with output similar to this: - - ```text -Thu Dec 5 19:58:40 2024 - +-----------------------------------------------------------------------------------------+ - | NVIDIA-SMI 565.72 Driver Version: 566.14 CUDA Version: 12.7 | - |-----------------------------------------+------------------------+----------------------+ - | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | - | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | - | | | MIG M. | - |=========================================+========================+======================| - | 0 NVIDIA GeForce RTX 3080 On | 00000000:09:00.0 On | N/A | - | 34% 24C P5 31W / 380W | 867MiB / 10240MiB | 7% Default | - | | | N/A | - +-----------------------------------------+------------------------+----------------------+ - - +-----------------------------------------------------------------------------------------+ - | Processes: | - | GPU GI CI PID Type Process name GPU Memory | - | ID ID Usage | - |=========================================================================================| - | 0 N/A N/A 35 G /Xwayland N/A | - | 0 N/A N/A 35 G /Xwayland N/A | - +-----------------------------------------------------------------------------------------+ -``` - - :::note - On systems that have SELinux enabled, it may be necessary to turn on the `container_use_devices` boolean in order to run the `nvidia-smi` command successfully from a container. -::: - - To check the status of the boolean, run the following: - - ```bash -getsebool container_use_devices -``` - - If the result of the command shows that the boolean is `off`, run the following to turn the boolean on: - - ```bash -sudo setsebool -P container_use_devices 1 -``` - -### CUDA_VISIBLE_DEVICES - -RamaLama respects the `CUDA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by nvidia-smi. - -You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands: - -```bash -export CUDA_VISIBLE_DEVICES="0,1" # Use GPUs 0 and 1 -ramalama run granite -``` - -This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads. - -If the `CUDA_VISIBLE_DEVICES` environment variable is set to an empty string, RamaLama will default to using the CPU. - -```bash -export CUDA_VISIBLE_DEVICES="" # Defaults to CPU -ramalama run granite -``` - -To revert to using all available GPUs, unset the environment variable: - -```bash -unset CUDA_VISIBLE_DEVICES -``` - -## Troubleshooting - -### CUDA Updates - -On some CUDA software updates, RamaLama stops working complaining about missing shared NVIDIA libraries for example: - -```bash -ramalama run granite -Error: crun: cannot stat `/lib64/libEGL_nvidia.so.565.77`: No such file or directory: OCI runtime attempted to invoke a command that was not found -``` - -Because the CUDA version is updated, the CDI specification file needs to be recreated. - - ```bash -sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml -``` - -## See Also - -[ramalama(1)](/docs/commands/ramalama/), [podman(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman.1.md) - ---- - -*Jan 2025, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/platform-guides/macos.mdx b/docsite/docs/platform-guides/macos.mdx deleted file mode 100644 index 97a2bb18..00000000 --- a/docsite/docs/platform-guides/macos.mdx +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: macos -description: Platform-specific setup guide -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-macos.7.md ---- - -# macos - -# Configure Podman Machine on Mac for GPU Acceleration - -Leveraging GPU acceleration on a Mac with Podman requires the configuration of -the `libkrun` machine provider. - -This can be done by either setting an environment variable or modifying the -`containers.conf` file. On MacOS, you'll likely need to create a new Podman -machine with libkrun to access the GPU. - -Previously created Podman Machines must be recreated to take -advantage of the `libkrun` provider. - -## Configuration Methods: - -### containers.conf - -Open the containers.conf file, typically located at $HOME/.config/containers/containers.conf. - -Add the following line within the [machine] section: provider = "libkrun". -This change will persist across sessions. - -### Environment Variable -Set the CONTAINERS_MACHINE_PROVIDER environment variable to libkrun. This will be a temporary change until you restart your terminal or session. - -For example: export CONTAINERS_MACHINE_PROVIDER=libkrun - -### ramalama.conf - -RamaLama can also be run in a limited manner without using Containers, by -specifying the --nocontainer option. Open the ramalama.conf file, typically located at $HOME/.config/ramalama/ramalama.conf. - -Add the following line within the [machine] section: `container = false` -This change will persist across sessions. - -## Podman Desktop - -Creating a Podman Machine with libkrun (MacOS): - - Go to Settings > Resources in Podman Desktop. - -In the Podman tile, click Create new. -In the Create a Podman machine screen, you can configure the machine's resources (CPU, Memory, Disk size) and enable Machine with root privileges if needed. -To use libkrun, ensure that the environment variable is set or the containers.conf file is configured before creating the machine. -Once the machine is created, Podman Desktop will manage the connection to the new machine. - -## Important Notes: - -On MacOS, `libkrun` is used to leverage the system's virtualization framework for running containers, and it requires a Podman machine to be created. - -Refer to the [Podman Desktop documentation](https://podman-desktop.io/docs/podman/creating-a-podman-machine) for detailed instructions and troubleshooting tips. - -## See Also - -[ramalama(1)](/docs/commands/ramalama/), [podman-machine(1)](https://github.com/containers/podman/blob/main/docs/source/markdown/podman-machine.1.md) - ---- - -*Apr 2025, Originally compiled by Dan Walsh <dwalsh@redhat.com>* \ No newline at end of file diff --git a/docsite/docs/platform-guides/musa.mdx b/docsite/docs/platform-guides/musa.mdx deleted file mode 100644 index 1f6f92e9..00000000 --- a/docsite/docs/platform-guides/musa.mdx +++ /dev/null @@ -1,83 +0,0 @@ ---- -title: musa -description: Platform-specific setup guide -# This file is auto-generated from manpages. Do not edit manually. -# Source: ramalama-musa.7.md ---- - -# musa - -# Setting Up RamaLama with MUSA Support on Linux systems - -This guide walks through the steps required to set up RamaLama with MUSA support. - -## Install the MT Linux Driver - -Download the appropriate [MUSA SDK](https://developer.mthreads.com/sdk/download/musa) and follow the installation instructions provided in the [MT Linux Driver installation guide](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/install_guide#2%E9%A9%B1%E5%8A%A8%E5%AE%89%E8%A3%85). - -## Install the MT Container Toolkit - -Obtain the latest [MT CloudNative Toolkits](https://developer.mthreads.com/sdk/download/CloudNative) and follow the installation instructions provided in the [MT Container Toolkit installation guide](https://docs.mthreads.com/cloud-native/cloud-native-doc-online/install_guide/#%E6%91%A9%E5%B0%94%E7%BA%BF%E7%A8%8B%E5%AE%B9%E5%99%A8%E8%BF%90%E8%A1%8C%E6%97%B6%E5%A5%97%E4%BB%B6). - -## Setting Up MUSA Support - - ```bash -$ (cd /usr/bin/musa && sudo ./docker setup $PWD) - $ docker info | grep mthreads - Runtimes: mthreads mthreads-experimental runc - Default Runtime: mthreads -``` - -## Testing the Setup - -# **Test the Installation** - - Run the following command to verify setup: - - ```bash -docker run --rm --env MTHREADS_VISIBLE_DEVICES=all ubuntu:22.04 mthreads-gmi -``` - -# **Expected Output** - - Verify everything is configured correctly, with output similar to this: - - ```text -Thu May 15 01:53:39 2025 - --------------------------------------------------------------- - mthreads-gmi:2.0.0 Driver Version:3.0.0 - --------------------------------------------------------------- - ID Name |PCIe |%GPU Mem - Device Type |Pcie Lane Width |Temp MPC Capable - | ECC Mode - +-------------------------------------------------------------+ - 0 MTT S80 |00000000:01:00.0 |0% 3419MiB(16384MiB) - Physical |16x(16x) |59C YES - | N/A - --------------------------------------------------------------- - - --------------------------------------------------------------- - Processes: - ID PID Process name GPU Memory - Usage - +-------------------------------------------------------------+ - No running processes found - --------------------------------------------------------------- -``` - -### MUSA_VISIBLE_DEVICES - -RamaLama respects the `MUSA_VISIBLE_DEVICES` environment variable if it's already set in your environment. If not set, RamaLama will default to using all the GPU detected by mthreads-gmi. - -You can specify which GPU devices should be visible to RamaLama by setting this variable before running RamaLama commands: - -```bash -export MUSA_VISIBLE_DEVICES="0,1" # Use GPUs 0 and 1 -ramalama run granite -``` - -This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads. - ---- - -*May 2025, Originally compiled by Xiaodong Ye <yeahdongcn@gmail.com>* \ No newline at end of file