1
0
mirror of https://github.com/containers/ramalama.git synced 2026-02-05 06:46:39 +01:00

16 Commits

Author SHA1 Message Date
Oliver Walsh
e956d11d70 Use default (auto) value for llama.cpp flash-attn
Also fix the uses_nvidia logic which was inverted.

Signed-off-by: Oliver Walsh <owalsh@redhat.com>
2026-01-29 12:21:11 +00:00
Ian Eaves
38e5e5cf8d adds benchmark metrics persistence
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
2026-01-24 22:06:46 -06:00
Oliver Walsh
867e1e6865 Fix vLLM inference spec and chat client
Omit the previous command ("binary" in the spec) as the vLLM images use the
entrypoint to run vLLM.

Fix --served-model-name option.
This refered to a non-existant model.model_name ctx variable.
Use model.alias instead.

Use precisely the same model name from the chat client.
vLLM will exit when this is not equal to the served-model-name.

Signed-off-by: Oliver Walsh <owalsh@redhat.com>
2026-01-24 00:51:23 +00:00
Mike Bonnet
162cc543f1 serve --rag: run rag_framework in a separate container
rag_framework is now a proxy that enriches requests to the LLM with RAG context. Run it in a separate
container and send requests from the chat interface to the RAG proxy.

Generate the rag_framework command using CommandFactory.

Signed-off-by: Mike Bonnet <mikeb@redhat.com>
2025-10-23 22:02:26 -07:00
Mike Bonnet
2be5d6f8eb run --rag: run rag_framework in a separate container
rag_framework is now a proxy that enriches requests to the LLM with RAG context. Run it in a separate
container and send requests from the chat interface to the RAG proxy.

Generate the rag_framework command using CommandFactory.

Signed-off-by: Mike Bonnet <mikeb@redhat.com>
2025-10-23 22:02:26 -07:00
Mike Bonnet
49cdf6e445 rag: performance improvements
Update doc2rag and rag_framework to load models from the local filesystem only, avoiding
unnecessary round-trips to external repos.

Convert rag_framework to use async vector db clients.

Pass the --debug option through to the scripts.

Signed-off-by: Mike Bonnet <mikeb@redhat.com>
2025-10-23 22:02:26 -07:00
Mike Bonnet
c0c75b38db convert: run conversion and quantization in runtime containers
Move model conversion and quantization from the build process into
separate runtime operations, and build the results into a container,
simplifying the process.

Use Engine and BuildEngine to handle the container manager operations
and reduce direct command execution.

Use the inference spec to define the "convert" and "quantize" interfaces.

Signed-off-by: Mike Bonnet <mikeb@redhat.com>
2025-10-23 16:16:32 -07:00
Mike Bonnet
7341685904 rag: generate doc2rag command-line with CommandFactory, and execute it with Engine
Use the CommandFactory to generate the doc2rag command-line, and create a subclass of Engine
to handle executing it in a container.

Add a dedicated mapping of env vars to rag images in the config, and use that to select the
rag image in the cli, making image selection more consistent.

Signed-off-by: Mike Bonnet <mikeb@redhat.com>
2025-10-23 16:16:32 -07:00
Michael Engel
941ae1b2d9 Added --max-tokens to llama.cpp inference spec
Relates to: https://github.com/containers/ramalama/pull/1982

Previously, the --max-tokens param was integrated in the daemon internal
command factory. With the introduction of the spec, this command factory
has now been replaced by the spec and the --max-tokens option added to
the llama.cpp one.

Signed-off-by: Michael Engel <mengel@redhat.com>
2025-10-23 14:19:49 +02:00
Kush Gupta
52aaea4ae6 support safetensor models across runtimes
Signed-off-by: Kush Gupta <kushalgupta@gmail.com>
2025-10-06 15:26:08 -04:00
Oliver Walsh
89cd6c360b Fix typo in llama.cpp engine spec
Signed-off-by: Oliver Walsh <owalsh@redhat.com>
2025-10-02 10:59:17 +01:00
Michael Engel
0991c83259 Use yaml anchor for options
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-10-01 16:02:43 +02:00
Michael Engel
403c310b07 Moved hard-coded bench to llama.cpp spec
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-10-01 16:02:43 +02:00
Michael Engel
299e54ecb4 Moved hard-coded perplexity to llama.cpp spec
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-10-01 16:02:43 +02:00
Michael Engel
09277eecb8 Replace eval() with more Jinja templating
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-09-30 17:57:10 +02:00
Michael Engel
5aa484d4c4 Added command builder based on external specification
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-09-30 17:57:10 +02:00