ramalama

mirror of https://github.com/containers/ramalama.git synced 2026-02-05 06:46:39 +01:00

Author	SHA1	Message	Date
Oliver Walsh	e956d11d70	Use default (auto) value for llama.cpp flash-attn Also fix the uses_nvidia logic which was inverted. Signed-off-by: Oliver Walsh <owalsh@redhat.com>	2026-01-29 12:21:11 +00:00
Ian Eaves	38e5e5cf8d	adds benchmark metrics persistence Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>	2026-01-24 22:06:46 -06:00
Oliver Walsh	867e1e6865	Fix vLLM inference spec and chat client Omit the previous command ("binary" in the spec) as the vLLM images use the entrypoint to run vLLM. Fix --served-model-name option. This refered to a non-existant model.model_name ctx variable. Use model.alias instead. Use precisely the same model name from the chat client. vLLM will exit when this is not equal to the served-model-name. Signed-off-by: Oliver Walsh <owalsh@redhat.com>	2026-01-24 00:51:23 +00:00
Mike Bonnet	162cc543f1	serve --rag: run rag_framework in a separate container rag_framework is now a proxy that enriches requests to the LLM with RAG context. Run it in a separate container and send requests from the chat interface to the RAG proxy. Generate the rag_framework command using CommandFactory. Signed-off-by: Mike Bonnet <mikeb@redhat.com>	2025-10-23 22:02:26 -07:00
Mike Bonnet	2be5d6f8eb	run --rag: run rag_framework in a separate container rag_framework is now a proxy that enriches requests to the LLM with RAG context. Run it in a separate container and send requests from the chat interface to the RAG proxy. Generate the rag_framework command using CommandFactory. Signed-off-by: Mike Bonnet <mikeb@redhat.com>	2025-10-23 22:02:26 -07:00
Mike Bonnet	49cdf6e445	rag: performance improvements Update doc2rag and rag_framework to load models from the local filesystem only, avoiding unnecessary round-trips to external repos. Convert rag_framework to use async vector db clients. Pass the --debug option through to the scripts. Signed-off-by: Mike Bonnet <mikeb@redhat.com>	2025-10-23 22:02:26 -07:00
Mike Bonnet	c0c75b38db	convert: run conversion and quantization in runtime containers Move model conversion and quantization from the build process into separate runtime operations, and build the results into a container, simplifying the process. Use Engine and BuildEngine to handle the container manager operations and reduce direct command execution. Use the inference spec to define the "convert" and "quantize" interfaces. Signed-off-by: Mike Bonnet <mikeb@redhat.com>	2025-10-23 16:16:32 -07:00
Mike Bonnet	7341685904	rag: generate doc2rag command-line with CommandFactory, and execute it with Engine Use the CommandFactory to generate the doc2rag command-line, and create a subclass of Engine to handle executing it in a container. Add a dedicated mapping of env vars to rag images in the config, and use that to select the rag image in the cli, making image selection more consistent. Signed-off-by: Mike Bonnet <mikeb@redhat.com>	2025-10-23 16:16:32 -07:00
Michael Engel	941ae1b2d9	Added --max-tokens to llama.cpp inference spec Relates to: https://github.com/containers/ramalama/pull/1982 Previously, the --max-tokens param was integrated in the daemon internal command factory. With the introduction of the spec, this command factory has now been replaced by the spec and the --max-tokens option added to the llama.cpp one. Signed-off-by: Michael Engel <mengel@redhat.com>	2025-10-23 14:19:49 +02:00
Kush Gupta	52aaea4ae6	support safetensor models across runtimes Signed-off-by: Kush Gupta <kushalgupta@gmail.com>	2025-10-06 15:26:08 -04:00
Oliver Walsh	89cd6c360b	Fix typo in llama.cpp engine spec Signed-off-by: Oliver Walsh <owalsh@redhat.com>	2025-10-02 10:59:17 +01:00
Michael Engel	0991c83259	Use yaml anchor for options Signed-off-by: Michael Engel <mengel@redhat.com>	2025-10-01 16:02:43 +02:00
Michael Engel	403c310b07	Moved hard-coded bench to llama.cpp spec Signed-off-by: Michael Engel <mengel@redhat.com>	2025-10-01 16:02:43 +02:00
Michael Engel	299e54ecb4	Moved hard-coded perplexity to llama.cpp spec Signed-off-by: Michael Engel <mengel@redhat.com>	2025-10-01 16:02:43 +02:00
Michael Engel	09277eecb8	Replace eval() with more Jinja templating Signed-off-by: Michael Engel <mengel@redhat.com>	2025-09-30 17:57:10 +02:00
Michael Engel	5aa484d4c4	Added command builder based on external specification Signed-off-by: Michael Engel <mengel@redhat.com>	2025-09-30 17:57:10 +02:00

16 Commits