* Start adding rpm/ramalama.spec for Fedora
Add a ramalama.spec to sit next to python-ramalama.spec while we get
this reviewed. Change various configs so they are aware of
ramalama.spec
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Add needed obsoletes/provides in base rpm to start process.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Try to fix CI problems with initial mr
The initial MR puts two spec files in the same directory which was
causing problems with the CI. This splits them off into different
directories which should allow for the tooling to work.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Finish move of Fedora rpm package to new name.
Put changes into various files needed to allow for new RPM package
`ramalama` to build in Fedora infrastructure versus python3-ramalama.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
* Fix problem with path names lsm5 caught
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
---------
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
Co-authored-by: Stephen Smoogen <ssmoogen@redhat.com>
mlx_lm.server is the only one in my path at least on my system.
Also, printing output like this which doesn't make sense:
Downloading huggingface://RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic/model.safetensors:latest ...
Trying to pull huggingface://RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic/model.safetensors:latest ...
Also remove recommendation to install via `brew install ramalama`, skips installing Apple specific
dependancies.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
This checked in file is an exact copy of:
curl -LsSfO https://astral.sh/uv/0.7.2/install.sh
Checking in the 0.7.2 version, because now a user can install with
access to github.com alone. Even if astral.sh is down for whatever
reason.
We may want to update uv installer from time to time.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
We are coming to the limits of what we can do in a "podman run"
line. Create wrapper functions so we can do things like forking
processes and other similar things that you need to do inside a
container in python3. There are some features coming up where
rather than upstreaming separate solutions to all our engines
like vLLM and llama.cpp we want to solve the problem in the
python3 layer.
The "if True:"'s will remain for a while, we may need to wait for
containers to be distributed around the place before we turn things
on.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
This makes sure the gpu detection techniques are the same
throughout the project. We do not display detailed accelerator info,
leave that to tools like "fastfetch" it is hard to maintain, there
are no standards.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Usually, the chat templates for gguf models are written as jinja templates.
Ollama, however, uses Go Templates specific to ollama. In order to use the
proper templates for models pulled from ollama, the chat templates are
converted to jinja ones and passed to llama-run.
Signed-off-by: Michael Engel <mengel@redhat.com>
ramalama-serve-core is intended to act as a proxy and implement
multiple-models. ramalama-client-core in intended to act as a OpenAI
client. ramalama-run-core is intended to act as ramalama-serve-core +
ramalama-client-core, both processes will die on completion of
ramalama-run-core.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
In cli.py we already load and merge configuration from various sources
and set defaults in the load_and_merge_config(). However, we still define
defaults when getting config values in various places.
In order to streamline this, the merged config is being provided by a
dedicated config.py module. Also, access to values is changed from .get
to access by index since a missing key is a bug and should throw an error.
Signed-off-by: Michael Engel <mengel@redhat.com>
Defaulting to that on platforms that have dnf, if it fails for
whatever reason, fall back to this script.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
AI Models are shipped with a lot of (meta) information such as
the used architecture, the chat template it requires and so on.
In order to make these available to the user, the new CLI command
inspect with the option support for --all and --json has been
implemented.
At the moment the GGUF file format - which includes the model as
well as the (meta) information in one file - is fully supported.
Other formats where the model and information is stored in different
files are not (yet) supported and only display basic information
such as the model name, path and registry.
Signed-off-by: Michael Engel <mengel@redhat.com>
It shouldn't be there. Also remove the -qq, it makes it feel
like the script isn't making progress as podman takes a while to
install.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
- if timeout happens we try 5 times before
sending Timeout error to users.
- improve user experience when timeout occurs
- Added console source tree for handling messages
Resolves: https://github.com/containers/ramalama/issues/693
Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
If you have git clone'd the project, you now have the option of
doing:
./install.sh -l
And it will install the version from the git repo.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Quadlets can directly execute a kube.yaml file, this option
makes it easy for users to generate a quadlet to execute the
kube.yaml file.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Need to be able to handle different kinds of quadlets.
Generates just a container for path based images, while
for OCI images creates a .container, .volume and .image file.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Want to try and install via pipx and wrote a script to remove old
install. Also converted python install script back to shell, the
python experience in a new mac is just not nice in comparison to
bash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>