1
0
mirror of https://github.com/containers/ramalama.git synced 2026-02-05 15:47:26 +01:00
Commit Graph

2360 Commits

Author SHA1 Message Date
Eric Curtin
057c19e8d2 Remove libexec files
This is breaking nocontainer invocations, python package managers
don't recognize libexec files and replace the shebang.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-11 05:20:49 +01:00
Daniel J Walsh
d98adcbc9f Merge pull request #1499 from containers/update-shortnames
This is not a multi-model model
2025-06-10 23:43:49 -04:00
Eric Curtin
6959d73d30 Merge pull request #1501 from alaviss/push-tumrzqxpzvkn
amdkfd: add constants for heap types
2025-06-10 17:41:49 -05:00
Leorize
309766dd8c amdkfd: add constants for heap types
Signed-off-by: Leorize <leorize+oss@disroot.org>
2025-06-10 17:22:30 -05:00
Eric Curtin
4808a49de0 Merge pull request #1500 from alaviss/push-pwxuznmnqptr
Only enumerate ROCm-capable AMD GPUs
2025-06-10 17:02:17 -05:00
Leorize
db4a7d24af Apply formatting fixes
Signed-off-by: Leorize <leorize+oss@disroot.org>
2025-06-10 15:20:18 -05:00
Leorize
93e36ac24e Extract VRAM minimum into a constant
Signed-off-by: Leorize <leorize+oss@disroot.org>
2025-06-10 15:17:37 -05:00
Leorize
ecb9fb086f Extract amdkfd utilities to its own module
Signed-off-by: Leorize <leorize+oss@disroot.org>
2025-06-10 15:17:20 -05:00
Leorize
fab87654cb Only enumerate ROCm-capable AMD GPUs
Discover AMD graphics devices using AMDKFD topology instead of
enumerating the PCIe bus. This interface exposes a lot more information
about potential devices, allowing RamaLama to filter out unsupported
devices.

Currently, devices older than GFX9 are filtered, as they are no longer
supported by ROCm.

Signed-off-by: Leorize <leorize+oss@disroot.org>
2025-06-10 14:54:48 -05:00
Eric Curtin
9bc76c2757 This is not a multi-model model
Although the other gemma once are. Point the user towards a single
gguf.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-10 18:43:06 +01:00
Daniel J Walsh
83a75f16f7 Merge pull request #1492 from containers/renovate/registry.access.redhat.com-ubi9-ubi-9.x
chore(deps): update registry.access.redhat.com/ubi9/ubi docker tag to v9.6-1749542372
2025-06-10 08:42:14 -04:00
Daniel J Walsh
8a9f6a0291 Merge pull request #1496 from containers/fix-build
Install uv to fix build issue
2025-06-10 08:32:17 -04:00
Eric Curtin
b21556b513 Install uv to fix build issue
Run the install-uv.sh script.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-10 13:14:56 +01:00
Daniel J Walsh
4be8cbc71e Merge pull request #1495 from containers/dont-use-llvmpipe
There's a change that we want that avoids using software rasterizers
2025-06-10 08:08:50 -04:00
Eric Curtin
b4a3375d94 There's a change that we want that avoids using software rasterizers
It avoids using llvmpipe when Vulkan is built in and fallsback to
ggml-cpu.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-10 13:05:31 +01:00
Daniel J Walsh
7bdd073b59 Merge pull request #1491 from makllama/xd/fix_hf
Fix #1489
2025-06-10 05:25:40 -04:00
renovate[bot]
5b849722cb chore(deps): update registry.access.redhat.com/ubi9/ubi docker tag to v9.6-1749542372
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-06-10 09:22:45 +00:00
Daniel J Walsh
5925bb6908 Merge pull request #1490 from rhatdan/llama-stack
Make sure llama-stack URL is shown to user
2025-06-10 05:22:05 -04:00
Xiaodong Ye
ae0775afd1 Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-10 16:45:47 +08:00
Xiaodong Ye
6f020d361c Fix #1489
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-10 16:39:26 +08:00
Daniel J Walsh
764fc2d829 Make sure llama-stack URL is shown to user
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-10 09:50:04 +02:00
Daniel J Walsh
b64d82276c Merge pull request #1471 from rhatdan/oci
Throw exception when using OCI without engine
2025-06-10 03:36:20 -04:00
Daniel J Walsh
041c05d2b8 Throw exception when using OCI without engine
Fixes: https://github.com/containers/ramalama/issues/1463

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-10 08:46:01 +02:00
Daniel J Walsh
97a14e9c2d Merge pull request #1486 from containers/remove-duplicate-line-on-restapi
Only print this in the llama-stack case
2025-06-10 00:09:54 -04:00
Eric Curtin
2368da00ac Only print this in the llama-stack case
In the llama.cpp case it doesn't make as much sense, llama-server
prints this string when it's ready to be served like so:

main: server is listening on http://0.0.0.0:8080 - starting the main loop

This can be printed seconds or minutes too early potentially in
the llama.cpp case.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-09 15:25:08 +01:00
Daniel J Walsh
c62acfbba6 Merge pull request #1484 from rhatdan/VERSION
Bump to v0.9.1
v0.9.1
2025-06-09 08:37:35 -04:00
Daniel J Walsh
9c639fc651 Bump to v0.9.1
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-09 14:37:05 +02:00
Daniel J Walsh
bbcfb7c0f1 Fix llama-stack
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-09 14:37:05 +02:00
Daniel J Walsh
3317372625 Merge pull request #1474 from rhatdan/demos
Update demos to show serving models.
2025-06-09 03:35:06 -04:00
Daniel J Walsh
cd2a8c3539 Update demo scripts to show serve
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-09 09:34:36 +02:00
Daniel J Walsh
fe6d90461f Merge pull request #1472 from rhatdan/llama-stack
Fix handling of generate with llama-stack
2025-06-09 03:29:53 -04:00
Daniel J Walsh
e4ea40a1b8 Merge pull request #1483 from containers/renovate/huggingface-hub-0.x
fix(deps): update dependency huggingface-hub to ~=0.32.4
2025-06-09 00:14:15 -04:00
renovate[bot]
9627b5617b fix(deps): update dependency huggingface-hub to ~=0.32.4
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-06-08 20:35:25 +00:00
Eric Curtin
4a10c02716 Merge pull request #1481 from ieaves/imp/dev-dependency-groups
Adds dev dependency groups
2025-06-08 15:34:54 -05:00
Daniel J Walsh
4fe7ae73a1 Fix stopping of llama-stack based containers by name
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-08 11:54:24 +02:00
Daniel J Walsh
2ca6b57dc3 Fix handling of generate with llama-stack
llama-stack API is not working without --generate command.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2025-06-07 10:36:46 +02:00
Ian Eaves
f65529bda7 adds dev dependency groups
Signed-off-by: Ian Eaves <ian.k.eaves@gmail.com>
2025-06-06 18:12:33 -05:00
Nathan Weinberg
268e47ccc0 Merge pull request #1478 from nathan-weinberg/stack-bump
chore: bump 'ramalama-stack' version to 0.2.0
2025-06-05 16:15:03 -04:00
Nathan Weinberg
c59a507426 chore: bump 'ramalama-stack' version to 0.2.0
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-06-05 15:00:11 -04:00
Daniel J Walsh
fc9b33e436 Merge pull request #1477 from containers/no-warmup
Don't warmup by default
2025-06-05 14:46:30 -04:00
Eric Curtin
8d2041a0bb Don't warmup by default
llama-server by default warms up the model with an empty run for
performance reasons. We can warm up ourselves with a real query.
Warming up was causing issues and delays start time.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-05 19:42:41 +01:00
Daniel J Walsh
a67d8c1f6a Merge pull request #1476 from containers/env-var
Call set_gpu_type_env_vars rather than set_accel_env_vars
2025-06-05 14:05:08 -04:00
Eric Curtin
882011029c Call set_gpu_type_env_vars rather than set_accel_env_vars
For GPU detection.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-05 16:43:48 +01:00
Daniel J Walsh
f07a062124 Merge pull request #1475 from containers/env-var
Do not override a small subset of env vars
2025-06-05 11:00:31 -04:00
Eric Curtin
ff446f96fb Do not override a small subset of env vars
RamaLama does not try to detect GPU if the user has already set
certain env vars. Make this list smaller.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-05 14:01:45 +01:00
Daniel J Walsh
ef7bd2a004 Merge pull request #1467 from rhatdan/llama-stack
llama-stack container build fails with == 1.5.0
2025-06-05 01:39:04 -04:00
Daniel J Walsh
b990ef0392 Merge pull request #1469 from containers/timeout-change
Change timeouts
2025-06-04 20:13:44 -04:00
Eric Curtin
0bcf3b8308 Merge pull request #1468 from waltdisgrace/documentation_improvements
Documentation improvements
2025-06-04 11:38:55 -05:00
Eric Curtin
0455e45073 Change timeouts
Most we want to sleep between request attempts in 100ms, a request
every 100ms isn't that expensive.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-04 17:37:11 +01:00
Grace Chin
c4777a9ccc Add documentation about running tests
Signed-off-by: Grace Chin <gchin@redhat.com>
2025-06-04 11:55:57 -04:00