2024-07-26 12:38:16 +01:00
|
|
|
# ramalama
|
|
|
|
|
|
2024-08-01 09:18:13 -04:00
|
|
|
The Ramalama project's goal is to make working with AI boring
|
|
|
|
|
through the use of OCI containers.
|
|
|
|
|
|
|
|
|
|
On first run Ramalama inspects your system for GPU support, falling back to CPU
|
|
|
|
|
support if no GPUs are present. It then uses container engines like Podman to
|
|
|
|
|
pull the appropriate OCI image with all of the software necessary to run an
|
|
|
|
|
AI Model for your systems setup. This eliminates the need for the user to
|
|
|
|
|
configure the system for AI themselves. After the initialization, Ramalama
|
|
|
|
|
will run the AI Models within a container based on the OCI image.
|
2024-07-26 12:38:16 +01:00
|
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
2024-08-22 12:37:14 +01:00
|
|
|
Install Ramalama by running this one-liner (on macOS run without sudo):
|
|
|
|
|
|
|
|
|
|
Linux:
|
2024-07-26 12:38:16 +01:00
|
|
|
|
|
|
|
|
```
|
2024-08-21 15:10:01 +01:00
|
|
|
curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.py | sudo python3
|
2024-07-26 12:38:16 +01:00
|
|
|
```
|
|
|
|
|
|
2024-08-22 12:37:14 +01:00
|
|
|
macOS:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.py | python3
|
|
|
|
|
```
|
|
|
|
|
|
2024-08-28 12:39:49 +01:00
|
|
|
## Hardware Support
|
|
|
|
|
|
|
|
|
|
| Hardware | Enabled |
|
|
|
|
|
| ---------------------------------- | ------- |
|
|
|
|
|
| CPU | :white_check_mark: |
|
|
|
|
|
| Apple Silicon GPU (macOS) | :white_check_mark: |
|
|
|
|
|
| Apple Silicon GPU (podman-machine) | :x: |
|
|
|
|
|
| Nvidia GPU (cuda) | :x: |
|
|
|
|
|
| AMD GPU (rocm) | :x: |
|
2024-08-15 13:46:45 -04:00
|
|
|
|
2024-09-10 15:51:15 -04:00
|
|
|
## COMMANDS
|
|
|
|
|
|
|
|
|
|
| Command | Description |
|
|
|
|
|
| ------------------------------------------------------ | ---------------------------------------------------------- |
|
|
|
|
|
| [ramalama-containers(1)](docs/ramalama-containers.1.md)| List all ramalama containers. |
|
|
|
|
|
| [ramalama-list(1)](docs/ramalama-list.1.md) | List all AI models in local storage. |
|
|
|
|
|
| [ramalama-login(1)](docs/ramalama-login.1.md) | Login to remote model registry. |
|
|
|
|
|
| [ramalama-logout(1)](docs/ramalama-logout.1.md) | Logout from remote model registry. |
|
|
|
|
|
| [ramalama-pull(1)](docs/ramalama-pull.1.md) | Pull AI Models into local storage. |
|
|
|
|
|
| [ramalama-push(1)](docs/ramalama-push.1.md) | Push AI Model (OCI-only at present) |
|
|
|
|
|
| [ramalama-rm(1)](docs/ramalama-rm.1.md) | Remove specified AI Model from local storage. |
|
|
|
|
|
| [ramalama-run(1)](docs/ramalama-run.1.md) | Run specified AI Model as a chatbot. |
|
|
|
|
|
| [ramalama-serve(1)](docs/ramalama-serve.1.md) | Serve specified AI Model as an API server. |
|
|
|
|
|
| [ramalama-stop(1)](docs/ramalama-stop.1.md) | Stop ramalaman container running an AI Model. |
|
|
|
|
|
| [ramalama-version(1)](docs/ramalama-version.1.md) | Display the ramalama version. |
|
|
|
|
|
|
2024-07-26 12:38:16 +01:00
|
|
|
## Usage
|
|
|
|
|
|
2024-07-31 09:35:00 -04:00
|
|
|
### Running Models
|
|
|
|
|
|
|
|
|
|
You can `run` a chatbot on a model using the `run` command. By default, it pulls from the ollama registry.
|
|
|
|
|
|
2024-08-01 09:18:13 -04:00
|
|
|
Note: Ramalama will inspect your machine for native GPU support and then will
|
|
|
|
|
use a container engine like Podman to pull an OCI container image with the
|
|
|
|
|
appropriate code and libraries to run the AI Model. This can take a long time to setup, but only on the first run.
|
2024-07-31 09:35:00 -04:00
|
|
|
```
|
2024-08-01 07:03:55 -04:00
|
|
|
$ ramalama run instructlab/merlinite-7b-lab
|
2024-08-01 09:18:13 -04:00
|
|
|
Copying blob 5448ec8c0696 [--------------------------------------] 0.0b / 63.6MiB (skipped: 0.0b = 0.00%)
|
|
|
|
|
Copying blob cbd7e392a514 [--------------------------------------] 0.0b / 65.3MiB (skipped: 0.0b = 0.00%)
|
|
|
|
|
Copying blob 5d6c72bcd967 done 208.5MiB / 208.5MiB (skipped: 0.0b = 0.00%)
|
|
|
|
|
Copying blob 9ccfa45da380 [--------------------------------------] 0.0b / 7.6MiB (skipped: 0.0b = 0.00%)
|
|
|
|
|
Copying blob 4472627772b1 [--------------------------------------] 0.0b / 120.0b (skipped: 0.0b = 0.00%)
|
|
|
|
|
>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
After the initial container image has been downloaded, you can interact with
|
2024-08-02 10:41:34 +01:00
|
|
|
different models, using the container image.
|
2024-08-01 09:18:13 -04:00
|
|
|
```
|
2024-08-01 16:12:00 +01:00
|
|
|
$ ramalama run granite-code
|
2024-08-01 09:18:13 -04:00
|
|
|
> Write a hello world application in python
|
|
|
|
|
|
|
|
|
|
print("Hello World")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
In a different terminal window see the running podman container.
|
|
|
|
|
```
|
|
|
|
|
$ podman ps
|
|
|
|
|
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
|
|
|
|
91df4a39a360 quay.io/ramalama/ramalama:latest /home/dwalsh/rama... 4 minutes ago Up 4 minutes gifted_volhard
|
2024-07-31 09:35:00 -04:00
|
|
|
```
|
|
|
|
|
|
2024-08-24 14:34:41 +01:00
|
|
|
### Listing Models
|
|
|
|
|
|
|
|
|
|
You can `list` all models pulled into local storage.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
$ ramalama list
|
|
|
|
|
NAME MODIFIED SIZE
|
|
|
|
|
ollama://tiny-llm:latest 16 hours ago 5.5M
|
|
|
|
|
huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M
|
|
|
|
|
ollama://granite-code:3b 5 days ago 1.9G
|
|
|
|
|
ollama://granite-code:latest 1 day ago 1.9G
|
|
|
|
|
ollama://moondream:latest 6 days ago 791M
|
|
|
|
|
```
|
|
|
|
|
### Pulling Models
|
|
|
|
|
|
|
|
|
|
You can `pull` a model using the `pull` command. By default, it pulls from the ollama registry.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
$ ramalama pull granite-code
|
|
|
|
|
################################################### 32.5%
|
|
|
|
|
```
|
|
|
|
|
|
2024-07-31 09:35:00 -04:00
|
|
|
### Serving Models
|
|
|
|
|
|
|
|
|
|
You can `serve` a chatbot on a model using the `serve` command. By default, it pulls from the ollama registry.
|
|
|
|
|
|
|
|
|
|
```
|
2024-08-01 07:03:55 -04:00
|
|
|
$ ramalama serve llama3
|
2024-07-31 09:35:00 -04:00
|
|
|
```
|
|
|
|
|
|
2024-07-26 12:38:16 +01:00
|
|
|
## Diagram
|
|
|
|
|
|
|
|
|
|
```
|
2024-07-26 12:53:42 +01:00
|
|
|
+---------------------------+
|
|
|
|
|
| |
|
|
|
|
|
| ramalama run granite-code |
|
|
|
|
|
| |
|
|
|
|
|
+-------+-------------------+
|
2024-07-26 12:38:16 +01:00
|
|
|
|
|
2024-07-26 12:45:47 +01:00
|
|
|
|
|
|
|
|
|
| +------------------+
|
|
|
|
|
| | Pull model layer |
|
2024-07-26 12:53:42 +01:00
|
|
|
+----------------------------------------->| granite-code |
|
2024-07-26 12:45:47 +01:00
|
|
|
+------------------+
|
|
|
|
|
| Repo options: |
|
|
|
|
|
+-+-------+------+-+
|
2024-07-26 12:38:16 +01:00
|
|
|
| | |
|
|
|
|
|
v v v
|
|
|
|
|
+---------+ +------+ +----------+
|
|
|
|
|
| Hugging | | quay | | Ollama |
|
|
|
|
|
| Face | | | | Registry |
|
|
|
|
|
+-------+-+ +---+--+ +-+--------+
|
|
|
|
|
| | |
|
|
|
|
|
v v v
|
|
|
|
|
+------------------+
|
2024-07-27 13:44:46 +01:00
|
|
|
| Start with |
|
|
|
|
|
| llama.cpp and |
|
|
|
|
|
| granite-code |
|
2024-07-26 12:38:16 +01:00
|
|
|
| model |
|
|
|
|
|
+------------------+
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## In development
|
|
|
|
|
|
|
|
|
|
Regard this alpha, everything is under development, so expect breaking changes, luckily it's easy to reset everything and re-install:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
rm -rf /var/lib/ramalama # only required if running as root user
|
|
|
|
|
rm -rf $HOME/.local/share/ramalama
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
and install again.
|
|
|
|
|
|
|
|
|
|
## Credit where credit is due
|
|
|
|
|
|
2024-09-09 00:23:05 +01:00
|
|
|
This project wouldn't be possible without the help of other projects like:
|
2024-07-26 12:38:16 +01:00
|
|
|
|
2024-09-09 00:23:05 +01:00
|
|
|
llama.cpp
|
|
|
|
|
whisper.cpp
|
|
|
|
|
vllm
|
|
|
|
|
podman
|
|
|
|
|
omlmd
|
|
|
|
|
huggingface
|
2024-07-26 12:38:16 +01:00
|
|
|
|
2024-09-09 00:23:05 +01:00
|
|
|
so if you like this tool, give some of these repos a :star:, and hey, give us a :star: too while you are at it.
|
2024-07-31 23:25:34 +01:00
|
|
|
|
|
|
|
|
## Contributors
|
|
|
|
|
|
|
|
|
|
Open to contributors
|
|
|
|
|
|
|
|
|
|
<a href="https://github.com/containers/ramalama/graphs/contributors">
|
|
|
|
|
<img src="https://contrib.rocks/image?repo=containers/ramalama" />
|
|
|
|
|
</a>
|