### [llama.cpp](https://github.com/ggerganov/llama.cpp)
> Handle: `llamacpp`
> URL: [http://localhost:33831](http://localhost:33831)

[](https://opensource.org/licenses/MIT)
[](https://github.com/ggerganov/llama.cpp/actions/workflows/server.yml)
[](https://conan.io/center/llama-cpp)
LLM inference in C/C++. Allows to bypass Ollama release cycle when needed - to get access to the latest models or features.
#### Starting
`llamacpp` docker image is quite large due to dependency on CUDA and other libraries. You might want to pull it ahead of time.
```bash
# [Optional] Pull the llamacpp
# images ahead of starting the service
harbor pull llamacpp
# Start the llama.cpp service
harbor up llamacpp
# Tail service logs
harbor logs llamacpp
# Open llamacpp Web UI
harbor open llamacpp
```
- Harbor will automatically allocate GPU resources to the container if available, see [Capabilities Detection](./3.-Harbor-CLI-Reference.md#capabilities-detection).
- `llamacpp` will be connected to `aider`, `anythingllm`, `boost`, `chatui`, `cmdh`, `opint`, `optillm`, `plandex`, `traefik`, `webui` services when they are running together.
#### Models
You can find GGUF models to run on Huggingface [here](https://huggingface.co/models?sort=trending&search=gguf). After you find a model you want to run, grab the URL from the browser address bar and pass it to the [`harbor config`](./3.-Harbor-CLI-Reference#harbor-config)
```bash
# Quick lookup for the models
harbof hf find gguf
# 1. With llama.cpp own cache:
#
# - Set the model to run, will be downloaded when llamacpp starts
# Accepts a full URL to the GGUF file (from Browser address bar)
harbor llamacpp model https://huggingface.co/user/repo/file.gguf
# TIP: You can monitor the download progress with a one-liner below
# Replace "" with the unique portion from the "file.gguf" URL above
du -h $(harbor find .gguf | grep )
# 2. Shared HuggingFace Hub cache, single file:
#
# - Locate the GGUF to download, for example:
# https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/blob/main/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Download a single file:
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Locate the file in the cache
harbor find Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Set the GGUF to llama.cpp
# "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--bartowski--Meta-Llama-3.1-70B-Instruct-GGUF/snapshots/83fb6e83d0a8aada42d499259bc929d922e9a558/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# 3. Shared HuggingFace Hub cache, whole repo:
#
# - Locate and download the repo in its entirety
harbor hf download av-codes/Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Find the files from the repo
harbor find Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Set the GGUF to llama.cpp
# "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--av-codes--Trinity-2-Codestral-22B-Q4_K_M-GGUF/snapshots/c0a1f7283809423d193025e92eec6f287425ed59/trinity-2-codestral-22b-q4_k_m.gguf
```
> [!NOTE]
> Please, note that this procedure doesn't download the model. If model is not found in the cache, it will be downloaded on the next start of `llamacpp` service.
Downloaded models are stored in the global `llama.cpp` cache on your local machine (same as native version uses). The server can only run one model at a time and must be restarted to switch models.
#### Configuration
You can provide additional arguments to the `llama.cpp` CLI via the `LLAMACPP_EXTRA_ARGS`. It can be set either with Harbor CLI or in the `.env` file.
```bash
# See llama.cpp server args
harbor run llamacpp --server --help
# Set the extra arguments
harbor llamacpp args '--max-tokens 1024 -ngl 100'
# Edit the .env file
HARBOR_LLAMACPP_EXTRA_ARGS="--max-tokens 1024 -ngl 100"
```
You can add `llamacpp` to default services in Harbor:
```bash
# Add llamacpp to the default services
# Will always start when running `harbor up`
harbor defaults add llamacpp
# Remove llamacpp from the default services
harbor defaults rm llamacpp
```
Following options are available via [`harbor config`](./3.-Harbor-CLI-Reference#harbor-config):
```bash
# Location of the llama.cpp own cache, either global
# or relative to $(harbor home)
LLAMACPP_CACHE ~/.cache/llama.cpp
# The port on the host machine where the llama.cpp service
# will be available
LLAMACPP_HOST_PORT 33831
```
#### Environment Variables
Follow Harbor's [environment variables guide](./1.-Harbor-User-Guide#environment-variables) to set arbitrary variables for `llamacpp` service.
#### `llama.cpp` CLIs and scripts
`llama.cpp` comes with a lot of helper tools/CLIs, which all can be accessed via the `harbor exec llamacpp` command (once the service is running).
```bash
# Show the list of available llama.cpp CLIs
harbor exec llamacpp ls
# See the help for one of the CLIs
harbor exec llamacpp ./scripts/llama-bench --help
```