llama-swap ghcr.io/mostlygeek/llama-swap:cuda https://github.com/mostlygeek/llama-swap cuda CUDA GPU accelerated (NVIDIA) Extra Parameters: --gpus all cpu CPU only (no GPU required) rocm ROCm GPU accelerated (AMD) — best performance for AMD GPUs Extra Parameters: --device=/dev/kfd --device=/dev/dri --group-add video --security-opt seccomp=unconfined vulkan Vulkan GPU accelerated (AMD/Intel) Extra Parameters: --device=/dev/dri bridge false https://forums.unraid.net/topic/197997-support-llama-swap-hot-model-swapping-for-llamacpp/ https://github.com/mostlygeek/llama-swap https://raw.githubusercontent.com/PikkonMG/unraid-docker-templates/refs/heads/main/templates/llama-swap.xml llama-swap is a lightweight proxy server that provides automatic model swapping for llama.cpp (llama-server). It hot-swaps models on demand based on API requests so you can serve many GGUF models from a single endpoint without restarting. Features: automatic model loading/unloading, macros, aliases, groups for multi-model concurrency, TTL auto-unload, streaming log viewer, and OpenAI-compatible API. SETUP: 1. Place your GGUF model files in the Models path below. 2. Create a config.yaml (see template https://github.com/PikkonMG/unraid-docker-templates/blob/main/examples/llama-swap/example-llama-swap-config.yaml) and place it in the Config path. 3. In your config.yaml, reference models as /models/yourmodel.gguf 4. For NVIDIA GPU: install the Unraid Nvidia plugin, select the cuda tag, and set ExtraParams to: --gpus all 5. For AMD GPU: select the rocm tag and set ExtraParams to: --device /dev/kfd --device /dev/dri --group-add video --security-opt seccomp=unconfined 6. For Intel iGPU or other Vulkan-capable GPUs: select the vulkan tag and set ExtraParams to: --device /dev/dri 7. For CPU only: select the cpu tag and remove ExtraParams entirely AI: Productivity: Tools: Other http://[IP]:[PORT:8080]/ui https://raw.githubusercontent.com/PikkonMG/unraid-docker-templates/main/templates/img/llama-swap.png https://raw.githubusercontent.com/PikkonMG/unraid-docker-templates/main/docs/screenshots/llama-swap-screenshot1.png --gpus all -config /config/config.yaml -watch-config 8080 /mnt/user/appdata/llama-swap/models /mnt/user/appdata/llama-swap/config all all