nvidia-nim-single
2026-03-12
false
AI: Tools:
AI,LLM,local-ai,NVIDIA,NIM,CUDA,inference,OpenAI API,chatbot,language model
nvcr.io/nim/meta/llama-3.2-3b-instruct:latest
nvcr.io
https://github.com/PikkonMG/unraid-docker-templates/
https://github.com/PikkonMG/unraid-docker-templates/blob/main/docs/NVIDIA-NIM_SINGLE.md
bridge
bash
false
https://forums.unraid.net/topic/197721-support-nvidia-nim-on-unraid-gpu-ai-inference-server/
https://build.nvidia.com/
NVIDIA NIM AI inference server for running LLMs locally on NVIDIA GPUs with CUDA acceleration and an OpenAI-compatible API.
Be sure to check out for NIM related support https://developer.nvidia.com/nim
DEFAULT MODEL: meta/llama-3.2-3b-instruct -- recommended for GPUs with 12 GB VRAM or less (RTX 3060, 3070, etc).
TO CHANGE MODELS: Update BOTH the Repository image tag AND the NIM_MODEL_NAME variable to matching values. Browse available models at https://build.nvidia.com/models
VRAM REQUIREMENTS (approximate):
- Llama 3.2 3B ~6 GB -- fits 8-12 GB cards
- Mistral 7B ~14 GB -- needs 16 GB+ (fp16 uses more than expected)
- Llama 3.1 8B ~22 GB -- needs 24 GB+
- Llama 3.1 70B ~80 GB -- multi-GPU only
BEFORE FIRST START -- REQUIRED STEPS (run once in Unraid terminal):
Before you can pull the image you must have a NVIDIA API key from https://build.nvidia.com. Generate a Personal API Key from your profile.
Step 1: Login to NGC registry (only needed once, persists until reboot):
docker login nvcr.io
Username: $oauthtoken
Password: YOUR_NGC_API_KEY
REQUIRED BEFORE FIRST START -- run in Unraid terminal:
Step 2: Fix cache directory permissions:
chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache
chmod -R 775 /mnt/user/appdata/nvidia-nim/cache
REQUIRES: NVIDIA GPU (Turing/RTX 20 series or newer) | nvidia-driver Unraid plugin | NGC API key from build.nvidia.com
URLS (replace YOUR_SERVER_IP with your Unraid IP):
WebUI / Swagger docs : http://YOUR_SERVER_IP:8000/docs
API base URL : http://YOUR_SERVER_IP:8000/v1 (use this in AnythingLLM, Open WebUI, etc.)
Models list : http://YOUR_SERVER_IP:8000/v1/models
http://[IP]:[PORT:8000]/docs
https://raw.githubusercontent.com/PikkonMG/unraid-docker-templates/main/templates/nvidia-nim-single.xml
https://raw.githubusercontent.com/PikkonMG/unraid-docker-templates/main/templates/img/nvidia-nim-single.png
--gpus all --shm-size=16gb --ulimit memlock=-1 --ulimit stack=67108864
NVIDIA GPU (Turing or newer) | nvidia-driver plugin (Community Applications) | NGC API Key (build.nvidia.com)
8000
/mnt/user/appdata/nvidia-nim/cache
meta/llama-3.2-3b-instruct
16384
/opt/nim/.cache
0
1
expandable_segments:True
INFO