--- name: deploy-deepseek-mlc description: "Deploy DeepSeek on Jetson Orin using MLC (Machine Learning Compilation) for optimized edge inference. Uses Docker/jetson-containers. Requires Jetson with >8GB RAM and JetPack 5.1.1+." --- # Deploy DeepSeek on Jetson with MLC --- ## Execution model Run one phase at a time. After each phase: - Relay all command output to the user. - If output contains `[STOP]` → stop immediately, consult the failure decision tree below. - If output ends with `[OK]` → tell the user "Phase N complete" and proceed to the next phase. --- ## Prerequisites | Requirement | Minimum | |-------------|---------| | Hardware | reComputer J4012 (Jetson Orin NX 16GB) or equivalent | | RAM | >8 GB (16 GB recommended for DeepSeek-R1 7B+) | | JetPack | 5.1.1+ (JetPack 6.x preferred) | | Storage | SSD strongly recommended — model weights are large | | Internet | Required for Docker pull and model download | --- ## Phase 1 — Preflight Verify JetPack version, available RAM, and disk space before touching Docker. ```bash cat /etc/nv_tegra_release free -h df -h / df -h /ssd 2>/dev/null || true ``` Expected: L4T R35.x (JP5) or R36.x (JP6), ≥8 GB RAM free, ≥50 GB disk available. `[OK]` when all three pass. `[STOP]` if RAM or disk is insufficient. --- ## Phase 2 — Install Docker + nvidia-container ```bash sudo apt update # JetPack 5.x sudo apt install -y nvidia-container # JetPack 6.x — also install curl, then Docker sudo apt install -y nvidia-container curl curl https://get.docker.com | sh sudo systemctl --now enable docker # Add current user to docker group sudo usermod -aG docker $USER newgrp docker ``` Verify: ```bash docker --version docker run --rm --runtime nvidia --gpus all ubuntu:22.04 nvidia-smi ``` Expected: `nvidia-smi` output shows the Jetson GPU. `[OK]` when GPU is visible inside the container. ### Move Docker storage to SSD (strongly recommended) Edit `/etc/docker/daemon.json`: ```json { "data-root": "/ssd/docker", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } ``` ```bash sudo systemctl restart docker docker info | grep "Docker Root Dir" ``` `[OK]` when `Docker Root Dir` points to your SSD path. --- ## Phase 3 — Pull MLC container and download DeepSeek model ```bash # JP5.x: docker pull dustynv/mlc-llm:r35.4.1 # JP6.x: docker pull dustynv/mlc-llm:r36.2.0 docker images | grep mlc-llm ``` Download model weights inside the container: ```bash docker run -it --rm \ --runtime nvidia \ --network host \ -v /ssd/models:/models \ dustynv/mlc-llm:r36.2.0 \ bash -c "huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local-dir /models/deepseek-r1-7b" ``` `[OK]` when model files are present under `/ssd/models/`. `[STOP]` if download fails — see failure decision tree. --- ## Phase 4 — Launch inference ```bash docker run -it --rm \ --runtime nvidia \ --network host \ -v /ssd/models:/models \ dustynv/mlc-llm:r36.2.0 \ python3 -m mlc_llm serve /models/deepseek-r1-7b \ --device cuda \ --host 0.0.0.0 \ --port 8080 ``` Test the endpoint: ```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-r1-7b","messages":[{"role":"user","content":"Hello"}]}' ``` `[OK]` when the API returns a JSON response with a completion. > For full step-by-step commands, screenshots, and model configuration options, read `references/source.body.md`. --- ## Failure decision tree | Symptom | Action | |---------|--------| | `docker: command not found` | Re-run the `curl https://get.docker.com \| sh` step. Confirm `sudo systemctl enable --now docker`. | | `nvidia-container` install fails | Confirm JetPack version with `cat /etc/nv_tegra_release`. JP5 and JP6 have different package names — check `references/source.body.md` for the exact apt source. | | `nvidia-smi` not visible inside container | nvidia-container-runtime not configured. Verify `/etc/docker/daemon.json` has the `nvidia` runtime entry and restart Docker. | | OOM / killed during inference | Model too large for available RAM. Try a smaller distill variant (1.5B or 7B). Ensure no other heavy processes are running. | | Model download fails / times out | Check internet connectivity. Retry with `huggingface-cli download --resume-download`. If HuggingFace is blocked, use a mirror or pre-download on another machine. | | `docker pull` fails with no space | Docker root is on eMMC. Move Docker data root to SSD (Phase 2 SSD step). | | Inference endpoint returns 500 | Model path inside container may be wrong. Verify the `-v` mount and the path passed to `mlc_llm serve`. | --- ## Reference files - `references/source.body.md` — full original Seeed tutorial with complete MLC configuration, model options, and effect demonstration (reference only)