# Running Experiments Configure SLURM (optional), pre-download model weights, then launch a run. > ⬅ [Main README](../README.md)  ·  Prev: [« Installation](installation.md)  ·  Next: [Monitoring & logs »](monitoring.md) All commands run from the project root (`SpatialClaw/`) with `conda activate spatialagent`. --- ## SLURM Account, Partition, and Memory > Skip this section if you're running on a single machine without SLURM. The shipped configs reference one specific cluster's partitions and accounts. Before submitting jobs, **edit these three files** to match your cluster: | File | Fields to update | |------|------------------| | `spatial_agent/launch_managers/vllm_manager/models.json` | top-level `accounts`, and `partition` field of each model entry | | `spatial_agent/launch_managers/agent_manager/config.json` | `accounts`, `default_slurm.partition` (agents themselves are CPU-only) | | `spatial_agent/launch_managers/gpu_server_manager/config.json` | `accounts`, `default_slurm.partition` | Check what your cluster offers: ```bash sinfo -h -o "%P %f %G" # available partitions and GPU types sacctmgr show -P -n assoc where user=$USER format=Account | sort -u ``` **`--mem-per-gpu` hard cap.** vLLM's sbatch template requests `--mem-per-gpu=240G` and the GPU tool server requests `228G`. Some clusters refuse to allocate more than ~234 GB/GPU and abort submission. If you see: ``` sbatch: error: You requested 245760M RAM, but only 1 GPUs. For 1 GPUs, please only request MAX 234842M RAM ``` lower the `--mem-per-gpu` line in `spatial_agent/launch_managers/vllm_manager/server_chain.py` and `gpu_server_manager/server_chain.py` to fit your cluster's cap. --- ## Pre-download Model Weights > ⚠️ **Pre-download is mandatory, not optional.** The vLLM SLURM script (`spatial_agent/launch_managers/vllm_manager/run_vllm.sh`) sets `HF_HUB_OFFLINE=1` so compute jobs cannot reach huggingface.co, and most HPC compute nodes have no outbound internet anyway. Anything you don't pre-fetch into the HF cache here will crash the SLURM job with `LocalEntryNotFoundError`. All commands run from the login node with `conda activate spatialagent`. ### HuggingFace login ```bash huggingface-cli login ``` ### SAM3.1 weights (gated) 1. Accept the license at . 2. Download: ```bash mkdir -p tools/third_party/sam3/weights huggingface-cli download facebook/sam3.1 \ --local-dir tools/third_party/sam3/weights \ --include "*.pt" "*.txt.gz" ln -sf ../sam3/assets/bpe_simple_vocab_16e6.txt.gz \ tools/third_party/sam3/weights/bpe_simple_vocab_16e6.txt.gz ``` ### VLM backbones The six paper backbones (Hopper / H100+ required for FP8): ```bash huggingface-cli download Qwen/Qwen3.5-397B-A17B-FP8 huggingface-cli download Qwen/Qwen3.5-122B-A10B-FP8 huggingface-cli download Qwen/Qwen3.6-35B-A3B-FP8 huggingface-cli download Qwen/Qwen3.6-27B huggingface-cli download google/gemma-4-26B-A4B-it huggingface-cli download prithivMLmods/gemma-4-31B-it-FP8 ``` On A100 / L40S, swap the FP8 entries for AWQ / GPTQ variants from `vllm_manager/models.json`: ```bash huggingface-cli download cyankiwi/gemma-4-31B-it-AWQ-4bit # for Gemma-4-31B-IT-AWQ huggingface-cli download Qwen/Qwen3.5-397B-A17B-GPTQ-Int4 # for Qwen3.5-397B GPTQ ``` ### Reconstruction backbones The GPU tool server's `Reconstruct` ships with three backends — download only the one(s) you'll use: ```bash # Pi3 (default — fast, monocular + multi-view) huggingface-cli download yyfz233/Pi3X # Depth-Anything-3 (alternative — DA3NESTED, requires the -1.1 suffix) huggingface-cli download depth-anything/DA3NESTED-GIANT-LARGE-1.1 # MapAnything (multi-view, optional). Also requires the DINOv2-giant torch-hub # cache; place it under tools/third_party/torch_hub/. huggingface-cli download facebook/map-anything ``` Pick the backend at launch time via `--reconstruct_backend {pi3,da3,mapanything}` on the GPU server (see below). --- ## Launching a Run The three services are launched by **three independent managers** — start them in order: vLLM → GPU tool server → agent. ### Quickstart via launch managers (recommended) ```bash # Terminal 1 — start a vLLM server (interactive menu) python -m spatial_agent.launch_managers.vllm_manager # [1] Dashboard [2] Start Server [3] Stop Server [q] Quit # Pick [2], then choose the model, account, partition, and confirm. # Terminal 2 — start the GPU tool server python -m spatial_agent.launch_managers.gpu_server_manager # [1] Dashboard [2] Start GPU Server(s) [3] Stop GPU Server(s) [q] Quit # Pick [2], choose the number of GPUs and the Reconstruct backend (pi3 / da3 / mapanything). # Terminal 3 — start the agent run python -m spatial_agent.launch_managers.agent_manager # [1] Dashboard [2] Start Agent Experiment(s) [3] Start CoT Experiment(s) [4] Stop [q] Quit # Pick [2], then choose benchmark(s), model, concurrency, and confirm. ``` Each run is checkpointed and auto-resumes when a 4-hour SLURM job rolls over. > The GPU tool server is only required when the dataset config's `tools_to_use` is non-empty (i.e. uses `Reconstruct` / `SAM3`). For pure CoT runs, skip Terminal 2. ### Direct CLI (no SLURM) Single-machine flow against either a self-hosted vLLM or a hosted endpoint: ```bash # 1) (Optional) Start a local GPU tool server — only if your dataset config uses tools. python -m spatial_agent.entrypoints.launch_gpu_server \ --num_gpus 1 --reconstruct_backend pi3 # 2) Run an experiment. python -m spatial_agent.entrypoints.run \ --dataset spatial_agent/config/dataset/erqa.json \ --model spatial_agent/config/model/gemini-3-pro.json \ --concurrency 4 ``` For a vLLM-served model, launch vLLM separately and ensure your model config has `"llm_base_url": "vllm"` and `"llm_model"` set to the **`served_name`** (e.g. `"gemma-4-31b"`, not the HF path). See [Configuration](configuration.md) for the config schema. ### Reproducing paper tables * **Table 1 (main results)** — run each benchmark in `spatial_agent/config/dataset/` with the corresponding vLLM-served model in `spatial_agent/config/model/`. The same hyperparameters / system prompt / tool set are used everywhere; no per-benchmark overrides are required. * **Table 2 (action-interface comparison)** — pass `--executor_type {code,react,single_pass}` (default `code` is SpatialClaw; `react` is the structured tool-call interface; `single_pass` is the one-shot code baseline). ### CoT (no-tool) baseline ```bash python -m spatial_agent.launch_managers.agent_manager # pick [3] Start CoT Experiment(s) ``` Or directly: ```bash python -m spatial_agent.entrypoints.cot_baseline \ --dataset spatial_agent/config/dataset/erqa.json \ --model spatial_agent/config/model/qwen3.5-397b-a17b.json ``` --- > Next: [Monitoring & logs »](monitoring.md)  ·  Hitting errors? See [Troubleshooting](troubleshooting.md)