--- name: deepswarm description: Use when running parallel AI workers for any long-running or multi-turn batch API task. Auto-calculates optimal workers + stagger. Supports tiered delegation (V4 Pro orchestrator → V4 Flash workers). 99.95% API success rate at scale. version: 2.0.0 author: Hermes Agent license: MIT metadata: hermes: tags: [orchestration, parallel, workers, api-generation, swarm, batch, multi-turn, delegation, tiered-models] related_skills: [tmux-agent-orchestrator, hermes-agent] --- # DeepSwarm — Task-Agnostic Parallel Worker Orchestration Spawn N parallel API workers for **any** long-running or multi-turn batch task. Auto-calculates optimal worker count and stagger delay. Supports tiered model delegation: orchestrator plans with a frontier model (V4 Pro), workers execute with a cheaper model (V4 Flash). ## Overview DeepSwarm 2.0 generalizes the proven orchestration pattern from the 19,331-trace generation project to any batch API task. You define a task — translations, reasoning traces, code reviews, summarization — and DeepSwarm parallelizes it across optimal workers with the right stagger for your API. **The core insight:** API rate limits are a function of simultaneous connections, not total volume. Auto-calculated stagger + worker count = 99.95% success. ## When to Use - **Any** batch API task: generation, translation, summarization, extraction, classification - Long-running individual calls (30s+) that benefit from parallelization - Multi-turn tasks where each worker loops through conversation turns - Cost optimization via tiered delegation (orchestrator ≠ worker model) - Crash-resilient batch processing (checkpointed, idempotent) **Don't use for:** - Quick calls under 10s (overhead not worth it — just loop) - Tasks requiring inter-worker coordination (use `delegate_task`) - Real-time interactive sessions (use tmux-agent-orchestrator) ## Quick Start ```bash # Install hermes skills tap add amanning3390/deepswarm # Define your task (task.yaml) # Generate seeds python3 scripts/seed.py --task task.yaml # Launch — auto-optimizes workers, stagger, model routing python3 scripts/swarm.py --task task.yaml --total 1000 # Filter — repair JSON, validate structure, apply length thresholds python3 scripts/filter.py --input-dir output/ --output clean.jsonl --errors errors.jsonl ``` ## Task Definition (task.yaml) ```yaml # What to do task_type: generation # generation | translation | summarization | custom prompt_template: | You are an AI assistant. {{seed}} # Model routing (tiered delegation) orchestrator_model: deepseek-v4-pro # Plans, monitors, handles errors worker_model: deepseek-v4-flash # Executes batches (cheaper!) worker_api_base: https://api.deepseek.com/v1/chat/completions worker_max_tokens: 4096 # Execution control multi_turn: true # Workers loop through conversation turns max_turns: 20 # Max turns per worker conversation seeds_file: seeds.jsonl # Pre-generated task seeds # Worker optimization (auto-calculated if omitted) workers: auto # auto | N stagger: auto # auto | seconds batch_size: auto # auto | tasks per worker # Output output_dir: output/ output_format: jsonl # jsonl | json | parquet checkpoint_every: 10 # Save progress every N tasks # Optional: custom worker logic worker_script: custom_worker.py # Override default worker behavior ``` ## Tiered Model Delegation Orchestrator (V4 Pro) and workers (V4 Flash) can use different models: ``` User Task → V4 Pro (plans, monitors) ├─ V4 Flash Worker 0 → API → output/ ├─ V4 Flash Worker 1 → API → output/ ├─ V4 Flash Worker 2 → API → output/ └─ ... ``` **Why tiered delegation matters:** - V4 Pro costs ~3× V4 Flash per token - Orchestrator only plans + monitors (few calls) - Workers make thousands of calls — use the cheapest model that works - Typical savings: 60-70% vs using V4 Pro for everything **When to use same model for both:** - Task quality requires frontier reasoning at every step - Worker model doesn't support the required format - Budget allows it and quality is paramount ## Auto-Optimization When `workers: auto` and `stagger: auto`: 1. DeepSwarm runs a single calibration call to measure call duration 2. Calculates optimal workers: `min(8, floor(rate_limit / call_duration))` 3. Sets stagger: `call_duration / workers × 2` 4. Adjusts batch_size: `total / workers` **Calibration table (pre-computed):** | Call Duration | Workers | Stagger | Success | Throughput | |--------------|---------|---------|---------|------------| | <10s | 16 | 1s | 99.9% | ~5,760/hr | | 10-30s | 12 | 2s | 99.9% | ~1,440/hr | | 30-60s | 8 | 5s | 99.95% | ~440/hr | | 60-90s | 6 | 10s | 99.9% | ~240/hr | | >90s | 4 | 15s | 99.9% | ~96/hr | ## Multi-Turn Task Support For tasks requiring conversation loops (generation, debugging, interactive work): ``` Worker loop: for each seed: messages = [system_prompt, user_task] for turn in range(max_turns): response = api_call(messages, model=worker_model) messages.append({"role": "assistant", "content": response}) if task_complete(response): break if needs_tool_call(response): messages.append(simulate_tool_response(response)) ``` Each turn is an independent API call. Multi-turn tasks benefit most from parallelization because per-task latency is high. ## Task-Agnostic Worker Design The worker (`worker.py`) accepts a YAML task definition and executes any pipeline: ```python def run_task(seed, config): messages = build_messages(seed, config) for turn in range(config["max_turns"]): response = call_api(messages, config) if is_complete(response, config): return finish(response, messages) if needs_continuation(response, config): messages = append_turn(messages, response, config) return messages ``` Built-in task types: - `generation` — Generate content from seed (the trace generation pattern) - `translation` — Translate each seed text - `summarization` — Summarize each seed document - `classification` — Classify each seed input - `custom` — Uses `worker_script` for completely custom logic ## Common Pitfalls 1. **`workers: auto` choosing too many.** If calibration call was fast but actual calls are slow, override manually. 2. **Forgetting to stagger between workers.** Even 2 workers at same millisecond can trigger rate limits on slow APIs. 3. **Mixing models without checking format compatibility.** Worker model must support the same prompt format as orchestrator. 4. **Not checkpointing.** Worker dies at 120/125 = lost work. Checkpoint every 10. 5. **Shell `&` without `wait`.** Without `wait`, shell exits early and kills child workers. 6. **Using V4 Pro for workers when V4 Flash would work.** Check quality on 10-sample test before committing to expensive model. 7. **Not deleting error outputs before restarting.** Error files consume indices and inflate disk. ## Verification Checklist - [ ] Task YAML has valid model names and API base URL - [ ] API key exported for both orchestrator and worker models - [ ] `workers: auto` or manual count ≤ 8 per batch - [ ] `stagger: auto` or manual ≥ call_duration / workers × 2 - [ ] Worker model tested on 5-sample run before full batch - [ ] Output directory exists and is writable - [ ] Seeds file exists with correct format - [ ] Checkpointing enabled for runs >100 tasks - [ ] Orchestrator model is V4 Pro (or equivalent frontier) for planning - [ ] Worker model is V4 Flash (or cheapest model that handles the task)