--- name: adaptive-subagents user-invokable: true description: "Optimize token cost by routing Agent tool delegations to the cheapest sufficient model.\nTRIGGER when: about to use the Agent tool, any multi-step task, implementation, debugging, refactoring, search, exploration, test writing, code review, or any task that could be decomposed into subtasks.\nDO NOT TRIGGER when: single-line fix, trivial question, or user says 'no routing'." --- # Adaptive Subagent Routing ## Quick Decision (apply to EVERY delegation) **Default: Haiku.** Escalate only with reason. 1. Is this a **mechanical lookup**? (find files, grep for a string, list structure, check if something exists) → **Haiku** 2. Does it require **comprehension or writing**? (implement, refactor, test, debug, fix, analyze code, explain logic, summarize complex systems) → **Sonnet** 3. Is it **architectural, security-sensitive, or ambiguous**? → **Opus** **Haiku boundary:** Haiku can *find* things and *format* things. It cannot *understand* or *reason about* code. If the task requires reading code and making a judgment, that's Sonnet. If unsure between two tiers, pick the cheaper one. Escalate after failure, not before. Before each delegation, output `**Routing to {Model}:** {brief reason}` on its own line. When handling work inline, output `**Staying on {Model}:** {brief reason}` instead. ## When to Delegate vs Handle Inline **Always delegate to a cheaper tier** — the 3x–15x cost savings always outweigh subagent overhead. **Skip delegation** only when: - The output is under ~100 tokens (one-liner answers, quick fixes already in context) - The overhead of spawning a subagent exceeds the work itself - The subtask needs the same model tier you're already on **and** context is already available ## Routing Rules | Haiku (1x) | Sonnet (3x) | Opus (15x) | |---|---|---| | File search, grep, glob | Implementation, bug fix | Architecture, migration | | Format/lint existing text | Refactor (< 10 files) | Security review | | Check if file/string exists | Test writing | Ambiguous/conflicting scope | | List files, directory structure | Debugging with stack traces | Multi-system design | | Literal string replacements | Code generation (< 500 lines) | Performance analysis | | | Summarize/analyze code logic | | | | Rename across multiple files | | | | Code review, explain code | | **Escalate +1 tier:** ambiguous requirements, public-facing output, security-sensitive, production-critical, or 2 consecutive failures. **Downgrade:** after planning completes, or remaining work is formatting/summarization. ## Cost Ratios | Model | Relative Cost | Pricing | |---|---|---| | Haiku (1x) | Baseline | ~$1/M input, $5/M output | | Sonnet (3x) | 3x Haiku | ~$3/M input, $15/M output | | Opus (15x) | 15x Haiku | ~$15/M input, $75/M output | A 10-delegation task routed to Haiku instead of Opus saves ~93% on those calls. ## Per-Subtask Routing When a request involves multiple steps: 1. **Decompose first.** Break the request into a todo list of subtasks. 2. **Route each subtask independently.** Most items are cheaper than the overall task. 3. **Downgrade after planning.** If Opus produces a plan, implementation goes to Sonnet. Lookups and cleanup go to Haiku. Example — "design and implement a caching layer": - Plan the architecture → Opus - Search for existing cache usage → Haiku - Implement the cache module → Sonnet - Write tests → Sonnet - Update internal dev notes → Haiku - Write public API docs → Sonnet (escalated: public-facing) ## Delegation Choose **agent type by task** and **model by complexity** independently. | Agent Type | Use When | Tools Available | |---|---|---| | `Explore` | Search, grep, read-only exploration | Read, Glob, Grep (no Edit/Write) | | `general-purpose` | Implementation, edits, multi-step work | All tools | | `Plan` | Architecture design, implementation planning | Read, Glob, Grep (no Edit/Write) | Combine freely — `Explore` + `haiku` for file lookups, `general-purpose` + `sonnet` for implementation, `Explore` + `sonnet` for complex investigation, `Plan` + `sonnet` for straightforward design. ``` Agent(subagent_type: "Explore", model: "haiku", description: "...", prompt: "...") Agent(subagent_type: "general-purpose", model: "sonnet", description: "...", prompt: "...") Agent(subagent_type: "Explore", model: "sonnet", description: "...", prompt: "...") Agent(subagent_type: "Plan", model: "sonnet", description: "...", prompt: "...") ``` Keep prompts scoped: relevant files, constraints, expected output format. The `description` parameter is required — use a short 3-5 word summary. ### Expected Output Formats | Tier | Expected Output | |---|---| | Haiku | File paths, grep results, existence checks, formatted text | | Sonnet | Code diffs, implementation files, test files, error analysis | | Opus | Numbered plan steps, architecture decisions with rationale, risk assessment | ## Parallel Fan-Out When a request contains independent subtasks, launch multiple subagents in a single message. **Before splitting, build a dependency graph.** Does any subtask need another's output, or do they modify the same file? If yes, run them sequentially. | Pattern | Parallel? | Why | |---|---|---| | Separate files, separate concerns | Yes | No shared state | | Research + unrelated implementation | Yes | Read-only doesn't conflict with writes | | Implementation + its tests | **No** | Tests depend on the implementation | | Feature + docs describing that feature | **No** | Docs need to reflect what was built | | Two features touching different modules | Yes | Independent code paths | | Refactor + anything in same files | **No** | Refactor changes the baseline | After all phases return, review outputs for conflicts before applying. ## User Overrides If the user says any of the following, respect it for the current request: - **"no routing"** / **"don't delegate"** — handle everything inline on the current model. - **"use Opus"** / **"use Sonnet"** / **"use Haiku"** — force that tier for all delegations. Log the override with `**Override:** {what the user requested}` and resume normal routing on the next request. ## Routing Log After each routing decision, append a line to `routing.log` in the project root using the Bash tool: ``` echo "{model} | {agent_type or inline} | {estimated savings vs Opus, e.g. ~93%} | {brief task description}" >> routing.log ``` Log every `**Routing to**` and `**Staying on**` decision. ## Guardrails - Max 2 retries per subagent, then escalate or ask the user. - Max 5 sequential delegations per user request. No limit on parallel. - Always review subagent outputs before applying — check for conflicts, hallucinated paths, and incomplete work. - **Do NOT modify project files:** Never write routing rules to CLAUDE.md. Never add hooks to settings.json. This skill is loaded automatically by the plugin.