---
name: model-routing
description: Intelligent model selection for Claude Code — decision matrices, cost tables, budget planning, and subagent model assignment for optimal cost/quality tradeoffs
allowed-tools:
  - Read
  - Grep
  - Glob
  - Bash
triggers:
  - model selection
  - which model
  - cost estimate
  - token budget
  - routing
  - haiku vs sonnet
  - sonnet vs opus
  - model recommendation
  - expensive session
---

# Model Routing Intelligence

Select the right Claude model for each task to optimize the cost/quality tradeoff.

## Goal

Eliminate wasted spend by routing tasks to the cheapest model that produces acceptable quality, while ensuring complex tasks get the reasoning depth they need.

## Decision Matrix

### Task → Model mapping

| Task Type | Recommended Model | Reasoning |
|-----------|-------------------|-----------|
| Architecture decisions | Opus 4.6 | Needs deep multi-step reasoning, hidden coupling detection |
| Complex debugging | Opus 4.6 | Root cause analysis requires holding many hypotheses |
| Security review | Opus 4.6 | Must not miss subtle vulnerabilities |
| Standard implementation | Sonnet 4.6 | Best balance of speed, quality, and cost for code generation |
| Code review | Sonnet 4.6 | Good pattern recognition at reasonable cost |
| Refactoring | Sonnet 4.6 | Mechanical transformations with quality checks |
| Test writing | Sonnet 4.6 | Formulaic but needs understanding of code under test |
| File search / grep | Haiku 4.5 | Simple lookup, no deep reasoning needed |
| Documentation lookup | Haiku 4.5 | Reading and summarizing existing content |
| Commit message generation | Haiku 4.5 | Short, formulaic output |
| Simple Q&A | Haiku 4.5 | Direct answers, no complex analysis |
| Research subagents | Haiku 4.5 | Exploration tasks that return summaries |

### Complexity signals

Use these signals to decide when to escalate from Sonnet to Opus:
- Multiple interacting systems or modules
- Non-obvious failure modes
- "Why does this work?" questions
- Tasks where a wrong answer is expensive to fix
- Cross-cutting concerns (auth, caching, observability)
- Migration or backward-compatibility requirements

Use these signals to downgrade from Sonnet to Haiku:
- Single-file changes
- Mechanical transformations (rename, reformat)
- Reading and summarizing (no generation)
- Answering factual questions about code

## Cost Tables

### Per-token pricing (USD per million tokens)

| Model | Input | Output | Cache Write | Cache Read |
|-------|------:|-------:|------------:|-----------:|
| Opus 4.6 | $15.00 | $75.00 | $18.75 | $1.50 |
| Sonnet 4.6 | $3.00 | $15.00 | $3.75 | $0.30 |
| Haiku 4.5 | $0.80 | $4.00 | $1.00 | $0.08 |

### Cost multipliers

| Comparison | Input | Output |
|-----------|------:|-------:|
| Opus vs Sonnet | 5x | 5x |
| Sonnet vs Haiku | 3.75x | 3.75x |
| Opus vs Haiku | 18.75x | 18.75x |

### Typical session costs

| Task | Model | Est. Tokens (in/out) | Est. Cost |
|------|-------|---------------------:|----------:|
| Simple bug fix | Sonnet | 50k/10k | ~$0.30 |
| Feature implementation | Sonnet | 200k/50k | ~$1.35 |
| Architecture review | Opus | 200k/30k | ~$5.25 |
| Quick lookup | Haiku | 20k/2k | ~$0.02 |
| Research subagent | Haiku | 80k/10k | ~$0.10 |
| Full code review (council) | Mixed | 500k/100k | ~$3-8 |

## Subagent Model Assignment

### Orchestration patterns

When using `cc-orchestrate` or spawning subagents, assign models by role:

```
Research agents     → Haiku (cheap exploration, summary return)
Implementation agents → Sonnet (code generation quality)
Review/audit agents → Sonnet or Opus (depends on risk)
Architecture agents → Opus (deep reasoning required)
```

### Example: builder-validator template

```
builder agent   → Sonnet 4.6 (writes code)
validator agent → Sonnet 4.6 (reviews code)
```

### Example: research-council template

```
researcher agents (3x) → Haiku 4.5 (parallel exploration)
synthesizer agent      → Sonnet 4.6 (combines findings)
```

## Budget Planning

### Setting a session budget

Before starting a task, estimate cost:

1. **Classify the task** using the decision matrix above
2. **Estimate token volume** based on file count and task scope
3. **Calculate cost** using the pricing table
4. **Set model** with `/model` or `claude -m`

### Token estimation rules of thumb

| Content Type | Tokens per Line |
|-------------|----------------:|
| TypeScript/JavaScript | ~10 |
| Python | ~8 |
| JSON/YAML | ~6 |
| Markdown | ~5 |
| Minified code | ~15 |

### Cost control techniques

1. **Start with Haiku for research**, switch to Sonnet for implementation
2. **Use subagents** to isolate expensive research from main context
3. **Compact early** at 60-70% context to avoid expensive re-reads
4. **Limit tool output** — avoid `cat`-ing entire large files; use Grep with limits
5. **Batch related tasks** to benefit from prompt caching (cache read = 10% of input cost)
6. **Use `--max-turns`** in headless mode to cap automated sessions

### Model switching workflow

```bash
# Start with research on Haiku
/model claude-haiku-4-5-20251001
# "Find all files related to auth, summarize the architecture"

# Switch to Sonnet for implementation
/model claude-sonnet-4-6
# "Implement the new auth middleware based on the research above"

# Switch to Opus for the tricky part
/model claude-opus-4-6
# "Review the session handling for race conditions and edge cases"
```

## Environment Variables

```bash
CLAUDE_MODEL=claude-sonnet-4-6          # Default model for sessions
ANTHROPIC_MODEL=claude-sonnet-4-6       # Alternative env var
```

## Settings Configuration

```json
{
  "model": "claude-sonnet-4-6",
  "smallFastModel": "claude-haiku-4-5-20251001"
}
```

The `smallFastModel` is used for internal operations like skill matching and context compression. Keep it on Haiku for cost efficiency.

## Anti-patterns

- Using Opus for everything — 5x the cost of Sonnet with marginal quality improvement on simple tasks
- Using Haiku for complex implementation — saves money but produces lower-quality code that needs more iterations
- Not using subagents — research in main context inflates token count for every subsequent turn
- Re-reading large files — each read costs tokens; anchor important content instead
- Ignoring cache hits — restructure prompts to maximize cache read tokens (10% of input cost)