# Dynamic Model Routing *Introduced in v2.19.0. Capability scoring introduced in v2.52.0.* Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters. Starting in v2.52.0, the router uses **capability-aware scoring** to select the *best fit* model for each task, not just the cheapest one in the tier. ## How It Works Each unit dispatched by auto-mode passes through a two-stage pipeline: **Stage 1: Complexity classification** — classifies the work into a tier (light/standard/heavy). **Stage 2: Capability scoring** — within the eligible tier, ranks available models by how well their capabilities match the task's requirements. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured. | Tier | Typical Work | Default Model Level | |------|-------------|-------------------| | **Light** | Slice completion, UAT, hooks | Cheapest capable light-tier model | | **Standard** | Research, planning, execution, milestone completion | Balanced standard-tier model | | **Heavy** | Replanning, roadmap reassessment, complex execution | Highest capability tier model | ## Enabling Dynamic routing is off by default. Enable it in preferences: ```yaml --- version: 1 dynamic_routing: enabled: true --- ``` ## Configuration ```yaml dynamic_routing: enabled: true tier_models: # explicit model per tier (optional) light: claude-haiku-4-5 standard: claude-sonnet-4-6 heavy: claude-opus-4-6 escalate_on_failure: true # bump tier on task failure (default: true) budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true) cross_provider: true # consider models from other providers (default: true) hooks: true # apply routing to post-unit hooks (default: true) capability_routing: true # enable capability scoring within tier (default: true) allow_flat_rate_providers: false # opt into routing for flat-rate providers (default: false) ``` ### `allow_flat_rate_providers` By default, dynamic routing is suppressed when the active provider is flat-rate (`claude-code`, GitHub Copilot, user-declared subscription proxies, or `externalCli` providers) because per-request cost is identical and downgrading only degrades quality. Set to `true` to opt into routing across a flat-rate subscription — useful when you want intelligent per-task selection (e.g. haiku for research, opus for architecture) within a single subscription's token budget: ```yaml dynamic_routing: enabled: true allow_flat_rate_providers: true cross_provider: false # recommended: keep routing inside the subscription ``` Keep `cross_provider: false` when enabling this flag unless every flat-rate provider you use exposes the full tier of models — otherwise the router may attempt to escape to a provider you haven't configured. ### `tier_models` Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families: - **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5.1-codex-mini`, `gpt-5.3-codex-spark`, `gpt-5.4-mini`, `gemini-2.0-flash` - **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gpt-4.1`, `gpt-5.1-codex-max`, `gemini-2.5-pro`, `deepseek-chat` - **Heavy:** `claude-opus-4-6`, `claude-opus-4-7`, `gpt-5`, `gpt-5-pro`, `gpt-5.1`, `gpt-5.2`, `gpt-5.2-codex`, `gpt-5.3-codex`, `gpt-5.4`, `gpt-5.5`, `o1`, `o3`, `o4-mini` Token profiles use the same tier mapping. `budget`, `balanced`, and `quality` declare per-phase tier intentions, then GSD resolves those tiers against the models currently available from your configured providers. This means a profile can resolve to OpenAI, Gemini, Anthropic, or another provider-specific model instead of hardcoding Claude-family defaults. ### `escalate_on_failure` When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning. ### `budget_pressure` When approaching the budget ceiling, the router progressively downgrades: | Budget Used | Effect | |------------|--------| | < 50% | No adjustment | | 50-75% | Standard → Light | | 75-90% | More aggressive downgrading | | > 90% | Nearly everything → Light; only Heavy stays at Standard | ### `cross_provider` When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured. ### `capability_routing` When enabled (default: true), the router uses capability scoring to pick the best model in a tier rather than always defaulting to the cheapest. Set to `false` to revert to cheapest-in-tier behavior: ```yaml dynamic_routing: enabled: true capability_routing: false # disable scoring, use cheapest-in-tier ``` ## Capability Profiles Each model has a built-in **capability profile** — a 7-dimension score (0–100) representing how well it handles different task types: | Dimension | What It Represents | |-----------|-------------------| | `coding` | Code generation and implementation accuracy | | `debugging` | Diagnosing and fixing errors | | `research` | Synthesizing information and exploring topics | | `reasoning` | Multi-step logical reasoning | | `speed` | Latency and throughput (inverse of capability depth) | | `longContext` | Handling large codebases and long documents | | `instruction` | Following structured instructions precisely | **Built-in profiles** ship for the Claude 4.6/4.7 family, the OpenAI GPT-4.x and GPT-5.x lines (including GPT-5.5, added v2.78), the o-series reasoning models (`o1`, `o3`, `o4-mini`, `o4-mini-deep-research`), Gemini 2.0/2.5, and `deepseek-chat`. The full table lives in `src/resources/extensions/gsd/model-router.ts` (`MODEL_CAPABILITY_PROFILES`). Models without a built-in profile receive **uniform scores of 50** across all dimensions. This is a cold-start policy — unknown models compete but don't have an advantage. From the user's perspective, routing behaves the same as before capability scoring was introduced for those models. **Profiles are heuristic rankings, not benchmarks.** They represent approximate relative strengths, not verified benchmark results. Use user overrides (below) to correct them for models you know well. ## How Scoring Works The routing pipeline within a tier: ``` classify complexity tier ↓ filter eligible models for tier ↓ fire before_model_select hook (optional override) ↓ capability score eligible models ↓ select winner (or first eligible if scoring is disabled) ``` **Scoring formula:** weighted average of capability dimensions ``` score = Σ(weight × capability) / Σ(weights) ``` **Task requirements** are dynamic — different task types weight dimensions differently: | Unit Type | Key Dimensions | |-----------|---------------| | `execute-task` | coding (0.9), instruction (0.7), speed (0.3) | | `research-*` | research (0.9), longContext (0.7), reasoning (0.5) | | `plan-*` | reasoning (0.9), coding (0.5) | | `replan-slice` | reasoning (0.9), debugging (0.6), coding (0.5) | | `complete-slice`, `run-uat` | instruction (0.8), speed (0.7) | For `execute-task`, requirements are further refined by task metadata signals: - Tags like `docs`, `config`, `readme` → boost instruction weight - Keywords like `concurrency`, `compatibility` → boost debugging and reasoning - Keywords like `migration`, `architecture` → boost reasoning and coding - Large file counts (≥6) or large estimated line counts (≥500) → boost coding and reasoning **Tie-breaking:** When two models score within 2 points of each other, the cheaper model wins. If costs are equal, lexicographic model ID breaks the tie (deterministic). ## User Overrides Correct built-in capability profiles for models you know well using `modelOverrides` in your models configuration: ```json { "providers": { "anthropic": { "modelOverrides": { "claude-sonnet-4-6": { "capabilities": { "debugging": 90, "research": 85 } } } } } } ``` Overrides are **deep-merged** with built-in defaults — only the specified dimensions are overridden; others retain their built-in values. **Use case:** You've found that a model consistently outperforms its built-in profile on specific task types. Override the relevant dimensions to steer the router toward that model for those tasks. ## Verbose Output When verbose mode is active, the router logs its routing decision. When capability scoring was used, the log includes a full scoring breakdown: ``` Dynamic routing [S]: claude-sonnet-4-6 (capability-scored) — claude-sonnet-4-6: 82.3, gpt-4o: 78.1, deepseek-chat: 72.0 ``` When tier-only routing was used (scoring disabled, single eligible model, or routing guards applied): ``` Dynamic routing [S]: claude-sonnet-4-6 (standard complexity, multiple steps) ``` The `selectionMethod` field in the routing decision indicates which path was taken: - `"capability-scored"` — capability scoring selected the winner - `"tier-only"` — cheapest in tier (or explicit pin) was used ## Extension Hook Extensions can intercept and override model selection using the `before_model_select` hook. The hook fires **after** tier filtering (eligible models are known) and **before** capability scoring (scores have not been computed yet). A hook can override selection entirely or return `undefined` to let scoring proceed normally. **Registering a handler:** ```typescript pi.on("before_model_select", async (event) => { const { unitType, unitId, classification, taskMetadata, eligibleModels, phaseConfig } = event; // Custom routing strategy: always use gemini for research tasks if (unitType.startsWith("research-")) { const gemini = eligibleModels.find(id => id.includes("gemini")); if (gemini) return { modelId: gemini }; } // Return undefined to let capability scoring proceed return undefined; }); ``` **Event payload:** | Field | Type | Description | |-------|------|-------------| | `unitType` | `string` | The unit type being dispatched (e.g., `"execute-task"`) | | `unitId` | `string` | Unique identifier for this unit dispatch | | `classification` | `{ tier, reason, downgraded }` | The complexity classification result | | `taskMetadata` | `Record \| undefined` | Task metadata extracted from the unit plan | | `eligibleModels` | `string[]` | Models eligible for the classified tier | | `phaseConfig` | `{ primary, fallbacks } \| undefined` | The user's configured model for this phase | **Return value:** `{ modelId: string }` to override selection, or `undefined` to defer to capability scoring. **First-override-wins:** If multiple extensions register handlers, the first one to return a non-undefined result wins. Subsequent handlers are not called. ## Complexity Classification Units are classified using pure heuristics — no LLM calls, sub-millisecond: ### Unit Type Defaults | Unit Type | Default Tier | |-----------|-------------| | `complete-slice`, `run-uat` | Light | | `research-*`, `plan-*`, `complete-milestone` | Standard | | `execute-task` | Standard (upgraded by task analysis) | | `replan-slice`, `reassess-roadmap` | Heavy | | `hook/*` | Light | ### Task Plan Analysis For `execute-task` units, the classifier analyzes the task plan: | Signal | Simple → Light | Complex → Heavy | |--------|---------------|----------------| | Step count | ≤ 3 | ≥ 8 | | File count | ≤ 3 | ≥ 8 | | Description length | < 500 chars | > 2000 chars | | Code blocks | — | ≥ 5 | | Complexity keywords | None | Present | **Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat` ### Adaptive Learning The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes. ## Interaction with Token Profiles Dynamic routing and token profiles are complementary: - **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression - **Dynamic routing** controls per-unit model selection within the configured phase model When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings. ## Cost Table The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output): | Model | Input | Output | |-------|-------|--------| | claude-haiku-4-5 | $0.80 | $4.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-opus-4-6 | $15.00 | $75.00 | | gpt-4o-mini | $0.15 | $0.60 | | gpt-4o | $2.50 | $10.00 | | gemini-2.0-flash | $0.10 | $0.40 | The cost table is used for comparison only — actual billing comes from your provider.