# ADR-073: Cost-Optimal Model Routing via @metaharness/router **Status**: Proposed **Date**: 2026-06-23 **Decision Makers**: RUV, Claude Flow Team **Related**: ADR-072 (RuVector Advanced Features), ADR-074 (Darwin TDR), ADR-075 (Harness Self-Evolution) **Affected packages**: `agentic-flow` (`src/router/`, `src/routing/`, `src/billing/`) ## Context `agentic-flow` already routes work three ways, but **none of them selects a model by predicted quality-per-cost from measured eval data**: - `src/router/router.ts` (`ModelRouter`) — picks **providers/models** (anthropic / openrouter / onnx / gemini / ollama) by **static config rules**. - `src/routing/TinyDancerRouter.ts` — FastGRNN routing to one of N **agent types** (`numAgents`); no `costPerMTok` / `qualityBar` concept. - `src/routing/SemanticRouter.ts` — HNSW intent → **agent-type** matching. `@metaharness/router` (`ruvnet/agent-harness-generator`, ADR-040/043) productizes the **DRACO Phase-2 finding**: routing each query to the *cheapest model predicted to clear a quality bar* is a measured Pareto win — a learned embedding router beat the best fixed model, with accuracy rising monotonically with training data. It exposes k-NN, a regularised kernel-ridge `TrainedRouter` (portable JSON, no model files), and an optional native FastGRNN backend via `@ruvector/tiny-dancer`. ### Why this is additive, not redundant - It selects **models by predicted cost-quality**, a capability `ModelRouter` lacks. - `@ruvector/tiny-dancer` is **already a dependency** of `agentic-flow` — the native backend ships for free. - The pure-TS path is **dependency-free** (no native deps, no network, no model files) — zero supply-chain risk. - It consumes data `agentic-flow` **already produces**: embeddings (`@xenova/transformers` / ruvector ONNX), per-model quality from `benchmark-results/`, and price tables from `src/billing/pricing`. ## Decision Introduce a **cost-optimal routing mode** in `src/router/` that wraps `@metaharness/router`: 1. Add an optional `routingStrategy: "cost-optimal"` to `RouterConfig` (default remains current config-rule behavior — no breaking change). 2. Build candidate sets from existing provider/model definitions, attaching `costPerMTok` from `src/billing/pricing` and labelled `examples` (`{ embedding, quality }`) from eval/benchmark logs. 3. At route time, embed the incoming query with the existing embedding service and call `Router.route(embedding)` → return the cheapest candidate predicted to clear `qualityBar`. 4. Use `resolveRouterBackend('auto')` — native FastGRNN when `@ruvector/tiny-dancer` is present (it is), pure-TS `TrainedRouter` otherwise. 5. Persist the trained model as portable JSON under `~/.agentic-flow/` alongside `router.config.json`. 6. Feed routing outcomes back into the eval log so accuracy improves with usage (the DRACO learning curve). ## Consequences **Positive** - Direct cost reduction (DRACO: cheap model matches frontier quality at ~10× lower cost) on the eval data already collected. - No new heavyweight dependency; pure-TS fallback keeps the security posture intact. - Quality-bar is explicit and tunable per deployment; falls back to best-predicted when no candidate clears the bar. **Negative / risks** - Requires a minimum of labelled eval examples per candidate to be useful (cold-start); until then it degrades to best-predicted, effectively current behavior. - Adds an embedding step on the hot path (mitigated: <10ms with existing ONNX/ruvector embeddings; cache by query hash). - Eval-log quality scoring must be trustworthy; garbage-in degrades routing (gate on the same scorer used in benchmarks). **Neutral** - Existing routers are untouched; this is a new opt-in strategy behind a config flag. ## Implementation sketch ``` src/router/cost-optimal-router.ts # adapter over @metaharness/router src/router/router.ts # add strategy switch; default unchanged src/billing/pricing # source of costPerMTok ~/.agentic-flow/router.model.json # persisted TrainedRouter (portable JSON) ``` Surface as `agentic-flow --route cost-optimal` and via `router.config.json` `{ "routingStrategy": "cost-optimal", "qualityBar": 0.8 }`.