# Intelligent Routing & Model Tiering

Lynkr automatically routes each request to the right model based on complexity — no caller changes, no manual labels.

---

## Overview

```
Request → Force Patterns → Tool Thresholds → Complexity Analysis → Agentic Detection → Tier Selection → Cost Optimization → Provider
```

**Benchmarked routing accuracy (June 2026):**

| Request | Lynkr routes to | Correct? |
|---|---|---|
| "What does git stash do?" | SIMPLE → local model | ✅ |
| "Edit config file to set DEBUG=true" | SIMPLE → local model | ✅ |
| "Analyse JWT vs httpOnly cookies security for banking" | COMPLEX → cloud model | ✅ |
| "Debug race condition in async auth middleware" | COMPLEX → cloud model | ✅ |

**Key benefits:**
- Routes simple requests to cheap/local models automatically
- Escalates complex and risk-sensitive requests to capable cloud models
- Automatic agentic workflow detection with tier upgrades
- 15-dimension complexity scorer — not just token count

---

## 4-Tier Model System

Every request is mapped to one of four complexity tiers:

| Tier | Score Range | Description | Example Tasks |
|------|-----------|-------------|---------------|
| **SIMPLE** | 0-25 | Greetings, simple Q&A, confirmations | "Hello", "What is a variable?", "Yes" |
| **MEDIUM** | 26-50 | Code reading, simple edits, research | "Read this file", "Fix this typo", "Search for X" |
| **COMPLEX** | 51-75 | Multi-file changes, debugging, architecture | "Refactor auth module", "Debug this race condition" |
| **REASONING** | 76-100 | Complex analysis, security audits, novel problems | "Security audit", "Design microservices architecture" |

### Configuration

Tiers are configured via mandatory environment variables in `provider:model` format:

```bash
# Required - one per tier
TIER_SIMPLE=ollama:llama3.2
TIER_MEDIUM=openai:gpt-4o
TIER_COMPLEX=openai:o1-mini
TIER_REASONING=openai:o1

# Examples with other providers
TIER_SIMPLE=ollama:qwen2.5-coder
TIER_MEDIUM=databricks:databricks-claude-sonnet-4-5
TIER_COMPLEX=azure-openai:gpt-5.2-chat
TIER_REASONING=databricks:databricks-claude-opus-4-6
```

If a model name is given without a provider prefix, the default provider (`MODEL_PROVIDER`) is used.

### Routing Precedence

There are three routing-related settings. Here is exactly how they interact:

#### 1. `TIER_*` Environment Variables (Highest Priority)

When **all four** `TIER_*` vars are set (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`), tiered routing is **active**. Every incoming request is scored for complexity (0-100), mapped to a tier, and routed to the `provider:model` specified in the matching `TIER_*` var.

In this mode, `MODEL_PROVIDER` is **not consulted** for routing decisions. The provider comes directly from the `TIER_*` value (e.g., `ollama:llama3.2` routes to Ollama, `openai:gpt-4o` routes to OpenAI).

If any of the four `TIER_*` vars are missing, tiered routing is **completely disabled** and the system falls back to `MODEL_PROVIDER`.

#### 2. `MODEL_PROVIDER` (Default / Fallback)

`MODEL_PROVIDER` controls routing in two scenarios:

- **When tiered routing is disabled** (any `TIER_*` var missing) — all requests go to the provider set in `MODEL_PROVIDER`, regardless of complexity. This is static routing.
- **When a `TIER_*` value has no provider prefix** (e.g., `TIER_SIMPLE=llama3.2` instead of `TIER_SIMPLE=ollama:llama3.2`) — `MODEL_PROVIDER` is used as the default provider for that tier.

Even when tiered routing is active and overrides it for request routing, `MODEL_PROVIDER` is still used for:
- **Startup checks** — e.g., if `MODEL_PROVIDER=ollama`, the server waits for Ollama to be reachable before accepting requests
- **Provider discovery API** (`/v1/providers`) — marks which provider is "primary" in the response
- **Embeddings routing** — the OpenAI-compatible router checks `MODEL_PROVIDER` for embedding provider selection

**Always set `MODEL_PROVIDER`** even when using tier routing.

#### 3. `PREFER_OLLAMA` (Removed)

`PREFER_OLLAMA` is **deprecated and has no effect**. If set, a warning is logged at startup:

```
[DEPRECATION] PREFER_OLLAMA is removed. Use TIER_* env vars for routing.
```

To route simple requests to Ollama, use `TIER_SIMPLE=ollama:<model>` instead.

#### Summary Table

| Configuration | Routing Behavior |
|---|---|
| All 4 `TIER_*` set | Tier routing active. Each request scored and routed to its tier's `provider:model`. `MODEL_PROVIDER` ignored for routing. |
| 1-3 `TIER_*` set | Tier routing **disabled**. All requests go to `MODEL_PROVIDER` (static). |
| No `TIER_*` set | Static routing. All requests go to `MODEL_PROVIDER`. |
| `TIER_*` value without provider prefix | `MODEL_PROVIDER` used as the default provider for that tier. |
| `PREFER_OLLAMA` set | No effect. Deprecation warning logged. |

#### Example: Mixed Local + Cloud Setup

```bash
MODEL_PROVIDER=ollama                        # Startup checks + default provider
TIER_SIMPLE=ollama:llama3.2                  # Score 0-25 → Ollama (free, local)
TIER_MEDIUM=openai:gpt-4o                    # Score 26-50 → OpenAI
TIER_COMPLEX=databricks:claude-sonnet-4-5    # Score 51-75 → Databricks
TIER_REASONING=databricks:claude-opus-4-6    # Score 76-100 → Databricks
```

In this setup, a "Hello" message (score ~5) routes to Ollama. A "Refactor the auth module" message (score ~65) routes to Databricks. `MODEL_PROVIDER=ollama` ensures the server waits for Ollama at startup but does not affect where complex requests go.

### Tier Config File

Additional tier preferences (fallback models per provider) can be defined in `config/model-tiers.json`:

```json
{
  "tiers": {
    "SIMPLE": { "preferred": { "ollama": ["llama3.2"], "openai": ["gpt-4o-mini"] } },
    "MEDIUM": { "preferred": { "openai": ["gpt-4o"], "anthropic": ["claude-sonnet-4-20250514"] } },
    "COMPLEX": { "preferred": { "openai": ["o1-mini"], "anthropic": ["claude-sonnet-4-20250514"] } },
    "REASONING": { "preferred": { "openai": ["o1"], "anthropic": ["claude-opus-4-20250514"] } }
  },
  "localProviders": {
    "ollama": { "free": true, "defaultTier": "SIMPLE" },
    "llamacpp": { "free": true, "defaultTier": "SIMPLE" },
    "lmstudio": { "free": true, "defaultTier": "SIMPLE" }
  }
}
```

---

## Complexity Scoring Algorithm

The complexity analyzer implements 5 phases to produce a score from 0-100.

### Phase 1: Basic Scoring

Three components scored independently:

**Token Count (0-20 points):**

| Tokens | Score |
|--------|-------|
| < 500 | 0 |
| 500-999 | 4 |
| 1,000-1,999 | 8 |
| 2,000-3,999 | 12 |
| 4,000-7,999 | 16 |
| 8,000+ | 20 |

**Tool Count (0-20 points):**

| Tools | Score |
|-------|-------|
| 0 | 0 |
| 1-3 | 4 |
| 4-6 | 8 |
| 7-10 | 12 |
| 11-15 | 16 |
| 16+ | 20 |

**Task Type (0-25 points):**
- Greetings / yes-no: 0-2
- Simple questions: 3
- General non-technical: 5
- Technical content: 10
- Refactoring: 16
- New implementation: 18
- From scratch: 20
- Entire codebase scope: 22
- Force cloud patterns (security audit, architecture review): 25

### Phase 2: Advanced Classification

Additional scoring on top of Phase 1:

**Code Complexity (0-20 points):**

| Pattern | Points |
|---------|--------|
| Multi-file operations | +5 |
| Architecture concerns | +5 |
| Security | +4 |
| Concurrency | +3 |
| Performance | +3 |
| Database operations | +3 |
| Testing | +2 |

**Reasoning Requirements (0-15 points):**

| Pattern | Points |
|---------|--------|
| Step-by-step reasoning | +4 |
| Trade-off analysis | +4 |
| General analysis | +3 |
| Planning | +3 |
| Edge cases | +2 |

**Conversation Bonus:**
- 6-10 messages: +2
- 11+ messages: +5

The standard score is the sum of all components, capped at 100.

### Weighted Scoring Mode (15 Dimensions)

When `ROUTING_WEIGHTED_SCORING=true`, the analyzer uses a 15-dimension weighted scoring system instead of the standard additive scoring:

```
Score = Sum of (dimension_value * weight) for all 15 dimensions
```

#### Dimension Weights

**Content Analysis (35% total):**

| Dimension | Weight | Measures |
|-----------|--------|----------|
| tokenCount | 0.08 | Request size (token estimate) |
| promptComplexity | 0.10 | Sentence structure, average length |
| technicalDepth | 0.10 | Technical keyword density |
| domainSpecificity | 0.07 | Number of specialized domains (security, ML, distributed, database, frontend, devops) |

**Tool Analysis (25% total):**

| Dimension | Weight | Measures |
|-----------|--------|----------|
| toolCount | 0.08 | Number of tools in request |
| toolComplexity | 0.10 | Weighted average of tool complexity (Bash=0.9, Write=0.8, Edit=0.7, Read=0.3, Glob/Grep=0.2) |
| toolChainPotential | 0.07 | Sequential operation indicators ("then", "after", "step 1") |

**Reasoning Requirements (25% total):**

| Dimension | Weight | Measures |
|-----------|--------|----------|
| multiStepReasoning | 0.10 | Step-by-step / planning patterns |
| codeGeneration | 0.08 | Code creation requests |
| analysisDepth | 0.07 | Trade-off / analysis patterns |

**Context Factors (15% total):**

| Dimension | Weight | Measures |
|-----------|--------|----------|
| conversationDepth | 0.05 | Message count in conversation |
| priorToolUsage | 0.05 | Tool results already in conversation |
| ambiguity | 0.05 | Inverse of request specificity |

Each dimension is scored 0-100 independently, then multiplied by its weight. The final score is the rounded sum.

### Phase 3: Metrics Tracking

Every routing decision is recorded in-memory (last 1,000 decisions) for analytics:
- Total decisions, local vs. cloud split
- Average complexity score
- Per-provider and per-tier distribution

Metrics are exposed via the `/metrics` endpoint and `X-Lynkr-*` response headers.

### Phase 4: Embeddings-Based Similarity (Optional)

When an embeddings model is configured (`OLLAMA_EMBEDDINGS_MODEL`), the analyzer can compare request content against reference embeddings for complex and simple tasks using cosine similarity. This produces a score adjustment of -10 to +10 points.

### Phase 5: Structural Analysis via Graphify (Optional)

When [Graphify](https://github.com/safishamsi/graphify) is enabled (`CODE_GRAPH_ENABLED=true`), the analyzer extracts file paths from the request and queries Graphify's knowledge graph for structural complexity signals.

**How it works:**
1. File paths are extracted from tool_use blocks, system prompts, and message text (supports both Anthropic and OpenAI formats)
2. Three parallel queries are sent to Graphify: `get_neighbors` (blast radius), `god_nodes`, and `graph_stats`
3. Results are scored and added to the complexity score

**Scoring (capped at +35):**

| Signal | Points | Condition |
|--------|--------|-----------|
| High blast radius | +15 | > 30 affected files |
| Medium blast radius | +10 | > 10 affected files |
| Low blast radius | +5 | > 5 affected files |
| Deep dependencies | +5 | Dependency depth > 4 |
| Infrastructure file | +10 | Editing Docker, CI/CD, config files |
| Low test coverage | +5 | < 30% test files in affected set |
| God node touched | +10 | Editing a hub class many things depend on |
| Low community cohesion | +5 | Cohesion < 0.15 with multiple communities |

**God node detection:** Graphify identifies the most-connected entities in the codebase (hub classes, central modules). Editing these has outsized impact — the router upgrades the request to a stronger model.

**Community cohesion:** Graphify uses Leiden clustering to group related code. Low cohesion means loosely-coupled code where changes are harder to reason about safely.

**Configuration:**
```bash
CODE_GRAPH_ENABLED=true
CODE_GRAPH_COMMAND=graphify           # CLI command (default: graphify)
CODE_GRAPH_WORKSPACE=/path/to/repo    # Optional — auto-detected from file paths
CODE_GRAPH_TIMEOUT=10000              # Query timeout in ms (default: 10000)
```

**Workspace auto-detection:** You don't need to set `CODE_GRAPH_WORKSPACE`. Lynkr automatically detects the workspace from absolute file paths in the request by finding their common directory prefix. This works per-request, so different conversations about different repos route correctly.

---

## Agentic Workflow Detection

The agentic detector identifies multi-step tool chains and autonomous agent patterns, boosting the complexity tier accordingly.

### Agent Types

| Type | Score Boost | Min Tier | Description |
|------|------------|----------|-------------|
| **SINGLE_SHOT** | +0 | SIMPLE | Simple request-response, no tool chains |
| **TOOL_CHAIN** | +15 | MEDIUM | Sequential tool usage (read -> edit -> test) |
| **ITERATIVE** | +25 | COMPLEX | Retry loops, debugging cycles, iterative refinement |
| **AUTONOMOUS** | +35 | REASONING | Open-ended tasks, full autonomy, complex decision making |

### Detection Signals

The detector evaluates 6 signal categories:

**1. Tool Count**
- 4-5 tools: +8
- 6-10 tools: +15
- 11+ tools: +25

**2. Agentic Tools Present** (Bash, Write, Edit, Task, Git, Test)
- 1 agentic tool: +8
- 2-3 agentic tools: +15
- 4+ agentic tools: +25

**3. Prior Tool Results** (already in an agentic loop)
- 1-2 tool results: +10
- 3-5 tool results: +20
- 6+ tool results: +30

**4. Content Pattern Matching**
- Autonomous patterns ("figure out", "solve", "make it work"): +25
- Iterative patterns ("keep trying", "debug", "retry"): +20
- Tool chain patterns ("then use", "next step", "step 1"): +15
- Multi-file work: +15
- Planning required: +10
- Implementation + testing: +15

**5. Conversation Depth**
- 5-8 messages: +6
- 9-15 messages: +12
- 16+ messages: +20

**6. Content Length**
- 2,000+ characters: +10

### Classification Thresholds

| Agent Type | Score Threshold | Additional Conditions |
|------------|----------------|----------------------|
| AUTONOMOUS | >= 60 | or autonomous pattern + score >= 40 |
| ITERATIVE | >= 40 | or deep tool loop + score >= 30 |
| TOOL_CHAIN | >= 20 | or many agentic tools present |
| SINGLE_SHOT | < 20 | Default |

When an agentic workflow is detected (`score >= 25`), the complexity score is boosted by the agent type's `scoreBoost` value, and the tier is upgraded to at least the agent type's `minTier`.

---

## Force Patterns

Certain requests bypass the scoring algorithm entirely:

### Force Local (always local model)
- Greetings: "hi", "hello", "thanks", "bye"
- Time queries: "what time is it"
- Confirmations: "yes", "no", "ok", "sure"
- Help requests: "help", "commands"

### Force Cloud (always cloud model)
- Security audits/reviews
- Architecture design/review
- Complete codebase refactoring
- Code/PR reviews
- Complex debugging
- Production incidents

---

## Cost Optimization

When `ROUTING_COST_OPTIMIZATION=true`, the router checks if a cheaper model can handle the determined tier.

### Model Registry

Pricing data is fetched from three sources (in priority order):

1. **LiteLLM** (highest priority) - Community-maintained pricing from [BerriAI/litellm](https://github.com/BerriAI/litellm)
2. **models.dev** - API pricing aggregator
3. **Databricks Fallback** - Hardcoded pricing for common models (Claude, Llama, GPT, Gemini, DBRX)

Pricing data is cached locally in `data/model-prices-cache.json` with a 24-hour TTL. Background refresh happens automatically when the cache is stale.

### Cost Tracking

The optimizer tracks costs at both session and global levels:
- Per-request cost recording (input + output tokens)
- Per-model, per-provider, per-tier breakdowns
- Savings calculation when routing to cheaper alternatives

### Pricing Lookup

The registry supports flexible model name lookup:
- Direct match: `gpt-4o`
- Provider prefix stripping: `databricks-claude-sonnet-4-5` -> `claude-sonnet-4-5`
- Fuzzy matching for partial names

---

## Routing Headers

Every response includes routing metadata in `X-Lynkr-*` headers:

| Header | Description | Example |
|--------|-------------|---------|
| `X-Lynkr-Routing-Method` | How the decision was made | `tier_config`, `force`, `tool_threshold`, `agentic`, `cost_optimized` |
| `X-Lynkr-Provider` | Selected provider | `databricks`, `ollama`, `openrouter` |
| `X-Lynkr-Complexity-Score` | Complexity score (0-100) | `42` |
| `X-Lynkr-Complexity-Threshold` | Score threshold for cloud routing | `40` |
| `X-Lynkr-Routing-Reason` | Human-readable reason | `force_local_pattern`, `autonomous_workflow` |
| `X-Lynkr-Tier` | Selected model tier | `SIMPLE`, `MEDIUM`, `COMPLEX`, `REASONING` |
| `X-Lynkr-Model` | Selected model | `llama3.2`, `gpt-4o`, `claude-opus-4-6` |
| `X-Lynkr-Agentic` | Agentic workflow type (if detected) | `TOOL_CHAIN`, `ITERATIVE`, `AUTONOMOUS` |
| `X-Lynkr-Cost-Optimized` | Whether cost optimization was applied | `true` |

---

## Configuration Reference

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `TIER_SIMPLE` | *required* | Model for simple tier (`provider:model`) |
| `TIER_MEDIUM` | *required* | Model for medium tier (`provider:model`) |
| `TIER_COMPLEX` | *required* | Model for complex tier (`provider:model`) |
| `TIER_REASONING` | *required* | Model for reasoning tier (`provider:model`) |
| `SMART_TOOL_SELECTION_MODE` | `heuristic` | Scoring mode: `aggressive` (threshold=60), `heuristic` (threshold=40), `conservative` (threshold=25) |
| `ROUTING_WEIGHTED_SCORING` | `false` | Enable 15-dimension weighted scoring |
| `ROUTING_AGENTIC_DETECTION` | `true` | Enable agentic workflow detection |
| `ROUTING_COST_OPTIMIZATION` | `false` | Enable cost-based model selection |
| `OLLAMA_MAX_TOOLS_FOR_ROUTING` | `3` | Max tools before routing away from Ollama |
| `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | `15` | Max tools before routing away from OpenRouter |
| `OLLAMA_EMBEDDINGS_MODEL` | *(none)* | Embeddings model for Phase 4 similarity |
| `CODE_GRAPH_ENABLED` | `false` | Enable Graphify structural analysis (Phase 5) |
| `CODE_GRAPH_COMMAND` | `graphify` | Graphify CLI command |
| `CODE_GRAPH_WORKSPACE` | `process.cwd()` | Default workspace (auto-detected per request) |
| `CODE_GRAPH_TIMEOUT` | `10000` | Graphify query timeout in ms |

### Smart Tool Selection Modes

| Mode | Threshold | Behavior |
|------|-----------|----------|
| `aggressive` | 60 | More requests go to local (saves cost) |
| `heuristic` | 40 | Balanced local/cloud split |
| `conservative` | 25 | More requests go to cloud (better quality) |

---

## Routing Safety Features

### Vision Capability Guard

Automatically upgrades to vision-capable models when images are detected in the request.

**When it activates:**
- Payload contains `type: 'image'` or `type: 'image_url'` content blocks
- Selected model lacks `vision: true` capability in model registry

**What it does:**
1. Searches for cheapest vision-capable model at or above current tier
2. Upgrades model and tier if necessary
3. Tags routing method with `+vision_guard`

**Example:**
```
Request: Image + "What's in this screenshot?"
Initial: MEDIUM → ollama:llama3.2 (no vision)
After guard: MEDIUM → anthropic:claude-sonnet-4-6 (vision: true)
```

**Tier escalation:** If no vision model exists at current tier, escalates to next tier up (SIMPLE→MEDIUM→COMPLEX→REASONING). If REASONING tier has no vision model, logs warning and keeps original selection (request will likely fail upstream).

**No configuration needed** — automatic based on model registry vision field.

---

### kNN Ambiguous Confidence Escalation

When kNN neighbor voting is split (no clear model winner), escalates tier to prioritize quality over cost.

**Confidence thresholds:**
- **>0.7 (high):** Trust kNN model recommendation, override heuristic
- **0.4-0.7 (ambiguous):** Escalate tier one step for safety
- **≤0.4 (low):** Ignore kNN, use heuristic selection

**What it does (ambiguous range):**
1. Current tier bumped one step: SIMPLE→MEDIUM→COMPLEX→REASONING
2. Select model from upgraded tier
3. Tag routing method with `+knn_ambiguous_escalate`

**Example:**
```
Request: "Refactor the auth module"
Heuristic: MEDIUM → openai:gpt-4o-mini (score 42)
kNN: confidence=0.55 (neighbors split)
Result: COMPLEX → anthropic:claude-opus-4-7
```

**REASONING ceiling:** REASONING tier never escalates (already at top).

**Graceful fallback:** If upgraded tier is unconfigured (e.g., missing `TIER_COMPLEX`), keeps current tier.

**Requires:** kNN enabled (`ROUTING_KNN_ENABLED=true`) with index of 1000+ samples at `data/knn/index.hnsw`.

---

## Routing Decision Flow

```
1. Are all 4 TIER_* env vars configured?
   └─ No → Return static provider (MODEL_PROVIDER), skip all routing

2. Risk analysis:
   └─ High risk → Force COMPLEX tier

3. Does content match FORCE_LOCAL patterns?
   └─ Yes → Route to SIMPLE tier

4. Does content match FORCE_CLOUD patterns?
   └─ Yes → Route to best cloud provider (requires FALLBACK_ENABLED)

5. Analyze complexity:
   └─ Calculate score 0-100 (standard or weighted mode)

6. Optional: Graphify structural analysis:
   └─ Query knowledge graph for blast radius, god nodes, community cohesion
   └─ Adjust score by up to +35

7. Optional: Embeddings adjustment:
   └─ Adjust score by -10 to +10 based on semantic similarity

8. Agentic detection:
   └─ If agentic → Boost score, enforce minimum tier
   └─ If AUTONOMOUS → Force cloud provider

9. Map score to tier (SIMPLE/MEDIUM/COMPLEX/REASONING)

10. Select provider:model from matching TIER_* env var

11. Cost optimization:
    └─ If enabled + not high-risk → find cheaper qualifying model

12. Context window escalation:
    └─ If estimated tokens > model context → escalate to larger-context model

13. Vision capability guard:
    └─ If payload has images + model lacks vision → upgrade to vision model

14. kNN routing:
    └─ If confidence > 0.7 → override with kNN model
    └─ If confidence 0.4-0.7 → escalate tier (ambiguous)
    └─ If confidence ≤ 0.4 → ignore kNN

15. LinUCB bandit:
    └─ If multiple candidates → pick best via UCB score

16. Deadline filter:
    └─ If LYNKR-Deadline-Ms header → pick fastest qualifying model

17. Tenant policy override:
    └─ If tenant blocks model → replace via cost optimizer

18. Record telemetry (provider, tier, latency, quality score)

19. Return { provider, model, tier, score, method }
```

---

## Routing Telemetry

Every routing decision is recorded in a SQLite telemetry store (`.lynkr/telemetry.db`) for analysis and continuous improvement.

### Telemetry Endpoints

| Endpoint | Description |
|----------|-------------|
| `GET /v1/routing/stats` | Aggregated stats with latency percentiles per provider |
| `GET /v1/routing/stats/:provider` | Per-provider statistics |
| `GET /v1/routing/telemetry` | Raw telemetry records with query filters |
| `GET /v1/routing/accuracy` | Over/under-provisioned routing percentage |

### Recorded Fields

Each telemetry record captures 20+ fields including: request ID, provider, tier, complexity score, latency, quality score (0-100), token usage, whether fallback was used, retry count, error type, and Graphify signals (blast radius, god node, cohesion).

### Quality Scoring

Every response is scored 0-100 for quality using heuristic signals:

| Signal | Points |
|--------|--------|
| HTTP 200 status | +10 |
| Output tokens > 100 | +5 |
| Tools used in response | +10 |
| No fallback triggered | +5 |
| No retries needed | +5 |
| Error occurred | -30 |
| Fallback was used | -10 |
| Multiple retries | -10 |
| Latency > 30s | -10 |
| Tier mismatch (REASONING request got low output) | -15 |

### Latency Tracking

Per-provider latency is tracked in a 200-sample circular buffer. Statistics exposed:
- P50, P95, P99 latency
- Average latency
- Latency-based score penalty (-5 to +10 points)

---

## Source Files

| File | Description |
|------|-------------|
| `src/routing/index.js` | Main routing orchestrator (`determineProviderSmart()`) |
| `src/routing/complexity-analyzer.js` | 5-phase complexity analysis, 15-dimension weighted scoring, Graphify integration |
| `src/routing/agentic-detector.js` | Agentic workflow detection and classification |
| `src/routing/model-tiers.js` | Tier definitions, model selection from `TIER_*` env vars |
| `src/routing/model-registry.js` | Multi-source pricing (LiteLLM, models.dev, Databricks fallback) |
| `src/routing/cost-optimizer.js` | Cost tracking, cheapest model finder, savings calculation |
| `src/routing/telemetry.js` | SQLite-backed routing telemetry store |
| `src/routing/quality-scorer.js` | Response quality scoring (0-100) |
| `src/routing/latency-tracker.js` | Per-provider latency tracking with percentiles |
| `src/tools/code-graph.js` | Graphify integration — knowledge graph queries for structural analysis |

---

## Next Steps

- **[Features Overview](features.md)** - Architecture and request flow
- **[Token Optimization](token-optimization.md)** - Cost reduction strategies
- **[Provider Configuration](providers.md)** - Setting up providers
- **[Production Guide](production.md)** - Deploy with routing enabled