---
id: "openrouter-llm"
name: "OpenRouter LLM Routing"
type: "skill"
status: "active"
triggers:
  - "model"
  - "llm"
  - "openrouter"
  - "which model"
  - "ai model"
  - "token cost"
  - "cheaper model"
description: "Guides agents on model selection, cost awareness, and the LLM fallback chain."
execution:
  target: "internal"
  route: ""
dependencies:
  files:
    - "aims-skills/tools/openrouter.tool.md"
    - "backend/uef-gateway/src/llm/openrouter.ts"
    - "backend/uef-gateway/src/llm/vertex-ai.ts"
priority: "high"
---

# OpenRouter LLM Routing Skill

## When This Fires

Triggers when any agent needs to select a model, estimate costs, or understand the LLM routing chain.

## Model Selection Rules

### By Task Complexity

| Task Type | Recommended Model | Tier | Why |
|-----------|-------------------|------|-----|
| Simple routing/classification | `gemini-3.0-flash` | Fast | $0.10/M input, fastest |
| General chat/responses | `claude-sonnet-4.6` | Standard | Best quality/cost balance, adaptive thinking |
| Complex reasoning/coding | `claude-opus-4.6` | Premium | Most capable, 1M context, adaptive max effort |
| Research + web grounding | `gemini-3.1-pro` | Standard | **$2/$12**, 1M ctx, native Google Search grounding, ARC-AGI-2 77.1%. See `skills/gemini-3.1-pro.skill.md` |
| Quick extraction/parsing | `claude-haiku-4.5` | Fast | Fast, cheap, good enough |
| Budget frontier reasoning | `glm-5` | Economy | **$1/$3.20**, 744B MoE, MIT licensed, record-low hallucination. See `skills/glm-5.skill.md` |
| Budget-sensitive batch work | `llama-4-maverick` | Economy | $0.27/M, 1M context, MoE |
| Visual agentic / agent swarm | `moonshotai/kimi-k2.5` | Premium | 1T MoE, native multimodal, 256K ctx. See `skills/kimi-k2.5.skill.md` |
| Video understanding | `moonshotai/kimi-k2.5` | Premium | Only model with native video input (official API only) |
| Ultra-budget frontier reasoning | `seed-2.0-pro` | Economy | **$0.47/$2.37**, 98.3 AIME, 89.5 VideoMME — Volcano Engine only. See `skills/bytedance-seed-2.0.skill.md` |
| AI video generation | Seedance 2.0 | — | 20s @ 1080p, character consistency — Volcano Engine. See `skills/bytedance-seed-2.0.skill.md` |
| Competitive coding | `seed-2.0-code` | Economy | 3020 Codeforces, 87.8 LiveCodeBench — Volcano Engine. See `skills/bytedance-seed-2.0.skill.md` |
| Chinese-language tasks | `glm-5` | Economy | Natively bilingual, CLUE benchmark leader. See `skills/glm-5.skill.md` |
| Voice / real-time response | `claude-opus-4.6` (fast mode) | Premium | 2.5x speed, same intelligence. See `skills/claude-4.6.skill.md` |
| Security audits | `claude-opus-4.6` (max effort) | Premium | Found 500+ zero-days. See `skills/claude-4.6.skill.md` |

### Cost Awareness Rules

1. **Never use Premium tier for simple tasks** — Classification, routing, and yes/no questions use Fast tier
2. **Default to Gemini Flash** — The gateway default (`gemini-3.0-flash`) is correct for 80% of routing decisions
3. **Escalate only when needed** — Start with Fast/Standard, upgrade to Premium only for complex multi-step reasoning
4. **Track costs per job** — Every `LLMResult.cost.usd` feeds into LUC for per-job billing
5. **Monitor token usage** — `LLMResult.tokens.total` must be logged for every call

### Fallback Chain

```
1. Vertex AI (if GOOGLE_APPLICATION_CREDENTIALS set)
   ↓ on failure
2. OpenRouter (if OPENROUTER_API_KEY set)
   ↓ on failure
3. Stub response (returns error message, never silent)
```

### API Key Check

Before making any LLM call, verify:
```typescript
if (!process.env.OPENROUTER_API_KEY) {
  // Fall back to heuristic mode — no real LLM calls
  // Log warning: agents operating in degraded mode
}
```

## Anti-Patterns

- Do NOT hardcode model IDs in frontend code — always route through UEF Gateway
- Do NOT use Premium models for logging/telemetry
- Do NOT retry 402 errors (payment required) — alert user to add credits
- Do NOT stream when response is <100 tokens — overhead exceeds benefit