--- name: auto-router-patterns description: Auto router patterns for this project. Intelligent model selection via task classification, cost tier diversity, high-stakes override, weighted tier selection. Triggers on "auto router", "model selection", "classification", "cost tier", "exploration", "high stakes", "routing", "router". --- # Auto Router Patterns System intelligently routes user messages to optimal models via task classification, cost-based diversity, and high-stakes safety overrides. ## Classification with gpt-oss-120b Router uses `gpt-oss-120b` via Cerebras for fast classification (~1000 tokens/sec): ```typescript // From autoRouter.ts function getRouterModel() { const openai = createOpenAI({ apiKey: process.env.AI_GATEWAY_API_KEY, baseURL: "https://gateway.ai.cloudflare.com/v1/planetaryescape/blah-chat-dev-gateway/openai", }); return openai("gpt-oss-120b"); // Via Cerebras } const ROUTER_MODEL_ID = "openai:gpt-oss-120b"; ``` Classification schema with generateObject: ```typescript // From autoRouter.ts const classificationSchema = z.object({ primaryCategory: z.enum(TASK_CATEGORIES), secondaryCategory: z.enum(TASK_CATEGORIES).optional().nullable(), complexity: z.enum(["simple", "moderate", "complex"]), requiresVision: z.boolean(), requiresLongContext: z.boolean(), requiresReasoning: z.boolean(), confidence: z.number().min(0).max(1), isHighStakes: z.boolean(), highStakesDomain: z.enum(HIGH_STAKES_DOMAINS).optional().nullable(), }); ``` Task categories: coding, reasoning, creative, factual, analysis, conversation, multimodal, research. ## Cost Tier Categorization Models categorized by average pricing (input + output / 2): ```typescript // From autoRouter.ts type CostTier = "cheap" | "mid" | "premium"; function getCostTier(pricing: { input: number; output: number }): CostTier { const avgCost = (pricing.input + pricing.output) / 2; if (avgCost < 1.0) return "cheap"; if (avgCost < 5.0) return "mid"; return "premium"; } ``` Examples: - Cheap: gpt-5-nano ($0.04/$0.16), gemini-2.0-flash ($0.1/$0.4) - Mid: gpt-5-mini ($0.15/$0.6), claude-3.5-haiku ($0.8/$4.0) - Premium: gpt-5 ($2.5/$10.0), claude-opus-4 ($15.0/$75.0) ## Weighted Tier Selection by Complexity Diversity via weighted random selection, NOT top-N then random: ```typescript // From autoRouter.ts const TIER_WEIGHTS: Record> = { simple: { cheap: 0.6, mid: 0.25, premium: 0.15 }, moderate: { cheap: 0.5, mid: 0.3, premium: 0.2 }, complex: { cheap: 0.3, mid: 0.4, premium: 0.3 }, }; ``` Critical: Groups ALL models by tier, not just top N. Simple tasks get cheap models 60% of time, premium 15%. Selection logic: ```typescript // From autoRouter.ts function selectWithExploration( scoredModels: Array<{ modelId: string; score: number }>, classification: { complexity: string; isHighStakes?: boolean }, ) { // Group ALL models by tier const tiers: Record> = { cheap: [], mid: [], premium: [], }; for (const model of sorted) { const tier = getCostTier(MODEL_CONFIG[model.modelId].pricing); tiers[tier].push(model); } // Get weights for complexity const weights = TIER_WEIGHTS[classification.complexity] ?? TIER_WEIGHTS.simple; const roll = Math.random(); // Select tier based on weighted random if (roll < weights.cheap && tiers.cheap.length > 0) { selectedTier = "cheap"; } else if (roll < weights.cheap + weights.mid && tiers.mid.length > 0) { selectedTier = "mid"; explorationPick = true; } else if (tiers.premium.length > 0) { selectedTier = "premium"; explorationPick = true; } // Random selection within chosen tier const pool = tiers[selectedTier]; return pool[Math.floor(Math.random() * pool.length)]; } ``` ## High-Stakes Override Medical/legal/financial/safety questions force premium tier for accuracy: ```typescript // From autoRouter.ts const HIGH_STAKES_DOMAINS = [ "medical", "legal", "financial", "safety", "mental_health", "privacy", "immigration", "domestic_abuse", ] as const; // HIGH-STAKES OVERRIDE at top of selectWithExploration if (classification.isHighStakes) { if (tiers.premium.length > 0) { const pool = tiers.premium; const picked = pool[Math.floor(Math.random() * pool.length)]; return { ...picked, explorationPick: false }; } // Fallback with warning if no premium models logger.warn("High-stakes query but no premium models available"); return { ...sorted[0], explorationPick: false }; } ``` Classification prompt emphasizes advice vs information: ```typescript // From routerPrompts.ts RULES: 1. Must seek ADVICE or ACTION, not just information 2. "What is a heart attack?" = NOT high stakes (educational) 3. "Am I having a heart attack?" = HIGH STAKES (medical) 4. "What does liability mean?" = NOT high stakes (definition) 5. "Can my employer fire me for this?" = HIGH STAKES (legal) ``` ## Diversity vs Top-Score Trade-off System balances model quality with cost/speed diversity: **Scoring phase** (autoRouter.ts): - Base: category match score (0-100 from MODEL_PROFILES) - Secondary category bonus: +30% of secondary score - Complexity: simple tasks penalized 0.7x (prefer cheap), complex boosted 1.2x (prefer capable) - Cost bias: `-(avgCost / 30) * (costBias / 100) * 20` - Speed bias: `+speedBonus * (speedBias / 100)` - Stickiness: +25 if model already selected in conversation - Reasoning bonus: +15 if task requires thinking and model has it - Research bonus: +25 for Perplexity models on research tasks **Selection phase** (selectWithExploration): - NOT greedy (always top score) - NOT pure random (chaos) - Weighted probabilistic by cost tier AND complexity - Ensures variety across conversations without sacrificing appropriateness ## Excluded Models Tracking for Retries Failed models excluded from retry attempts: ```typescript // From autoRouter.ts routeMessage args export const routeMessage = internalAction({ args: { // ... excludedModels: v.optional(v.array(v.string())), // Failed models }, handler: async (ctx, args) => { // Filter eligible models const eligibleModels = getEligibleModels( classification, args.currentContextTokens ?? 0, args.excludedModels, // ← Passed to filter ); // Check if all models exhausted if (eligibleModels.length === 0) { const fallbackModel = "openai:gpt-5-mini"; if (args.excludedModels?.includes(fallbackModel)) { throw new Error("All models exhausted including fallback"); } return { selectedModelId: fallbackModel, /* ... */ }; } }, }); function getEligibleModels( classification: TaskClassification, currentContextTokens: number, excludedModels?: string[], ): string[] { return Object.keys(MODEL_CONFIG).filter((modelId) => { // Exclude failed models from retry attempts if (excludedModels?.includes(modelId)) return false; // ... other filters }); } ``` Caller (chat.ts or generation retry logic) tracks failed models and passes them to router. ## Key Files - `packages/backend/convex/ai/autoRouter.ts` - Main routing action, classification, selection - `packages/backend/convex/ai/modelProfiles.ts` - MODEL_CONFIG, MODEL_PROFILES, category scores - `packages/backend/convex/ai/routerPrompts.ts` - Classification prompt, reasoning template - `packages/backend/convex/chat.ts` - Calls routeMessage when user has "auto" selected - `packages/backend/convex/generation.ts` - Retry logic with excludedModels ## Avoid - Don't use top-N greedy selection - breaks diversity - Don't skip high-stakes override - safety critical - Don't hardcode tier weights - use complexity-based config - Don't forget to track router usage for cost monitoring (recordTextGeneration) - Don't use classification model for generation - gpt-oss-120b is internal-only