--- name: agent-model-selection description: Guidelines for selecting appropriate language models for agents based on task-specific benchmarks, availability, and cost efficiency. --- # Agent Model Selection Skill ## Purpose Provides data-driven guidance for selecting the most appropriate language model when creating or modifying agent definitions. ## When to Use - When creating a new agent and need to assign a model - When modifying an existing agent's model assignment - When troubleshooting agent performance issues related to model capabilities - When optimizing costs across the agent ecosystem ## Reference Data **Always consult [docs/ai-model-reference.md](../../docs/ai-model-reference.md)** for: - Current performance benchmarks by category (Coding, Reasoning, Language, Instruction Following, etc.) - Model availability in GitHub Copilot Pro - Premium request multipliers (cost) - Recommended model assignments by agent type - **Task-based guidance and tutorials** (via external links with descriptions) This reference is updated periodically with latest benchmark data. ## Critical Learnings 1. **Use task-specific benchmarks, not overall scores** - Different models excel at different tasks - Example: GPT-5.2-Codex excels in Coding while Claude Sonnet 4.5 is better for Language (76.00) 2. **Claude Sonnet 4.5 has poor Instruction Following** (score: 23.52) - Unsuitable for agents that follow templates (Task Planner, Quality Engineer) - Use Gemini models instead for structured output (scores: 65-75) 3. **Gemini 3 Flash offers best value for many tasks** - 0.33x premium multiplier (cost-effective) - Strong Instruction Following (74.86) - Good Language performance (84.56) - Ideal for: Task Planner, Release Manager, high-frequency agents 4. **GPT-5.2-Codex is the latest coding model** - Latest generation Codex model (improved over 5.1 Codex Max) - Specialized for agentic coding tasks - Primary choice for Developer agent - Also solid for Code Reviewer 5. **Always verify model availability** - Check against official GitHub Copilot documentation - Model names must match exactly (case-sensitive) - Include "(Preview)" suffix for preview models (e.g., "Gemini 3 Pro (Preview)") 6. **⚠️ Coding agents must NOT have `model:` in frontmatter** - The `model:` property is only valid for VS Code agents (files without `-coding-agent` suffix) - On GitHub.com, the `model:` property in `*-coding-agent.agent.md` files causes a hard `CAPIError: 400 The requested model is not supported` error (confirmed by experiment) - The GitHub docs say this property is "ignored" but in practice it prevents the agent from running - Apply model selection only to the corresponding VS Code agent file (e.g., `developer.agent.md`) ## Model Selection Process When selecting or changing a model: 1. **Identify the agent's primary task categories** (from ai-model-reference.md) - Coding, Reasoning, Language, Instruction Following, etc. 2. **Check category-specific performance** - Look up relevant benchmarks in ai-model-reference.md - Compare top 3-5 performers in that category 3. **Consider cost vs frequency** - High-frequency agents → favor lower multipliers (0.33x, 0x) - Critical accuracy agents → favor best performer regardless of cost 4. **Verify availability** - Confirm model is listed in "Available Models" section - Check it's available for VS Code (required) 5. **Document your reasoning** - Include benchmark scores in proposal - Explain trade-offs made ## Example Model Selection **Scenario**: Selecting model for Quality Engineer agent 1. **Primary tasks**: Define test plans following specific template format 2. **Key categories**: Instruction Following (critical), Reasoning (important) 3. **Benchmark lookup** (from ai-model-reference.md): - Gemini 3 Flash: Instruction Following 74.86, 0.33x cost ✅ - Gemini 3 Pro: Instruction Following 65.85, 1x cost ✅ - Claude Sonnet 4.5: Instruction Following 23.52 ❌ (disqualified) 4. **Decision**: Gemini 3 Pro (balance of performance and cost) 5. **Rationale**: Strong instruction following (65.85), reasonable cost (1x), good for template-based work ## When to Update Model Assignments Reassess models when: - New benchmark data shows significant performance changes - Agent is underperforming its tasks consistently - New models are released with better performance - Cost optimization is needed - ai-model-reference.md is updated with new data ## Key Principles - **Task-specific benchmarks matter more than overall scores** - **Balance cost with performance** based on agent frequency and criticality - **Always verify availability** against official documentation - **Document your rationale** for model selection decisions