--- name: gemini-model-selection description: Choose the optimal Gemini model based on use case, performance benchmarks, cost, and feature requirements argument-hint: "" allowed-tools: Read, Write, Bash(pip install, npm install, go get) --- # Gemini Model Selection Guide Choose the optimal Gemini model for your use case: $ARGUMENTS ## Expert Knowledge You are a Gemini API specialist with expertise in: - Model capabilities and performance characteristics - Cost optimization and token pricing - Latency requirements and throughput considerations - Feature availability across model tiers - 2026 benchmark data and real-world performance ## Model Comparison (Feb 2026) | Model | ID | Context Window | Output Limit | Thinking | Best For | |-------|-----|----------------|--------------|----------|----------| | **Gemini 3 Pro** | `gemini-3-pro-preview` | 1M tokens (2M planned) | 65K tokens | Yes (levels) | Best multimodal, deep reasoning | | **Gemini 3 Flash** | `gemini-3-flash-preview` | 1M tokens | 65K tokens | Yes (levels) | Coding, speed + intelligence | | **Gemini 2.5 Pro** | `gemini-2.5-pro` | 1M tokens | 65K tokens | Yes (budget) | State-of-the-art thinking | | **Gemini 2.5 Flash** | `gemini-2.5-flash` | 1M tokens | 65K tokens | Yes (budget) | Price-performance balance | | **Gemini 2.5 Flash-Lite** | `gemini-2.5-flash-lite` | 1M tokens | 65K tokens | Optional | Ultra-fast, lowest cost | ## Benchmark Performance (2026) ### Reasoning & Knowledge | Model | GPQA Diamond | Humanity's Last Exam | MMMU Pro | MMLU-Pro | |-------|-------------|---------------------|----------|----------| | Gemini 3 Pro | **90.4%** | 33.7% | 81.2% | ~85% | | Gemini 3 Flash | 90.4% | 33.7% | **81.2%** | ~85% | | Gemini 2.5 Pro | 85%+ | 28%+ | 75%+ | ~78% | | Gemini 2.5 Flash | 80%+ | 25%+ | 72.9% | ~71% | | Gemini 2.5 Flash-Lite | 70%+ | N/A | 72.9% | ~68% | ### Coding Performance | Model | SWE-bench Verified | Notes | |-------|-------------------|-------| | Gemini 3 Flash | **78%** | Surprisingly beats Pro! | | Gemini 3 Pro | 76.2% | Memory issues reported | | Gemini 2.5 Pro | ~70% | Stable production | | Gemini 2.5 Flash | ~65% | Good for most tasks | ### Speed & Throughput | Model | Output Speed | Time to First Token | Relative Speed | |-------|-------------|---------------------|----------------| | Gemini 3 Flash | Very High | 0.21-0.37s | 3x faster than 2.5 Pro | | Gemini 2.5 Flash | **250 tokens/sec** | Fast | #3 of 77 models | | Gemini 2.5 Flash-Lite | Highest | **Fastest** | Optimized for latency | | Gemini 2.5 Pro | Medium | Medium | Deeper thinking trade-off | | Gemini 3 Pro | Slower | Slower | Quality over speed | ### AI Intelligence Index (Artificial Analysis) | Model | Score | Rank | |-------|-------|------| | Gemini 3 Flash | 71 | Beats Claude 3 Opus (70) | | Gemini 2.5 Flash | 21 | #17 of 77 models | ## Pricing (Feb 2026) | Model | Input (per 1M) | Output (per 1M) | Relative Cost | |-------|----------------|-----------------|---------------| | Gemini 3 Pro | ~$2-4 | ~$8-15 | $$$$$ | | Gemini 3 Flash | $0.50 | $3.00 | $$$ | | Gemini 2.5 Pro | ~$1.25 | ~$5.00 | $$$$ | | Gemini 2.5 Flash | $0.30 | $2.50 | $$ | | Gemini 2.5 Flash-Lite | **$0.10** | **$0.40** | $ | ### Cost Comparison - Flash-Lite is **15x cheaper** than Pro for input tokens - Gemini 3 Flash is **4-8x cheaper** than Gemini 3 Pro - 2.5 Flash uses **30% fewer tokens** than 2.5 Pro on average ## Feature Availability | Feature | 3 Pro | 3 Flash | 2.5 Pro | 2.5 Flash | 2.5 Flash-Lite | |---------|-------|---------|---------|-----------|----------------| | Function Calling | Yes | Yes | Yes | Yes | Yes | | Grounding (Search) | Yes | Yes | Yes | Yes | Yes | | Code Execution | Yes | Yes | Yes | Yes | No | | Multimodal (Image) | Yes | Yes | Yes | Yes | Yes | | Multimodal (Video) | Yes | Yes | Yes | Yes | Limited | | Multimodal (Audio) | Yes | Yes | Yes | Yes | No | | Image Generation | Yes (Pro Image) | No | No | No | No | | Structured Output | Yes | Yes | Yes | Yes | Yes | | Thinking Mode | Levels | Levels | Budget | Budget | Optional (off default) | | Context Caching | Yes | Yes | Yes | Yes | Yes | | Live API | Yes | Yes | Yes | Yes | No | ## Use Case Mapping ### Gemini 3 Pro (`gemini-3-pro-preview`) **Best for:** - Best multimodal understanding in the world - Complex agentic workflows requiring deep reasoning - Research and analysis requiring highest accuracy - When quality is absolutely paramount - Extended thinking for nuanced analysis ("Deep Think" mode) **Limitations:** Slower, most expensive, reported memory issues ```python response = client.models.generate_content( model="gemini-3-pro-preview", contents="Analyze this complex architectural diagram and suggest improvements..." ) ``` ### Gemini 3 Flash (`gemini-3-flash-preview`) **Best for:** - **Coding tasks** (actually beats Pro on SWE-bench!) - Production applications needing speed + frontier intelligence - High-frequency workflows at scale - Real-time assistants - Best value for advanced capabilities **Key advantage:** 3x faster than 2.5 Pro, competitive with 3 Pro quality **Warning:** 91% hallucination rate on refusal benchmarks - needs guardrails ```python response = client.models.generate_content( model="gemini-3-flash-preview", contents="Debug this code and suggest optimizations..." ) ``` ### Gemini 2.5 Pro (`gemini-2.5-pro`) **Best for:** - State-of-the-art thinking model - Stable production workloads - Complex reasoning with proven reliability - Long-form content generation - When you need battle-tested stability ```python response = client.models.generate_content( model="gemini-2.5-pro", contents="Generate a detailed technical specification..." ) ``` ### Gemini 2.5 Flash (`gemini-2.5-flash`) **Best for:** - Best price-performance ratio - General-purpose applications - Large-scale processing - Chat and Q&A systems - **Recommended default for most use cases** **Speed:** 250 tokens/sec (#3 fastest of 77 models) ```python response = client.models.generate_content( model="gemini-2.5-flash", contents="Summarize this article..." ) ``` ### Gemini 2.5 Flash-Lite (`gemini-2.5-flash-lite`) **Best for:** - Ultra-high volume, latency-sensitive tasks - Classification, translation, summarization at scale - Cost-sensitive applications - Simple Q&A and extraction - When thinking mode is not needed **Key features:** Lowest latency, lowest cost, thinking off by default ```python response = client.models.generate_content( model="gemini-2.5-flash-lite", contents="Classify this support ticket: ..." ) ``` ## Decision Matrix | Need | Best Choice | Runner-up | |------|-------------|-----------| | **Best coding** | 3 Flash (78% SWE-bench) | 3 Pro | | **Best multimodal** | 3 Pro | 3 Flash | | **Fastest** | 2.5 Flash-Lite | 2.5 Flash | | **Cheapest** | 2.5 Flash-Lite | 2.5 Flash | | **Best reasoning** | 3 Pro | 2.5 Pro | | **Best value** | 3 Flash | 2.5 Flash | | **Production stable** | 2.5 Pro | 2.5 Flash | | **High volume** | 2.5 Flash-Lite | 2.5 Flash | ## Decision Tree ``` START | v Need cutting-edge capabilities (Gemini 3)? |-- Yes --> Coding-focused? | |-- Yes --> gemini-3-flash-preview (beats Pro!) | |-- No --> Need absolute best multimodal? | |-- Yes --> gemini-3-pro-preview | |-- No --> gemini-3-flash-preview | |-- No --> Need thinking/reasoning mode? |-- Yes --> Need stability? | |-- Yes --> gemini-2.5-pro | |-- No --> gemini-2.5-flash | |-- No --> High volume / cost-sensitive? |-- Yes --> gemini-2.5-flash-lite |-- No --> gemini-2.5-flash ``` ## Cost Optimization Tips 1. **Start with Flash-Lite** for simple tasks, upgrade if quality insufficient 2. **Use Gemini 3 Flash over 3 Pro** for coding (better + cheaper!) 3. **Use caching** for repeated context (up to 75% savings) 4. **Batch requests** with Batch API (50% discount) 5. **Right-size context** - 2.5 Flash uses 30% fewer tokens than 2.5 Pro 6. **Disable thinking** in Flash-Lite for maximum speed ## Deprecation Schedule | Model | Status | Deprecation Date | |-------|--------|------------------| | Gemini 2.0 Flash | Deprecated | March 31, 2026 | | Gemini 2.0 Flash-Lite | Deprecated | March 31, 2026 | | Gemini 1.5 Pro | Deprecated | September 2025 | | Gemini 1.5 Flash | Deprecated | September 2025 | > **Migration**: Replace `gemini-2.0-*` with `gemini-2.5-*` equivalents ## Model Selection Code ```python def select_model( task_type: str, volume: str = "normal", quality: str = "standard", speed_priority: bool = False ) -> str: """Select optimal Gemini model based on requirements.""" # Coding: Gemini 3 Flash actually beats Pro! if task_type in ["coding", "code_review", "debugging"]: return "gemini-3-flash-preview" # Best multimodal understanding if task_type in ["multimodal", "image_analysis", "video_understanding"]: return "gemini-3-pro-preview" if quality == "highest" else "gemini-3-flash-preview" # Agentic and complex reasoning if task_type in ["agentic", "complex_reasoning", "research"]: return "gemini-3-pro-preview" if quality == "highest" else "gemini-3-flash-preview" # Stable production reasoning if task_type in ["analysis", "long_form"] and quality == "stable": return "gemini-2.5-pro" # High volume / cost sensitive if volume == "high" or task_type in ["classification", "extraction", "translation"]: return "gemini-2.5-flash-lite" # Speed priority if speed_priority: return "gemini-2.5-flash-lite" # Default: best price-performance return "gemini-2.5-flash" # Usage examples print(select_model("coding")) # gemini-3-flash-preview print(select_model("classification", volume="high")) # gemini-2.5-flash-lite print(select_model("research", quality="highest")) # gemini-3-pro-preview ``` ## Common Patterns | Pattern | Recommended Model | Reason | |---------|-------------------|--------| | Code assistant | **3 Flash** | 78% SWE-bench, fast | | Chatbot | 2.5 Flash | Price-performance | | Document summarization | 2.5 Flash | General purpose | | Data extraction | 2.5 Flash-Lite | Simple, high volume | | Agent orchestration | 3 Pro or 3 Flash | Complex reasoning | | Real-time translation | 2.5 Flash-Lite | Lowest latency | | Research assistant | 3 Pro | Deepest reasoning | | Image understanding | 3 Pro | Best multimodal | ## Sources - [Gemini Models Documentation](https://ai.google.dev/gemini-api/docs/models) - [Gemini 3 Flash Announcement](https://blog.google/products-and-platforms/products/gemini/gemini-3-flash/) - [Artificial Analysis Benchmarks](https://artificialanalysis.ai/models/gemini-2-5-flash) - [Gemini 2.5 Developer Blog](https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/) - [Better Stack Gemini 3 Flash Review](https://betterstack.com/community/guides/ai/gemini-3-flash-review/) ## Deliverables For: $ARGUMENTS Provide: 1. Recommended model with justification based on benchmarks 2. Alternative options if requirements change 3. Cost estimation with actual pricing 4. Performance expectations (speed, quality) 5. Code example with selected model