---
name: gemini-3-pro-api
description: Gemini 3 Pro API/SDK integration for text generation, reasoning, and chat. Covers setup, authentication, thinking levels, streaming, and production deployment. Use when working with Gemini 3 Pro API, Python SDK, Node.js SDK, text generation, chat applications, or advanced reasoning tasks.
---

# Gemini 3 Pro API Integration

Comprehensive guide for integrating Google's Gemini 3 Pro API/SDK into your applications. Covers setup, authentication, text generation, advanced reasoning with dynamic thinking, chat applications, streaming responses, and production deployment patterns.

## Overview

**Gemini 3 Pro** (`gemini-3-pro-preview`) is Google's most intelligent model designed for complex tasks requiring advanced reasoning and broad world knowledge. This skill provides complete workflows for API integration using Python or Node.js SDKs.

### Key Capabilities

- **Massive Context:** 1M token input, 64k token output
- **Dynamic Thinking:** Adaptive reasoning with high/low modes
- **Streaming:** Real-time token delivery
- **Chat:** Multi-turn conversations with history
- **Production-Ready:** Error handling, retry logic, cost optimization

### When to Use This Skill

- Setting up Gemini 3 Pro API access
- Building text generation applications
- Implementing chat applications with reasoning
- Configuring advanced thinking modes
- Deploying production Gemini applications
- Optimizing API usage and costs

---

## Quick Start

### Prerequisites

- **API Key:** Get from [Google AI Studio](https://aistudio.google.com/)
- **Python 3.9+** or **Node.js 18+**

### Python Quick Start

```python
# Install SDK
pip install google-genai

# Basic usage
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3-pro-preview")

response = model.generate_content("Explain quantum computing")
print(response.text)
```

### Node.js Quick Start

```typescript
// Install SDK
npm install @google/generative-ai

// Basic usage
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI("YOUR_API_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3-pro-preview" });

const result = await model.generateContent("Explain quantum computing");
console.log(result.response.text());
```

---

## Core Workflows

### Workflow 1: Quick Start Setup

**Goal:** Get from zero to first successful API call in < 5 minutes.

**Steps:**

1. **Get API Key**
   - Visit [Google AI Studio](https://aistudio.google.com/)
   - Create or select project
   - Generate API key
   - Copy key securely

2. **Install SDK**
   ```bash
   # Python
   pip install google-genai

   # Node.js
   npm install @google/generative-ai
   ```

3. **Configure Authentication**
   ```python
   # Python - using environment variable (recommended)
   import os
   import google.generativeai as genai

   genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
   ```

   ```typescript
   // Node.js - using environment variable (recommended)
   const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
   ```

4. **Make First API Call**
   ```python
   # Python
   model = genai.GenerativeModel("gemini-3-pro-preview")
   response = model.generate_content("Write a haiku about coding")
   print(response.text)
   ```

5. **Verify Success**
   - Check response received
   - Verify text output
   - Note token usage
   - Confirm API key working

**Expected Outcome:** Working API integration in under 5 minutes.

---

### Workflow 2: Chat Application Development

**Goal:** Build a production-ready chat application with conversation history and streaming.

**Steps:**

1. **Initialize Chat Model**
   ```python
   # Python
   model = genai.GenerativeModel(
       "gemini-3-pro-preview",
       generation_config={
           "thinking_level": "high",  # Dynamic reasoning
           "temperature": 1.0,  # Keep at 1.0 for best results
           "max_output_tokens": 8192
       }
   )
   ```

2. **Start Chat Session**
   ```python
   chat = model.start_chat(history=[])
   ```

3. **Send Message with Streaming**
   ```python
   response = chat.send_message(
       "Explain how neural networks learn",
       stream=True
   )

   # Stream tokens in real-time
   for chunk in response:
       print(chunk.text, end="", flush=True)
   ```

4. **Manage Conversation History**
   ```python
   # History is automatically maintained
   # Access it anytime
   print(f"Conversation turns: {len(chat.history)}")

   # Continue conversation
   response = chat.send_message("Can you give an example?")
   ```

5. **Handle Thought Signatures**
   - SDKs handle automatically in standard chat flows
   - No manual intervention needed for basic use
   - See `references/thought-signatures.md` for advanced cases

6. **Implement Error Handling**
   ```python
   import time
   from google.api_core import retry, exceptions

   @retry.Retry(predicate=retry.if_exception_type(
       exceptions.ResourceExhausted,
       exceptions.ServiceUnavailable
   ))
   def send_with_retry(chat, message):
       return chat.send_message(message)

   try:
       response = send_with_retry(chat, user_input)
   except exceptions.GoogleAPIError as e:
       print(f"API error: {e}")
   ```

**Expected Outcome:** Production-ready chat application with streaming, history, and error handling.

---

### Workflow 3: Production Deployment

**Goal:** Deploy Gemini 3 Pro integration with monitoring, cost control, and reliability.

**Steps:**

1. **Setup Authentication (Production)**
   ```python
   # Use environment variables (never hardcode keys)
   import os
   from pathlib import Path

   # Option 1: Environment variable
   api_key = os.getenv("GEMINI_API_KEY")

   # Option 2: Secrets manager (recommended for production)
   # Use Google Secret Manager, AWS Secrets Manager, etc.
   ```

2. **Configure Production Settings**
   ```python
   model = genai.GenerativeModel(
       "gemini-3-pro-preview",
       generation_config={
           "thinking_level": "high",  # or "low" for simple tasks
           "temperature": 1.0,  # CRITICAL: Keep at 1.0
           "max_output_tokens": 4096,
           "top_p": 0.95,
           "top_k": 40
       },
       safety_settings={
           # Configure content filtering as needed
       }
   )
   ```

3. **Implement Comprehensive Error Handling**
   ```python
   from google.api_core import exceptions, retry
   import logging

   logging.basicConfig(level=logging.INFO)
   logger = logging.getLogger(__name__)

   def generate_with_fallback(prompt, max_retries=3):
       @retry.Retry(
           predicate=retry.if_exception_type(
               exceptions.ResourceExhausted,
               exceptions.ServiceUnavailable,
               exceptions.DeadlineExceeded
           ),
           initial=1.0,
           maximum=10.0,
           multiplier=2.0,
           deadline=60.0
       )
       def _generate():
           return model.generate_content(prompt)

       try:
           return _generate()
       except exceptions.InvalidArgument as e:
           logger.error(f"Invalid argument: {e}")
           raise
       except exceptions.PermissionDenied as e:
           logger.error(f"Permission denied: {e}")
           raise
       except Exception as e:
           logger.error(f"Unexpected error: {e}")
           # Fallback to simpler model or cached response
           return None
   ```

4. **Monitor Usage and Costs**
   ```python
   def log_usage(response):
       usage = response.usage_metadata
       logger.info(f"Tokens - Input: {usage.prompt_token_count}, "
                   f"Output: {usage.candidates_token_count}, "
                   f"Total: {usage.total_token_count}")

       # Estimate cost (for prompts ≤200k tokens)
       input_cost = (usage.prompt_token_count / 1_000_000) * 2.00
       output_cost = (usage.candidates_token_count / 1_000_000) * 12.00
       total_cost = input_cost + output_cost

       logger.info(f"Estimated cost: ${total_cost:.6f}")

   response = model.generate_content(prompt)
   log_usage(response)
   ```

5. **Implement Rate Limiting**
   ```python
   import time
   from collections import deque

   class RateLimiter:
       def __init__(self, max_requests_per_minute=60):
           self.max_rpm = max_requests_per_minute
           self.requests = deque()

       def wait_if_needed(self):
           now = time.time()
           # Remove requests older than 1 minute
           while self.requests and self.requests[0] < now - 60:
               self.requests.popleft()

           # Check if at limit
           if len(self.requests) >= self.max_rpm:
               sleep_time = 60 - (now - self.requests[0])
               if sleep_time > 0:
                   time.sleep(sleep_time)

           self.requests.append(now)

   limiter = RateLimiter(max_requests_per_minute=60)

   def generate_with_rate_limit(prompt):
       limiter.wait_if_needed()
       return model.generate_content(prompt)
   ```

6. **Setup Logging and Monitoring**
   ```python
   import logging
   from datetime import datetime

   # Configure logging
   logging.basicConfig(
       level=logging.INFO,
       format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
       handlers=[
           logging.FileHandler('gemini_api.log'),
           logging.StreamHandler()
       ]
   )

   logger = logging.getLogger(__name__)

   def monitored_generate(prompt):
       start_time = datetime.now()
       try:
           response = model.generate_content(prompt)
           duration = (datetime.now() - start_time).total_seconds()

           logger.info(f"Success - Duration: {duration}s, "
                       f"Tokens: {response.usage_metadata.total_token_count}")
           return response
       except Exception as e:
           duration = (datetime.now() - start_time).total_seconds()
           logger.error(f"Failed - Duration: {duration}s, Error: {e}")
           raise
   ```

**Expected Outcome:** Production-ready deployment with monitoring, cost control, error handling, and rate limiting.

---

## Thinking Levels

### Dynamic Thinking System

Gemini 3 Pro introduces `thinking_level` to control reasoning depth:

**`thinking_level: "high"` (default)**
- Maximum reasoning depth
- Best quality for complex tasks
- Slower first-token response
- Higher cost
- **Use for:** Complex reasoning, coding, analysis, research

**`thinking_level: "low"`**
- Minimal reasoning overhead
- Faster response
- Lower cost
- Simpler output
- **Use for:** Simple questions, factual answers, quick queries

### Configuration

```python
# Python
model = genai.GenerativeModel(
    "gemini-3-pro-preview",
    generation_config={
        "thinking_level": "high"  # or "low"
    }
)
```

```typescript
// Node.js
const model = genAI.getGenerativeModel({
  model: "gemini-3-pro-preview",
  generationConfig: {
    thinking_level: "high"  // or "low"
  }
});
```

### Critical Notes

⚠️ **Temperature MUST stay at 1.0** - Changing temperature can cause looping or degraded performance on complex reasoning tasks.

⚠️ **Cannot combine** `thinking_level` with legacy `thinking_budget` parameter.

See `references/thinking-levels.md` for detailed guide.

---

## Streaming Responses

### Python Streaming

```python
response = model.generate_content(
    "Write a long article about AI",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)
```

### Node.js Streaming

```typescript
const result = await model.generateContentStream("Write a long article about AI");

for await (const chunk of result.stream) {
    process.stdout.write(chunk.text());
}
```

### Benefits

- Lower perceived latency
- Real-time user feedback
- Better UX for long responses
- Can process tokens as they arrive

See `references/streaming.md` for advanced patterns.

---

## Cost Optimization

### Pricing (Gemini 3 Pro)

| Context Size | Input | Output |
|-------------|-------|--------|
| ≤ 200k tokens | $2/1M | $12/1M |
| > 200k tokens | $4/1M | $18/1M |

### Optimization Strategies

1. **Keep prompts under 200k tokens** (50% cheaper)
2. **Use `thinking_level: "low"` for simple tasks** (faster, lower cost)
3. **Implement context caching** for reusable contexts (see `gemini-3-advanced` skill)
4. **Monitor token usage** and set budgets
5. **Use Gemini 1.5 Flash** for simple tasks (20x cheaper)

See `references/best-practices.md` for comprehensive cost optimization.

---

## Model Selection

### Gemini 3 Pro vs Other Models

| Model | Context | Output | Input Price | Best For |
|-------|---------|--------|-------------|----------|
| **gemini-3-pro-preview** | 1M | 64k | $2-4/1M | Complex reasoning, coding |
| gemini-1.5-pro | 1M | 8k | $7-14/1M | General use, multimodal |
| gemini-1.5-flash | 1M | 8k | $0.35-0.70/1M | Simple tasks, cost-sensitive |

### When to Use Gemini 3 Pro

✅ Complex reasoning tasks
✅ Advanced coding problems
✅ Long-context analysis (up to 1M tokens)
✅ Large output requirements (up to 64k tokens)
✅ Tasks requiring dynamic thinking

### When to Use Alternatives

- **Gemini 1.5 Flash:** Simple tasks, cost-sensitive applications
- **Gemini 1.5 Pro:** Multimodal tasks, general use
- **Gemini 2.5 models:** Experimental features, specific capabilities

---

## Error Handling

### Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `ResourceExhausted` | Rate limit exceeded | Implement retry with backoff |
| `InvalidArgument` | Invalid parameters | Validate input, check docs |
| `PermissionDenied` | Invalid API key | Check authentication |
| `DeadlineExceeded` | Request timeout | Reduce context, retry |

### Production Error Handling

```python
from google.api_core import exceptions, retry

@retry.Retry(
    predicate=retry.if_exception_type(
        exceptions.ResourceExhausted,
        exceptions.ServiceUnavailable
    ),
    initial=1.0,
    maximum=60.0,
    multiplier=2.0
)
def safe_generate(prompt):
    try:
        return model.generate_content(prompt)
    except exceptions.InvalidArgument as e:
        logger.error(f"Invalid argument: {e}")
        raise
    except exceptions.PermissionDenied as e:
        logger.error(f"Permission denied - check API key: {e}")
        raise
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        raise
```

See `references/error-handling.md` for comprehensive patterns.

---

## References

**Setup & Configuration**
- [Setup Guide](references/setup-guide.md) - Installation, authentication, configuration
- [Best Practices](references/best-practices.md) - Optimization, cost control, tips

**Features**
- [Text Generation](references/text-generation.md) - Detailed text generation patterns
- [Chat Patterns](references/chat-patterns.md) - Chat conversation management
- [Thinking Levels](references/thinking-levels.md) - Dynamic thinking system guide
- [Streaming](references/streaming.md) - Streaming response patterns

**Production**
- [Error Handling](references/error-handling.md) - Error handling and retry strategies

**Official Resources**
- [Gemini 3 Documentation](https://ai.google.dev/gemini-api/docs/gemini-3)
- [Python SDK Docs](https://googleapis.github.io/python-genai/)
- [Node.js SDK Docs](https://github.com/google/generative-ai-js)
- [API Reference](https://ai.google.dev/gemini-api/docs)
- [Pricing](https://ai.google.dev/pricing)

---

## Next Steps

### After Basic Setup

1. **Explore chat applications** - Build conversational interfaces
2. **Add multimodal capabilities** - Use `gemini-3-multimodal` skill
3. **Add image generation** - Use `gemini-3-image-generation` skill
4. **Add advanced features** - Use `gemini-3-advanced` skill (caching, tools, batch)

### Common Integration Patterns

- **Simple Chatbot:** This skill only
- **Multimodal Assistant:** This skill + `gemini-3-multimodal`
- **Creative Bot:** This skill + `gemini-3-image-generation`
- **Production App:** All 4 Gemini 3 skills

---

## Troubleshooting

### Issue: API key not working
**Solution:** Verify API key in Google AI Studio, check environment variable

### Issue: Rate limit errors
**Solution:** Implement rate limiting, upgrade to paid tier, reduce request frequency

### Issue: Slow responses
**Solution:** Use `thinking_level: "low"` for simple tasks, enable streaming, reduce context size

### Issue: High costs
**Solution:** Keep prompts under 200k tokens, use appropriate thinking level, consider Gemini 1.5 Flash for simple tasks

### Issue: Temperature warnings
**Solution:** Keep temperature at 1.0 (default) - do not modify for complex reasoning tasks

---

## Summary

This skill provides everything needed to integrate Gemini 3 Pro API into your applications:

✅ Quick setup (< 5 minutes)
✅ Production-ready chat applications
✅ Dynamic thinking configuration
✅ Streaming responses
✅ Error handling and retry logic
✅ Cost optimization strategies
✅ Monitoring and logging patterns

For multimodal, image generation, and advanced features, see the companion skills.

**Ready to build?** Start with **Workflow 1: Quick Start Setup** above!