---
name: invoking-gemini
description: Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing.
metadata:
  version: 0.0.3
---

# Invoking Gemini

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

## When to Use Gemini

**Structured outputs:**
- JSON Schema validation with property ordering guarantees
- Pydantic model compliance
- Strict schema adherence (enum values, required fields)

**Cost optimization:**
- Parallel batch processing (Gemini Flash is lightweight)
- High-volume simple tasks
- Budget-constrained operations

**Google ecosystem:**
- Integration with Google services
- Vertex AI workflows
- Google-specific APIs

**Multi-modal tasks:**
- Image analysis with JSON output
- Video processing
- Audio transcription with structure

## Available Models

**gemini-2.0-flash-exp** (Recommended):
- Fast, cost-effective
- Native JSON Schema support
- Good for structured outputs

**gemini-1.5-pro**:
- More capable reasoning
- Better for complex tasks
- Higher cost

**gemini-1.5-flash**:
- Balanced speed/quality
- Good for most tasks

See [references/models.md](references/models.md) for full model details.

## Setup

**Prerequisites:**

1. Install google-generativeai:
   ```bash
   uv pip install google-generativeai pydantic
   ```

2. Configure API key via project knowledge file:

   **Option 1 (recommended): Individual file**
   - Create document: `GOOGLE_API_KEY.txt`
   - Content: Your API key (e.g., `AIzaSy...`)

   **Option 2: Combined file**
   - Create document: `API_CREDENTIALS.json`
   - Content:
     ```json
     {
       "google_api_key": "AIzaSy..."
     }
     ```

   Get your API key: https://console.cloud.google.com/apis/credentials

## Basic Usage

Import the client:

```python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

# Simple prompt
response = invoke_gemini(
    prompt="Explain quantum computing in 3 bullet points",
    model="gemini-2.0-flash-exp"
)
print(response)
```

## Structured Output

Use Pydantic models for guaranteed JSON Schema compliance:

```python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)

# result is a BookAnalysis instance
print(result.title)  # "1984"
print(result.genre)  # "Dystopian Fiction"
```

**Advantages over Claude:**
- Guaranteed property ordering in JSON
- Strict enum enforcement
- Native schema validation (no prompt engineering)
- Lower cost for simple extractions

## Parallel Invocation

Process multiple prompts concurrently:

```python
from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")
```

**Use cases:**
- Batch classification tasks
- Data labeling
- Multiple independent analyses
- A/B testing prompts

## Error Handling

The client handles common errors:

```python
from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key
```

**Common issues:**
- Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
- Invalid model → Raises ValueError
- Rate limit → Automatically retries with backoff
- Network error → Returns None after retries

## Advanced Features

### Custom Generation Config

```python
response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)
```

### Multi-modal Input

```python
# Image analysis with structured output
from pydantic import BaseModel

class ImageDescription(BaseModel):
    objects: list[str]
    scene: str
    colors: list[str]

result = invoke_with_structured_output(
    prompt="Describe this image",
    pydantic_model=ImageDescription,
    image_path="/mnt/user-data/uploads/photo.jpg"
)
```

See [references/advanced.md](references/advanced.md) for more patterns.

## Comparison: Gemini vs Claude

**Use Gemini when:**
- Structured output is primary goal
- Cost is a constraint
- Property ordering matters
- Batch processing many simple tasks

**Use Claude when:**
- Complex reasoning required
- Long context needed (200K tokens)
- Code generation quality matters
- Nuanced instruction following

**Use both:**
- Claude for planning/reasoning
- Gemini for structured extraction
- Parallel workflows with different strengths

## Token Efficiency Pattern

Gemini Flash is cost-effective for sub-tasks:

```python
# Claude (you) plans the approach
# Gemini executes structured extractions

data_points = []
for file in uploaded_files:
    # Gemini extracts structured data
    result = invoke_with_structured_output(
        prompt=f"Extract contact info from {file}",
        pydantic_model=ContactInfo
    )
    data_points.append(result)

# Claude synthesizes results
# ... your analysis here ...
```

## Limitations

**Not suitable for:**
- Tasks requiring deep reasoning
- Long context (>1M tokens)
- Complex code generation
- Subjective creative writing

**Token limits:**
- gemini-2.0-flash-exp: ~1M input tokens
- gemini-1.5-pro: ~2M input tokens

**Rate limits:**
- Vary by API tier
- Client handles automatic retry

## Examples

See [references/examples.md](references/examples.md) for:
- Data extraction from documents
- Batch classification
- Multi-modal analysis
- Hybrid Claude+Gemini workflows

## Troubleshooting

**"API key not configured":**
- Add project knowledge file `GOOGLE_API_KEY.txt` with your API key
- Or add to `API_CREDENTIALS.json`: `{"google_api_key": "AIzaSy..."}`
- See Setup section above for details

**Import errors:**
```bash
uv pip install google-generativeai pydantic
```

**Schema validation failures:**
- Check Pydantic model definitions
- Ensure prompt is clear about expected structure
- Add examples to prompt if needed

## Cost Comparison

Approximate pricing (as of 2024):

**Gemini 2.0 Flash:**
- Input: $0.15 / 1M tokens
- Output: $0.60 / 1M tokens

**Claude Sonnet:**
- Input: $3.00 / 1M tokens
- Output: $15.00 / 1M tokens

For 1000 simple extraction tasks (100 tokens each):
- Gemini Flash: ~$0.10
- Claude Sonnet: ~$2.00

**Strategy:** Use Claude for complex reasoning, Gemini for high-volume simple tasks.