---
name: gemini-image-gen
description: Guide for implementing Google Gemini API image generation - create high-quality images from text prompts using gemini-2.5-flash-image model. Use when generating images, creating visual content, or implementing text-to-image features. Supports text-to-image, image editing, multi-image composition, and iterative refinement.
license: MIT
version: 1.0.0
allowed-tools:
  - Bash
  - Read
  - Write
---

# Gemini Image Generation Skill

Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.

## When to Use This Skill

Use this skill when you need to:
- Generate images from text descriptions
- Edit existing images by adding/removing elements or changing styles
- Combine multiple source images into new compositions
- Iteratively refine images through conversational editing
- Create visual content for documentation, design, or creative projects

## Prerequisites

### API Key Setup

The skill automatically detects your `GEMINI_API_KEY` in this order:

1. **Process environment**: `export GEMINI_API_KEY="your-key"`
2. **Skill directory**: `.claude/skills/gemini-image-gen/.env`
3. **Project directory**: `./.env` (project root)

**Get your API key**: Visit [Google AI Studio](https://aistudio.google.com/apikey)

Create `.env` file with:
```bash
GEMINI_API_KEY=your_api_key_here
```

### Python Setup

Install required package:
```bash
pip install google-genai
```

## Quick Start

### Basic Text-to-Image Generation

```python
from google import genai
from google.genai import types
import os

# API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

# Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)
```

### Using the Helper Script

For convenience, use the provided helper script that handles API key detection and file saving:

```bash
# Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "A futuristic city with flying cars" \
  --aspect-ratio 16:9 \
  --output ./docs/assets/city.png

# Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "Modern architecture design" \
  --response-modalities image text \
  --aspect-ratio 1:1
```

## Key Features

### Aspect Ratios

| Ratio | Resolution | Use Case | Token Cost |
|-------|-----------|----------|------------|
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
| 4:3 | 1152×896 | Traditional media | 1290 |
| 3:4 | 896×1152 | Vertical posters | 1290 |

### Response Modalities

- **`['image']`**: Generate only images
- **`['text']`**: Generate only text descriptions
- **`['image', 'text']`**: Generate both images and descriptions

### Image Editing

Provide existing image + text instructions to modify:

```python
import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)
```

### Multi-Image Composition

Combine up to 3 source images (recommended):

```python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)
```

## Prompt Engineering Tips

**Structure effective prompts** with three elements:
1. **Subject**: What to generate ("a robot")
2. **Context**: Environmental setting ("in a futuristic city")
3. **Style**: Artistic treatment ("cyberpunk style, neon lighting")

**Example**: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"

**Quality modifiers**:
- Add terms like "4K", "HDR", "high-quality", "professional photography"
- Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"

**Text in images**:
- Limit to 25 characters maximum
- Use up to 3 distinct phrases
- Specify font styles: "bold sans-serif title" or "handwritten script"

See `references/prompting-guide.md` for comprehensive prompt engineering strategies.

## Safety Settings

The model includes adjustable safety filters. Configure per-request:

```python
config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)
```

See `references/safety-settings.md` for detailed configuration options.

## Output Management

All generated images should be saved to `./docs/assets/` directory:

```bash
# Create directory if needed
mkdir -p ./docs/assets
```

The helper script automatically saves to this location with timestamped filenames.

## Model Specifications

**Model**: `gemini-2.5-flash-image`
- **Input tokens**: Up to 65,536
- **Output tokens**: Up to 32,768
- **Supported inputs**: Text and images
- **Supported outputs**: Text and images
- **Knowledge cutoff**: June 2025
- **Features**: Image generation, structured outputs, batch API, caching

## Limitations

- Maximum 3 input images recommended for best results
- Text rendering works best when generated separately first
- Does not support audio/video inputs
- Regional restrictions on child image uploads (EEA, CH, UK)
- Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi

## Error Handling

Common issues and solutions:

**API key not found**:
```bash
# Check environment variables
echo $GEMINI_API_KEY

# Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
# or
cat .env
```

**Safety filter blocking**:
- Review `response.prompt_feedback.block_reason`
- Adjust safety settings if appropriate for your use case
- Modify prompt to avoid triggering filters

**Token limit exceeded**:
- Reduce prompt length
- Use fewer input images
- Simplify image editing instructions

## Reference Documentation

For detailed information, see:
- `references/api-reference.md` - Complete API specifications
- `references/prompting-guide.md` - Advanced prompt engineering
- `references/safety-settings.md` - Safety configuration details
- `references/code-examples.md` - Additional implementation examples

## Resources

- [Official Documentation](https://ai.google.dev/gemini-api/docs/image-generation)
- [API Reference](https://ai.google.dev/api/generate-content)
- [Get API Key](https://aistudio.google.com/apikey)
- [Google AI Studio](https://aistudio.google.com) - Interactive testing