---
name: nano-banana-pro
description: |
  Image generation and editing using Google Gemini's Nano Banana Pro (gemini-3-pro-image-preview) model.
  Use when user requests: "Generate an image", "Create an image", "Make me a picture", "Draw",
  "Edit that image", "Change the color", "Remove background", "Add transparency", "Modify this image",
  "Make it transparent", "Change the style", "Add text to image", or any image creation/manipulation task.
  Supports text-to-image generation, image editing, multi-turn conversations, and transparency extraction
  via difference matting technique.
---

# Nano Banana Pro Image Generation & Editing

Generate and edit images using Google's Gemini 3 Pro model with advanced transparency support.

## Prerequisites

1. **Dependencies**:
   ```bash
   pip install google-genai Pillow numpy python-dotenv
   ```

2. **API Key**: The script loads from `.env` automatically. Only ask the user if the script fails with "No API key found".

## CLI Usage (REQUIRED)

**ALWAYS use the CLI script. Do NOT write Python code or create .py files.**

Run `scripts/generate.py` directly:

```bash
# Basic generation
python scripts/generate.py "a cute banana sticker" -o banana.png

# With transparency (for game assets, stickers, icons)
python scripts/generate.py "pixel art sword" -o sword.png --transparent

# Custom size and aspect ratio
python scripts/generate.py "game logo" -o logo.png --size 4K --ratio 16:9
```

**Options:**
- `-o, --output` - Output filename (default: output.png)
- `--transparent` - Extract true alpha channel using difference matting
- `--size` - 1K, 2K, or 4K (default: 2K)
- `--ratio` - Aspect ratio: 1:1, 16:9, 9:16, etc. (default: 1:1)
- `--model` - Model override (default: gemini-3-pro-image-preview)

**Note:** The script loads the API key from `.env` automatically. Do not check for API keys manually or ask the user about them - just run the script and it will error with instructions if the key is missing.

## Intent Detection

Analyze user request to determine:

| Intent | Triggers | Action |
|--------|----------|--------|
| **Generate** | "create", "generate", "make", "draw", "design" | Text-to-image |
| **Edit** | "edit", "change", "modify", "update", "fix" | Image-to-image |
| **Transparency** | "transparent", "remove background", "alpha", "cutout", "PNG with transparency" | Use difference matting |
| **Text overlay** | "add text", "write on", "label", "caption" | Use Gemini 3 Pro for accurate text |

## Resolution Selection

Choose resolution based on use case:

| Resolution | Best For | Pixel Output |
|------------|----------|--------------|
| **1K** | Quick previews, thumbnails, web icons | ~1024px |
| **2K** | Social media, standard web images | ~2048px |
| **4K** | Print, professional assets, sprite sheets | ~4096px |

**Heuristics:**
- Sprite sheets, game assets, print materials → **4K**
- Social media, blog images, presentations → **2K**
- Quick tests, thumbnails, prototypes → **1K**

When uncertain, ask user or default to **2K**.

## Aspect Ratios

Available: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`

**Selection guide:**
- Square content (icons, avatars, social posts) → `1:1`
- Portrait (mobile, vertical video) → `9:16` or `3:4`
- Landscape (desktop, presentations) → `16:9` or `3:2`
- Cinematic/ultrawide → `21:9`

## Core Implementation

### Basic Generation

```python
from google import genai
from google.genai import types
from PIL import Image
import io

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Your descriptive prompt here",
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",  # or other ratio
            image_size="2K"     # 1K, 2K, or 4K
        ),
    ),
)

# Extract image from response
for part in response.parts:
    if part.inline_data is not None:
        image = Image.open(io.BytesIO(part.inline_data.data))
        image.save("output.png")
        break
```

### Image Editing

```python
# Load existing image
input_image = Image.open("input.png")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        input_image,
        "Edit instruction: Change the background to sunset colors"
    ],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="2K"
        ),
    ),
)
```

### Multi-Turn Editing

Preserve context across edits using thought signatures:

```python
# First edit
response1 = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[image, "Add a red hat"],
    config=config,
)

# Continue editing (include previous response)
response2 = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        image,
        "Add a red hat",
        response1,  # Include for context preservation
        "Now make the hat blue instead"
    ],
    config=config,
)
```

## Transparency Extraction

When user needs transparent images, use **difference matting**. See `scripts/transparency.py`.

**When to use:**
- User explicitly asks for transparency
- Game sprites, icons, logos
- Assets that will be composited
- Cutouts and stickers

**Process:**
1. Generate image on pure white background (#FFFFFF)
2. Edit same image to pure black background (#000000)
3. Calculate alpha from pixel differences
4. Recover original colors

**Key insight:** Opaque pixels appear identical on both backgrounds (distance ≈ 0), transparent pixels show background color (max distance).

```python
from scripts.transparency import extract_alpha_difference_matting

# After generating white and black background versions
final_image = extract_alpha_difference_matting(img_on_white, img_on_black)
final_image.save("output.png")  # RGBA with true transparency
```

## Prompt Engineering

### Fundamental Principle

> "Describe the scene, don't just list keywords."

Narrative paragraphs outperform disconnected word lists.

### Effective Prompt Structure

```
[Style/Medium] of [Subject] in [Context/Setting], [Lighting], [Additional details]
```

**Examples:**

```
# Photorealistic
A professional studio photograph of a brass steampunk pocket watch,
shot with a 50mm lens, soft diffused lighting from the left,
shallow depth of field with bokeh background, 4K HDR quality.

# Illustration
A detailed digital illustration of a medieval blacksmith's forge,
isometric perspective, warm orange glow from the furnace,
dieselpunk aesthetic with exposed pipes and riveted metal plates.

# Product mockup
A product photography shot of a ceramic coffee mug on a marble surface,
natural window lighting, minimalist Scandinavian style, clean white background.
```

### Text in Images

For images containing text, use Gemini 3 Pro (not Imagen):
- Keep text to 25 characters or less per element
- Use 2-3 distinct text phrases maximum
- Specify font style generally (bold, elegant, handwritten)
- Indicate size (small, medium, large)

### Quality Modifiers

Add these for enhanced output:
- **Photography:** 4K, HDR, studio photo, professional lighting
- **Art:** detailed, by a professional, high-quality illustration
- **General:** high-fidelity, crisp details, polished finish

## Error Handling

```python
from google.genai import errors

def generate_with_retry(client, *, model, contents, config, max_attempts=5):
    for attempt in range(1, max_attempts + 1):
        try:
            return client.models.generate_content(
                model=model, contents=contents, config=config
            )
        except errors.APIError as e:
            code = getattr(e, "code", None) or getattr(e, "status", None)
            if code not in (429, 500, 502, 503, 504) or attempt >= max_attempts:
                raise
            delay = min(30, 2 ** (attempt - 1))
            time.sleep(delay)
```

## Model Selection

| Model | Use Case |
|-------|----------|
| `gemini-3-pro-image-preview` | Complex edits, text rendering, multi-turn, transparency workflows |
| `gemini-2.5-flash-image` | Quick generation, high volume, simple tasks |
| `imagen-4.0-generate-001` | Photorealistic images, no editing needed |

Default to **gemini-3-pro-image-preview** for most tasks.

## File References

- `scripts/generate.py` - CLI for image generation (use this instead of writing code)
- `scripts/transparency.py` - Difference matting implementation
- `references/prompts.md` - Extended prompt examples by category