---
name: gemini-image-generator
description: Generate images using Google's Gemini API. Use when creating images from text prompts, editing existing images, or combining reference images for AI-generated visual content.
---

# Gemini Image Generator

## Overview

Generate images using Google's Gemini API with support for text-to-image generation, image editing, and multi-image reference inputs. Supports both the fast Gemini 2.5 Flash model and the high-quality Gemini 3 Pro model with up to 4K resolution.

## When to Use

- Generating app icons, logos, and UI assets
- Creating marketing visuals and promotional graphics
- Prototyping UI designs with AI-generated placeholders
- Generating game sprites and 2D assets
- Creating concept art and mood boards
- Editing or modifying existing images with text prompts
- Style transfer using reference images

## Prerequisites

- Python 3.9+
- `google-genai` package
- `GEMINI_API_KEY` environment variable

## Installation

```bash
pip install google-genai
```

### Getting an API Key

1. Go to [Google AI Studio](https://aistudio.google.com/apikey)
2. Sign in with your Google account
3. Click "Create API Key"
4. Copy the key and set it as an environment variable:

```bash
export GEMINI_API_KEY="your-api-key"
```

Add to your shell profile (`~/.zshrc` or `~/.bashrc`) for persistence:

```bash
echo 'export GEMINI_API_KEY="your-api-key"' >> ~/.zshrc
```

## Quick Start

Generate a simple image:

```bash
python scripts/generate_image.py -p "A fluffy orange cat sitting on a windowsill, warm sunlight, cozy atmosphere"
```

Generate with specific aspect ratio:

```bash
python scripts/generate_image.py -p "Modern tech startup banner" -a 16:9 -o banner.png
```

Edit an existing image:

```bash
python scripts/generate_image.py -p "Make the sky more dramatic with sunset colors" -i photo.jpg -o edited.png
```

## Command Reference

```
python scripts/generate_image.py [options]

Required:
  -p, --prompt TEXT         Text prompt describing the image

Optional:
  -o, --output PATH         Output file path (default: auto-generated)
  -m, --model MODEL         Model to use (default: gemini-3-pro-image-preview)
  -a, --aspect-ratio RATIO  Aspect ratio (default: 1:1)
  -s, --size SIZE           Image size: 1K, 2K, 4K (default: 1K, Pro only)
  -i, --input-image PATH    Input image for editing mode
  -r, --reference-images    Reference image(s), can be repeated (max 14)
  -v, --verbose             Show detailed progress
```

## Models

| Model | Resolution | Best For |
|-------|------------|----------|
| `gemini-3-pro-image-preview` | Up to 4K | Final assets, high quality, professional work |
| `gemini-2.5-flash-image` | 1024px | Quick iterations, prototyping, batch generation |

The Pro model is used by default. Use the Flash model for faster generation when quality is less critical:

```bash
python scripts/generate_image.py -p "Quick concept sketch" -m gemini-2.5-flash-image
```

## Aspect Ratios

| Ratio | Use Case |
|-------|----------|
| `1:1` | App icons, profile pictures, thumbnails |
| `2:3` | Portrait photos, book covers |
| `3:2` | Landscape photos, postcards |
| `3:4` | Portrait photos, social media posts |
| `4:3` | Traditional photos, presentations |
| `4:5` | Instagram posts, portrait social media |
| `5:4` | Large format prints |
| `9:16` | Stories, vertical videos, mobile wallpapers |
| `16:9` | Widescreen banners, video thumbnails, headers |
| `21:9` | Ultrawide banners, cinematic headers |

## Image Sizes

Available for Gemini 3 Pro model only:

| Size | Resolution | Use Case |
|------|------------|----------|
| `1K` | 1024px | Web graphics, thumbnails |
| `2K` | 2048px | Print materials, detailed graphics |
| `4K` | 4096px | High-resolution prints, large displays |

```bash
python scripts/generate_image.py -p "Detailed landscape" -s 4K -o landscape_4k.png
```

## Prompt Engineering Guide

### Prompt Structure

Use this formula for effective prompts:

```
[Subject] + [Style] + [Details] + [Quality modifiers]
```

### Techniques

**1. Be Specific About the Subject**

```
Bad:  "a cat"
Good: "a fluffy orange tabby cat sitting on a windowsill"
```

**2. Specify Art Style**

- Photorealistic, cartoon, anime, oil painting, watercolor
- Digital art, 3D render, pixel art, vector illustration
- Specific styles: "in the style of Studio Ghibli", "cyberpunk aesthetic"

**3. Include Environment and Lighting**

- "golden hour lighting", "dramatic shadows", "soft ambient light"
- "neon-lit cityscape", "cozy interior", "misty forest"

**4. Add Quality Modifiers**

- "high quality", "detailed", "professional"
- "sharp focus", "studio lighting", "cinematic"

**5. Specify Composition**

- "centered composition", "rule of thirds"
- "close-up", "wide shot", "bird's eye view", "isometric"

### Example Prompts by Use Case

**App Icon**
```
Minimalist app icon for a weather app, blue gradient background,
white cloud with golden sun rays, flat design, rounded corners,
iOS style, clean and modern
```

**Marketing Banner**
```
Professional tech startup banner, abstract geometric shapes
flowing from left to right, purple and blue gradient,
modern and clean aesthetic, corporate style
```

**Game Sprite**
```
Pixel art character sprite, fantasy warrior with glowing sword,
32x32 style, transparent background, retro 16-bit game aesthetic,
vibrant colors
```

**Product Photo**
```
Professional product photo of wireless earbuds on white background,
soft shadows, studio lighting, minimalist composition,
commercial photography style
```

**Concept Art**
```
Futuristic city skyline at sunset, flying vehicles between
towering skyscrapers, neon lights reflecting on wet streets,
cyberpunk atmosphere, cinematic composition, detailed
```

**UI Mockup Asset**
```
Abstract gradient background for mobile app, soft purple to pink
transition, subtle geometric patterns, modern and minimal,
suitable for dark text overlay
```

## Generation Modes

### Text-to-Image

Generate images from text descriptions:

```bash
python scripts/generate_image.py -p "Your description here" -o output.png
```

### Image Editing

Modify an existing image with a text prompt:

```bash
python scripts/generate_image.py \
  -p "Change the background to a tropical beach at sunset" \
  -i original.jpg \
  -o edited.png
```

### Multi-Image Reference

Use up to 14 reference images to guide style or content:

```bash
python scripts/generate_image.py \
  -p "Create a new character in this art style" \
  -r style_ref1.png \
  -r style_ref2.png \
  -o new_character.png
```

## Examples

### Generate App Icons

```bash
# iOS-style weather icon
python scripts/generate_image.py \
  -p "Minimalist weather app icon, blue sky gradient, white fluffy cloud, sun peeking out, flat design, rounded square, iOS 17 style" \
  -a 1:1 \
  -o weather_icon.png

# Fitness app icon
python scripts/generate_image.py \
  -p "Fitness app icon, running figure silhouette, orange to red gradient background, energetic and dynamic, modern flat design" \
  -a 1:1 \
  -o fitness_icon.png
```

### Create Marketing Assets

```bash
# Website hero banner
python scripts/generate_image.py \
  -p "Abstract tech hero banner, flowing data visualization, dark blue background with glowing cyan accents, futuristic and professional" \
  -a 21:9 \
  -s 2K \
  -o hero_banner.png

# Social media post
python scripts/generate_image.py \
  -p "Motivational quote background, soft sunrise gradient, minimalist mountain silhouette, peaceful and inspiring" \
  -a 4:5 \
  -o social_post_bg.png
```

### Generate Game Assets

```bash
# Character sprite
python scripts/generate_image.py \
  -p "Pixel art hero character, knight with blue cape and silver armor, idle pose, transparent background, 16-bit retro style" \
  -a 1:1 \
  -o knight_sprite.png

# Environment tile
python scripts/generate_image.py \
  -p "Grass tile for top-down RPG, seamless pattern, vibrant green with small flowers, pixel art style, 32x32 aesthetic" \
  -a 1:1 \
  -o grass_tile.png
```

### Edit Photos

```bash
# Change background
python scripts/generate_image.py \
  -p "Replace background with a cozy coffee shop interior" \
  -i portrait.jpg \
  -o portrait_coffee_shop.png

# Style enhancement
python scripts/generate_image.py \
  -p "Enhance with dramatic cinematic color grading, increase contrast, add film grain" \
  -i landscape.jpg \
  -o landscape_cinematic.png
```

## Troubleshooting

### "GEMINI_API_KEY environment variable not set"

Set your API key:
```bash
export GEMINI_API_KEY="your-api-key"
```

### "Rate limit exceeded"

Wait a few minutes and try again. For batch operations, add delays between requests.

### "Content policy violation"

Modify your prompt to avoid content that violates Google's usage policies. Try:
- Using more generic descriptions
- Avoiding specific brand names or copyrighted characters
- Removing potentially sensitive content

### "No image in response"

The model sometimes returns text instead of an image. Try:
- Making your prompt more specific
- Adding "generate an image of" to your prompt
- Using a different aspect ratio

### "Unsupported image format"

Supported formats for input images: PNG, JPEG, WebP

### Size option not working

The size option (2K, 4K) is only available for `gemini-3-pro-image-preview`. The Flash model always generates 1024px images.

## Best Practices

- **Start simple**: Begin with clear, concise prompts and iterate
- **Use the right model**: Flash for speed, Pro for quality
- **Match aspect ratio to use case**: 16:9 for banners, 1:1 for icons
- **Save high-quality versions**: Use 4K when you need detailed assets
- **Iterate on prompts**: Small changes can significantly affect results
- **Use reference images**: For consistent style across multiple generations
- **Add quality modifiers**: "high quality", "detailed", "professional"
- **Specify what you don't want**: "no text", "simple background", "no people"