--- name: gemini-image description: Invoke Google Gemini for image generation and understanding using the Python google-genai SDK. Supports gemini-3-pro-image-preview (generation + understanding), gemini-2.5-flash-image (fast generation), and vision models for analysis. --- # Gemini Image Skill Invoke Google Gemini models for image generation, image understanding, and visual analysis using the Python `google-genai` SDK. ## Available Models | Model ID | Description | Best For | Output Format | |----------|-------------|----------|---------------| | `gemini-3-pro-image-preview` | Best image generation + understanding | High-quality image gen, complex visual analysis | JPEG | | `gemini-2.5-flash-image` | Fast image generation | Quick image creation | PNG | | `gemini-3-pro-preview` | Multimodal understanding | Image analysis without generation | N/A | | `gemini-2.5-flash` | Fast vision | Quick image analysis | N/A | ## Configuration **API Key**: `${GEMINI_API_KEY}` ## Usage ### Image Generation ```bash python -c " from google import genai from google.genai import types client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) response = client.models.generate_content( model='gemini-3-pro-image-preview', # Returns JPEG | Use gemini-2.5-flash-image for PNG contents='Generate an image of a sunset over mountains', config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'] ) ) # Map mime types to file extensions mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'} # Save generated image if response.candidates and response.candidates[0].content: for part in response.candidates[0].content.parts: if hasattr(part, 'inline_data') and part.inline_data: ext = mime_to_ext.get(part.inline_data.mime_type, '.png') filename = f'output{ext}' # Data is already raw bytes - no base64 decode needed with open(filename, 'wb') as f: f.write(part.inline_data.data) print(f'Image saved to {filename} ({part.inline_data.mime_type})') elif hasattr(part, 'text'): print(part.text) " ``` ### Image Understanding (Analyze Image from File) ```bash python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) # Read image file - must be base64 encoded for INPUT with open('IMAGE_PATH', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Describe this image in detail'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) " ``` ### Image Understanding (From URL) ```bash python -c " from google import genai from google.genai import types import urllib.request import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) # Fetch image from URL - must be base64 encoded for INPUT url = 'IMAGE_URL_HERE' with urllib.request.urlopen(url) as response: image_data = base64.b64encode(response.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='What is in this image?'), types.Part(inline_data=types.Blob(mime_type='image/jpeg', data=image_data)) ]) ] ) print(response.text) " ``` ## Workflow When this skill is invoked: 1. **Determine the task type**: - **Image Generation**: User wants to create an image - **Image Understanding**: User wants to analyze an existing image - **Image Editing**: User wants to modify an image (generation with reference) 2. **Select the appropriate model**: - Image generation → `gemini-3-pro-image-preview` (JPEG) or `gemini-2.5-flash-image` (PNG) - Image analysis → `gemini-3-pro-preview` or `gemini-2.5-flash` 3. **Prepare the input**: - For generation: Text prompt describing desired image - For understanding: Load image file as base64 4. **Execute and handle output**: - Generation: Save binary image data to file - Understanding: Return text description ## Example Invocations ### Generate Product Image ```bash python -c " from google import genai from google.genai import types client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) response = client.models.generate_content( model='gemini-3-pro-image-preview', contents='Create a professional product photo of a sleek wireless headphone on a white background, studio lighting', config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'] ) ) mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'} if response.candidates and response.candidates[0].content: for part in response.candidates[0].content.parts: if hasattr(part, 'inline_data') and part.inline_data: ext = mime_to_ext.get(part.inline_data.mime_type, '.png') with open(f'headphone{ext}', 'wb') as f: f.write(part.inline_data.data) print(f'Image saved to headphone{ext}') " ``` ### Analyze Screenshot ```bash python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('screenshot.png', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Analyze this UI screenshot. Identify any usability issues and suggest improvements.'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) " ``` ### OCR / Extract Text from Image ```bash python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('document.png', 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Extract all text from this image. Preserve formatting where possible.'), types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data)) ]) ] ) print(response.text) " ``` ### Compare Two Images ```bash python -c " from google import genai from google.genai import types import base64 client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) with open('image1.png', 'rb') as f: img1_data = base64.b64encode(f.read()).decode('utf-8') with open('image2.png', 'rb') as f: img2_data = base64.b64encode(f.read()).decode('utf-8') response = client.models.generate_content( model='gemini-3-pro-preview', contents=[ types.Content(parts=[ types.Part(text='Compare these two images. What are the key differences?'), types.Part(inline_data=types.Blob(mime_type='image/png', data=img1_data)), types.Part(inline_data=types.Blob(mime_type='image/png', data=img2_data)) ]) ] ) print(response.text) " ``` ## Image Generation Parameters When generating images, you can customize: ```python config=types.GenerateContentConfig( response_modalities=['IMAGE', 'TEXT'], # Request both image and description temperature=1.0, # Higher = more creative # Additional parameters may be model-specific ) ``` ## Supported Image Formats **Input (for understanding)**: - PNG (`image/png`) - JPEG (`image/jpeg`) - GIF (`image/gif`) - WebP (`image/webp`) **Output (from generation)**: - PNG (default, `image/png`) - The API returns raw bytes in `part.inline_data.data` (NOT base64 encoded) - Check `part.inline_data.mime_type` to determine the actual format returned ## Error Handling Common errors and solutions: - **Image too large**: Resize image before sending (max varies by model) - **Unsupported format**: Convert to PNG/JPEG - **Generation blocked**: Adjust prompt to comply with safety guidelines - **Rate limiting**: Implement retry with exponential backoff ## Notes - Image generation requires `response_modalities=['IMAGE', 'TEXT']` in config - For best results with generation, be specific and descriptive in prompts - Image understanding works with both local files and URLs - Multiple images can be sent in a single request for comparison - Gemini 3 Pro Image is NOT available via CLI - must use Python SDK ## Tools to Use - **Bash**: Execute Python commands - **Read**: Load image files (binary mode) - **Write**: Save generated images - **Glob**: Find image files in directories