---
name: gemini-image
description: Analyze images using Gemini's vision capabilities. Use for image analysis, text extraction from screenshots, and visual content understanding.
---

# Gemini Image Analysis

Analyze images using Gemini Pro's vision capabilities.

## Prerequisites

```bash
pip install google-generativeai
export GEMINI_API_KEY=your_api_key
```

## CLI Reference

### Basic Image Analysis

```bash
# Analyze an image
gemini -m pro -f /path/to/image.png "Describe this image in detail"

# With specific question
gemini -m pro -f screenshot.png "What error message is shown?"

# Multiple images
gemini -m pro -f image1.png -f image2.png "Compare these two images"
```

## Analysis Operations

### General Description

```bash
gemini -m pro -f image.png "Describe this image comprehensively:
1. Main subject/content
2. Colors and composition
3. Text visible (if any)
4. Context and purpose
5. Notable details"
```

### Extract Text (OCR)

```bash
gemini -m pro -f screenshot.png "Extract all text from this image.
Format as plain text, preserving layout where possible.
Include any text in buttons, labels, or UI elements."
```

### Code from Screenshot

```bash
gemini -m pro -f code-screenshot.png "Extract the code from this screenshot.
Provide as properly formatted code with correct indentation.
Note any parts that are unclear or partially visible."
```

### UI Analysis

```bash
gemini -m pro -f ui-screenshot.png "Analyze this UI:
1. What application/website is this?
2. What page/screen is shown?
3. Main UI elements and their purpose
4. User flow/actions available
5. Any UX issues or suggestions"
```

### Error Analysis

```bash
gemini -m pro -f error-screenshot.png "Analyze this error:
1. What error is shown?
2. What is the likely cause?
3. How to fix it?
4. Any related information visible?"
```

### Diagram Understanding

```bash
gemini -m pro -f diagram.png "Explain this diagram:
1. What type of diagram is this?
2. Main components and their relationships
3. Data/process flow
4. Key takeaways"
```

## Specific Use Cases

### Debug Screenshot

```bash
gemini -m pro -f debug-screen.png "I'm debugging an issue. From this screenshot:
1. What is the current state?
2. What errors or warnings are visible?
3. What should I look at?
4. Suggested next steps"
```

### Compare Before/After

```bash
gemini -m pro -f before.png -f after.png "Compare these before and after images:
1. What changed?
2. Is this an improvement?
3. Any issues in the 'after' version?
4. Anything missing?"
```

### Design Feedback

```bash
gemini -m pro -f design.png "Provide design feedback:
1. Visual hierarchy
2. Color usage
3. Typography
4. Spacing and alignment
5. Accessibility concerns
6. Suggestions for improvement"
```

### Data Extraction

```bash
gemini -m pro -f chart.png "Extract data from this chart:
1. Chart type
2. Data series and values
3. Axes labels and ranges
4. Key trends or insights
5. Output as structured data if possible"
```

### Form Analysis

```bash
gemini -m pro -f form.png "Analyze this form:
1. Form purpose
2. Fields and their types
3. Required vs optional
4. Validation rules visible
5. UX suggestions"
```

## Workflow Patterns

### Screenshot to Issue

```bash
# Capture screenshot (macOS)
screencapture -i /tmp/bug.png

# Analyze and format as issue
gemini -m pro -f /tmp/bug.png "Create a bug report from this screenshot:

## Summary
[One-line description]

## Steps to Reproduce
[Inferred from screenshot]

## Expected Behavior
[What should happen]

## Actual Behavior
[What the screenshot shows]

## Environment
[Any visible system info]"
```

### UI to Code

```bash
gemini -m pro -f ui-design.png "Generate React component code that recreates this UI:
- Use Tailwind CSS for styling
- Make it responsive
- Include proper TypeScript types
- Add appropriate accessibility attributes"
```

### Documentation

```bash
gemini -m pro -f app-screen.png "Write user documentation for this screen:
- What this screen is for
- How to use each feature
- Common tasks
- Tips and notes"
```

## Image Types Supported

- PNG, JPEG, GIF, WebP
- Screenshots
- Photos
- Diagrams and charts
- UI mockups
- Code snippets
- Documents

## Best Practices

1. **Use clear images** - Higher quality = better analysis
2. **Crop to relevant area** - Remove unnecessary context
3. **Ask specific questions** - Vague prompts get vague answers
4. **Provide context** - Tell Gemini what you're looking for
5. **Verify extracted text** - OCR isn't perfect
6. **Multiple angles** - Use multiple images for complex subjects