--- name: image-prompt-generator description: Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API. --- # Image Prompt Generator Generate professional, non-generic images using Google's Gemini API for image generation. ## Prerequisites & Setup ### Getting Your Gemini API Key 1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey) 2. Sign in with your Google account 3. Click "Create API Key" 4. Copy the generated key ### Configuring the API Key **Option 1: Environment file (recommended)** Create a `.env` file in your project root: ```bash GEMINI_API_KEY=your_api_key_here ``` **Option 2: Direct environment variable** ```bash export GEMINI_API_KEY=your_api_key_here ``` ### Install Dependencies ```bash pip install google-generativeai python-dotenv pillow ``` ### Available Models | Model | API Name | Best For | |-------|----------|----------| | **Flash** | `gemini-2.5-flash-image` | Speed, drafts, iteration | | **Pro** | `gemini-3-pro-image-preview` | **Final assets, 16:9 aspect ratio, quality** | ### Image Size (CRITICAL for quality) | Size | Output | Use For | |------|--------|---------| | 1K | ~400KB | Drafts, iteration | | **2K** | ~2MB | **Thumbnails, web assets (default)** | | 4K | ~7MB | Print, high-resolution needs | **CRITICAL**: Always specify `--size 2K` or `--size 4K` for final assets. Without this, output defaults to 1K which looks low-quality. ### Skill Stack Thumbnails - MANDATORY SETTINGS For ANY Skill Stack thumbnail, you MUST use: ```bash python scripts/images/generate_image.py "YOUR PROMPT" \ --model pro \ --size 2K \ --aspect 16:9 \ --output public/images/thumbnails \ --name your-slug ``` And your prompt MUST include risograph style instructions: ``` STYLE: Risograph / screen print aesthetic. Visible halftone dots throughout. Slight misregistration between color layers. Indie printmaking quality - warm, tactile, handmade. COLOR: Warm cream base. Charcoal and gray tones. Terracotta (#D4654A) accent on ONE key element. TEXTURE: Visible halftone pattern. Paper grain. NOT smooth digital gradients. AVOID: Empty white borders, photorealism, glossy renders, smooth gradients, clip art aesthetic. ``` **WHY**: Without `--size 2K`, output is ~400KB garbage that looks like it was made with Flash. Without risograph style, output looks generic and AI-generated. --- ## Workflow Overview 1. **Brainstorm Concepts** - Generate 4-6 high-level visual ideas 2. **Select Direction** - User picks the concept they like 3. **Optimize Prompt** - Refine into a strong, detailed prompt 4. **Style Variations** - Adapt to 2-3 different visual styles 5. **Generate Images** - Run via Gemini API ## Step 1: Brainstorm Concepts When the user provides a topic or use case, generate **3 high-level visual concepts** and indicate your pick. Each concept should be: - **One sentence** describing the visual idea - **Concrete and immediate** - you can picture it instantly - **Conceptual but not abstract** - a clear object/scene with meaning - **Non-generic** - avoid cliches (no lightbulbs for ideas, no handshakes for partnership) **Format:** ``` **Concept A:** One sentence description of the visual concept and why it works. **Concept B:** One sentence description... **Concept C:** One sentence description... **My pick: Concept X** — Brief reasoning why this one is strongest. ``` **Example for "newsletter about personal productivity":** ``` **Concept A:** A vintage compass where the needle points toward a coffee ring stain on a map—direction emerges from daily rituals. **Concept B:** A clock where the 12 hours show seasonal changes—time management over long arcs, not just hours. **Concept C:** One small key attached to dozens of decorative keychains—we overcomplicate simple solutions. **My pick: Concept A** — The coffee stain is unexpected and personal, the compass is concrete without being cliché. ``` Wait for user to confirm or redirect before proceeding. ## Step 2: Optimize the Prompt Once the user selects a concept, develop it into a full prompt. Structure: ``` Create a [style type] illustration of [subject]. CONCEPT: [Expand the one-sentence idea into a clear visual description] STYLE: [Artistic approach - load from references/styles/ if brand-specific] COMPOSITION: [Framing, focal point, negative space, balance] COLORS: [Palette - describe by name, not hex codes which may render as text] TEXTURE: [Surface qualities, analog/digital feel] AVOID: [What should NOT appear - be specific] FORMAT: [Aspect ratio] ``` **Key principles:** - Natural language, full sentences - no tag soup - Describe colors by name (burnt orange, sky blue, near-black) not hex codes - Maximum 2-3 elements - if it feels busy, remove something - Favor metaphor over literal depiction ## Step 3: Style Variations **MANDATORY for Skill Stack: Risograph style.** Every thumbnail must include risograph style instructions in the prompt. See `references/styles/risograph.md`. Key risograph elements to ALWAYS include in prompt: - "Visible halftone dots throughout" - "Slight misregistration between color layers" - "Indie printmaking quality - warm, tactile, handmade" - "NOT smooth digital gradients" Available styles in `references/styles/` (for non-Skill-Stack projects): - **risograph.md** - DEFAULT. Halftone dots, misregistration, indie printmaking aesthetic. - **minimalist-ink.md** - High-contrast black and white, crosshatching. - **watercolor-line.md** - Ink linework with watercolor washes. - **editorial-conceptual.md** - Conceptual, sophisticated, editorial wit. ## Step 4: Generate via API ### Running the Script ```bash # Generate 2K thumbnail (recommended for web) python scripts/images/generate_image.py "prompt here" --model pro --aspect 16:9 --size 2K # Generate 4K for print/high-res python scripts/images/generate_image.py "prompt here" --model pro --size 4K # Save to specific folder python scripts/images/generate_image.py "prompt" --output "./images" --name "my_image" --size 2K ``` **Options:** - `--model pro` (default, higher quality) or `--model flash` (faster) - `--size 2K` (default), `1K`, or `4K` - **always use 2K or 4K for final assets** - `--aspect 16:9` (default), `1:1`, `9:16`, `3:4`, `4:3` - `--variations N` - generate N versions - `--output ./path` - save location - `--name prefix` - filename prefix **Output location:** Save images alongside the content they belong to - not a generic images dump. ## Step 5: Iterate After user reviews generated images: - **80% good?** Request specific edits conversationally rather than regenerating - **Composition off?** Adjust framing or element placement in prompt - **Wrong style?** Try a different style reference - **Too busy?** Simplify to fewer elements - **Colors wrong?** Be more explicit about palette ## Prompting Principles ### Write Like a Creative Director Brief the model like a human artist. Use proper grammar, full sentences, and descriptive adjectives. | Don't | Do | |-------|-----| | "Cool car, neon, city, night, 8k" | "A cinematic wide shot of a futuristic sports car speeding through a rainy Tokyo street at night. The neon signs reflect off the wet pavement and the car's metallic chassis." | **Be specific about:** - **Subject:** Instead of "a woman," say "a sophisticated elderly woman wearing a vintage chanel-style suit" - **Materiality:** Describe textures - "matte finish," "brushed steel," "soft velvet," "crumpled paper" - **Setting:** Define location, time of day, weather - **Lighting:** Specify mood and light source - **Mood:** Emotional tone of the image ### Provide Context Context helps the model make logical artistic decisions. Include the "why" or "for whom." **Example:** "Create an image of a sandwich for a Brazilian high-end gourmet cookbook." *(Model infers: professional plating, shallow depth of field, perfect lighting)* ### Keep It Simple - One clear focal point - Maximum 2-3 elements total - Generous negative space - If it feels busy, remove something ### Avoid the Generic **Hard rules:** - **NEVER include text in images** - Text renders poorly and looks amateur - No lightbulbs for "ideas" - No handshakes for "partnership" - No audio waveforms for "podcasts" or "audio" - No gears/cogs for "systems" or "process" - No happy stock photo poses - No glossy AI aesthetic - No puzzle pieces for "connection" or "integration" ### Metaphors That Work Strong concepts from past generations: - **Orange cut open revealing neural network** - organic exterior, systematic interior (knowledge systems, RAG) - **Mail slot with origami bird emerging** - delivery transforms into something alive (newsletters) - **Exercise club crossed with quill pen** - physical culture meets intellectual work (body + mind) - **Typewriter key extreme close-up with radiating impact** - decisive action, command (books, authority) - **Medicine dropper with mandala bloom** - clinical precision meets transcendence (therapy, transformation) - **Straight razor on leather strop** - precision tool, polishing, refinement (editing, production) - **Prism dispersing light into distinct beams** - one input, multiple outputs (frameworks) - **Compass with coffee stain on map** - direction from daily rituals (productivity) ## Resources ### references/styles/ Aesthetic style definitions: - `risograph.md` - **DEFAULT** - Halftone, misregistration, indie printmaking - `minimalist-ink.md` - Black and white ink illustration - `watercolor-line.md` - Ink with watercolor washes - `editorial-conceptual.md` - Conceptual editorial style ### scripts/ - `generate_image.py` - Gemini API image generation ## Prompt Modifiers Reference | Category | Examples | |----------|----------| | **Lighting** | golden hour, dramatic shadows, soft diffused light, neon glow, overcast | | **Style** | cinematic, editorial, technical diagram, hand-drawn, photorealistic | | **Texture** | matte finish, brushed steel, soft velvet, crumpled paper, weathered wood | | **Composition** | wide shot, close-up, bird's eye view, dutch angle, symmetrical | | **Mood** | energetic, serene, dramatic, playful, sophisticated | | **Quality** | 4K, high-fidelity, pixel-perfect, professional grade | ## Advanced Capabilities ### Text Rendering & Infographics Put exact text in quotes. Specify style: "polished editorial," "technical diagram," or "hand-drawn whiteboard." **Example prompts:** ``` Earnings Report Infographic: "Generate a clean, modern infographic summarizing the key financial highlights from this earnings report. Include charts for 'Revenue Growth' and 'Net Income', and highlight the CEO's key quote in a stylized pull-quote box." ``` ``` Whiteboard Summary: "Summarize the concept of 'Transformer Neural Network Architecture' as a hand-drawn whiteboard diagram suitable for a university lecture. Use different colored markers for the Encoder and Decoder blocks, and include legible labels for 'Self-Attention' and 'Feed Forward'." ``` ### Character Consistency & Thumbnails Use reference images and state "Keep the person's facial features exactly the same as Image 1." Describe expression/action changes while maintaining identity. **Example prompt:** ``` Viral Thumbnail: "Design a viral video thumbnail using the person from Image 1. Face Consistency: Keep the person's facial features exactly the same as Image 1, but change their expression to look excited and surprised. Action: Pose the person on the left side, pointing their finger towards the right side of the frame. Subject: On the right side, place a high-quality image of a delicious avocado toast. Graphics: Add a bold yellow arrow connecting the person's finger to the toast. Text: Overlay massive, pop-style text in the middle: 'Done in 3 mins!'. Use a thick white outline and drop shadow. Background: A blurred, bright kitchen background. High saturation and contrast." ``` ### Image Reworking (Edit Existing Images) The `--input` flag enables "rework mode" - pass an existing image to Gemini and describe the changes you want. **Key use cases:** - **Small tweaks** - Adjust colors, add/remove elements, change lighting - **Style transfer** - Keep composition but change artistic style - **Object manipulation** - Remove, add, or modify specific objects - **Seasonal/temporal changes** - Same scene, different time/season **Running in rework mode:** ```bash # Basic edit - add something python scripts/generate_image.py "Add snow to the roof and yard" \ --input ./house.png \ --model pro # Color adjustment python scripts/generate_image.py "Change the accent color from red to teal, keep everything else identical" \ --input ./thumbnail.png \ --model pro # Style transfer - keep composition, change aesthetic python scripts/generate_image.py "Convert this to risograph style with halftone dots and slight color misregistration" \ --input ./photo.png \ --model pro # Generate variations of an edit python scripts/generate_image.py "Make the lighting warmer, like golden hour" \ --input ./portrait.png \ --variations 3 \ --model pro ``` **Prompting tips for rework mode:** 1. **Be specific about what to preserve:** - "Keep the person's facial features exactly the same" - "Maintain the composition and framing" - "Don't change the background" 2. **Be explicit about what to change:** - "Change ONLY the color of the shirt from blue to red" - "Add snow to the roof and nothing else" - "Remove the text overlay" 3. **Use comparative language:** - "Make the colors more vibrant" - "Increase the contrast slightly" - "Make the lighting softer and more diffused" **Output naming:** Files from rework mode are named `{prefix}_{timestamp}_edit_{model}.png` to distinguish from generated images (`_gen_`). ### Advanced Editing Examples **Object Removal:** ```bash python scripts/generate_image.py \ "Remove the tourists from the background and fill with matching cobblestones and storefronts" \ --input ./street-photo.png \ --model pro ``` **Seasonal Control:** ```bash python scripts/generate_image.py \ "Turn this into winter. Add snow to the roof and yard. Change lighting to cold, overcast afternoon. Keep architecture identical." \ --input ./house-summer.png \ --model pro ``` **Character Consistency (thumbnail series):** ```bash python scripts/generate_image.py \ "Keep the person's face exactly the same. Change expression to surprised. Add a pointing gesture toward the right side of the frame." \ --input ./person-reference.png \ --model pro ``` --- ## Related Skills - **youtube-title-creator** - Pair generated images with optimized titles - **social-content-creation** - Use images in platform-optimized posts --- *For custom brand styles, create new style files in references/styles/ following the existing format*