{ "cells": [ { "cell_type": "markdown", "id": "2a89ade7", "metadata": {}, "source": [ "# GPT Image Generation Models Prompting Guide" ] }, { "cell_type": "markdown", "id": "093e2be4", "metadata": {}, "source": [ "## 1. Introduction" ] }, { "cell_type": "markdown", "id": "f8e7cdd4", "metadata": {}, "source": [ "OpenAI's gpt-image generation models are designed for production-quality visuals and highly controllable creative workflows. They are well-suited for both professional design tasks and iterative content creation, and support both high-quality rendering and lower-latency use cases depending on the workflow.\n", "\n", "Key Capabilities include: \n", "\n", "- **High-fidelity photorealism** with natural lighting, accurate materials, and rich color rendering\n", "- **Flexible quality–latency tradeoffs**, allowing faster generation at lower settings while still exceeding the visual quality of prior-generation image models\n", "- **Robust facial and identity preservation** for edits, character consistency, and multi-step workflows\n", "- **Reliable text rendering** with crisp lettering, consistent layout, and strong contrast inside images\n", "- **Complex structured visuals**, including infographics, diagrams, and multi-panel compositions\n", "- **Precise style control and style transfer** with minimal prompting, supporting everything from branded design systems to fine-art styles\n", "- **Strong real-world knowledge and reasoning**, enabling accurate depictions of objects, environments, and scenarios\n", "\n", "This guide highlights prompting patterns, best practices, and example prompts drawn from real production use cases for `gpt-image-2`. It is our most capable image model, with stronger image quality, improved editing performance, and broader support for production workflows. The `low` quality setting is especially strong for latency-sensitive use cases, while `medium` and `high` remain good fits when maximum fidelity matters. \n" ] }, { "cell_type": "markdown", "id": "d312c387", "metadata": {}, "source": [ "## 1.1 OpenAI Image Model Parameters \n", "\n", "This section is a reference for the image models covered in this guide, focused on:\n", "\n", "- model name\n", "- supported `outputQuality` values\n", "- supported `input_fidelity` values\n", "- supported `size` / resolution behavior\n", "- recommended use cases by workflow\n", "\n", "## Model summary\n", "As of April 21, 2026, OpenAI has the following image models available.\n", "\n", "| Model | `outputQuality` | `input_fidelity` | Resolutions | Recommended use |\n", "| --- | --- | --- | --- | --- |\n", "| `gpt-image-2` | `low`, `medium`, `high` | Disabled. `input_fidelity` does not work for this model because output is already high fidelity by default | Any resolution that satisfies the constraints below | Recommended default for new builds. Use for highest-quality generation and editing, text-heavy images, photorealism, compositing, identity-sensitive edits, and workflows where fewer retries matter more than the lowest possible cost. |\n", "| `gpt-image-1.5` | `low`, `medium`, `high` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Keep for existing validated workflows during migration. For new work, prefer `gpt-image-2`, especially when quality, editing reliability, or flexible sizing matter. |\n", "| `gpt-image-1` | `low`, `medium`, `high` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Legacy compatibility only. If you are starting a new workflow or refreshing prompts, move to `gpt-image-2`; keep `gpt-image-1` only when you need short-term stability while validating the upgrade. |\n", "| `gpt-image-1-mini` | `low`, `medium`, `high` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Use when cost and throughput are the main constraint: large batch variant generation, rapid ideation, previews, lightweight personalization, and draft assets that do not require the strongest generation or editing performance. |\n", "\n", "### `gpt-image-2` size options\n", "\n", "`gpt-image-2` supports any resolution passed in the `size` parameter as long as all of these constraints are met:\n", "\n", "- Maximum edge length must be less than `3840px`\n", "- Both edges must be a multiple of `16`\n", "- Ratio between the long edge and short edge must not be greater than `3:1`\n", "- Total pixels must not exceed `8,294,400`\n", "- Total pixels must not be less than `655,360`\n", "\n", "If the output image exceeds `2560x1440` pixels (`3,686,400` total pixels), commonly referred to as 2K, treat it as experimental because results can be more variable above this size.\n", "\n", "### Popular `gpt-image-2` sizes\n", "\n", "These are useful reference points that fit the constraints above:\n", "\n", "| Label | Resolution | Notes |\n", "| --- | --- | --- |\n", "| HD portrait | `1024x1536` | Standard portrait option |\n", "| HD landscape | `1536x1024` | Standard landscape option |\n", "| Square | `1024x1024` | Good general-purpose default |\n", "| 2K / QHD | `2560x1440` | Popular widescreen format and recommended upper reliability boundary for `gpt-image-2` |\n", "| 4K / UHD | `3840x2160` | Experimental upper-end target. If the max-edge rule is enforced literally as `< 3840`, round down to the nearest valid size such as `3824x2144` |\n", "\n", "### When to use which model\n", "\n", "- Choose `gpt-image-2` as the default for most production workflows. It is the strongest overall model and the right upgrade target for teams currently using `gpt-image-1.5` or `gpt-image-1` for high-quality outputs.\n", "- Choose `gpt-image-2` with quality: low when speed and unit economics dominate the decision. This setting has good quality for a lot of use cases and it a strong fit for high-volume generation and experimentation. You can also try `gpt-image-1-mini` for these use cases, but we have seen quality: low works just as well.\n", "- Keep `gpt-image-1.5` or `gpt-image-1` only for backward compatibility while you validate prompt migrations, regression-test outputs, or maintain older workflows that are not yet ready to move.\n", "\n", "### Recommended upgrade path from `gpt-image-1.5` and `gpt-image-1`\n", "\n", "For workflows currently using `gpt-image-1.5` or `gpt-image-1`, the recommendation is:\n", "\n", "- Upgrade to `gpt-image-2` for customer-facing assets, photorealistic generation, editing-heavy flows, brand-sensitive creative, text-in-image work, and any workflow where better first-pass quality reduces manual review or reruns.\n", "- Consider `gpt-image-1-mini` instead of legacy models only when the main goal is lowering cost for large batches of exploratory or lower-stakes images.\n", "- During migration, keep prompts largely the same at first, then retune only after you have compared output quality, latency, and retry rates on your real workload." ] }, { "cell_type": "markdown", "id": "90b8bc80", "metadata": {}, "source": [ "## 2. Prompting Fundamentals\n", "\n", "The following prompting fundamentals are applicable to GPT image generation models. They are based on patterns that showed up repeatedly in alpha testing across generation, edits, infographics, ads, human images, UI mockups, and compositing workflows.\n", "\n", "* **Structure + goal:** Write prompts in a consistent order (background/scene → subject → key details → constraints) and include the intended use (ad, UI mock, infographic) to set the “mode” and level of polish. For complex requests, use short labeled segments or line breaks instead of one long paragraph.\n", "\n", "* **Prompt format:** Use the format that is easiest to maintain. Minimal prompts, descriptive paragraphs, JSON-like structures, instruction-style prompts, and tag-based prompts can all work well as long as the intent and constraints are clear. For production systems, prioritize a skimmable template over clever prompt syntax.\n", "\n", "* **Specificity + quality cues:** Be concrete about materials, shapes, textures, and the visual medium (photo, watercolor, 3D render), and add targeted “quality levers” only when needed (e.g., *film grain*, *textured brushstrokes*, *macro detail*). For photorealism, include the word “photorealistic” directly in the prompt to strongly engage the model’s photorealistic mode. Similar phrases like “real photograph,” “taken on a real camera,” “professional photography,” or “iPhone photo” can also help, but detailed camera specs may be interpreted loosely, so use them mainly for high-level look and composition rather than exact physical simulation.\n", "\n", "* **Latency vs fidelity:** For latency-sensitive or high-volume use cases, start with `quality=\"low\"` and evaluate whether it meets your visual requirements. In many cases, it provides sufficient fidelity with significantly faster generation. For small or dense text, detailed infographics, close-up portraits, identity-sensitive edits, and high-resolution outputs, compare `medium` or `high` before shipping.\n", "\n", "* **Composition:** Specify framing and viewpoint (close-up, wide, top-down), perspective/angle (eye-level, low-angle), and lighting/mood (soft diffuse, golden hour, high-contrast) to control the shot. If layout matters, call out placement (e.g., “logo top-right,” “subject centered with negative space on left”). For wide, cinematic, low-light, rain, or neon scenes, add extra detail about scale, atmosphere, and color so the model does not trade mood for surface realism.\n", "\n", "* **People, pose, and action:** For people in scenes, describe scale, body framing, gaze, and object interactions. Examples: “full body visible, feet included,” “child-sized relative to the table,” “looking down at the open book, not at the camera,” or “hands naturally gripping the handlebars.” These details help with body proportion, action geometry, and gaze alignment.\n", "\n", "* **Constraints (what to change vs preserve):** State exclusions and invariants explicitly (e.g., “no watermark,” “no extra text,” “no logos/trademarks,” “preserve identity/geometry/layout/brand elements”). For edits, use “change only X” + “keep everything else the same,” and repeat the preserve list on each iteration to reduce drift. If the edit should be surgical, also say not to alter saturation, contrast, layout, arrows, labels, camera angle, or surrounding objects.\n", "\n", "* **Text in images:** Put literal text in **quotes** or **ALL CAPS** and specify typography details (font style, size, color, placement) as constraints. For tricky words (brand names, uncommon spellings), spell them out letter-by-letter to improve character accuracy. Use `medium` or `high` quality for small text, dense information panels, and multi-font layouts.\n", "\n", "* **Multi-image inputs:** Reference each input by **index and description** (“Image 1: product photo… Image 2: style reference…”) and describe how they interact (“apply Image 2’s style to Image 1”). When compositing, be explicit about which elements move where (“put the bird from Image 1 on the elephant in Image 2”).\n", "\n", "* **Iterate instead of overloading:** Long prompts can work well, but debugging is easier when you start with a clean base prompt and refine with small, single-change follow-ups (“make lighting warmer,” “remove the extra tree,” “restore the original background”). Use references like “same style as before” or “the subject” to leverage context, but re-specify critical details if they start to drift." ] }, { "cell_type": "markdown", "id": "45bdaee0", "metadata": {}, "source": [ "## 3. Setup\n", "\n", "Run this once. It:\n", "- creates the API client\n", "- creates `output_images/` in the images folder. \n", "- adds a small helper to save base64 images\n", "\n", "Put any reference images used for edits into `input_images/` (or update the paths in the examples)." ] }, { "cell_type": "code", "execution_count": null, "id": "faa04870", "metadata": {}, "outputs": [], "source": [ "import os\n", "import base64\n", "from openai import OpenAI\n", "\n", "client = OpenAI()\n", "\n", "os.makedirs(\"../../images/input_images\", exist_ok=True)\n", "os.makedirs(\"../../images/output_images\", exist_ok=True)\n", "\n", "def save_image(result, filename: str) -> None:\n", " \"\"\"\n", " Saves the first returned image to the given filename inside the output_images folder.\n", " \"\"\"\n", " image_base64 = result.data[0].b64_json\n", " out_path = os.path.join(\"../../images/output_images\", filename)\n", " with open(out_path, \"wb\") as f:\n", " f.write(base64.b64decode(image_base64))\n", "\n", "from IPython.display import HTML, Image, display\n", "\n", "def display_image_grid(items, width=240):\n", " cards = []\n", " for item in items:\n", " title = item.get(\"title\", \"\")\n", " label = f'