--- name: image-generation description: Generate or edit images from text prompts. Use when the user asks to create, draw, design, or edit an image, illustration, photo, icon, poster, or any visual content. metadata: cowagent: requires: anyEnv: - OPENAI_API_KEY - GEMINI_API_KEY - ARK_API_KEY - DASHSCOPE_API_KEY - MINIMAX_API_KEY - LINKAI_API_KEY --- # Image Generation Generate and edit images using AI models. The script automatically picks a backend based on which API keys are configured — **you don't need to specify a model unless the user explicitly names one**. Supported models (passed via `model` only when the user asks for a specific one): - **OpenAI** — `gpt-image-2`, `gpt-image-1` - **Gemini Nano Banana** — `nano-banana-2`, `nano-banana-pro`, `nano-banana` - **Seedream (Volcengine Ark)** — `seedream-5.0-lite`, `seedream-4.5` - **Qwen (DashScope)** — `qwen-image-2.0`, `qwen-image-2.0-pro` - **MiniMax** — `image-01` ## Usage Run `scripts/generate.py` with a JSON argument. The path is relative to this skill's `base_dir`. ```bash python /scripts/generate.py '' ``` **Set bash timeout to at least 600 seconds**, as image generation can take 30–200s per provider, and the script may try multiple providers sequentially. ### Parameters | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `prompt` | string | yes | — | Image description | | `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL. Multi-image fusion is supported (pass a list) | | `quality` | string | no | auto | `low` / `medium` / `high` (only some backends honour this) | | `size` | string | no | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value (`1024x1024`) | | `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9` (some backends also support extreme ratios like `1:4` / `8:1`) | **Higher `quality` and larger `size` cost more and run slower.** In normal cases, when the user does not explicitly specify, `low` or `medium` is sufficient. Only use `high` when the user asks for it. ### Example — generate ```bash python /scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}' ``` With aspect ratio: ```bash python /scripts/generate.py '{"prompt": "Isometric miniature city of Shanghai at sunset", "size": "2K", "aspect_ratio": "16:9"}' ``` ### Important: Editing vs Generating When the user asks to **edit, modify, or improve an existing image**, pass the original image via `image_url`. Prefer **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing. ### Example — edit (image-to-image) ```bash python /scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}' ``` Multi-image fusion — pass a list: ```bash python /scripts/generate.py '{"prompt": "Combine these characters into a group photo", "image_url": ["/path/a.png", "/path/b.png"]}' ``` ### Output Prints JSON to stdout: ```json { "model": "doubao-seedream-5-0-260128", "images": [ {"url": "/path/to/output.png"} ] } ``` After success, display the image to the user. You can either embed it in markdown (`![description](/path/to/output.png)`) or use the `send` tool. On error: ```json { "error": "error message" } ``` ### Setup The script needs **at least one** of these API keys (set via `env_config` or `config.json`): `OPENAI_API_KEY` / `GEMINI_API_KEY` / `ARK_API_KEY` / `DASHSCOPE_API_KEY` / `MINIMAX_API_KEY` / `LINKAI_API_KEY` Each also has an optional `*_API_BASE` for custom endpoints. The script automatically picks the first configured backend and falls back to the next if it fails — no need to specify a model. ### Error Handling If the script returns an error after trying all configured backends, **do NOT retry with the same parameters** — the failure is almost always a configuration issue (wrong API key, unsupported API base). Tell the user to fix it via `env_config`, then retry. ### Notes - HTTP timeout is 300s — high-resolution generation can take over 200s. - Omit `quality` / `size` to let the model pick automatically (`auto`). - Input images for editing are auto-compressed to ≤ 4MB / longest edge ≤ 4096px.