--- name: baoyu-image-gen description: AI image generation with OpenAI, Google, DashScope, Replicate and APIMart APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images. --- # Image Generation (AI SDK) Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象), Replicate and APIMart providers. ## Script Directory **Agent Execution**: 1. `SKILL_DIR` = this SKILL.md file's directory 2. Script path = `${SKILL_DIR}/scripts/main.ts` ## Preferences (EXTEND.md) Use Bash to check EXTEND.md existence (priority order): ```bash # Check project-level first test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project" # Then user-level (cross-platform: $HOME works on macOS/Linux/WSL) test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user" ``` ┌──────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├──────────────────────────────────────────────────┼───────────────────┤ │ .baoyu-skills/baoyu-image-gen/EXTEND.md │ Project directory │ ├──────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md │ User home │ └──────────────────────────────────────────────────┴───────────────────┘ ┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ Use defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘ **EXTEND.md Supports**: Default provider | Default quality | Default aspect ratio | Default image size | Default models Schema: `references/config/preferences-schema.md` ## Usage ```bash # Basic npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png # With aspect ratio npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 # High quality npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k # From prompt files npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png # With reference images (Google multimodal or OpenAI edits) npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png # With reference images (explicit provider/model) npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png # Specific provider npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai # DashScope (阿里通义万象) npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope # Replicate (google/nano-banana-pro) npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate # Replicate with specific model npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana # APIMart (Gemini-3-Pro-Image-preview, async task polling) npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat in watercolor style" --image out.png --provider apimart # APIMart with reference image npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make this into anime style" --image out.png --provider apimart --ref source.png # APIMart with doubao-seedream-5-0-lite model npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cinematic mountain sunrise" --image out.png --provider apimart --model doubao-seedream-5-0-lite ``` ## Options | Option | Description | |--------|-------------| | `--prompt `, `-p` | Prompt text | | `--promptfiles ` | Read prompt from files (concatenated) | | `--image ` | Output image path (required) | | `--provider google\|openai\|dashscope\|replicate\|apimart` | Force provider (default: auto-detect, usually google if available) | | `--model `, `-m` | Model ID (`--ref` with OpenAI requires GPT Image model, e.g. `gpt-image-1.5`; APIMart example: `doubao-seedream-5-0-lite`) | | `--ar ` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) | | `--size ` | Size (e.g., `1024x1024`) | | `--quality normal\|2k` | Quality preset (default: 2k) | | `--imageSize 1K\|2K\|4K` | Image size hint for Google/APIMart (APIMart maps to `resolution`) | | `--ref ` | Reference images. Supported by Google multimodal, OpenAI edits (GPT Image models), Replicate and APIMart | | `--n ` | Number of images (APIMart currently supports only `1`) | | `--json` | JSON output | ## Environment Variables | Variable | Description | |----------|-------------| | `OPENAI_API_KEY` | OpenAI API key | | `GOOGLE_API_KEY` | Google API key | | `DASHSCOPE_API_KEY` | DashScope API key (阿里云) | | `REPLICATE_API_TOKEN` | Replicate API token | | `APIMART_API_KEY` | APIMart API key | | `OPENAI_IMAGE_MODEL` | OpenAI model override | | `GOOGLE_IMAGE_MODEL` | Google model override | | `DASHSCOPE_IMAGE_MODEL` | DashScope model override (default: z-image-turbo) | | `REPLICATE_IMAGE_MODEL` | Replicate model override (default: google/nano-banana-pro) | | `APIMART_IMAGE_MODEL` | APIMart model override (default: gemini-3-pro-image-preview, e.g. doubao-seedream-5-0-lite) | | `OPENAI_BASE_URL` | Custom OpenAI endpoint | | `GOOGLE_BASE_URL` | Custom Google endpoint | | `DASHSCOPE_BASE_URL` | Custom DashScope endpoint | | `REPLICATE_BASE_URL` | Custom Replicate endpoint | | `APIMART_BASE_URL` | Custom APIMart endpoint (default: https://api.apimart.ai) | | `APIMART_TASK_LANGUAGE` | Task status language for polling (default: en) | **Load Priority**: CLI args > EXTEND.md > env vars > `/.baoyu-skills/.env` > `~/.baoyu-skills/.env` ## Replicate Model Configuration When using `--provider replicate`, the model can be configured in the following ways (highest priority first): 1. CLI flag: `--model ` 2. EXTEND.md: `default_model.replicate` 3. Env var: `REPLICATE_IMAGE_MODEL` 4. Built-in default: `google/nano-banana-pro` Supported model formats: - `owner/name` (recommended for official models), e.g. `google/nano-banana-pro` - `owner/name:version` (community models by version), e.g. `stability-ai/sdxl:` Examples: ```bash # Use Replicate default model npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate # Override model explicitly npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana ``` ## Provider Selection 1. `--ref` provided + no `--provider` → auto-select Google first, then OpenAI, then Replicate, then APIMart 2. `--provider` specified → use it (if `--ref`, must be `google`, `openai`, `replicate`, or `apimart`) 3. Only one API key available → use that provider 4. Multiple available → default to Google ## Quality Presets | Preset | Google imageSize | OpenAI Size | Use Case | |--------|------------------|-------------|----------| | `normal` | 1K | 1024px | Quick previews | | `2k` (default) | 2K | 2048px | Covers, illustrations, infographics | **Google/APIMart imageSize**: Can be overridden with `--imageSize 1K|2K|4K` ## Aspect Ratios Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1` - Google multimodal: uses `imageConfig.aspectRatio` - Google Imagen: uses `aspectRatio` parameter - OpenAI: maps to closest supported size ## Generation Mode **Default**: Sequential generation (one image at a time). This ensures stable output and easier debugging. **Parallel Generation**: Only use when user explicitly requests parallel/concurrent generation. | Mode | When to Use | |------|-------------| | Sequential (default) | Normal usage, single images, small batches | | Parallel | User explicitly requests, large batches (10+) | **Parallel Settings** (when requested): | Setting | Value | |---------|-------| | Recommended concurrency | 4 subagents | | Max concurrency | 8 subagents | | Use case | Large batch generation when user requests parallel | **Agent Implementation** (parallel mode only): ``` # Launch multiple generations in parallel using Task tool # Each Task runs as background subagent with run_in_background=true # Collect results via TaskOutput when all complete ``` ## Error Handling - Missing API key → error with setup instructions - Generation failure → auto-retry once - Invalid aspect ratio → warning, proceed with default - Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal, OpenAI GPT Image edits, Replicate, or APIMart) ## Extension Support Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.