--- name: IMA Studio — All-in-One AI Creation description: "All-in-One AI creation: images (SeeDream 4.5, Midjourney, Nano Banana 2), videos (Wan 2.6, Kling, Veo 3.1, Sora, Pixverse, Hailuo, SeeDance, Vidu), music (Suno sonic, DouBao), TTS (seed-tts-2.0). Unified access with intelligent model selection and cross-media workflow orchestration. BEFORE using: READ ima-knowledge-ai skill for workflow and best practices. Use for: any AI content creation including images, videos, music, text-to-speech, multi-media projects, character consistency, product demos, social campaigns, complete creative workflows." --- # IMA AI Creation ## ⚠️ 重要：模型 ID 参考 **CRITICAL:** When calling the script, you MUST use the exact **model_id** (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ `nano-banana-pro` is WRONG; ✅ `gemini-3-pro-image` is CORRECT). **Quick Reference Table:** ### 图像模型 (Image Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | Nano Banana2 | `gemini-3.1-flash-image` | ❌ NOT nano-banana-2, 预算选择 4-13 pts | | Nano Banana Pro | `gemini-3-pro-image` | ❌ NOT nano-banana-pro, 高质量 10-18 pts | | SeeDream 4.5 | `doubao-seedream-4.5` | ✅ Recommended default, 5 pts | | Midjourney | `midjourney` | ✅ Same as friendly name, 8-10 pts | ### 视频模型 (Video Models) | 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) | |-------------------------|---------------|----------------|-------------| | Wan 2.6 | `wan2.6-t2v` | `wan2.6-i2v` | ⚠️ Note -t2v/-i2v suffix | | Kling O1 | `kling-video-o1` | `kling-video-o1` | ⚠️ Note video- prefix | | Kling 2.6 | `kling-v2-6` | `kling-v2-6` | ⚠️ Note v prefix | | Hailuo 2.3 | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | ⚠️ Note MiniMax- prefix | | Hailuo 2.0 | `MiniMax-Hailuo-02` | `MiniMax-Hailuo-02` | ⚠️ Note 02 not 2.0 | | Google Veo 3.1 | `veo-3.1-generate-preview` | `veo-3.1-generate-preview` | ⚠️ Note -generate-preview suffix | | Sora 2 Pro | `sora-2-pro` | `sora-2-pro` | ✅ Straightforward | | Pixverse | `pixverse` | `pixverse` | ✅ Same as friendly name | ### 音乐模型 (Music Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | Suno (sonic v4) | `sonic` | ⚠️ Simplified to sonic | | DouBao BGM | `GenBGM` | ❌ NOT doubao-bgm | | DouBao Song | `GenSong` | ❌ NOT doubao-song | ### 语音模型 (Speech/TTS Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | seed-tts-2.0 | `seed-tts-2.0` | ✅ Same as friendly name (default) | **How to get the correct model_id:** 1. Check this table first 2. Use `--list-models --task-type ` to query available models 3. Refer to command examples in this SKILL.md **Example:** ```bash # ❌ WRONG: Inferring from friendly name --model-id nano-banana-pro # ✅ CORRECT: Using exact model_id from table --model-id gemini-3-pro-image ``` --- ## ⚠️ MANDATORY PRE-CHECK: Read Knowledge Base First! **If ima-knowledge-ai is not installed:** Skip all "Read …" steps below; use only this SKILL's **📥 User Input Parsing** (media type → task_type) and the Recommended Defaults / model tables for each media type. **BEFORE executing ANY multi-media generation task, you MUST:** 1. **Check for workflow complexity** — Read `ima-knowledge-ai/references/workflow-design.md` if: - User mentions: "MV"、"宣传片"、"完整作品"、"配乐"、"soundtrack" - Task spans multiple media types (image + video, video + music, etc.) - Complex multi-step workflows that need task decomposition 2. **Check for visual consistency needs** — Read `ima-knowledge-ai/references/visual-consistency.md` if: - User mentions: "系列"、"多张"、"同一个"、"角色"、"续"、"series"、"same" - Task involves: multiple images/videos, character continuity, product shots - Second+ request about same subject (e.g., "旺财在游泳" after "生成旺财照片") 3. **Check video modes** — Read `ima-knowledge-ai/references/video-modes.md` if: - Any video generation task - Need to understand: image_to_video vs reference_image_to_video difference 4. **Check model selection** — Read `ima-knowledge-ai/references/model-selection.md` if: - Unsure which model to use - Need cost/quality trade-off guidance - User specifies budget or quality requirements **Why this matters:** - Multi-media workflows need proper task sequencing (e.g., video duration → matching music duration) - AI generation defaults to **独立生成** each time — without reference images, results will be inconsistent - Wrong video mode = wrong result (image_to_video ≠ reference_image_to_video) - Model choice affects cost and quality significantly **Example multi-media workflow:** ``` User: "帮我做个产品宣传MV，有背景音乐，主角是旺财小狗" ❌ Wrong: 1. Generate dog image (random look) 2. Generate video (different dog) 3. Generate music (unrelated) ✅ Right: 1. Read workflow-design.md + visual-consistency.md 2. Generate Master Reference: 旺财小狗图片 3. Generate video shots using image_to_video with 旺财 as first frame 4. Get video duration (e.g., 15s) 5. Generate BGM with matching duration and mood ``` **How to check:** ```python # Step 0: Determine media type first (image / video / music / speech) # From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech # Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech) # Step 1: Read knowledge base based on task type if multi_media_workflow: read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md") if "same subject" or "series" or "character": read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md") if video_generation: read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md") # Step 2: Execute with proper sequencing and reference images # (see workflow-design.md for specific patterns) ``` **No exceptions** — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first. --- ## 📥 User Input Parsing (Media Type & Task Routing) **Purpose:** So that any agent parses user intent consistently, **first determine the media type** from the user's request, then choose task_type and model. ### 1. User phrasing → media type (do this first) | User intent / keywords | Media type | task_type examples | |------------------------|------------|---------------------| | 画 / 生成图 / 图片 / image / 画一张 / 图生图 | **image** | `text_to_image`, `image_to_image` | | 视频 / 生成视频 / video / 图生视频 / 文生视频 | **video** | `text_to_video`, `image_to_video`, `first_last_frame_to_video`, `reference_image_to_video` | | 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | **music** | `text_to_music` | | 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | **speech** | `text_to_speech` | If the request mixes media (e.g. "宣传片+配乐"), treat as **multi-media workflow**: read `workflow-design.md`, then plan image → video → music steps and use the correct task_type for each step. ### 2. Model and parameter parsing - **Image:** For model name → model_id and size/aspect_ratio parsing, follow the same rules as in **ima-image-ai** skill (User Input Parsing section). - **Video:** For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow **ima-video-ai** skill (User Input Parsing section). - **Music:** Suno (`sonic`) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id `sonic`, `GenBGM`, `GenSong` per "Recommended Defaults" and "Music Generation" tables below. - **Speech (TTS):** Get model_id from `GET /open/v1/product/list?category=text_to_speech` or run script with `--task-type text_to_speech --list-models`. Map user intent to parameters using product `form_config`: | User intent / phrasing | Parameter (if in form_config) | Notes | |------------------------|--------------------------------|--------| | 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options | | 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options | | 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 | | 音调 / pitch | pitch | If supported | | 大声/小声 / volume | volume | If supported | If the user does not specify, use form_config defaults. Pass extra params via `--extra-params '{"speed":1.0}'`. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry). --- ## ⚙️ How This Skill Works **For transparency:** This skill uses a bundled Python script (`scripts/ima_create.py`) to call the IMA Open API. The script: - Sends your prompt to **two IMA-owned domains** (see "Network Endpoints" below) - Uses `--user-id` **only locally** as a key for storing your model preferences - Returns image/video/music URLs when generation is complete **What gets sent to IMA servers:** - ✅ Your prompt/description (image/video/music) - ✅ Model selection (SeeDream/Wan/Suno/etc.) - ✅ Generation parameters (size, duration, style, etc.) - ❌ NO API key in prompts (key is used for authentication only) - ❌ NO user_id (it's only used locally) **What's stored locally:** - `~/.openclaw/memory/ima_prefs.json` - Your model preferences (< 1 KB) - `~/.openclaw/logs/ima_skills/` - Generation logs (auto-deleted after 7 days) --- ## 🌐 Network Endpoints Used | Domain | Owner | Purpose | Data Sent | Privacy | |--------|-------|---------|-----------|---------| | `api.imastudio.com` | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, **your API key** | Standard HTTPS, data processed for AI generation | | `imapi.liveme.com` | IMA Studio | Image/Video upload service (presigned URL generation) | **Your API key**, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only | | `*.aliyuncs.com`, `*.esxscloud.com` | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, **NO API key**) | IMA-managed OSS buckets, presigned URLs expire after 7 days | **Key Points:** - **Music tasks** (`text_to_music`) and **TTS tasks** (`text_to_speech`) only use `api.imastudio.com`. - **Image/video tasks** require `imapi.liveme.com` to obtain presigned URLs for uploading input images. - Your API key is sent to **both `api.imastudio.com` and `imapi.liveme.com`** (both owned by IMA Studio). - Verify network calls: `tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'`. See this document: **🌐 Network Endpoints Used** and **⚠️ Credential Security Notice** for full disclosure. --- ## ⚠️ Credential Security Notice **Your API key is sent to both IMA-owned domains:** 1. `Authorization: Bearer ima_xxx...` → `api.imastudio.com` (main API) 2. Query param `appUid=ima_xxx...` → `imapi.liveme.com` (upload service) **Security best practices:** - 🧪 **Use test keys for experiments:** Generate a separate API key for testing. - 🔍 **Monitor usage:** Check `https://imastudio.com/dashboard` for unauthorized activity. - ⏱️ **Rotate keys:** Regenerate your API key periodically (monthly recommended). - 📊 **Review logs:** Check `~/.openclaw/logs/ima_skills/` for unexpected API calls. **Why two domains?** IMA Studio uses a microservices architecture: - **`api.imastudio.com`**: Core AI generation API - **`imapi.liveme.com`**: Specialized image/video upload service (shared infrastructure) Both domains are operated by IMA Studio. The same API key grants access to both services. ### Agent Execution (Internal Reference) > **Note for users:** You can review the script source at `scripts/ima_create.py` anytime. > The agent uses this script to simplify API calls. **Music tasks** use only `api.imastudio.com`, while **image/video tasks** also call `imapi.liveme.com` for file uploads (see "Network Endpoints" above). Use the bundled script internally for all task types — it ensures correct parameter construction: ```bash # ─── Image Generation ────────────────────────────────────────────────────────── # Basic text-to-image (default model) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_image \ --model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \ --user-id {user_id} --output-json # Text-to-image with size override (Nano Banana2) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_image \ --model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \ --size 2k --user-id {user_id} --output-json # Image-to-image with input URL python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type image_to_image \ --model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \ --input-images https://example.com/photo.jpg --user-id {user_id} --output-json # ─── Video Generation ────────────────────────────────────────────────────────── # Basic text-to-video (default model, 5s 720P) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_video \ --model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \ --user-id {user_id} --output-json # Text-to-video with extra params (10s 1080P) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_video \ --model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \ --extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \ --user-id {user_id} --output-json # Image-to-video (animate static image) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type image_to_video \ --model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \ --input-images https://example.com/photo.jpg --user-id {user_id} --output-json # First-last frame video (two images) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type first_last_frame_to_video \ --model-id kling-video-o1 --prompt "smooth transition between frames" \ --input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \ --user-id {user_id} --output-json # ─── Music Generation ────────────────────────────────────────────────────────── # Basic text-to-music (Suno default) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \ --user-id {user_id} --output-json # Music with custom lyrics (Suno custom mode) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id sonic --prompt "pop ballad, emotional" \ --extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \ --user-id {user_id} --output-json # Background music (DouBao BGM) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id GenBGM --prompt "relaxing ambient music for meditation" \ --user-id {user_id} --output-json # ─── Text-to-Speech (TTS) ───────────────────────────────────────────────────── # List TTS models first to get model_id, then generate speech python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_speech --list-models # TTS: use model_id from list above (prompt = text to speak) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_speech \ --model-id --prompt "Text to be spoken here." \ --user-id {user_id} --output-json ``` The script outputs JSON with `url`, `model_name`, `credit` — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users. --- ## Overview Call IMA Open API to create AI-generated content. All endpoints require an `ima_*` API key. The core flow is: **query products → create task → poll until done**. --- ## 🔒 Security & Transparency Policy > **This skill is community-maintained and open for inspection.** ### ✅ What Users CAN Do **Full transparency:** - ✅ **Review all source code**: Check `scripts/ima_create.py` and `ima_logger.py` anytime - ✅ **Verify network calls**: Music tasks use `api.imastudio.com` only; image/video tasks also use `imapi.liveme.com` (see "Network Endpoints" section) - ✅ **Inspect local data**: View `~/.openclaw/memory/ima_prefs.json` and log files - ✅ **Control privacy**: Delete preferences/logs anytime, or disable file writes (see below) **Configuration allowed:** - ✅ **Set API key** in environment or agent config: - Environment variable: `export IMA_API_KEY=ima_your_key_here` - OpenClaw/MCP config: Add `IMA_API_KEY` to agent's environment configuration - Get your key at: https://imastudio.com - ✅ **Use scoped/test keys**: Test with limited API keys, rotate after testing - ✅ **Disable file writes**: Make prefs/logs read-only or symlink to `/dev/null` **Data control:** - ✅ **View stored data**: `cat ~/.openclaw/memory/ima_prefs.json` - ✅ **Delete preferences**: `rm ~/.openclaw/memory/ima_prefs.json` (resets to defaults) - ✅ **Delete logs**: `rm -rf ~/.openclaw/logs/ima_skills/` (auto-cleanup after 7 days anyway) ### ⚠️ Advanced Users: Fork & Modify If you need to modify this skill for your use case: 1. **Fork the repository** (don't modify the original) 2. **Update your fork** with your changes 3. **Test thoroughly** with limited API keys 4. **Document your changes** for troubleshooting **Note:** Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version. ### ❌ What to AVOID (Security Risks) **Actions that could compromise security:** - ❌ Sharing API keys publicly or in skill files - ❌ Modifying API endpoints to unknown servers - ❌ Disabling SSL/TLS certificate verification - ❌ Logging sensitive user data (prompts, IDs, etc.) - ❌ Bypassing authentication or billing mechanisms **Why this matters:** 1. **API Compatibility**: Skill logic aligns with IMA Open API schema 2. **Security**: Malicious modifications could leak credentials or bypass billing 3. **Support**: Modified skills may not be supported 4. **Community**: Breaking changes affect all users ### 📋 Privacy & Data Handling Summary **What this skill does with your data:** | Data Type | Sent to IMA? | Stored Locally? | User Control | |-----------|-------------|-----------------|--------------| | Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) | | API key | ✅ Yes (authentication header) | ❌ No | Set via env var | | user_id (optional CLI arg) | ❌ **Never** (local preference key only) | ✅ Yes (as prefs file key) | Change `--user-id` value | | Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime | | Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days | **Privacy recommendations:** 1. **Use test/scoped API keys** for initial testing 2. **Note**: `--user-id` is **never sent to IMA servers** - it's only used locally as a key for storing preferences in `~/.openclaw/memory/ima_prefs.json` 3. **Review source code** at `scripts/ima_create.py` to verify network calls (search for `create_task` function) 4. **Rotate API keys** after testing or if compromised **Get your IMA API key:** Visit https://imastudio.com to register and get started. ### 🔧 For Skill Maintainers Only **Version control:** - All changes must go through Git with proper version bumps (semver) - CHANGELOG.md must document all changes - Production deployments require code review **File checksums (optional):** ```bash # Verify skill integrity sha256sum SKILL.md scripts/ima_create.py ``` If users report issues, verify file integrity first. --- ## 🧠 User Preference Memory (Image) > User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection. ### Storage: `~/.openclaw/memory/ima_prefs.json` Single file, shared across all IMA skills: ```json { "user_{user_id}": { "text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" }, "image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" }, "text_to_speech": { "model_id": "", "model_name": "...", "credit": 2, "last_used": "..." } } } ``` ### Model Selection Flow (Image Generation) **Step 1: Get knowledge-ai recommendation** (if installed) ```python knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5" ``` **Step 2: Check user preference** ```python user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...} ``` **Step 3: Decide which model to use** ```python if user_pref exists: use_model = user_pref["model_id"] # Highest priority else: use_model = knowledge_recommended_model or fallback_default ``` **Step 4: Check for mismatch (for later hint)** ```python if user_pref exists and knowledge_recommended_model != user_pref["model_id"]: mismatch = True # Will add hint in success message ``` ### When to Write (User Explicit Preference ONLY) **✅ Save preference when user explicitly specifies a model:** | User says | Action | |-----------|--------| | `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference | | `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住！以后图片生成默认用 [XXX]` | | `我喜欢XXX` / `我更喜欢XXX` | Save as preference | **❌ Do NOT save when:** - Agent auto-selects from knowledge-ai → not user preference - Agent uses fallback default → not user preference - User says generic quality requests (see "Clear Preference" below) → clear preference instead ### When to Clear (User Abandons Preference) **🗑️ Clear preference when user wants automatic selection:** | User says | Action | |-----------|--------| | `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation | | `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation | | `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation | | `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation | | `重新推荐` | Clear pref + use knowledge-ai recommendation | **Implementation:** ```python del prefs[f"user_{user_id}"][task_type] save_prefs(prefs) ``` --- ## ⭐ Model Selection Priority (Image) **Selection flow:** 1. **User preference** (if exists) → Highest priority, always respect 2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task 3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists) **Important notes:** - User preference is only saved when user **explicitly specifies** a model (see "When to Write" above) - Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches - When mismatch detected → add gentle hint in success message (does NOT interrupt generation) > The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations. When using user preference for image generation, show a line like: ``` 🎨 根据你的使用习惯，将用 [Model Name] 帮你生成… • 模型：[Model Name]（你的常用模型） • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` ### Preference Change Confirmation When user switches to a different model than their saved preference: ``` 💡 你之前喜欢用 [Old Model]，这次换成了 [New Model]。要把 [New Model] 设为以后的默认吗？回复「是」保存 / 回复「否」仅本次使用 ``` --- ## ⭐ Recommended Defaults > **These are fallback defaults — only used when no user preference exists.** > **Always default to the newest and most popular model. Do NOT default to the cheapest.** | Task Type | Default Model | model_id | version_id | Cost | Why | |-----------|--------------|----------|------------|------|-----| | **text_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest doubao flagship, photorealistic 4K | | text_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Fastest and cheapest option | | text_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10/10/18 pts | Premium quality, 1K/2K/4K options | | text_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, creative styles | | **image_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest, best i2i quality | | image_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Cheapest option | | image_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10 pts | Premium quality | | image_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, style transfer | | **text_to_video** | **Wan 2.6** | `wan2.6-t2v` | `wan2.6-t2v` | 25 pts | 🔥 Most popular t2v, balanced cost | | text_to_video (premium) | **Hailuo 2.3** | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | 38 pts | Higher quality | | text_to_video (budget) | **Vidu Q2** | `viduq2` | `viduq2` | 5 pts | Lowest cost t2v | | **image_to_video** | **Wan 2.6** | `wan2.6-i2v` | `wan2.6-i2v` | 25 pts | 🔥 Most popular i2v, 1080P | | image_to_video (premium) | **Kling 2.6** | `kling-v2-6` | `kling-v2-6` | 40-160 pts | Premium Kling i2v | | **first_last_frame_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Newest Kling reasoning model | | **reference_image_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Best reference fidelity | | **text_to_music** | **Suno (sonic-v4)** | `sonic` | `sonic` | 25 pts | Latest Suno engine, best quality | | **text_to_speech** | (query product list) | — | — | — | Run `--task-type text_to_speech --list-models`; use first or user-preferred model_id | **Premium options:** - **Image**: Nano Banana Pro — Highest quality with size control (1K/2K/4K), higher cost (10-18 pts for text_to_image, 10 pts for image_to_image) - **Video**: Kling O1, Sora 2 Pro, Google Veo 3.1 — Premium quality with longer duration options **Quick selection guide (production as of 2026-02-27, sorted by popularity):** - **Image (4 models available)** → **SeeDream 4.5** (5, default); artistic → Midjourney 🎨 (8-10); budget → Nano Banana2 (4, 512px); premium → Nano Banana Pro (10-18) - **🔥 Video from text (most popular)** → **Wan 2.6** (25, balanced); premium → Hailuo 2.3 (38); budget → Vidu Q2 (5) - **🔥 Video from image (most popular)** → **Wan 2.6** (25) - Music → **Suno** (25); DouBao BGM/Song (30 each) - Cheapest → Nano Banana2 512px (4) for image; Vidu Q2 (5) for video **Selection guide by use case:** **Image Generation:** - General image generation → **SeeDream 4.5** (5pts) - **Custom aspect ratio (16:9, 9:16, 4:3, etc.)** → **SeeDream 4.5** 🌟 or **Nano Banana Pro/2** 🆕 (native support) - Budget-conscious / fast generation → **Nano Banana2** (4pts) - Highest quality with size control (1K/2K/4K) → **Nano Banana Pro** (text_to_image: 10-18pts, image_to_image: 10pts) - **Artistic/creative styles, illustrations, paintings** → **Midjourney** 🎨 (8-10pts) - Style transfer / image editing → **SeeDream 4.5** (5pts) or **Midjourney** 🎨 (artistic) **Video Generation:** - General video generation → **Wan 2.6** (25pts, most popular) - Premium cinematic quality → **Google Veo 3.1** (70-330pts) or **Sora 2 Pro** (122+pts) - Budget video → **Vidu Q2** (5pts) or **Hailuo 2.0** (5pts) - With audio support → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts) - First/last frame animation → **Kling O1** (48+pts) - Reference image consistency → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts) **Music Generation:** - **Custom song with lyrics, vocals, style** → **Suno sonic-v5** (25pts, default, ~2min) - Full control: custom_mode, lyrics, vocal_gender, tags, negative_tags - Best for: complete songs, vocal tracks, artistic compositions - **Background music / ambient loop** → **DouBao BGM** (30pts, ~30s) - Simplified: prompt-only, no advanced parameters - Best for: video backgrounds, ambient music, short loops - **Simple song generation** → **DouBao Song** (30pts, ~30s) - Simplified: prompt-only - Best for: quick song generation, structured vocal compositions - **User explicitly asks for cheapest** → DouBao BGM/Song (6pts each) — only if explicitly requested **Speech (TTS) Generation:** - **Text-to-speech / 语音合成 / 朗读** → `text_to_speech`. Always query `GET /open/v1/product/list?category=text_to_speech` (or `--list-models`) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document. **⚠️ Technical Note for Suno:** > `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly when creating Suno tasks. **⚠️ Production Image Models (4 available):** - SeeDream 4.5 (`doubao-seedream-4.5`) — 5 pts, default - Midjourney 🎨 (`midjourney`) — 8/10 pts for 480p/720p, artistic styles - Nano Banana2 (`gemini-3.1-flash-image`) — 4/6/10/13 pts for 512px/1K/2K/4K - Nano Banana Pro (`gemini-3-pro-image`) — 10/10/18 pts for 1K/2K/4K **All other image models mentioned in older documentation are no longer available in production.** **🌟 Parameter Support Notes (All Task Types):** ### Image Models (text_to_image / image_to_image) **🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!** - **Nano Banana Pro**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY - **Nano Banana2**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY - **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9) - **Midjourney**: ❌ 1:1 only (fixed 1024x1024) **aspect_ratio support details:** - ✅ **aspect_ratio**: - **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9) - **Nano Banana2**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4) - **Nano Banana Pro**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4) - **Midjourney**: ❌ 1:1 only (fixed 1024x1024) - ✅ **size**: - **Nano Banana2**: 512px, 1K, 2K, 4K (via different `attribute_id`s, 4-13 pts) - **Nano Banana Pro**: 1K, 2K, 4K (via different `attribute_id`s, 10-18 pts) - **SeeDream 4.5**: Adaptive default (5 pts) - **Midjourney**: 480p/720p (via `attribute_id`, 8/10 pts) - ❌ **8K**: No model supports 8K (max is 4K via Nano Banana Pro) - ❌ **Non-standard aspect ratios** (7:3, 8:5, etc.): Not supported. Use closest supported ratio or video models. - ✅ **n**: Multiple outputs supported (1-4), credit × n **When user requests unsupported combinations for images:** - **Midjourney + aspect_ratio (16:9, etc.)**: Recommend **SeeDream 4.5** or **Nano Banana series** instead ``` ❌ Midjourney 暂不支持自定义 aspect_ratio（仅支持 1024x1024 方形） ✅ 推荐方案： 1. SeeDream 4.5（支持虚拟参数 aspect_ratio） • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 • 成本：5 积分（性价比最佳） 2. Nano Banana Pro/2（原生支持 aspect_ratio） • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4 • 成本：4-18 积分（按尺寸）需要我帮你用 SeeDream 4.5 生成吗？ ``` - **Any model + 8K**: Inform user no model supports 8K, max is 4K (Nano Banana Pro) - **Any model + non-standard ratio (7:3, 8:5, etc.)**: Non-standard ratio, not supported. Suggest closest supported ratio (e.g., 21:9 for ultra-wide, 2:3 for portrait) ### Video Models (text_to_video / image_to_video / first_last_frame / reference_image) - ✅ **resolution**: 540P, 720P, 1080P, 2K, 4K (model-dependent, higher res = higher cost) - ✅ **aspect_ratio**: 16:9, 9:16, 1:1, 4:3 (model-dependent, check `form_config`) - ✅ **duration**: 4s, 5s, 10s, 15s (model-dependent, longer = higher cost) - ⚠️ **generate_audio**: Supported by Veo 3.1, Kling O1, Hailuo (check `form_config`) - ✅ **prompt_extend**: AI-powered prompt enhancement (most models support) - ✅ **negative_prompt**: Content exclusion (most models support) - ✅ **shot_type**: Single/multi-shot control (model-dependent) - ✅ **seed**: Reproducibility control (most models support, -1 = random) - ✅ **n**: Multiple outputs (1-4), credit × n #### 🆕 Special Case: Pixverse Model Parameter (v1.0.7+) **Auto-Inference Logic for Pixverse V5.5/V5/V4:** - **Problem**: Pixverse V5.5, V5, V4 lack `model` field in `form_config` from Product List API - **Backend Requirement**: Backend requires `model` parameter (e.g., `"v5.5"`, `"v5"`, `"v4"`) - **Auto-Fix**: System automatically extracts version from `model_name` and injects it - Example: `model_name: "Pixverse V5.5"` → auto-inject `model: "v5.5"` - Example: `model_name: "Pixverse V4"` → auto-inject `model: "v4"` - **Note**: V4.5 and V3.5 include `model` in `form_config` (no auto-inference needed) - **Relevant Task Types**: All video modes (text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video) **Error Prevention:** - Without auto-inference: `err_code=400017 err_msg=Invalid value for model` - With auto-inference (v1.0.7+): Pixverse V5.5/V5/V4 work seamlessly ✅ ### Music Models (text_to_music) **Suno sonic-v5 (Full-Featured):** - ✅ **custom_mode**: Suno only (enables vocal_gender, lyrics, tags support) - ✅ **vocal_gender**: Suno only (male/female/mixed, requires custom_mode=True) - ✅ **lyrics**: Suno only (custom lyrics support, requires custom_mode=True) - ✅ **make_instrumental**: Suno only (force instrumental, no vocals) - ✅ **auto_lyrics**: Suno only (AI-generated lyrics) - ✅ **tags**: Suno only (genre/style tags) - ✅ **negative_tags**: Suno only (exclude unwanted styles) - ✅ **title**: Suno only (song title) - ❌ **duration**: Fixed-length output (DouBao ~30s, Suno ~2min, not user-controllable) - ✅ **n**: Multiple outputs supported (1-2), credit × n **DouBao BGM/Song (Simplified):** - ✅ **prompt**: Text description only - ❌ **No advanced parameters** (no custom_mode, lyrics, vocal control) - ❌ **duration**: Fixed ~30s output **🎵 Suno Prompt Writing Guide (for `gpt_description_prompt`):** When using Suno, structure your prompt with these elements: 1. **Genre/Style:** - Examples: `"lo-fi hip hop"`, `"orchestral cinematic"`, `"upbeat pop"`, `"dark ambient"`, `"indie folk"`, `"electronic dance"` 2. **Tempo/BPM:** - Examples: `"80 BPM"`, `"fast tempo"`, `"slow ballad"`, `"moderate pace 110 BPM"` 3. **Vocals Control:** - **No vocals**: `"no vocals"` → set `make_instrumental=true` - **With vocals**: `"female vocals"` → set `vocal_gender="female"` - **Male vocals**: `"male vocals"` → set `vocal_gender="male"` - **Mixed**: Set `vocal_gender="mixed"` 4. **Mood/Emotion:** - Examples: `"happy and energetic"`, `"melancholic"`, `"tense and dramatic"`, `"peaceful and calming"` 5. **Negative Tags (exclude styles):** - Use `negative_tags`: `"heavy metal, distortion, screaming"` to exclude unwanted elements 6. **Duration Hint:** - Examples: `"60 seconds"`, `"30 second loop"`, `"2 minute track"` - Note: Suno typically generates ~2min, not strictly controllable **Example Suno prompts:** ``` "upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill" → Set: make_instrumental=true "emotional pop ballad, slow tempo, female vocals, melancholic" → Set: vocal_gender="female" "orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals" → Set: make_instrumental=true, tags="orchestral,cinematic,epic" "acoustic indie folk, gentle guitar, male vocals, warm and nostalgic" → Set: vocal_gender="male", tags="acoustic,indie,folk" ``` **⚠️ Technical Note for Suno:** > `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly. ### Common Parameter Patterns - **n (batch generation)**: Supported by ALL models. Cost = base_credit × n. Creates n independent resources. - **seed**: Supported by most models (-1 = random, >0 = reproducible results) - **prompt_extend**: AI-powered prompt enhancement (video models only) ### Decision Tree: When User Requests Unsupported Features ``` User asks for custom aspect ratio image (e.g. "7:3 landscape") → ❌ Image models don't support custom ratios → ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频，然后截取首帧作为图片。" User asks for 8K image → ❌ No model supports 8K → ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro，18积分)。要使用吗？" User asks for video with audio → Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio → ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个？" User asks for long music (e.g. "5 minute track") → ❌ Duration not user-controllable → ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。" User asks for 30s video → Check model: Most models max 15s → ✅ Solution: "当前最长15秒。可选模型：Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。" ``` **When user requests unsupported combinations:** - Video + audio (unsupported model) → "该模型不支持音频。建议用 Veo 3.1 或 Kling O1 (支持 generate_audio 参数)" - Music + custom duration → "音乐时长由模型固定(Suno约2分钟,DouBao约30秒),无法自定义" - Video duration > 15s → "当前最长15秒。可选模型：Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)" > **Note:** Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above. --- ## 🧠 User Preference Memory (Video) > User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection. ### Storage: `~/.openclaw/memory/ima_prefs.json` ```json { "user_{user_id}": { "text_to_video": { "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." }, "image_to_video": { "model_id": "wan2.6-i2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." }, "first_last_frame_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }, "reference_image_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." } } } ``` ### Model Selection Flow (Video Generation) **Step 1: Get knowledge-ai recommendation** (if installed) ```python knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "Wan 2.6" ``` **Step 2: Check user preference** ```python user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "kling-video-o1", ...} ``` **Step 3: Decide which model to use** ```python if user_pref exists: use_model = user_pref["model_id"] # Highest priority else: use_model = knowledge_recommended_model or fallback_default ``` **Step 4: Check for mismatch (for later hint)** ```python if user_pref exists and knowledge_recommended_model != user_pref["model_id"]: mismatch = True # Will add hint in success message ``` ### When to Write (User Explicit Preference ONLY) **✅ Save preference when user explicitly specifies a model:** | User says | Action | |-----------|--------| | `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference | | `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住！以后视频生成默认用 [XXX]` | | `我喜欢XXX` / `我更喜欢XXX` | Save as preference | **❌ Do NOT save when:** - Agent auto-selects from knowledge-ai → not user preference - Agent uses fallback default → not user preference - User says generic quality requests (see "Clear Preference" below) → clear preference instead ### When to Clear (User Abandons Preference) **🗑️ Clear preference when user wants automatic selection:** | User says | Action | |-----------|--------| | `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation | | `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation | | `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation | | `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation | | `重新推荐` | Clear pref + use knowledge-ai recommendation | **Implementation:** ```python del prefs[f"user_{user_id}"][task_type] save_prefs(prefs) ``` --- ## ⭐ Model Selection Priority (Video) **Selection flow:** 1. **User preference** (if exists) → Highest priority, always respect 2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task 3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists) **Important notes:** - User preference is only saved when user **explicitly specifies** a model (see "When to Write" above) - Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches - When mismatch detected → add gentle hint in success message (does NOT interrupt generation) > The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations. --- ## 💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕 > **v2.0 Updates (aligned with ima-image-ai v1.3):** > - Added Step 0 for correct message ordering (fixes group chat bug) > - Added Step 5 for explicit task completion > - Enhanced Midjourney support with proper timing estimates > - Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done > > This skill runs inside IM platforms (Feishu, Discord via OpenClaw). > Generation takes 10 seconds (music) up to 6 minutes (video). **Never let users wait in silence.** > Always follow all 6 steps below, every single time. ### 🚫 Never Say to Users The following are **internal implementation details**. Never mention them in any user-facing message, under any circumstances: | ❌ Never say | ✅ What users care about | |-------------|--------------------------| | `ima_create.py` / 脚本 / script | — | | 自动化脚本 / automation script | — | | 自动处理产品列表查询 | — | | 自动解析参数和配置 | — | | 智能轮询 / polling / 轮询 | — | | product list / 商品列表接口 | — | | attribute_id / model_version / form_config | — | | API 调用 / HTTP 请求 | — | | 任何技术参数名 | 模型名称、积分、生成时间 | User messages must only contain: **model name, estimated/actual time, credits consumed, result URL, and natural language status updates.** --- ### Estimated Generation Time (All Task Types) | Task Type | Model | Estimated Time | Poll Every | Send Progress Every | |-----------|-------|---------------|------------|---------------------| | **text_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s | | | Nano Banana2 💚 | 20~40s | 5s | 15s | | | Nano Banana Pro | 60~120s | 5s | 30s | | | Midjourney 🎨 | 40~90s | 8s | 25s | | **image_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s | | | Nano Banana2 💚 | 20~40s | 5s | 15s | | | Nano Banana Pro | 60~120s | 5s | 30s | | | Midjourney 🎨 | 40~90s | 8s | 25s | | **text_to_video** | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s | | | SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s | | | Kling O1, Sora 2 Pro | 180~360s | 8s | 60s | | **image_to_video** | Same ranges as text_to_video | — | 8s | 40s | | **first_last_frame / reference** | Kling O1, Veo 3.1 | 180~360s | 8s | 60s | | **text_to_music** | DouBao BGM / Song | 10~25s | 5s | 10s | | | Suno (sonic-v5) | 20~45s | 5s | 15s | | **text_to_speech** | (varies by model) | 5~30s | 3s | 10s | `estimated_max_seconds` = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1). --- ### Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕 **⚠️ CRITICAL:** This step is essential for correct message ordering in IM platforms (Feishu, Discord). **Before doing anything else**, reply to the user with a friendly acknowledgment message using your **normal reply** (not `message` tool). This reply will automatically appear FIRST in the conversation. **Example acknowledgment messages:** For images: ``` 好的!来帮你画一只萌萌的猫咪 🐱 ``` ``` 收到！马上为你生成一张 16:9 的风景照 🏔️ ``` ``` OK! Starting image generation with SeeDream 4.5 🎨 ``` For videos: ``` 好的!来帮你生成一段视频 🎬 ``` ``` 收到！开始用 Wan 2.6 生成视频 🎥 ``` For music: ``` 好的!来帮你创作一首音乐 🎵 ``` **Rules:** - Keep it short and warm (< 15 words) - Match the user's language (Chinese/English) - Include relevant emoji (🐱/🎨/🎬/🎵/✨) - This is your ONLY normal reply — all subsequent updates use `message` tool **Why this matters:** - Normal replies automatically appear FIRST in the conversation thread - `message` tool pushes appear in chronological order AFTER your initial reply - This ensures users see: "好的!" → "🎨 开始生成..." → "⏳ 进度..." → "✅ 成功!" (correct order) - Without Step 0, the confirmation might appear LAST, confusing users --- ### Step 1 — Pre-Generation Notification (Push via message tool) **After Step 0 reply**, use the `message` tool to push a notification immediately: ``` [Emoji] 开始生成 [内容类型]，请稍候… • 模型：[Model Name] • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` **Emoji by content type:** - 图片 → `🎨` - 视频 → `🎬`（加注:视频生成需要较长时间，我会定时汇报进度） - 音乐 → `🎵` **Cost transparency (new requirement):** - Always show credit cost with model tier context - For expensive models (>50 pts), offer cheaper alternative proactively - Examples: - Balanced (default): "使用 Wan 2.6（25 积分，最新 Wan）" - Premium (user explicit): "使用高端模型 Kling O1（48-120 积分），质量最佳" - Premium (auto-selected): "使用 Wan 2.6（25 积分）。若需更高质量可选 Kling O1（48 积分起）" - Budget (user asked): "使用 Vidu Q2（5 积分，最省钱）" > Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium. --- ### Step 2 — Progress Updates Poll the task detail API every `[Poll Every]` seconds per the table. Send a progress update every `[Send Progress Every]` seconds. ``` ⏳ 正在生成中… [P]% 已等待 [elapsed]s，预计最长 [max]s ``` **Progress formula:** ``` P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100)) ``` - **Cap at 95%** — never reach 100% until the API confirms `success` - If `elapsed > estimated_max`: freeze at 95%, append `「快了，稍等一下…」` - For video with max=360s: at 120s → 33%, at 250s → 69%, at 400s → 95% (frozen) --- ### Step 3 — Success Notification When task status = `success`: #### For Video Tasks (text_to_video / image_to_video / first_last_frame / reference_image) **3.1 Send video player first** (IM platforms like Feishu will render inline player): ```python # Get result URL from script output or task detail API result = get_task_result(task_id) video_url = result["medias"][0]["url"] # Build caption caption = f"""✅ 视频生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] [视频描述]""" # Add mismatch hint if user pref conflicts with knowledge-ai recommendation if user_pref_exists and knowledge_recommended_model != used_model: caption += f""" 💡 提示：当前任务也许用 {knowledge_recommended_model} 也会不错（{reason}，{cost} pts）""" # Send video with caption (use message tool if available) message( action="send", media=video_url, # ⚠️ Use HTTPS URL directly, NOT local file path caption=caption ) ``` **Important:** - Hint is **non-intrusive** — does NOT interrupt generation - Only shown when user pref conflicts with knowledge-ai recommendation - User can ignore the hint; video is already delivered **3.2 Then send link as text** (for copying/sharing): ```python # Send link message immediately after video message(action="send", text=f"🔗 视频链接（可复制分享）：\n{video_url}") ``` **⚠️ Critical for video:** - Send video player FIRST (inline preview) - Send text link SECOND (for copying) - Include first-frame thumbnail URL if available: `result["medias"][0]["cover"]` #### For Image Tasks (text_to_image / image_to_image) ```python # Build caption caption = f"""✅ 图片生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] 🔗 原始链接：{image_url}""" # Add mismatch hint if user pref conflicts with knowledge-ai recommendation if user_pref_exists and knowledge_recommended_model != used_model: caption += f""" 💡 提示：当前任务也许用 {knowledge_recommended_model} 也会不错（{reason}，{cost} pts）""" # Send image with caption message( action="send", media=image_url, caption=caption ) ``` **Important:** - Hint is **non-intrusive** — does NOT interrupt generation - Only shown when user pref conflicts with knowledge-ai recommendation - User can ignore the hint; image is already delivered #### For Music Tasks (text_to_music) Send audio file with player: ``` ✅ 音乐生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] • 时长：约 [duration] [音频URL或直接发送音频文件] ``` #### For TTS Tasks (text_to_speech) — Full UX Protocol (Steps 0–5) **Step 0 — Initial acknowledgment (normal reply)** First reply with a short acknowledgment, e.g.: 好的，正在帮你把这段文字转成语音。 / OK, converting this text to speech. **Step 1 — Pre-generation (message tool)** Push once: ``` 🔊 开始语音合成，请稍候… • 模型：[Model Name] • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` **Step 2 — Progress** Poll every 2–5s. Every 10–15s send: `⏳ 语音合成中… [P]%`，已等待 [elapsed]s，预计最长 [max]s. Cap progress at 95% until API returns success. **Step 3 — Success (message tool)** When `resource_status == 1` and `status != "failed"`, send **media** = `medias[0].url` and **caption**: ``` ✅ 语音合成成功！ • 模型：[Model Name] • 耗时：实际 [actual]s • 消耗积分：[N pts] 🔗 原始链接：[url] ``` Use the **URL** from the API (do not use local file paths). **Step 4 — Failure (message tool)** On failure, send user-friendly message. **TTS error translation (do not expose raw API errors):** | Technical | ✅ Say (CN) | ✅ Say (EN) | |-----------|-------------|-------------| | 401 Unauthorized | 密钥无效或未授权，请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai | | 4008 Insufficient points | 积分不足，请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai | | Invalid product attribute | 参数配置异常，请稍后重试 | Configuration error, try again later | | Error 6006 / 6010 | 积分或参数不匹配，请换模型或重试 | Points/params mismatch, try another model | | resource_status == 2 / status failed | 语音合成失败，建议换模型或缩短文本 | Synthesis failed, try another model or shorter text | | timeout | 合成超时，请稍后重试 | Timed out, try again later | | Network error | 网络不稳定，请检查后重试 | Network unstable, check and retry | | Text too long (TTS) | 文本过长，请缩短后重试 | Text too long, please shorten | Links: API key — https://www.imaclaw.ai/imaclaw/apikey ；Credits — https://www.imaclaw.ai/imaclaw/subscription **Step 5 — Done** After Step 0–4, no further reply needed. Do not send duplicate confirmations. --- ### Step 4 — Failure Notification When task status = `failed` or any API/network error, send: ``` ❌ [内容类型]生成失败 • 原因：[natural_language_error_message] • 建议改用： - [Alt Model 1]（[特点]，[N pts]） - [Alt Model 2]（[特点]，[N pts]）需要我帮你用其他模型重试吗？ ``` **⚠️ CRITICAL: Error Message Translation** **NEVER show technical error messages to users.** Always translate API errors into natural language. **API key & credits:** 密钥与积分管理入口为 imaclaw.ai（与 imastudio.com 同属 IMA 平台）。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com). | Technical Error | ❌ Never Say | ✅ Say Instead (Chinese) | ✅ Say Instead (English) | |----------------|-------------|------------------------|------------------------| | `401 Unauthorized` 🆕 | Invalid API key / 401 Unauthorized | ❌ API密钥无效或未授权
💡 **生成新密钥**: https://www.imaclaw.ai/imaclaw/apikey | ❌ API key is invalid or unauthorized
💡 **Generate API Key**: https://www.imaclaw.ai/imaclaw/apikey | | `4008 Insufficient points` 🆕 | Insufficient points / Error 4008 | ❌ 积分不足，无法创建任务
💡 **购买积分**: https://www.imaclaw.ai/imaclaw/subscription | ❌ Insufficient points to create this task
💡 **Buy Credits**: https://www.imaclaw.ai/imaclaw/subscription | | `"Invalid product attribute"` / `"Insufficient points"` | Invalid product attribute | 生成参数配置异常，请稍后重试 | Configuration error, please try again later | | `Error 6006` (credit mismatch) | Error 6006 | 积分计算异常，系统正在修复 | Points calculation error, system is fixing | | `Error 6009` (no matching rule) | Error 6009 | 参数组合不匹配，已自动调整 | Parameter mismatch, auto-adjusted | | `Error 6010` (attribute_id mismatch) | Attribute ID does not match | 模型参数不匹配，请尝试其他模型 | Model parameters incompatible, try another model | | `error 400` (bad request) | error 400 / Bad request | 请求参数有误，请稍后重试 | Invalid request parameters, please try again | | `resource_status == 2` | Resource status 2 / Failed | 生成过程遇到问题，建议换个模型试试 | Generation failed, please try another model | | `status == "failed"` (no details) | Task failed | 这次生成没成功，要不换个模型试试？ | Generation unsuccessful, try a different model? | | `timeout` | Task timed out / Timeout error | 生成时间过长已超时，建议用更快的模型 | Generation took too long, try a faster model | | Network error / Connection refused | Connection refused / Network error | 网络连接不稳定，请检查网络后重试 | Network connection unstable, check network and retry | | Rate limit exceeded | 429 Too Many Requests / Rate limit | 请求过于频繁，请稍等片刻再试 | Too many requests, please wait a moment | | Prompt moderation (Sora only) | Content policy violation | 提示词包含敏感内容，请修改后重试 | Prompt contains restricted content, please modify | | Model unavailable | Model not available / 503 Service Unavailable | 当前模型暂时不可用，建议换个模型 | Model temporarily unavailable, try another model | | **Lyrics format error (Suno only)** 🎵 | Invalid lyrics format | 歌词格式有误，请调整后重试 | Lyrics format error, adjust and retry | | **Prompt too short/long (Music)** 🎵 | Prompt length invalid | 音乐描述过短或过长，请调整到合适长度 (建议20-100字) | Music description too short or long, adjust to appropriate length (20-100 chars recommended) | | **Text too long (TTS)** 🔊 | TTS text length | 文本过长，请缩短后重试 | Text too long, please shorten and retry | **Generic fallback (when error is unknown):** - Chinese: `生成过程遇到问题，请稍后重试或换个模型试试` - English: `Generation encountered an issue, please try again or use another model` **Best Practices:** 1. **Focus on user action**: Tell users what to do next, not what went wrong technically 2. **Be reassuring**: Use phrases like "建议换个模型试试" instead of "失败了" 3. **Avoid blame**: Never say "你的提示词有问题" → say "提示词需要调整一下" 4. **Provide alternatives**: Always suggest 1-2 alternative models in the failure message 5. **🆕 Include actionable links (v1.0.8+)**: For 401/4008 errors, provide clickable links to API key generation or credit purchase pages 6. **🎵 Music-specific (v1.2.0+)**: - For Suno lyrics errors, suggest simplifying lyrics or using auto-generated lyrics (`auto_lyrics=true`) - For prompt length errors, give example length (e.g., "建议20-100字") - For BGM requests, recommend DouBao BGM over Suno 7. **🔊 TTS-specific:** Use the TTS error translation table in "For TTS Tasks (text_to_speech)" above; suggest another model via `--list-models` or shortening text. --- ### Step 5 — Done (No Further Action Needed) 🆕 **After sending Step 3 (success) or Step 4 (failure):** 1. **DO NOT send any additional messages** unless the user asks a follow-up question 2. **The task is complete** — wait for the user's next request 3. **User preference has been saved** (if generation succeeded) 4. **The conversation is ready** for the next generation request **Why this step matters:** - Prevents unnecessary "anything else?" messages that clutter the chat - Allows users to naturally continue the conversation when ready - Respects the asynchronous nature of IM platforms **Exception:** If the user explicitly asks "还有别的吗？" or similar, then respond naturally. --- **🆕 Enhanced Error Handling (v1.0.8):** The Reflection mechanism (3 automatic retries) now provides **specific, actionable suggestions** for common errors: - **401 Unauthorized**: System suggests generating a new API key with clickable link - **4008 Insufficient Points**: System suggests purchasing credits with clickable link - **500 Internal Server Error**: Automatic parameter degradation (size, resolution, duration, quality) - **6009 No Rule Match**: Automatic parameter completion from credit_rules - **6010 Attribute Mismatch**: Automatic credit_rule reselection - **Timeout**: Helpful info with dashboard link for background task status All error handling is **automatic and transparent** — users receive natural language explanations with next steps. **Failure fallback by task type:** | Task Type | Failed Model | First Alt | Second Alt | |-----------|-------------|-----------|------------| | text_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10-18pts, premium) | | text_to_image | Nano Banana2 | SeeDream 4.5 (5pts, better quality) | Nano Banana Pro (10-18pts) | | text_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts, budget) | | image_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10pts) | | image_to_image | Nano Banana2 | SeeDream 4.5 (5pts) | Nano Banana Pro (10pts) | | image_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts) | | text_to_video | Kling O1 | Wan 2.6 (25pts) | Vidu Q2 (5pts) | | text_to_video | Google Veo 3.1 | Kling O1 (48pts) | Sora 2 Pro (122pts) | | text_to_video | Any | Wan 2.6 (25pts, most popular) | Hailuo 2.0 (5pts) | | image_to_video | Wan 2.6 | Kling O1 (48pts) | Hailuo 2.0 i2v (25pts) | | image_to_video | Any | Wan 2.6 (25pts, most popular) | Vidu Q2 Pro (20pts) | | first_last / reference | Kling O1 | Kling 2.6 (80pts) | Veo 3.1 (70pts+) | | **text_to_music** 🎵 | **Suno** | **DouBao BGM (30pts, 背景音乐)** | **DouBao Song (30pts, 歌曲生成)** | | **text_to_music** 🎵 | **DouBao BGM** | **DouBao Song (30pts)** | **Suno (25pts, 功能最强)** | | **text_to_music** 🎵 | **DouBao Song** | **DouBao BGM (30pts)** | **Suno (25pts, 功能最强)** | | **text_to_speech** 🔊 | (any) | Query `--list-models` for alternatives | Use another model_id from product list | **Music-specific failure guidance:** - If Suno fails → Recommend DouBao BGM (for background music) or DouBao Song (for songs) - If DouBao BGM fails → Try DouBao Song first (similar pricing), then Suno (more powerful) - If DouBao Song fails → Try DouBao BGM first (similar pricing), then Suno (more powerful) - For lyrics errors in Suno → Suggest simplifying lyrics or using `auto_lyrics=true` - For prompt length errors → Recommend 20-100 characters **TTS-specific failure guidance:** - If TTS fails → Run `--task-type text_to_speech --list-models` and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages. --- ## Supported Models at a Glance > **Source: production `GET /open/v1/product/list` (2026-02-27).** Model count reduced significantly. Always query product list API at runtime. ### Image Generation (4 models each) | Category | Name | model_id | Cost | |----------|------|----------|------| | **text_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts | | text_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) | | text_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts | | text_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10/10/18 pts | | **image_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts | | image_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) | | image_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts | | image_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10 pts | **Midjourney attribute_ids**: 5451/5452 (text_to_image), 5453/5454 (image_to_image) **Nano Banana2 size options**: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts) **Nano Banana Pro size options**: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i) #### Image Model Capabilities (Parameter Support) ⚠️ **Critical**: Models have **varying parameter support**. Custom aspect ratios are now **supported by multiple models**. | Model | Custom Aspect Ratio | Max Resolution | Size Options | Notes | |-------|---------------------|----------------|--------------|-------| | **SeeDream 4.5** | ✅ (via virtual params) | 4K (adaptive) | 8 aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts) | | **Nano Banana2** | ✅ **Native support** 🆕 | 4K (4096×4096) | 512px/1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` | | **Nano Banana Pro** | ✅ **Native support** 🆕 | 4K (4096×4096) | 1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` | | **Midjourney** 🎨 | ❌ (1:1 only) | 1024px (square) | 480p/720p via `attribute_id` | Fixed 1024x1024, artistic style focus | **Key Capabilities**: - ✅ **Aspect ratio control**: **SeeDream 4.5** (virtual params), **Nano Banana Pro/2** (native support) - ❌ **8K**: Not supported by any model (max is 4K) - ✅ **Size control**: **Nano Banana2**, **Nano Banana Pro**, and **Midjourney** support multiple size options via different `attribute_id`s - ✅ **Budget option**: **Nano Banana2** is the cheapest at 4 pts for 512px, but 4K costs 13pts - 🎨 **Artistic styles**: **Midjourney** excels at creative, artistic, and illustration styles - 💡 **Best value**: **SeeDream 4.5** at 5pts offers aspect ratio flexibility; **Nano Banana2** 512px at 4pts for fastest/cheapest ### Video Generation | Category | Name | model_id | Cost Range | |----------|------|----------|-----------| | **text_to_video (14)** | Wan 2.6 🔥 | `wan2.6-t2v` | 25-120 pts | | | Hailuo 2.3 | `MiniMax-Hailuo-2.3` | 32+ pts | | | Hailuo 2.0 | `MiniMax-Hailuo-02` | 5+ pts | | | Vidu Q2 | `viduq2` | 5-70 pts | | | SeeDance 1.5 Pro | `doubao-seedance-1.5-pro` | 20+ pts | | | Sora 2 Pro | `sora-2-pro` | 122+ pts | | | Kling O1 | `kling-video-o1` | 48-120 pts | | | Kling 2.6 | `kling-v2-6` | 80+ pts | | | Google Veo 3.1 | `veo-3.1-generate-preview` | 70-330 pts | | | Pixverse V5.5 / V5 / V4.5 / V4 / V3.5 | `pixverse` | 12-48 pts | | **image_to_video (14)** | Wan 2.6 🔥 | `wan2.6-i2v` | 25-120 pts | | | Hailuo 2.3 / 2.0 | `MiniMax-Hailuo-2.3/02` | 25-32 pts | | | Vidu Q2 Pro | `viduq2-pro` | 20-70 pts | | | SeeDance 1.5 Pro | `doubao-seedance-1.5-pro` | 47+ pts | | | Sora 2 Pro | `sora-2-pro` | 122+ pts | | | Kling O1 / 2.6 | `kling-video-o1/v2-6` | 48-120 pts | | | Google Veo 3.1 | `veo-3.1-generate-preview` | 70-330 pts | | | Pixverse V5.5-V3.5 | `pixverse` | 12-48 pts | | **first_last_frame (11)** | Kling O1 🌟 | `kling-video-o1` | 48-120 pts | | | Kling 2.6 | `kling-v2-6` | 80+ pts | | | Others (9) | Hailuo 2.0, Vidu Q2 Pro, SeeDance 1.5 Pro, Veo 3.1, Pixverse V5.5-V3.5 | — | | **reference_image (6)** | Kling O1 🌟 | `kling-video-o1` | 48-120 pts | | | Google Veo 3.1 | `veo-3.1-generate-preview` | 70-330 pts | | | Others (4) | Vidu Q2, Pixverse V5.5/V5/V4.5 | — | | text_to_video | SeeDance 1.5 Pro / 1.0 Pro | `doubao-seedance-1.5-pro` / `doubao-seedance-1.0-pro` | 16 / 15 pts | | text_to_video | Sora 2 Pro / Sora 2 | `sora-2-pro` / `sora-2` | 120 / 35 pts | | text_to_video | Kling O1 / 2.6 / 2.5 Turbo / 1.6 | `kling-video-o1` / `kling-v2-6` / `kling-v2-5-turbo` / `kling-v1-6` | 48 / 80 / 24 / 32 pts | | text_to_video | Google Veo 3.1 Fast / 3.1 / 3.0 | `veo-3.1-fast-generate-preview` / `veo-3.1-generate-preview` / `veo-3.0-generate-preview` | 55 / 140 / 280 pts | | text_to_video | Pixverse V3.5–V5.5 | `pixverse` | 12 pts | | image_to_video | Wan 2.6 / 2.6 Flash / 2.5 / 2.2 Plus | `wan2.6-i2v` / `wan2.6-i2v-flash` / `wan2.5-i2v-preview` / `wan2.2-i2v-plus` | 25 / 12 / 12 / 10 pts | | image_to_video | Kling 2.1 Master | `kling-v2-1-master` | 150 pts | | first_last_frame_to_video | Kling O1 | `kling-video-o1` | 70 pts | | reference_image_to_video | Kling O1 / Vidu Q2 / Q1 | `kling-video-o1` / `viduq2` / `viduq1` | 48 / 10 / 25 pts | ### Music Generation | Category | Name | model_id | Cost | Notes | |----------|------|----------|------|-------| | text_to_music | Suno | `sonic` | 25 pts | sonic-v5; custom_mode, lyrics, vocal_gender | | text_to_music | DouBao BGM | `GenBGM` | 30 pts | Background music | | text_to_music | DouBao Song | `GenSong` | 30 pts | Song generation | ### Speech (TTS) — text_to_speech Models and credits are **not fixed**. Always call `GET /open/v1/product/list?category=text_to_speech` (or run the script with `--task-type text_to_speech --list-models`) to get current model_id, attribute_id, and credit. **ima-all-ai has complete TTS capability:** This document and the bundled `ima_create.py` provide full TTS support (routing, parameters, create/poll, UX protocol Steps 0–5, error translation). The **ima-tts-ai** skill is an optional standalone package with the same specification. #### TTS Task Detail — Response Shape Poll `POST /open/v1/tasks/detail` until completion. For TTS, `medias[]` uses the same structure as other IMA audio tasks: | Field | Type | Meaning | |-------|------|--------| | `resource_status` | int or null | 0=处理中, 1=可用, 2=失败, 3=已删除；null 视为 0 | | `status` | string | "pending" / "processing" / "success" / "failed" | | `url` | string | Audio URL when resource_status=1 (mp3/wav) | | `duration_str` | string | Optional, e.g. "12s" | | `format` | string | Optional, e.g. "mp3", "wav" | **Success example:** When all medias have `resource_status == 1` and `status != "failed"`, read `medias[0].url` (or `watermark_url`). Example: `{"medias":[{"resource_status":1,"status":"success","url":"https://cdn.../output.mp3","duration_str":"12s","format":"mp3"}]}`. #### TTS Create Task — Request Shape `task_type`: `"text_to_speech"`. No image input: `src_img_url: []`, `input_images: []`. `prompt` (text to speak) must be inside `parameters[].parameters`, not at top level. Extra fields (e.g. voice_id, speed) come from product `form_config`; pass via `--extra-params` and only include params present in the product’s credit_rules/form_config. #### TTS Common Mistakes | Mistake | Fix | |---------|-----| | prompt at top level | Put prompt inside `parameters[].parameters` (script does this) | | Wrong or missing attribute_id | Always call product list first; use credit_rules | | Single poll | Poll until all medias have resource_status == 1 | | Ignoring status when resource_status=1 | Check status != "failed" | | Sending params not in form_config/credit_rules | Use only params from product list; script reflection strips others on retry | > Always call `GET /open/v1/product/list?category=` first to get the live `attribute_id` and `form_config` defaults required for task creation. There are two equivalent route systems serving the same backend logic: | Route | Auth | Use Case | |-------|------|----------| | `/open/v1/` | `Authorization: Bearer ima_*` only | Third-party / agent access | | `/api/v3/` | Token + API Key (dual auth) | Frontend App | **This skill documents the `/open/v1/` Open API.** All business logic (credit validation, N-flattening, risk control) runs identically on both paths. ## Environment Base URL: `https://api.imastudio.com` Required/recommended headers for all `/open/v1/` endpoints: | Header | Required | Value | Notes | |--------|----------|-------|-------| | `Authorization` | ✅ | `Bearer ima_your_api_key_here` | API key authentication | | `x-app-source` | ✅ | `ima_skills` | Fixed value — identifies skill-originated requests | | `x_app_language` | recommended | `en` / `zh` | Product label language; defaults to `en` if omitted | ``` Authorization: Bearer ima_your_api_key_here x-app-source: ima_skills x_app_language: en ``` --- ## 📤 When to Upload Images (Quick Reference) **The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.** | Task Type | Input Required? | Upload Before Create? | Notes | |-----------|----------------|----------------------|-------| | **text_to_image** | ❌ No | — | Prompt only | | **image_to_image** | ✅ Yes (1 image) | ✅ Upload first | Single input image | | **text_to_video** | ❌ No | — | Prompt only | | **image_to_video** | ✅ Yes (1 image) | ✅ Upload first | Single input image | | **first_last_frame_to_video** | ✅ Yes (2 images) | ✅ Upload first | First + last frame | | **reference_image_to_video** | ✅ Yes (1+ images) | ✅ Upload first | Reference image(s) | | **text_to_music** | ❌ No | — | Prompt only | | **text_to_speech** | ❌ No | — | Prompt only (text to speak) | **Upload flow:** 1. User provides local file path or bytes → call `prepare_image_url()` (see section below) 2. User provides HTTPS URL → use directly, no upload needed 3. Use the returned CDN URL (`fdl`) as the value for `input_images` / `src_img_url` **Example workflow (image_to_image):** ```python # User provides local file image_url = prepare_image_url("/path/to/photo.jpg", api_key) # → Returns: https://ima-ga.esxscloud.com/webAgent/privite/2026/02/27/..._uuid.jpeg # Then create task with this URL create_task( task_type="image_to_image", input_images=[image_url], # Use uploaded URL prompt="turn into oil painting" ) ``` --- ## ⚠️ MANDATORY: Always Query Product List First > **CRITICAL**: You MUST call `/open/v1/product/list` BEFORE creating any task. > The `attribute_id` field is REQUIRED in the create request. If it is `0` or missing, you get: > `"Invalid product attribute"` → `"Insufficient points"` → task fails completely. > **NEVER construct a create request from the model table alone. Always fetch the product first.** ### How to get attribute_id (all task types) ```python # Query product list with the correct category GET /open/v1/product/list?app=ima&platform=web&category= # task_type: text_to_image | image_to_image | text_to_video | image_to_video | # first_last_frame_to_video | reference_image_to_video | text_to_music | text_to_speech # Walk the V2 tree to find your target model (type=3 leaf nodes only) for group in response["data"]: for version in group.get("children", []): if version["type"] == "3" and version["model_id"] == target_model_id: attribute_id = version["credit_rules"][0]["attribute_id"] credit = version["credit_rules"][0]["points"] model_version = version["id"] # = version_id / model_version model_name = version["name"] form_defaults = {f["field"]: f["value"] for f in version["form_config"]} break ``` ### Quick Reference: Known attribute_ids > Pre-queried values for convenience. **Always call the product list at runtime for accuracy.** | Model | Task Type | model_id | attribute_id | credit | Notes | |-------|-----------|----------|-------------|--------|-------| | **text_to_image** |||||| | | SeeDream 4.5 | text_to_image | `doubao-seedream-4.5` | **2341** | 5 pts | Default, balanced | | Nano Banana Pro (1K) | text_to_image | `gemini-3-pro-image` | **2399** | 10 pts | 1024×1024 | | Nano Banana Pro (2K) | text_to_image | `gemini-3-pro-image` | **2400** | 10 pts | 2048×2048 | | Nano Banana Pro (4K) | text_to_image | `gemini-3-pro-image` | **2401** | 18 pts | 4096×4096 | | **text_to_video** |||||| | | Wan 2.6 (720P, 5s) | text_to_video | `wan2.6-t2v` | **2057** | 25 pts | Default, balanced | | Wan 2.6 (1080P, 5s) | text_to_video | `wan2.6-t2v` | **2058** | 40 pts | — | | Wan 2.6 (720P, 10s) | text_to_video | `wan2.6-t2v` | **2059** | 50 pts | — | | Wan 2.6 (1080P, 10s) | text_to_video | `wan2.6-t2v` | **2060** | 80 pts | — | | Wan 2.6 (720P, 15s) | text_to_video | `wan2.6-t2v` | **2061** | 75 pts | — | | Wan 2.6 (1080P, 15s) | text_to_video | `wan2.6-t2v` | **2062** | 120 pts | — | | Kling O1 (5s, std) | text_to_video | `kling-video-o1` | **2313** | 48 pts | Latest Kling | | Kling O1 (5s, pro) | text_to_video | `kling-video-o1` | **2314** | 60 pts | — | | Kling O1 (10s, std) | text_to_video | `kling-video-o1` | **2315** | 96 pts | — | | Kling O1 (10s, pro) | text_to_video | `kling-video-o1` | **2316** | 120 pts | — | | **text_to_music** |||||| | | Suno (sonic-v4) | text_to_music | `sonic` | **2370** | 25 pts | Default | | DouBao BGM | text_to_music | `GenBGM` | **4399** | 30 pts | — | | DouBao Song | text_to_music | `GenSong` | **4398** | 30 pts | — | | **All others** | any | — | → query `/open/v1/product/list` | — | Always runtime query | ⚠️ **Production warning**: `attribute_id` and `credit` values change frequently in production. Always call `/open/v1/product/list` at runtime; above table is pre-queried reference only (2026-02-27). ### Common Mistakes (and resulting errors) | Mistake | Error | |---------|-------| | `attribute_id` is `0` or missing | `"Invalid product attribute"` + `"Insufficient points"` | | `attribute_id` outdated (production changed) | Same errors; always query product list first | | **`attribute_id` doesn't match parameter combination** | **Error 6010: "Attribute ID does not match the calculated rule"** | | `prompt` at outer `parameters[]` level | Prompt ignored; wrong routing | | `cast` missing from inner `parameters.parameters` | Billing validation failure | | `credit` value wrong or missing | Error 6006 | | `model_name` / `model_version` missing | Wrong backend routing | | Skipped product list, used table values directly | All of the above | **⚠️ Critical for Google Veo 3.1 and multi-rule models:** Models like Google Veo 3.1 have **multiple `credit_rules`**, each with a different `attribute_id` for different parameter combinations: - `720p + 4s + optimized` → attribute_id A - `720p + 8s + optimized` → attribute_id B - `4K + 4s + high` → attribute_id C The script automatically selects the correct `attribute_id` by matching your parameters (`duration`, `resolution`, `compression_quality`, `generate_audio`) against each rule's `attributes`. If the match fails, you get error 6010. **Fix**: The bundled script now checks these video-specific parameters for smart credit_rule selection. Always use the script, not manual API construction. --- ## Core Flow ``` 1. GET /open/v1/product/list?app=ima&platform=web&category= → REQUIRED: Get attribute_id, credit, model_version, model_name, form_config defaults [If input image required] 2. Upload image → get public HTTPS URL → See "Image Upload" section below 3. POST /open/v1/tasks/create → Must include: attribute_id, model_name, model_version, credit, cast, prompt (nested!) 4. POST /open/v1/tasks/detail {"task_id": "..."} → Poll until medias[].resource_status == 1 → Extract url from completed media ``` --- ## Image Upload (Required Before Image Tasks) **The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.** When a user provides an image (local file, bytes, base64), you must upload it first and get a URL. This is exactly what the IMA frontend does before every image task. ### Real Upload Flow (from IMA Frontend Source) The frontend uses a **two-step presigned URL flow** via the IM platform: ``` Step 1: GET /api/rest/oss/getuploadtoken → returns { ful, fdl } ful = presigned PUT URL (upload destination, expires ~7 days) fdl = final CDN download URL (use this as input_images value) Step 2: PUT {ful} with raw image bytes + Content-Type header → image is stored in Aliyun OSS: zhubite-imagent-bot.oss-us-east-1.aliyuncs.com → accessible via CDN: https://ima-ga.esxscloud.com/... ``` ### Step 1: Get Upload Token ``` GET https://imapi-qa.liveme.com/api/rest/oss/getuploadtoken ``` Required query parameters (11 total — sourced directly from frontend `generateUploadInfo`): | Parameter | Example | Description | |-----------|---------|-------------| | `appUid` | `ima_xxx...` | **Use IMA API key directly** — no separate login needed | | `appId` | `webAgent` | App identifier (fixed) | | `appKey` | `32jdskjdk320eew` | App secret (fixed, used for `sign` generation) | | `cmimToken` | `ima_xxx...` | **Use IMA API key directly** — same as appUid | | `sign` | `117CF6CF...` | IM auth HMAC: `SHA1("webAgent\|32jdskjdk320eew\|{timestamp}\|{nonce}").upper()` | | `timestamp` | `1772042430` | Unix timestamp (seconds), generated per request | | `nonce` | `CxI1FLI5ajLJZ1jlxZmeg` | Random nonce string, generated per request | | `fService` | `privite` | Fixed: storage service type | | `fType` | `picture` | `picture` for images, `video`, `audio` | | `fSuffix` | `jpeg` | File extension: `jpeg`, `png`, `mp4`, `mp3` | | `fContentType` | `image/jpeg` | MIME type of the file | > **简化认证**：直接使用 IMA API key 填充 `appUid` 和 `cmimToken` 参数，无需单独获取凭证。 Response: ```json { "ful": "https://zhubite-imagent-bot.oss-us-east-1.aliyuncs.com/webAgent/privite/2026/02/26/..._uuid.jpeg?Expires=...&OSSAccessKeyId=...&Signature=...", "fdl": "https://ima-ga.esxscloud.com/webAgent/privite/2026/02/26/..._uuid.jpeg", "ful_expire": "...", "fdl_expire": "...", "fdl_key": "..." } ``` ### Step 2: Upload Image via Presigned URL ``` PUT {ful} Content-Type: image/jpeg Body: [raw image bytes] ``` No auth headers needed — the presigned URL already encodes the credentials. ### Step 3: Use `fdl` as the Image URL After the PUT succeeds, use `fdl` (the CDN URL) as the value for `input_images` / `src_img_url`. ### Python Implementation ```python import hashlib, time, uuid, requests, mimetypes # ── 🌐 IMA Upload Service Endpoint (IMA-owned, for image/video uploads) ────── IMA_IM_BASE = "https://imapi-qa.liveme.com" # prod: https://imapi.liveme.com # ── 🔑 Hardcoded APP_KEY (Public, Shared Across All Users) ────────────────── # This APP_KEY is a PUBLIC identifier used by IMA Studio's image/video upload # service. It is NOT a secret—it's intentionally shared across all users and # embedded in the IMA web frontend. This key is used to generate HMAC signatures # for upload token requests, but your IMA API key (ima_xxx...) is the ACTUAL # authentication credential. Think of APP_KEY as a "client ID" rather than a # "client secret." # # ⚠️ Security Note: Your ima_xxx... API key is the sensitive credential. It is # sent to imapi.liveme.com as query parameters (appUid, cmimToken). Always use # test keys for experiments and rotate your API key regularly. # # 📖 See SECURITY.md for complete disclosure and network verification guide. APP_ID = "webAgent" APP_KEY = "32jdskjdk320eew" # Public shared key (used for HMAC sign generation) APP_UID = "" # POST /api/v3/login/app → data.user_id APP_TOKEN = "" # POST /api/v3/login/app → data.token def _gen_sign() -> tuple[str, str, str]: """Generate per-request (sign, timestamp, nonce).""" nonce = uuid.uuid4().hex[:21] ts = str(int(time.time())) raw = f"{APP_ID}|{APP_KEY}|{ts}|{nonce}" sign = hashlib.sha1(raw.encode()).hexdigest().upper() return sign, ts, nonce def get_upload_token(app_uid: str, app_token: str, suffix: str, content_type: str) -> dict: """Step 1: Get presigned upload URL from IMA's upload service. Calls GET imapi.liveme.com/api/rest/oss/getuploadtoken with exactly 11 params. Returns: { "ful": "", "fdl": "" } Args: app_uid: Your IMA API key (ima_xxx...), used as appUid parameter app_token: Your IMA API key (ima_xxx...), used as cmimToken parameter suffix: File extension (jpeg, png, mp4, mp3) content_type: MIME type (image/jpeg, video/mp4, etc.) Security Note: Your IMA API key (ima_xxx...) is sent to imapi.liveme.com as query parameters (appUid, cmimToken). This is IMA Studio's image/video upload service, separate from the main api.imastudio.com API. Both domains are owned by IMA Studio—this is part of IMA's microservices architecture. Why two domains? - api.imastudio.com: Core AI generation API (product list, task creation) - imapi.liveme.com: Specialized upload service (presigned URL generation) Your API key grants access to both services. For security verification, see SECURITY.md section "Network Traffic Verification." """ sign, ts, nonce = _gen_sign() r = requests.get( f"{IMA_IM_BASE}/api/rest/oss/getuploadtoken", params={ # App Key params "appUid": app_uid, # APP_UID "appId": APP_ID, "appKey": APP_KEY, "cmimToken": app_token, # APP_TOKEN "sign": sign, "timestamp": ts, "nonce": nonce, # File params "fService": "privite", # fixed "fType": "picture", # picture / video / audio "fSuffix": suffix, # jpeg / png / mp4 / mp3 "fContentType": content_type, }, ) r.raise_for_status() return r.json()["data"] def upload_image_to_oss(image_bytes: bytes, content_type: str, ful: str) -> None: """Step 2: PUT image bytes to the presigned OSS URL. No auth needed.""" resp = requests.put(ful, data=image_bytes, headers={"Content-Type": content_type}) resp.raise_for_status() def prepare_image_url(source, api_key: str) -> str: """ Full workflow: upload any image and return the CDN URL (fdl). Args: source: file path (str), raw bytes, or already-public HTTPS URL api_key: IMA API key for upload authentication Returns: public HTTPS CDN URL ready to use as input_images value """ # Already a public URL → use directly, no upload needed if isinstance(source, str) and source.startswith("https://"): return source # Read file bytes if isinstance(source, str): ext = source.rsplit(".", 1)[-1].lower() if "." in source else "jpeg" with open(source, "rb") as f: image_bytes = f.read() content_type = mimetypes.guess_type(source)[0] or "image/jpeg" else: image_bytes = source ext = "jpeg" content_type = "image/jpeg" # Step 1: Get presigned URL using API key directly token_data = get_upload_token(api_key, ext, content_type) ful = token_data["ful"] fdl = token_data["fdl"] # Step 2: Upload to OSS upload_image_to_oss(image_bytes, content_type, ful) # Step 3: Return CDN URL return fdl # use this as input_images / src_img_url value ``` > **OSS path format**: `webAgent/privite/{YYYY}/{MM}/{DD}/{timestamp}_{uid}_{uuid}.{ext}` > **CDN base**: `https://ima-ga.esxscloud.com/` > **OSS bucket**: `zhubite-imagent-bot.oss-us-east-1.aliyuncs.com` --- ## Quick Reference ### Task Types (category values) | category | Capability | Input | |----------|------------|-------| | `text_to_image` | Text → Image | prompt | | `image_to_image` | Image → Image | prompt + input_images | | `text_to_video` | Text → Video | prompt | | `image_to_video` | Image → Video | prompt + input_images | | `first_last_frame_to_video` | First+Last Frame → Video | prompt + src_img_url[2] | | `reference_image_to_video` | Reference Image → Video | prompt + src_img_url[1+] | | `text_to_music` | Text → Music | prompt | | `text_to_speech` | Text → Speech | prompt (text to speak) | ### Detail API status values Each media in `medias[]` has two fields: | Field | Type | Values | Description | |-------|------|--------|-------------| | **`resource_status`** | int (or `null`) | `0`, `1`, `2`, `3` | 0=处理中, 1=可用, 2=失败, 3=已删除。API 可能返回 `null`，需当作 0。 | | **`status`** | string | `"pending"`, `"processing"`, `"success"`, `"failed"` | 任务状态文案。轮询时以 `resource_status` 为准；`status == "failed"` 表示失败。 | Poll on `resource_status` first, then ensure `status` is not `"failed"`: | `resource_status` | `status` | Meaning | Action | |-------------------|----------|---------|--------| | `0` or **`null`** | `pending` / `processing` | 处理中 | Keep polling; do not stop (null = 0) | | `1` | `success` (or `completed`) | **完成** | Read `url`; stop only when **all** medias are 1 | | `1` | `failed` | 失败 (status 优先) | Stop, handle error | | `2` | any | 失败 | Stop, handle error | | `3` | any | 已删除 | Stop | > **Important**: (1) Treat `resource_status: null` as 0. (2) Stop only when **all** medias have `resource_status == 1`. (3) When `resource_status=1`, still check `status != "failed"`. --- ## API 1: Product List ``` GET /open/v1/product/list?app=ima&platform=web&category=text_to_image ``` Internally calls downstream `/v1/products/listv2`. Returns a **V2 tree structure**: `type=2` nodes are model groups, `type=3` nodes are versions (leaves). Only `type=3` nodes contain `credit_rules` and `form_config`. > `webAgent` is auto-converted to `ima` by the gateway — you can use either value for `app`. ```json [ { "id": "SeeDream", "type": "2", "name": "SeeDream", "model_id": "", "children": [ { "id": "doubao-seedream-4-0-250828", "type": "3", "name": "SeeDream 4.0", "model_id": "doubao-seedream-4.0", "credit_rules": [ { "attribute_id": 332, "points": 5, "attributes": { "default": "enabled" } } ], "form_config": [ { "field": "size", "type": "tags", "value": "1K", "options": [{"label":"1K","value":"1K"}, {"label":"2K","value":"2K"}] } ] } ] } ] ``` **How to pick a version for task creation:** 1. Traverse nodes to find `type=3` leaves (versions) 2. Use `model_id` and `id` (= `model_version`) from the leaf 3. Pick `credit_rules[].attribute_id` matching your desired quality/size (`attributes` field shows the config) 4. Use `form_config[].value` as default `parameters` values > `credit_rules[].attribute_id` → required for task creation as `attribute_id`. > `credit_rules[].points` → required for task creation as `credit` and `cast.points`. --- ## API 2: Create Task ``` POST /open/v1/tasks/create ``` ### Request Structure ```json { "task_type": "text_to_image", "enable_multi_model": false, "src_img_url": [], "upload_img_src": "", "parameters": [ { "attribute_id": 8538, "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "model_version": "doubao-seedream-4-5-251128", "app": "ima", "platform": "web", "category": "text_to_image", "credit": 5, "parameters": { "prompt": "a beautiful mountain sunset, photorealistic", "size": "4k", "n": 1, "input_images": [], "cast": {"points": 5, "attribute_id": 8538} } } ] } ``` ### Field Reference | Field | Required | Description | |-------|----------|-------------| | `task_type` | ✅ | Must match `parameters[].category` | | `parameters[].attribute_id` | ✅ | From `credit_rules[].attribute_id` in product list | | `parameters[].model_id` | ✅ | From `type=3` leaf node `model_id` | | `parameters[].model_version` | ✅ | From `type=3` leaf node `id` | | `parameters[].app` | ✅ | Use `ima` (or `webAgent`, auto-converted) | | `parameters[].platform` | ✅ | Use `web` | | `parameters[].category` | ✅ | Must match top-level `task_type` | | `parameters[].credit` | ✅ | Must equal `credit_rules[].points`. Error 6006 if wrong. | | `parameters[].parameters.prompt` | ✅ | The actual prompt text used by downstream service | | `parameters[].parameters.cast` | ✅ | `{"points": N, "attribute_id": N}` — mirrors credit | | `parameters[].parameters.n` | ✅ | Number of outputs (usually `1`). Gateway flattens N>1 into separate resources. | | `parameters[].parameters.input_images` | image tasks | Array of input image URLs | | top-level `src_img_url` | multi-image | Array for first_last_frame / reference tasks | ### N-Field Flattening (Gateway Internal Logic) When `n > 1`, the gateway automatically: 1. Generates `n` independent `resourceBizId` values 2. Deducts credits `n` times (one per resource) 3. Creates `n` separate tasks in the downstream service Response `medias[]` will contain `n` items. Poll until **all** have `resource_status == 1`. ### Response ```json { "code": 0, "data": { "id": "task_abc123", "biz_id": "biz_xxx", "task_type": "text_to_image", "medias": [], "generate_count": 1, "created_at": 1700000000000, "timeout_at": 1700000300000 } } ``` `data.id` = task ID for polling. `timeout_at` = Unix ms deadline. --- ## API 3: Task Detail (Poll) ``` POST /open/v1/tasks/detail {"task_id": ""} ``` Poll every 2–5s (8s+ for video). Completed response: ```json { "id": "task_abc", "medias": [{ "resource_status": 1, "status": "success", "url": "https://cdn.../output.jpg", "cover": "https://cdn.../cover.jpg", "format": "jpg", "width": 1024, "height": 1024 }] } ``` **Polling stop condition (must implement exactly):** - Treat `resource_status: null` (or missing) as **0** (processing). Do **not** stop when you see null; backend may serialize Go `*int` as null. - Stop **only when ALL** `medias[].resource_status == 1` and no `status == "failed"`. If you return on the first media with `resource_status == 1` while others are still 0, the task is not fully done and you will keep polling or get inconsistent state. - Stop immediately if any `status == "failed"` or `resource_status == 2` or `resource_status == 3`. --- ## Task Type Examples ### text_to_image ✅ Verified No image input. `src_img_url: []`, `input_images: []`. See API 2 for full example. ### text_to_video ✅ Verified Extra fields vs text_to_image — all from `form_config` defaults: ```json { "task_type": "text_to_video", "src_img_url": [], "parameters": [{ "attribute_id": 4838, "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "model_version": "wan2.6-t2v", "category": "text_to_video", "credit": 3, "app": "ima", "platform": "web", "parameters": { "prompt": "a puppy dancing happily, sunny meadow", "negative_prompt": "", "prompt_extend": false, "duration": 5, "resolution": "1080P", "aspect_ratio": "16:9", "shot_type": "single", "seed": -1, "n": 1, "input_images": [], "cast": {"points": 3, "attribute_id": 4838} } }] } ``` > Video-specific fields from `form_config`: `duration` (seconds), `resolution`, `aspect_ratio`, `shot_type`, `negative_prompt`, `prompt_extend`. > Poll every 8s (video generation is slower). Response `medias[].cover` = first-frame thumbnail. ### text_to_music No image input. `src_img_url: []`, `input_images: []`. ### image_to_image ✅ Verified ```json { "task_type": "image_to_image", "src_img_url": ["https://...input.jpg"], "parameters": [{ "attribute_id": 8560, "model_id": "doubao-seedream-4.5", "model_version": "doubao-seedream-4-5-251128", "category": "image_to_image", "credit": 5, "app": "ima", "platform": "web", "parameters": { "prompt": "turn into oil painting style", "size": "4k", "n": 1, "input_images": ["https://...input.jpg"], "cast": {"points": 5, "attribute_id": 8560} } }] } ``` > ⚠️ `size` must be from `form_config` options (e.g. `"2k"`, `"4k"`, `"2048x2048"`). `"adaptive"` is NOT valid for SeeDream 4.5 i2i — causes error 400. > Top-level `src_img_url` **and** `parameters.input_images` must both contain the input image URL. > Some i2i models (e.g. `doubao-seededit-3.0-i2i`) may not be available in test environments — fall back to SeeDream 4.5. ### image_to_video / first_last_frame_to_video / reference_image_to_video ```json { "src_img_url": ["https://first-frame.jpg", "https://last-frame.jpg"] } ``` Index 0 = first frame (or reference), index 1 = last frame (first_last_frame only). --- ## Common Mistakes | Mistake | Fix | |---------|-----| | `attribute_id` not from `credit_rules` | Always fetch product list first | | `credit` value wrong | Must exactly match `credit_rules[].points` — error 6006 | | `prompt` at wrong location | Put prompt in `parameters[].parameters.prompt` (nested), not only at top level | | Polling `biz_id` instead of `id` | Use `id` (task ID) for `/tasks/detail` | | Single-poll instead of loop | Poll until `resource_status == 1` for ALL medias | | Missing `app` / `platform` in parameters | Required fields — use `ima` / `web` | | `category` mismatch | `parameters[].category` must match top-level `task_type` | | `resource_status == 2` not handled | Check for failure, don't loop forever | | `status == "failed"` ignored | `resource_status=1` + `status="failed"` means actual failure | | `n > 1` and only checking first media | All `n` media items must reach `resource_status == 1` | --- ## Complete Python Example See the Python example sections throughout this documentation for implementation guidance covering all 7 task types. --- ## Supported Models & Search Terms **Image:** SeeDream 4.5 (see dream), Midjourney (MJ), Nano Banana 2, Nano Banana Pro **Video:** Wan 2.6, Kling O1, Kling 2.6, Google Veo 3.1 (veo), Sora 2 Pro, Pixverse V5.5, Hailuo 2.0, Hailuo 2.3, MiniMax Hailuo, SeeDance 1.5 Pro, Vidu Q2 **Music:** Suno sonic v4, Suno sonic v5, DouBao BGM (GenBGM), DouBao Song (GenSong) **TTS:** seed-tts-2.0 (seed tts, text-to-speech) **Capabilities:** multimodal AI creation, all-in-one, image generation, video generation, music generation, text-to-speech, text-to-image, image-to-video, text-to-music