---
name: IMA Studio TTS
description: "TTS (text-to-speech) via IMA Open API with seed-tts-2.0. Voice synthesis, speech from text, dubbing, audio content creation. Output: audio URL (mp3/wav). Flow: query products, create task, poll until done. Requires IMA API key."
---

# IMA TTS (Text-to-Speech)

## Overview

Call IMA Open API to create **text-to-speech** audio. Same flow as other IMA creation skills: **query products → create task → poll until done**. Task type is `text_to_speech`. **This skill targets seed-tts-2.0 only** — seed-tts-1.1 is not supported; the script defaults to `seed-tts-2.0` when no model is specified.

## ⚙️ How This Skill Works

This skill uses a bundled Python script `scripts/ima_tts_create.py` to call the IMA Open API:

- Sends **text (prompt)** to `https://api.imastudio.com`
- Uses `--user-id` only locally for preference storage
- Returns an **audio URL** when synthesis is complete
- **Reflection mechanism**: on create failure, retries up to 3 times with parameter adjustments

**What gets sent to IMA:** prompt (text to speak), model selection, parameters (e.g. voice_id, speed). **Not sent:** API key in prompt body; user_id is local only.

### Agent Execution

Use the bundled script:

```bash
# List available TTS models (optional; default is seed-tts-2.0)
python3 {baseDir}/scripts/ima_tts_create.py --api-key $IMA_API_KEY --list-models

# Generate speech (default model: seed-tts-2.0; omit --model-id to use default)
python3 {baseDir}/scripts/ima_tts_create.py \
  --api-key $IMA_API_KEY \
  --model-id seed-tts-2.0 \
  --prompt "Text to be spoken here." \
  --user-id {user_id} \
  --output-json
```

Script outputs JSON; parse it for `url` and pass to the user via the UX protocol below.

---

## Environment

Base URL: `https://api.imastudio.com`

| Header | Required | Value |
|--------|----------|-------|
| `Authorization` | ✅ | `Bearer ima_your_api_key_here` |
| `x-app-source` | ✅ | `ima_skills` |
| `x_app_language` | recommended | `en` / `zh` |

---

## ⚠️ MANDATORY: Always Query Product List First

You **MUST** call `/open/v1/product/list` with `category=text_to_speech` before creating any task. `attribute_id` is required; if 0 or missing → `"Invalid product attribute"` and task fails.

```python
GET /open/v1/product/list?app=ima&platform=web&category=text_to_speech
```

Then traverse the V2 tree: `type=2` = model groups, `type=3` = versions (leaves). Only `type=3` nodes have `credit_rules` and `form_config`. Use a leaf’s `model_id`, `id` (= model_version), and `credit_rules[0].attribute_id` / `points` for create.

---

## Core Flow

```
1. GET /open/v1/product/list?app=ima&platform=web&category=text_to_speech
   → Get attribute_id, credit, model_version, form_config

2. POST /open/v1/tasks/create
   → task_type: "text_to_speech", parameters[].parameters.prompt = text to speak

3. POST /open/v1/tasks/detail  { "task_id": "..." }
   → Poll every 2–5s until medias[].resource_status == 1 and status != "failed"
   → Read medias[].url (and optional duration_str, format)
```

---

## Task Detail API — Actual Response Shape

Poll `POST /open/v1/tasks/detail` until completion. Response uses the same structure as other IMA audio tasks:

| Field | Type | Meaning |
|-------|------|--------|
| `resource_status` | int or null | 0=处理中, 1=可用, 2=失败, 3=已删除；null 视为 0 |
| `status` | string | "pending" / "processing" / "success" / "failed" |
| `url` | string | Audio URL when resource_status=1 (mp3/wav) |
| `duration_str` | string | Optional, e.g. "30s" |
| `format` | string | Optional, e.g. "mp3", "wav" |

**Completed success example:**

```json
{
  "id": "task_xxx",
  "medias": [{
    "resource_status": 1,
    "status": "success",
    "url": "https://cdn.../output.mp3",
    "duration_str": "12s",
    "format": "mp3"
  }]
}
```

**Rules:**

- Treat `resource_status: null` as 0 (processing).
- Success only when **all** medias have `resource_status == 1` and `status != "failed"`.
- On `resource_status == 2` or `status == "failed"`, stop and handle error (e.g. use `error_msg` / `remark`).

---

## API 2: Create Task

```
POST /open/v1/tasks/create
```

**text_to_speech** — no image input. `src_img_url: []`, `input_images: []`.

```json
{
  "task_type": "text_to_speech",
  "enable_multi_model": false,
  "src_img_url": [],
  "parameters": [{
    "attribute_id":  "<from credit_rules>",
    "model_id":      "<model_id>",
    "model_name":    "<model_name>",
    "model_version": "<version_id>",
    "app":           "ima",
    "platform":      "web",
    "category":      "text_to_speech",
    "credit":        "<points>",
    "parameters": {
      "prompt":       "Text to be spoken.",
      "n":            1,
      "input_images": [],
      "cast":         {"points": "<points>", "attribute_id": "<attribute_id>"}
    }
  }]
}
```

`prompt` must be inside `parameters[].parameters`, not at top level. Extra fields (e.g. voice_id, speed) come from product `form_config`; include only those present in the product’s credit_rules/form_config.

Response: `data.id` = task_id for polling.

---

## Supported Task Type & Models

| category | Capability | Input |
|----------|------------|-------|
| `text_to_speech` | Text → Speech | prompt (text to speak) |

**Models:** This skill supports **seed-tts-2.0** only (seed-tts-1.1 is not supported). The script defaults to `--model-id seed-tts-2.0` when none is provided. For current `attribute_id` and `credit`, the script reads from the product list at runtime.

### seed-tts-2.0 — Verified request parameters

The following `parameters[].parameters` shape has been verified to work for **seed-tts-2.0** (attribute_id/credit come from product list and may differ by app/platform):

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | string | ✅ | Text to speak (合成文本). |
| `n` | int | ✅ | Usually 1. |
| `model` | string | ✅ | Sub-model: `seed-tts-2.0-expressive` (default) or `seed-tts-2.0-standard`. |
| `speaker` | string | optional | Speaker ID / 发音人，e.g. `zh_male_sophie_uranus_bigtts`（[音色列表 1257544](https://www.volcengine.com/docs/6561/1257544) 中原生 voice_type）. **注意：** 使用原生格式（如 `zh_male_*_uranus_bigtts`），不支持 `BV*_streaming` 格式。 |
| `audio_params` | object | optional | `emotion`（情感）、`speech_rate`（语速 [-50,100]）、`loudness_rate`（音量 [-50,100]）等，见 [1598757 请求 Body](https://www.volcengine.com/docs/6561/1598757?lang=zh). |
| `additions` | object | optional | e.g. `{"explicit_language": "crosslingual", "context_texts": []}`. |
| `cast` | object | ✅ | `{"points": <credit>, "attribute_id": <attribute_id>}` from product list. |

**Script example with extra params:**

```bash
python3 ima_tts_create.py --api-key $IMA_API_KEY --model-id seed-tts-2.0 \
  --prompt "阳光青年音色测试，你好世界。" \
  --extra-params '{"model":"seed-tts-2.0-expressive","speaker":"zh_male_sophie_uranus_bigtts","audio_params":{"emotion":"neutral"},"additions":{"explicit_language":"crosslingual","context_texts":[]}}' \
  --output-json
```

**Note:** The script gets `attribute_id` and `credit` from the product list (e.g. `app=ima&platform=web` → often 2 pts / attribute_id 4419 for seed-tts-2.0). If you have a different app/platform (e.g. webAgent), the product list may return different credit_rules (e.g. 5 pts / attribute_id 8987); the script uses whatever the product list returns for the chosen model.

**Speaker / 音色列表（seed-tts-2.0 兼容火山引擎音色）：** 完整音色 ID 与场景分类见项目内 `volcengine_tts_timbre_list.json`。该文件来自 [火山引擎豆包语音合成音色列表](https://www.volcengine.com/docs/6561/1257544)，使用原生 `voice_type` 格式（如 `zh_male_sophie_uranus_bigtts` 魅力苏菲、`zh_female_vv_uranus_bigtts` Vivi）。**⚠️ 注意：** IMA API 只支持原生格式（`*_uranus_bigtts` 系列），不支持 `BV*_streaming` 豆包音色 ID。

**与火山引擎 2.0 文档对照：** 上述参数与 [HTTP Chunked/SSE 单向流式 V3 请求 Body](https://www.volcengine.com/docs/6561/1598757?lang=zh) 一致：`req_params.text` → prompt，`req_params.speaker` → speaker（必填项），`req_params.model` → model（expressive/standard），`req_params.audio_params`（emotion、speech_rate、loudness_rate 等），`req_params.additions`（如 explicit_language）。2.0 能力说明见 [豆包语音合成2.0能力介绍](https://www.volcengine.com/docs/6561/1871062?lang=zh)（语音指令、引用上文、语音标签等）。

---

## 🎤 当用户说「帮我制作旁白/配音」时如何询问

当用户表达「帮我制作旁白」「做一段配音」「把这段文字读出来」等意图时，**必须先收集关键信息再调用脚本**，避免缺参或盲目默认。

### 必问

| 询问项 | 对应参数 | 说明 |
|--------|----------|------|
| **要朗读的内容 / 旁白文案** | `prompt` | 合成文本，必填。若用户只给主题，可请用户提供具体文案或由你生成后让用户确认。 |

### 建议问（让用户选择）

| 询问项 | 对应参数 | 选项来源与示例 |
|--------|----------|----------------|
| **音色 / 发音人** | `speaker` | 从项目内 `volcengine_tts_timbre_list.json`（或 [音色列表 1257544](https://www.volcengine.com/docs/6561/1257544)）按场景推荐：**通用场景**（魅力苏菲 `zh_male_sophie_uranus_bigtts`、Vivi `zh_female_vv_uranus_bigtts`、云舟 `zh_male_m191_uranus_bigtts`）、**视频配音**（大壹 `zh_male_dayi_uranus_bigtts`、猴哥 `zh_male_sunwukong_uranus_bigtts`）、**角色扮演**（知性灿灿 `zh_female_cancan_uranus_bigtts`、撒娇学妹 `zh_female_sajiaoxuemei_uranus_bigtts`）。可简短列出 3–5 个候选让用户选，或问「要男声/女声？偏解说/读书/助手？」再缩小范围。**⚠️ 使用原生格式**（`*_uranus_bigtts`）。 |

### 可选问（按需补充）

| 询问项 | 对应参数 | 说明与取值 |
|--------|----------|------------|
| **情感 / 情绪** | `audio_params.emotion` | 部分音色支持，如 neutral、sad、angry；详见 [音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544)。 |
| **语速** | `audio_params.speech_rate` | 范围 [-50, 100]，0 为正常，100 约 2 倍速。可通过 `--extra-params '{"audio_params":{"speech_rate":20}}'` 传入。 |
| **音量** | `audio_params.loudness_rate` | 范围 [-50, 100]，0 为正常（mix 音色不支持）。 |
| **模型风格** | `model` | `seed-tts-2.0-expressive`（默认，表现力强）或 `seed-tts-2.0-standard`（更稳定）。 |

**脚本对应：** `--prompt` 必填；`--speaker`、`--emotion` 直接支持；语速/音量/模型等通过 `--extra-params` 传入 JSON（见上文 Script example）。

---

## 📥 User Input Parsing (Parameter Recognition)

Map user intent to parameters using product `form_config` (e.g. voice, speed):

| User intent / phrasing | Parameter (if in form_config) | Notes |
|-------------------------|---------------------------------|--------|
| 旁白 / 配音 / 朗读 / 把这段读出来 | prompt + speaker（建议问） | **先问清内容与音色**，再调用；见上方「当用户说制作旁白/配音时如何询问」。 |
| 女声 / 女声朗读 / female voice | voice_id / voice_type / speaker | Use value from form_config or e.g. speaker ID |
| 男声 / 男声朗读 / male voice | voice_id / voice_type / speaker | Use value from form_config or e.g. speaker ID |
| 发音人 / 音色 / speaker | speaker | seed-tts-2.0: e.g. zh_male_sophie_uranus_bigtts，见 volcengine_tts_timbre_list.json（原生格式） |
| 情感 / 情绪 / emotion | audio_params.emotion | e.g. "neutral", "sad"；部分音色支持 |
| 语速快/慢 / speed up/slow | audio_params.speech_rate | 范围 [-50, 100]，0 为正常 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | audio_params.loudness_rate | 范围 [-50, 100] |
| 风格 expressive/standard | model | seed-tts-2.0: seed-tts-2.0-expressive / seed-tts-2.0-standard |

If the user does not specify, use form_config defaults. Do not send parameters not present in the product’s credit_rules/attributes or form_config (reflection will strip them on retry).

---

## 🧠 User Preference Memory

Storage: `~/.openclaw/memory/ima_prefs.json`

```json
{
  "user_{user_id}": {
    "text_to_speech": {
      "model_id": "...",
      "model_name": "...",
      "credit": 2,
      "last_used": "..."
    }
  }
}
```

- **Before generation:** Load prefs; if `user_{user_id}.text_to_speech` exists, use that model and optionally mention it.
- **After success:** Save used model to `user_{user_id}.text_to_speech`.
- **On explicit change:** e.g. “换成XXX” / “以后都用XXX” → switch and save.

---

## 💬 User Experience Protocol (IM / Feishu / Discord)

TTS usually completes in a few seconds to tens of seconds. **Do not leave users without feedback.**

### Step 0 — Initial Acknowledgment (Normal Reply)

First reply with a short acknowledgment (normal reply, not message tool), e.g.:

- 好的，正在帮你把这段文字转成语音。
- OK, converting this text to speech.

### Step 1 — Pre-Generation Notification (message tool)

Push once:

```
🔊 开始语音合成，请稍候…
• 模型：[Model Name]
• 预计耗时：[X ~ Y 秒]
• 消耗积分：[N pts]
```

### Step 2 — Progress

Poll every 2–5s. Every 10–15s send a progress update, e.g.:

```
⏳ 语音合成中… [P]%
已等待 [elapsed]s，预计最长 [max]s
```

Cap progress at 95% until API returns success.

### Step 3 — Success (message tool)

When `resource_status == 1` and `status != "failed"`, send the audio and caption:

- **media** = `medias[0].url`
- **caption** example:

```
✅ 语音合成成功！
• 模型：[Model Name]
• 耗时：[actual]s
• 消耗积分：[N pts]
🔗 原始链接：[url]
```

Use the **URL** from the API (do not use local file paths).

### Step 4 — Failure (message tool)

On failure or API/network error, send a short, user-friendly message and suggestions:

```
❌ 语音合成失败
• 原因：[自然语言原因]
• 建议：换个模型重试或检查文本长度/内容

需要我帮你用其他模型重试吗？
```

**Error translation (do not expose raw API/technical errors):**

| Technical | ✅ Say (CN) | ✅ Say (EN) |
|-----------|-------------|-------------|
| 401 Unauthorized | 密钥无效或未授权，请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai |
| 4008 Insufficient points | 积分不足，请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai |
| Invalid product attribute | 参数配置异常，请稍后重试 | Configuration error, try again later |
| Error 6006 / 6010 | 积分或参数不匹配，请换模型或重试 | Points/params mismatch, try another model |
| resource_status == 2 / status failed | 合成失败，建议换模型或缩短文本 | Synthesis failed, try another model or shorter text |
| timeout | 合成超时，请稍后重试 | Timed out, try again later |
| Network error | 网络不稳定，请检查后重试 | Network unstable, check and retry |

Links: API key — https://www.imaclaw.ai/imaclaw/apikey ；Credits — https://www.imaclaw.ai/imaclaw/subscription

### Step 5 — Done

After Step 0–4, no further reply needed. Do not send duplicate confirmations.

---

## Common Mistakes

| Mistake | Fix |
|---------|-----|
| prompt at top level | Put prompt inside `parameters[].parameters` |
| Wrong or missing attribute_id | Always call product list first; use credit_rules[0] |
| Single poll | Poll until all medias have resource_status == 1 |
| Ignoring status when resource_status=1 | Check status != "failed" |
| Sending params not in form_config/credit_rules | Use only params from product list; script reflection will strip others on retry |

---

## Security & Local Data

- **Network:** This skill uses only `https://api.imastudio.com` (no image upload domain for TTS).
- **Local files:** `~/.openclaw/memory/ima_prefs.json` (preferences), `~/.openclaw/logs/ima_skills/` (logs, e.g. 7-day retention). No prompts or API keys stored.
- **API key:** Set via environment (e.g. `IMA_API_KEY`) or agent config; never hardcode.

---

## Python Example (Minimal)

```python
import time
import requests

BASE = "https://api.imastudio.com"
HEADERS = {
    "Authorization": "Bearer ima_your_key",
    "Content-Type": "application/json",
    "x-app-source": "ima_skills",
}

# 1. Product list
r = requests.get(
    f"{BASE}/open/v1/product/list",
    headers=HEADERS,
    params={"app": "ima", "platform": "web", "category": "text_to_speech"},
)
tree = r.json()["data"]
# ... find type=3 node, get attribute_id, model_id, model_version, credit ...

# 2. Create task
body = {
    "task_type": "text_to_speech",
    "enable_multi_model": False,
    "src_img_url": [],
    "parameters": [{
        "attribute_id": attribute_id,
        "model_id": model_id,
        "model_name": model_name,
        "model_version": model_version,
        "app": "ima", "platform": "web",
        "category": "text_to_speech",
        "credit": credit,
        "parameters": {
            "prompt": "Hello, world.",
            "n": 1,
            "input_images": [],
            "cast": {"points": credit, "attribute_id": attribute_id},
        },
    }],
}
r = requests.post(f"{BASE}/open/v1/tasks/create", headers=HEADERS, json=body)
task_id = r.json()["data"]["id"]

# 3. Poll
while True:
    r = requests.post(f"{BASE}/open/v1/tasks/detail", headers=HEADERS, json={"task_id": task_id})
    task = r.json()["data"]
    medias = task.get("medias") or []
    if not medias:
        time.sleep(3)
        continue
    rs = medias[0].get("resource_status")
    if rs is None: rs = 0
    if rs == 2 or (medias[0].get("status") or "").lower() == "failed":
        raise RuntimeError(medias[0].get("error_msg") or "failed")
    if rs == 1 and (medias[0].get("url") or medias[0].get("watermark_url")):
        url = medias[0]["url"] or medias[0]["watermark_url"]
        print(url)  # e.g. https://cdn.../output.mp3
        break
    time.sleep(3)
```

---

## Quick Reference

| Item | Value |
|------|--------|
| Task type | `text_to_speech` |
| Product list | `GET /open/v1/product/list?category=text_to_speech` |
| Create | `POST /open/v1/tasks/create` (prompt inside parameters[].parameters) |
| Poll | `POST /open/v1/tasks/detail` every 2–5s |
| Done when | All medias: resource_status=1, status≠"failed", url present |
| Script | `scripts/ima_tts_create.py` (--list-models, --model-id, --prompt, --output-json) |

---

## Supported Models & Search Terms

**Model:** seed-tts-2.0 (also known as: seed tts, seed-tts, ByteDance TTS)

**Capabilities:** text-to-speech (TTS), speech synthesis, voice synthesis, voice generation, text to speech, dubbing, narration, voiceover, audio generation