--- name: wonda-cli description: Using the Wonda CLI to generate images, videos, music, and audio from the terminal — plus LinkedIn, Reddit, and X/Twitter research and automation --- # Wonda CLI Wonda CLI is a content creation toolkit for terminal-based agents. Use it to generate images, videos, music, and audio; edit and compose media; publish to social platforms; and research/automate across LinkedIn, Reddit, and X/Twitter. ## Install If `wonda` is not found on PATH, install it first: ```bash # npm npm i -g @degausai/wonda # Homebrew brew tap degausai/tap && brew install wonda ``` ## Setup - **Auth**: `wonda auth login` (opens browser, recommended) or set `WONDA_API_KEY` env var - **Verify**: `wonda auth check` ### Access tiers Not all commands are available to every account type: | Tier | Access | | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | **Anonymous** (temporary account, no login) | Media upload/download, editing (`video/edit`, `image/edit`, `audio/edit`), transcription, social publishing, scraping, analytics | | **Free** (logged in, Basic/Free plan) | Everything above + **generation** (`image/generate`, `video/generate`, etc.), styles, recipes, brand | | **Paid** (Plus, Pro, or Absolute plan) | Everything above + **video analysis** (requires credits), **skill commands** (`wonda skill install/list/get`) | | **Flagged** (per-account PostHog flags) | `wonda transitions` (transitionsEnabled), `wonda clipping` (clippingEnabled). Flip the flag in PostHog for the account. | If a command returns a `403` error, check your plan at https://app.wondercat.ai/settings/billing. ### Social signups (Instagram, TikTok, etc.) Drive them with the `wonda device` primitives + a throwaway mailbox from `wonda email`. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it. Standard loop: 1. `wonda email account create --random` → save `{email, password}`. Persist the resulting platform login with `wonda credentials create --website instagram.com --username --email --password-stdin <<< ""` (passwords are AES-256-GCM encrypted at rest; retrieve later with `wonda credentials get `). 2. `wonda device create` → pick a `ready` device (poll `wonda device get --fields status`). 3. `wonda device launch com.instagram.android` (or `com.zhiliaoapp.musically` for TikTok). Fall back to `wonda device open-url` if you'd rather start in the web flow. 4. Loop: `wonda device screenshot > s.json` → decode the base64 PNG → read → pick an action → `tap | type | swipe | key` → screenshot again. Use `--text "SomeButtonLabel"` on `tap` before guessing coordinates; fall back to `--x --y` read off the screenshot for elements without matching text (number pickers, date spinners, etc.). 5. When the app sends a verification email, `wonda email inbox wait --timeout 120` — returns `{codes: ["483921"], links: [...]}` with the 6-digit code already extracted. `wonda device type --text ""` to feed it back. **Race-safety**: capture a timestamp _before_ triggering the signup (`SINCE=$(date -u +%FT%TZ)`) and pass `--since "$SINCE"` — otherwise a fast mail server can land the email before your wait call and the old snapshot filters it out. 6. For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard, `wonda device type --text ""` replaces the selected text. `wonda device key --code 4` dismisses the keyboard when done. **Consent-like taps** — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something. Before starting an automation that may hit these, ask the user once in chat whether to auto-accept them. If they say yes, tap through without pausing; if they say no, stop at each one and confirm. This does not apply to CAPTCHAs or "prove you're human" puzzles — always hand those off via `wonda device stream` (see next section). **Rate-limit signals** — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with `wonda device stream ` (see next section). Don't click through puzzles yourself. ### Voice cloning Clone a voice from a 10s+ audio clip and use it in TTS. Hard limit: 20 cloned voices per account. Cost: $1.50 per clone. ```bash # Clone from a local file (auto-uploads to media library first) wonda voice create "Andu" --file ./sample.mp3 --description "My voice" # Clone from existing wonda media wonda voice create "Brand" --media-id # Optional source-audio preprocessing wonda voice create "Clean" --file ./raw.wav --noise-reduction --normalize-volume # List cloned voices (each row reports isExpired and expiresInDays) wonda voice list # One voice wonda voice get # Rename / re-describe (local only, no provider call) wonda voice update --name "New Name" --description "..." # Delete wonda voice delete ``` **Use a cloned voice in TTS** by passing the `providerVoiceId` from `voice get` as `voiceId` to `/audio/speech`: ```bash wonda audio speech "Hello world" \ --model minimax-speech-2-8-hd \ --params '{"voiceId":""}' ``` **7-day expiry**: cloned voices that haven't been used in TTS within 7 days are automatically expired. Running TTS with a cloned voice automatically refreshes its expiry. Idle voices that lapse must be re-cloned ($1.50 again). ### Credentials vault Persist logins created on external platforms (Instagram, TikTok, Twitter, etc.) so they can be reused on the next run. Passwords are AES-256-GCM encrypted with a server-side key and only decrypted on `get`. ```bash # Create wonda credentials create --website instagram.com --username myhandle \ --email me@example.com --password-stdin <<< "hunter2" \ --metadata '{"signup_source":"wonda-email"}' # List (passwords omitted) wonda credentials list --website instagram.com # Get full record including decrypted password wonda credentials get # Update any field (use --password-stdin to rotate; --username "" to clear) wonda credentials update --username newhandle # Delete wonda credentials delete # Fetch + record why you're using it in one call — POST, not GET, because # it writes a 'used' event with the reason. Prefer this over `get` whenever # you can articulate the reason. wonda credentials use --reason "instagram signup flow" # See recent events (created / used / rotated / updated) for audit wonda credentials events ``` Fields: `website` (required — typed input like `insta` is canonicalized to `instagram.com`), `username`, `email`, `password` (required), `metadata` (arbitrary JSON). At least one of `username` / `email` must be present. Multiple records per `(website, username)` are allowed — dedupe on your side if you need to. **Event log**: every `credentials get`/`use`, `create`, password rotate, and other updates are recorded as events on the credential (actor: `cli` | `web` | `system`). Use `credentials events ` or the web UI's history icon to audit. The event log is append-only and cascades on credential delete. ### Handing off to a human If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use `wonda device stream ` — returns a `playerUrl` signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward. ```bash wonda device stream # → { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" } ``` ### Global output flags All commands support these output control flags: - `--json` — Force JSON output (auto-enabled when stdout is piped) - `--quiet` — Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting - `-o ` — Download output to file (implies `--wait`) - `--fields status,outputs` — Select specific JSON fields - `--jq '.outputs[0].media.url'` — Filter JSON output with a jq expression ## How to think about content creation You are a marketing director with access to a full production toolkit. Before touching any tool, think: 1. **What product category?** (beauty, food, tech, fashion, fitness, etc.) 2. **What format performs for this category?** (UGC memes for everyday products, cinematic for luxury, before/after for transformations, testimonial for services) 3. **What's the hook?** (relatable scenario, surprising twist, aspirational lifestyle, social proof) 4. **What specific scene?** (not "product on table" but "person discovering the product in a funny situation") ## Decision flow When asked to create content, follow this order: ### Step 1: Gather context ```bash wonda brand # Brand identity, colors, products, audience wonda analytics instagram # What content performs well wonda scrape social --handle @competitor --platform instagram --wait # Competitive research (if relevant) # Cross-platform research (if relevant) wonda x search "topic OR keyword" # Find conversations on X/Twitter wonda x user-tweets @competitor # Competitor's recent tweets wonda reddit search "topic" --sort top --time week # Reddit discussions wonda reddit feed marketing --sort hot # Subreddit trends wonda linkedin search "topic" --type COMPANIES # LinkedIn company/people research wonda linkedin profile competitor-vanity-name # LinkedIn profile intel ``` ### Step 2: Check content skills Content skills are step-by-step guides for common content types. Each skill tells you exactly which models, prompts, and editing operations to use — and in what order. ALWAYS check skills before building from scratch. ```bash wonda skill list # Browse all content skills wonda skill get # Full step-by-step guide for a skill ``` **Full skill index:** | Slug | Description | Input | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | | product-video | Product/scene video — prompt library for all categories | optional product image | | ugc-talking | Talking-head UGC — single clip, two-angle PIP, or 20s+ with B-roll | optional reference | | ugc-reaction-batch | Batch TikTok-native UGC reactions with viral strategy | optional product image | | tiktok-ugc-pipeline | Scrape viral reel → generate 5 UGC → post as drafts | reel or TikTok URL | | ugc-dance-motion | Dance/motion transfer | image + video | | marketing-brain | Marketing strategy brain — hooks, visuals, ads | user brief | | reddit-subreddit-intel | Scrape top posts, analyze virality, generate ideas | subreddit + product | | twitter-influencer-search | Find X influencers and amplifiers | competitor/niche keywords | | tiktok-slideshow-carousel | 3-slide TikTok carousel — hook, bridge, product reveal | app screenshot + audience | | creative-static-ads | Single-frame static ad images — 6 conversion pillars, 8 archetypes, 8 psychological hooks | product + optional image | | ffmpeg | All local ffmpeg recipes — trim, audio swap, captions, social formats, scene split, silence cut, frame extraction, analysis artifacts | local video path or mediaId | | image-edit | All image edit paths — img2img, background removal, crop, text overlay, vectorize | image mediaId or local path | | remotion-local-render | Render editorPipeline blueprint steps locally via @remotion/renderer | manifest JSON + editor job id | **If a skill matches** → `wonda skill get `, read it, adapt to context, execute each step. **If no skill matches** → build from scratch (Step 3). ### Step 2.5: Decide whether finishing should be local Not every media task should go back through Wonda editing. Use this routing rule: - Use `wonda` for AI generation, AI transcription/alignment, scraping, publishing, hosted transitions, and workflows that need media IDs or remote jobs. - Use local `ffmpeg` for deterministic transforms on files you already have or can download: trim, crop/scale/pad, concat (merging multiple clips), replace audio, extract audio/frame, reverse, normalize for delivery, burn captions, split scenes, cut silence, and build analysis artifacts. **Always merge clips locally** — server-side merge can hang for 30+ minutes once any input exceeds ~7MB. When a task starts from a Wonda media ID but the actual edit is deterministic, move it to local files first: ```bash wonda media download -o ./input.mp4 ``` Before any local ffmpeg work: ```bash which ffmpeg which ffprobe ffmpeg -version ffprobe -v error -show_format -show_streams -of json ./input.mp4 ``` Font rule for local caption/text work: - Prefer an explicit font file path over a family name. - Never assume a font exists. Check first with `fc-match`, `fc-list`, `/System/Library/Fonts`, `/Library/Fonts`, `~/Library/Fonts`, or `/usr/share/fonts`. - If the task is mainly local finishing/captions/formatting/splitting/artifact extraction, check the `ffmpeg` skill before inventing commands. - `wonda edit video` renders locally by default for single-video ops (`trim`, `crop`, `speed`, `volume`, `textOverlay`, `animatedCaptions` with supplied captions, `editAudio`). The server returns a manifest; the CLI runs `@remotion/renderer` against a CloudFront-hosted bundle, uploads the output, and finalizes the editor_job. No flag needed. Pass `--render-server` only to force Lambda. Multi-video ops (`overlay`, `splitScreen`, `splitScenes`, `motionDesign`) auto-reject with a 400 — the CLI will tell you to use `--render-server`. **`merge` is also rejected locally, but do NOT fall back to `--render-server` — use the local `ffmpeg -f concat` recipe in the `ffmpeg` skill** (server merge can hang for 30+ minutes on inputs >~7MB). See the `remotion-local-render` content skill for the full Remotion recipe (including the STT-free TikTok-style caption flow via `wonda alignment extract-timestamps` → `--caption-segments`). Default local export target unless the user asked otherwise: ```bash -c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -movflags +faststart -c:a aac -b:a 192k ``` Always pass `-y` as the first flag so the command auto-overwrites the output. `ffmpeg` prompts interactively when the output path exists and agent shells hang on that prompt until timeout. ### Step 3: Build from scratch (chain endpoints) When no skill matches, chain individual CLI commands. Each step produces an output that feeds into the next. **Single asset:** ```bash wonda generate image --model gpt-image-2 --prompt "..." --aspect-ratio 9:16 --wait -o out.png # --params '{"quality":"high"}' — auto/low/medium/high (default auto) # --negative-prompt "..." — override what to exclude (models like cookie have good defaults) # --seed — pin the seed for reproducible results (model-dependent) wonda generate video --model seedance-2 --prompt "..." --duration 5 --params '{"quality":"high"}' --wait -o out.mp4 wonda generate text --model --prompt "..." --wait wonda generate music --model suno-music --prompt "upbeat lo-fi" --wait -o music.mp3 ``` **Audio (speech, transcription, dialogue):** ```bash # Text-to-speech wonda audio speech --model elevenlabs-tts --prompt "Your script here" \ --params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait -o speech.mp3 # elevenlabs-tts always requires a voiceId param # Common voice: Rachel (female) "21m00Tcm4TlvDq8ikWAM" # Transcribe audio/video to text wonda audio transcribe --model elevenlabs-stt --attach $MEDIA --wait # Multi-speaker dialogue wonda audio dialogue --model elevenlabs-dialogue --prompt "Speaker A: Hi! Speaker B: Hello!" \ --wait -o dialogue.mp3 ``` **Audio AI operations (direct-inference, NOT editor ops):** ```bash # Denoise / dereverberate speech wonda audio enhance --model replicate-resemble-enhance --attach $MEDIA \ --params '{"denoise":true,"chunkSeconds":10}' --wait -o enhanced.wav # Split a track into voice and instrumental stems wonda audio extract-voice --model replicate-demucs --attach $MEDIA \ --wait -o vocals.wav ``` DO NOT use `wonda edit video --operation enhanceAudio` or `--operation voiceExtractor` — those paths are deprecated. They still work but emit a warning, and they route through the heavier editor_job pipeline for no functional reason. **Add animated captions to a video:** The `animatedCaptions` operation handles everything in one step — it extracts audio, transcribes for word-level timing, and renders animated word-by-word captions onto the video. ```bash # Generate a video with speech audio VID_JOB=$(wonda generate video --model seedance-2 --prompt "..." --duration 5 --aspect-ratio 9:16 --params '{"quality":"high"}' --wait --quiet) VID_MEDIA=$(wonda jobs get inference $VID_JOB --jq '.outputs[0].media.mediaId') # Add animated captions (single step) wonda edit video --operation animatedCaptions --media $VID_MEDIA \ --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \ --wait -o final.mp4 ``` The video's original audio is preserved. Do NOT replace the audio with TTS — Sora already generated the speech. **Transitions (effects pipelines on a single video):** ```bash wonda transitions presets # List built-in presets (JSON) wonda transitions operations # Grouped by category (analysis/effect/...) wonda transitions operations --json # Full per-param metadata wonda transitions llms # Full reference (presets + ops + dependencies) wonda transitions run --media $VID --preset flash_glow --wait -o out.mp4 # Or send an agent-generated timeline of clips (inline JSON): wonda transitions run --media $VID \ --clips '[{"layer_type":"video","start_frame":0,"end_frame":60}]' --wait -o out.mp4 # Or from a file (handy for long agent timelines): wonda transitions run --media $VID --clips ./timeline.json --wait -o out.mp4 wonda transitions job # Poll a transition job ``` Use exactly one of `--preset` or `--clips`. Requires a full (logged-in) account. **Always read `wonda transitions llms` first when composing a clips timeline.** It documents the detect/segment/effect dependencies, which ops need masks, and the full clip-spec shape (layer types, tracks, effects, transforms). **Preset variables (`variables` block).** Each preset declares the template variables it accepts under `variables` in `wonda transitions presets`. Each entry has `name`, `description`, and `required`. Required variables MUST be supplied or the job is rejected with a 400 — no more silent skipping. Pass them with `--var name=value` (repeatable) or, for the common `prompt` case, the `--prompt` shortcut: ```bash # flash_glow_prompted requires { prompt } wonda transitions run --media $VID --preset flash_glow_prompted \ --prompt "woman in white dress" --wait -o out.mp4 # text_behind_person requires { prompt, text } wonda transitions run --media $VID --preset text_behind_person \ --var prompt="the person" --var text="HELLO WORLD" --wait -o out.mp4 # Numeric-typed vars: bare digits are decoded as numbers, "true"/"false" as # bools, everything else stays a string. Presets that compare frame indices # numerically (border_frame, marquee_text, quick_motion_text, bg_remove_scale) # need this — quoting an int turns it back into a string. wonda transitions run --media $VID --preset border_frame \ --var exit_start_frame=200 --var exit_end_frame=251 --wait -o out.mp4 ``` The `prompt` variable is a **detection text query** describing which subject to mask, fed to SAM3 to produce per-frame segmentation masks. Not a content-generation prompt. Building a custom `--clips` timeline that needs detection masks? Add a clip with `layer_type: "video"` and a `mask: {layer_type: "mask", analysis_steps: [{name: segment, params: {prompt: "..."}}]}`. SAM3 handles both detection and segmentation in one step from the prompt, so no separate `detect` step is needed. **Multi-scene presets (`requiresMultiScene: true`).** Some presets use scene-aware logic and expect a video with multiple cuts/scenes. Check `requiresMultiScene` in `wonda transitions presets`. If true, feeding a single continuous shot will produce only one scene and the effect may look underwhelming. Combine clips first or use a video with natural cuts. **Tweaking preset params.** Two cases depending on the preset's response shape from `wonda transitions presets --json`: 1. **Clip-shape preset** (response has `clips:` or `tracks:`): copy the JSON, edit any clip param, and submit as `--clips`. The request body shape matches the response shape. ```bash wonda transitions presets --json | jq '.presets[] | select(.name=="flash_glow_montage") | .clips' > /tmp/clips.json # edit /tmp/clips.json wonda transitions run --media $VID --clips "$(cat /tmp/clips.json)" --wait -o out.mp4 ``` 2. **Step-shape preset** (response has `steps:`): these are call-only. Submit by name with `--preset` and accept the published defaults; param tweaking awaits migration to clip-shape. Affected presets today: `bg_remove_scale`, `bullet_time`, `chromatic_aberration`, `psychedelic`, `vhs_fisheye`, `diagonal_wipe`, `nostalgic_summer`, `speed_ramp_transition`. **Output URL paths differ by job type:** - Inference jobs (generate, audio): `.outputs[0].media.url` and `.outputs[0].media.mediaId` - Editor jobs (edit): `.outputs[0].url` and `.outputs[0].mediaId` ## Model waterfall ### Image Default: `gpt-image-2`. OpenAI's flagship — strongest prompt adherence, best text-in-image, high-fidelity edits via reference images. Handles 1-4 reference images. Quality tiers: `auto` (default), `low`, `medium`, `high` — pass via `--params '{"quality":"high"}'`. Caps at 1536px output. For img2img editing specifically (change, add/remove, restyle, bg-remove, crop, text overlay, vectorize), use `wonda skill get image-edit` — it has the full edit-specific decision tree. Pick something else only when one of these applies: - User explicitly requests another model - **More than 4 reference images** → `nano-banana-2` (gpt-image-2 caps at 4 refs; nano-banana-2 accepts up to 14). For 1-4 refs, stay on `gpt-image-2`. - Need vector output → `runware-vectorize` - Need background removal → `birefnet-bg-removal` - Cheapest possible / fastest drafts → `z-image` - Need >1536px / true 4K output → `nano-banana-pro` (1K/2K/4K) or `nano-banana-2` (1K/2K/4K). gpt-image-2 caps at 1536px. - gpt-image-2 unavailable / OpenAI down → `nano-banana-2` or `seedream-4-5` or `grok-imagine-pro` - Spicy content → `cookie` (SDXL-based, tag-based or natural language prompts) — **ONLY select when the user explicitly asks for spicy content. Never auto-select.** **Cookie model (`cookie`):** SDXL with DMD acceleration and hires fix. **Restricted: only use when the user explicitly requests spicy content.** Accepts both danbooru-style tags (`1cat, portrait, soft lighting`) and natural language. Supports `--negative-prompt` (has sensible defaults; override only when needed) and `--seed` for reproducibility. ```bash wonda generate image --model cookie --prompt "1cat, portrait, soft lighting" --wait -o out.png wonda generate image --model cookie --prompt "a woman in a garden, golden hour" \ --negative-prompt "ugly, blurry, watermark" --seed 42 --wait -o out.png ``` ### Video Default: `seedance-2` (duration 5/10/15s, default 5s, quality: high). Escalation: - Quality complaint or different style → `sora2` or `sora2pro` - Max single-clip duration is **15s** for Seedance 2, **20s** for Sora → for longer content, stitch multiple clips via merge - Veo (`veo3_1`, `veo3_1-fast`) is available but NOT in the default waterfall. Only pick Veo when the user explicitly asks for Veo by name. **Image-to-video routing (MANDATORY when attaching a reference image):** - Person/face visible in the **reference image** → MUST use `kling_3_pro` (preserves identity better for faces) - No person in reference image → use `seedance-2` - **Text-to-video (no reference image):** Seedance 2 generates people fine. This rule ONLY applies when you `--attach` an image. **Kling model family:** - `kling_3_pro` — Text-to-video and image-to-video, supports start/end images, custom elements (@Element1, @Element2), 3-15s duration, 16:9/9:16/1:1 - `kling_2_6_pro` — General purpose, 5-10s, 16:9/9:16/1:1, text-to-video and image-to-video - `kling_2_6_motion_control` — Motion transfer: requires both a reference image AND a reference video, recreates the video's motion with the image's appearance - `kling2_5-pro` — Budget Kling option, 5-10s, supports first/last frame images **Kling prompt rules (important):** Kling's prompt field caps at **2,500 characters** and Kling responds poorly to Sora-style structured briefs (`SCENE:` / `SUBJECT:` / `MOTION:` / `BANNED LOOK:` section headers). In that format Kling latches onto atmosphere nouns and silently drops the central subject (verified empirically: the same 2,842-char Sora-style prompt that rendered correctly on Sora 2 Pro and Seedance 2 produced no phone at all on Kling — even when trimmed to 2,250 chars). When escalating Seedance → Kling, or targeting Kling directly, **rewrite the prompt as short natural-language prose (~1,000–1,500 chars)** and **lead with the hero subject in the opening sentence** rather than burying it inside a `SUBJECT:` block. Do NOT pass a Sora-formatted prompt through to Kling unchanged. **Other video models:** - `grok-imagine-video` — xAI video generation, 5-15s, supports 7 aspect ratios including 4:3 and 3:2 - `topaz-video-upscale` — Upscale video resolution (1-4x factor, supports fps conversion) - `sync-lipsync-v2-pro` — Legacy lipsync for user-supplied video + audio pairs. Inferior to native-audio generation and almost never the right choice for new content. See the "Lip sync" section for rules. Seedance family (DEFAULT video model, watermarks automatically removed): - `seedance-2` — Base Seedance 2.0 (T2V/I2V, 5-15s, high=standard/basic=fast) - `seedance-2-omni` — Multi-reference generation (images, audio refs) - `seedance-2-video-edit` — Edit existing video via text prompt **Video durations:** Accepted `--duration` values vary by model. Check with `wonda capabilities` or `wonda models info `. ### Audio - Music: `suno-music` (set `--params '{"instrumental":true}'` for no vocals) - Text-to-speech: `elevenlabs-tts` — only for explicit narrator/voice-over asks over silent footage. Do NOT use to "make a UGC character talk" — Sora / Sora 2 Pro / Veo 3.1 / Kling 3 / Seedance 2 generate native synced speech in any language, which looks and sounds far better. Always set voiceId in params. Default female voice: `--params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}'` (Rachel). - Transcription: `elevenlabs-stt` - Multi-speaker dialogue: `elevenlabs-dialogue` - Enhance audio (clean up noisy speech): `replicate-resemble-enhance` via `wonda audio enhance` — denoise + dereverberate. Use when a voice recording sounds muffled, echoey, or has background noise. NOT a general "sounds better" button; if the source is already clean this can soften it. - Extract voice (isolate vocals / split stems): `replicate-demucs` via `wonda audio extract-voice` — splits into voice and instrumental tracks. Use to pull a speaker or singer off a track, or to isolate the music behind a vocal. **Native synced speech (preferred over TTS + lipsync):** Sora, Sora 2 Pro, Veo 3.1, Kling 3, and Seedance 2 all generate dialogue in any language directly inside the video, with mouth movements baked in. Put the line (and language) in the video model's `--prompt`. Never chain `elevenlabs-tts` → `sync-lipsync-v2-pro` to fake speech over a silent generation. ## Characters Characters are reusable saved combos (image + optional voice audio) you can mention in Kling prompts with `@name`. The server auto-injects the image as `start_image` and the audio (with its Kling `voice_id`) as `voice_audio` whenever a Kling model is selected. Name rules: must start with a letter, 1–31 chars, alphanumeric + `_`/`-`. **From a Kling clip** — extract a frame + voice from a generation you like: ```bash VID=$(wonda generate video --model kling_3_pro --prompt "young man, grey tshirt, talking to camera" --wait --quiet) VID_MEDIA=$(wonda jobs get inference $VID --jq '.outputs[0].media.mediaId') wonda character from-media alex --source $VID_MEDIA --frame-ms 2500 wonda generate video --model kling_3_pro --prompt "@alex welcomes viewers to the channel" --wait -o alex-welcome.mp4 ``` **From scratch** — generate a portrait and a TTS sample, then bind them: ```bash IMG=$(wonda generate image --model nano-banana-2 --prompt "young woman, studio portrait" --wait --quiet) IMG_MEDIA=$(wonda jobs get inference $IMG --jq '.outputs[0].media.mediaId') AUD=$(wonda audio speech --model elevenlabs-tts --prompt "Hi, this is me" --params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait --quiet) AUD_MEDIA=$(wonda jobs get inference $AUD --jq '.outputs[0].media.mediaId') wonda character create maya --image $IMG_MEDIA --audio $AUD_MEDIA ``` List / inspect / update / delete: `wonda character list`, `wonda character get `, `wonda character update --audio $NEW`, `wonda character delete `. Only one character with audio can be referenced per Kling generation. ## Prompt writing rules Follow this waterfall top-to-bottom. Use the FIRST matching rule and stop. 1. **PASSTHROUGH** — If the user says "use my exact prompt" / "verbatim" / "no enhancements" → copy their words exactly. Zero modifications. 2. **IMAGE-TO-VIDEO** — When a source image feeds into a video model, describe MOTION ONLY. The model can see the image. Do NOT describe the image content. - Good: `"gentle breathing motion, camera slowly pushes in, atmospheric lighting shifts"` - Bad: `"Two cats on a lavender background breathing softly"` (describes the image) 3. **EMPTY PROMPT (from scratch)** — Use the user's exact request as the prompt. Do NOT add style descriptors, lighting, composition, or mood. - User says "create an image of a cat with sunglasses" → prompt: `"create an image of a cat with sunglasses"` - Do NOT enhance to `"A playful orange tabby wearing oversized reflective sunglasses, studio lighting, shallow depth of field"` 4. **NON-EMPTY PROMPT (adapting a template)** — Keep the structure and style, only swap content to match the user's request. Keep prompts literal and constraint-heavy. ## Aspect ratio rules Three cases, no exceptions: 1. User specifies a ratio → use it: `--aspect-ratio 16:9` 2. User doesn't mention ratio → explicitly set `--aspect-ratio 9:16` for social content (UGC, TikTok, Reels, Stories). Portrait is the default for any social/marketing video. 3. Editing existing media → use `--aspect-ratio auto` to preserve source dimensions **UGC and social content is ALWAYS portrait (9:16).** If someone asks for a TikTok, Reel, Story, or UGC video, always use `--aspect-ratio 9:16`. Landscape is only for YouTube, presentations, or when explicitly requested. **Square (1:1)** is supported by all Kling models and some image models — use for Instagram feed posts when requested. ## Common chaining patterns These patterns show how to compose multi-step pipelines by chaining CLI commands. Each step's output feeds into the next. > **No need to download and re-upload between steps.** Every generation and edit > produces a media ID in its output. Pass that ID directly to the next command > via `--media` or `--audio-media`. Use `--jq '.outputs[0].media.mediaId'` > for inference jobs and `--jq '.outputs[0].mediaId'` for editor jobs. > Only use `-o ` on the FINAL step to download the finished output. ### Animate an image to video ```bash MEDIA=$(wonda media upload ./product.jpg --quiet) # No person in image → Seedance 2 wonda generate video --model seedance-2 --prompt "camera slowly pushes in, product rotates" \ --attach $MEDIA --duration 5 --params '{"quality":"high"}' --wait -o animated.mp4 # Person in image → Kling (ONLY when attaching a reference image with a person) wonda generate video --model kling_3_pro --prompt "the person turns and smiles" \ --attach $MEDIA --duration 5 --wait -o person.mp4 ``` ### Replace audio on a video (TTS voiceover or music) ```bash # Generate TTS TTS_JOB=$(wonda audio speech --model elevenlabs-tts --prompt "The script" \ --params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait --quiet) TTS_MEDIA=$(wonda jobs get inference $TTS_JOB --jq '.outputs[0].media.mediaId') # Mix onto video (mute original, full voiceover) wonda edit video --operation editAudio --media $VID_MEDIA --audio-media $TTS_MEDIA \ --params '{"videoVolume":0,"audioVolume":100}' --wait -o with-voice.mp4 ``` Only use this when you need to REPLACE the video's audio. Sora, Sora 2 Pro, Veo 3.1, Kling 3, and Seedance 2 all generate native synced speech in any language — don't replace it with TTS unless the user explicitly asks for a different voiceover. Never reach for this step to "add speech" to a UGC/talking-head clip; put the dialogue in the video model's prompt instead. ### Add static text overlay Static overlays (meme text, "chat did i cook", etc.) use smaller font sizes than captions. They're ambient, not meant to dominate the frame. ```bash wonda edit video --operation textOverlay --media $VID_MEDIA \ --prompt-text "chat, did i cook" \ --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"top-center","sizePercent":66,"fontSizeScale":0.5,"strokeWidth":4.5,"paddingTop":10}' \ --wait -o with-text.mp4 ``` **Featured textOverlay + animatedCaptions presets.** `wonda edit {video,image,audio}` accepts `--preset ` (scoped to `--operation`). `--params` fields override preset values on key collisions. `textOverlay` (static, top-centered): - `TikTok White Highlight` — black text on a slightly rounded white box. - `TikTok Black Highlight` — white text on a slightly rounded black box. - `TikTok Red Highlight` — white text on a slightly rounded red (`#E14135`) box. `animatedCaptions` (STT-driven, bottom-centered): - `TikTok White Captions` — black text, white highlight on the active word. - `TikTok Black Captions` — white text, black highlight on the active word. - `TikTok Red Captions` — white text, red (`#E14135`) highlight on the active word. ```bash wonda edit video --operation textOverlay \ --preset "TikTok Red Highlight" --media \ --params '{"text":"YOUR HEADLINE"}' --wait -o ./out.mp4 ``` Image `textOverlay` requires `--render-server`; video renders locally by default. **Font sizing guide:** - Static overlays: `sizePercent: 66`, `fontSizeScale: 0.5`, `strokeWidth: 4.5` - Animated captions: `sizePercent: 80`, `fontSizeScale: 0.8`, `strokeWidth: 2.5`, `highlightColor: rgb(252, 61, 61)` - Font: `TikTok Sans SemiCondensed` for both ### Add animated captions (word-by-word with timing) The `animatedCaptions` operation extracts audio, transcribes, and renders animated word-by-word captions — all in one step. ```bash wonda edit video --operation animatedCaptions --media $VIDEO_MEDIA \ --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \ --wait -o with-captions.mp4 ``` For quick static captions (no timing, just text on screen), use `textOverlay` with `--prompt-text`: ```bash wonda edit video --operation textOverlay --media $VIDEO_MEDIA \ --prompt-text "Summer Sale - 50% Off" \ --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80}' \ --wait -o captioned.mp4 ``` ### Add background music ```bash MUSIC_JOB=$(wonda generate music --model suno-music \ --prompt "upbeat lo-fi hip hop, warm vinyl crackle" --wait --quiet) MUSIC_MEDIA=$(wonda jobs get inference $MUSIC_JOB --jq '.outputs[0].media.mediaId') wonda edit video --operation editAudio --media $VID_MEDIA --audio-media $MUSIC_MEDIA \ --params '{"videoVolume":100,"audioVolume":30}' --wait -o with-music.mp4 ``` ### Editor output chaining When chaining multiple editor operations (e.g., editAudio → animatedCaptions → textOverlay), extract the media ID from each editor job output and pass it to the next step. Note the jq path differs from inference jobs: ```bash # Inference jobs: .outputs[0].media.mediaId # Editor jobs: .outputs[0].mediaId EDIT_JOB=$(wonda edit video --operation editAudio --media $VID --audio-media $AUDIO \ --params '{"videoVolume":0,"audioVolume":100}' --wait --quiet) STEP1_MEDIA=$(wonda jobs get editor $EDIT_JOB --jq '.outputs[0].mediaId') CAP_JOB=$(wonda edit video --operation animatedCaptions --media $STEP1_MEDIA \ --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' --wait --quiet) STEP2_MEDIA=$(wonda jobs get editor $CAP_JOB --jq '.outputs[0].mediaId') wonda edit video --operation textOverlay --media $STEP2_MEDIA \ --prompt-text "Hook text" --params '{"position":"top-center","fontFamily":"TikTok Sans SemiCondensed","sizePercent":66,"fontSizeScale":0.5,"strokeWidth":4.5}' --wait -o final.mp4 ``` ### Merge multiple clips **Always merge locally with ffmpeg.** Server-side merge (`wonda edit video --operation merge`) can hang for 30+ minutes once any input exceeds ~7MB. Download every Wonda media ID, then concat. Stream-copy is fast but requires matching codec/profile/resolution; fall back to re-encode if it errors: ```bash wonda media download $CLIP1 -o /tmp/clip-1.mp4 wonda media download $CLIP2 -o /tmp/clip-2.mp4 wonda media download $CLIP3 -o /tmp/clip-3.mp4 cat > /tmp/concat.txt <` and the CLI downloads each rendered clip + a `plan.json`. Auth: requires the `clippingEnabled` PostHog feature flag in prod; local dev bypasses automatically. **Source — never pass YouTube URLs to `--url`.** The flag exists on the CLI but the underlying `--url` flow shells out to `yt-dlp` on the **video-worker container** (Cloud Run / GCP datacenter IP). YouTube blocks datacenter IPs with the "Sign in to confirm you're not a bot" challenge and the worker has no cookie store, so YouTube ingest fails at progress 0.05 with that error and the LLM hold has to be released. For YouTube, always download locally and upload first: ```bash yt-dlp -o /tmp/source.mp4 \ -f "bv*[ext=mp4][height<=720]+ba[ext=m4a]/b[ext=mp4][height<=720]" \ --merge-output-format mp4 "" MEDIA=$(wonda media upload /tmp/source.mp4 --quiet) ``` `--url` is fine for **direct mp4 URLs** (no JS, no anti-bot cookies). ```bash # Plan only — fast, no render wonda clipping --media $MEDIA --brief "the most controversial moments" --dry-run --wait # Full pipeline: select + render + download wonda clipping --media $MEDIA \ --brief "the most controversial moments" \ --caption-preset "TikTok Red Captions" \ --hook auto \ --wait --output ./clips/ # Filter by speaker (uses ElevenLabs diarization labels) wonda clipping --media $MEDIA --speaker SPEAKER_00 --wait --output ./clips/ # Speaker rename for readable rationales wonda clipping --media $MEDIA --speaker Joe \ --speaker-map '{"SPEAKER_00":"Joe","SPEAKER_01":"Guest"}' --wait --output ./clips/ # Tune count and durations — pick a target length with a tolerance wonda clipping --media $MEDIA --brief "punchy one-liners" \ --count 5 --duration 20 --tolerance 5 --wait --output ./clips/ # Or specify an explicit min/max range instead (mutually exclusive # with --duration/--tolerance) wonda clipping --media $MEDIA --brief "punchy one-liners" \ --count 5 --min-duration 8 --max-duration 30 --wait --output ./clips/ # Auto-pick FX preset per clip from a catalog wonda clipping --media $MEDIA --auto-preset \ --preset-catalog '[{"slug":"flash_glow","description":"glow + scene flash"},{"slug":"text_glow","description":"per-word text glow"}]' \ --wait --output ./clips/ ``` Job-status shape (returned by GET `/api/v1/clipping/jobs/{id}`): ```json { "clippingJobId": "...", "status": "succeeded", "stage": "succeeded", "progress": 1, "plan": { "sourceDurationSec": 1800.5, "speakers": ["SPEAKER_00", "SPEAKER_01"], "clips": [ { "start": 12.4, "end": 38.7, "title": "Why he quit the agency", "hookText": "He admits…", "rationale": "Concedes \"the agency model is dead\" then explains why...", "score": 87, "dominantSpeaker": "SPEAKER_00", "reframeMode": "blur-fill", "preset": null, "mediaId": "uuid-of-rendered-clip", "url": "https://storage.googleapis.com/.../clip.mp4" } ] }, "error": null } ``` ## Editor operations reference | Operation | Inputs | Key Params | | ------------------ | --------------------------- | ----------------------------------------------------------------------------- | | `animatedCaptions` | video_0 | fontFamily, position, sizePercent, fontSizeScale, strokeWidth, highlightColor | | `textOverlay` | video_0 + prompt | fontFamily, position, sizePercent, fontSizeScale, strokeWidth | | `editAudio` | video_0 + audio_0 | videoVolume (0-100), audioVolume (0-100) | | `merge` | video_0..video_4 | Handle order = playback order | | `overlay` | video_0 (bg) + video_1 (fg) | position, resizePercent | | `splitScreen` | video_0 + video_1 | targetAspectRatio (16:9 or 9:16) | | `trim` | video_0 | trimStartMs, trimEndMs (milliseconds) | | `splitScenes` | video_0 | mode (split/omit), threshold, outputSelection | | `speed` | video_0 | speed (multiplier: 2 = 2x faster) | | `extractAudio` | video_0 | Extracts audio track | | `reverseVideo` | video_0 | Plays backwards | | `skipSilence` | video_0 | maxSilenceDuration (default 0.03) | | `imageCrop` | video_0 | aspectRatio | | `textOverlay` | video_0 (image) | Same as video textOverlay — works on images, outputs image (png/jpg) | Valid textOverlay fonts: Inter, Montserrat, Bebas Neue, Oswald, TikTok Sans, TikTok Sans Condensed, TikTok Sans SemiCondensed, TikTok Sans SemiExpanded, TikTok Sans Expanded, TikTok Sans ExtraExpanded, Nohemi, Poppins, Raleway, Anton, Comic Cat, Gavency Valid positions: top-left, top-center, top-right, center-left, center, center-right, bottom-left, bottom-center, bottom-right ## Marketing & distribution ```bash # Connected social accounts wonda accounts instagram wonda accounts tiktok # Analytics wonda analytics instagram wonda analytics tiktok wonda analytics meta-ads # Scrape competitors wonda scrape social --handle @nike --platform instagram --wait wonda scrape social-status # Get results of a social scrape wonda scrape ads --query "sneakers" --country US --wait wonda scrape ads --query "sneakers" --country US --search-type keyword \ --active-status active --sort-by impressions_desc --period last30d \ --media-type video --max-results 50 --wait wonda scrape ads-status # Get results of an ads search # Download a single reel or TikTok video SCRAPE=$(wonda scrape video --url "https://www.instagram.com/reel/ABC123/" --wait --quiet) # → returns scrape result with mediaId in the media array # Publish wonda publish instagram --media --account --caption "New drop" wonda publish instagram --media --account --caption "..." --alt-text "..." --product IMAGE --share-to-feed wonda publish instagram-carousel --media ,, --account --caption "..." wonda tiktok creator-info --account # Live privacy options + comment/duet/stitch defaults wonda publish tiktok --media --account --caption "New drop" --privacy PUBLIC_TO_EVERYONE wonda publish tiktok --media --account --caption "..." --privacy PUBLIC_TO_EVERYONE \ --disable-comment --commercial-disclose --brand-organic wonda publish tiktok-carousel --media , --account --caption "..." \ --privacy PUBLIC_TO_EVERYONE --cover-index 0 # History wonda publish history instagram --limit 10 wonda publish history tiktok --limit 10 # Browse media library wonda media list --kind image --limit 20 wonda media info ``` ### X/Twitter Supports reads, writes, and social graph. ```bash # Auth setup (run `wonda x auth --help` for details) wonda x auth set --auth-token --ct0 wonda x auth set --account burner --auth-token <...> --ct0 <...> # multi-account wonda x auth check # Read wonda x search "sneakers" -n 20 # Search tweets wonda x user @nike # User profile wonda x user-tweets @nike -n 20 # User's recent tweets wonda x read # Single tweet wonda x replies # Replies to a tweet wonda x thread # Full thread (author's self-replies) wonda x home # Home timeline (--following for Following tab) wonda x bookmarks # Your bookmarks wonda x likes # Your liked tweets wonda x following @handle # Who a user follows wonda x followers @handle # A user's followers wonda x lists @handle # User's lists (--member-of for memberships) wonda x list-timeline # Tweets from a list wonda x news --tab trending # Trending topics (tabs: for_you, trending, news, sports, entertainment) # Write (uses internal API — use on secondary accounts) wonda x tweet "Hello world" # Post a tweet wonda x tweet "Hello world" --browser # Full stealth via real browser (Patchright) wonda x tweet "Hello world" --attach ~/clip.mp4 # Attach image/gif/video (up to 4) wonda x reply "Great point" # Reply wonda x like # Like wonda x unlike # Unlike wonda x retweet # Retweet wonda x unretweet # Unretweet wonda x follow @handle # Follow wonda x unfollow @handle # Unfollow # Maintenance wonda x refresh-ids # Refresh cached GraphQL query IDs from X's JS bundles ``` All paginated commands support: `-n `, `--cursor`, `--all`, `--max-pages`, `--delay `. **Tweet modes:** The `tweet` command has two modes: - **Default (API):** X's internal GraphQL (`CreateTweet` for ≤280 chars, `CreateNoteTweet` for long-form Premium). Fast (<1s), supports `--attach` for media. Occasionally fails with error 226 when X rotates query IDs or feature flags — when that happens, recapture via `twitter-tone-research/_artifacts/scripts/capture-ct-bw.mjs` and bump the three knobs in `xclient/`. - **`--browser` (Patchright):** Launches a real undetected Chrome browser, opens x.com compose, types with human-style jitter, clicks Post. Supports `--attach` (image/gif/video, up to 4) — files are driven through the hidden compose input via Playwright's `setInputFiles`, no native picker dialog opens; the script waits for X's upload pipeline to finalize (up to 5 min for video) before submitting. Zero fingerprinting risk. Slower (~10s text, ~30-90s with video) but fully drift-proof — no queryIds, feature flags, or request shape to maintain. Requires: `npm i patchright && npx patchright install chromium`. ### LinkedIn Supports search, profiles, companies, messaging, and engagement. ```bash # Auth setup (run `wonda linkedin auth --help` for details) wonda linkedin auth set --li-at-value --jsessionid-value wonda linkedin auth set --account brand-A --li-at-value <...> --jsessionid-value <...> # multi-account wonda linkedin auth check # Read wonda linkedin me # Your identity wonda linkedin search "data engineer" --type PEOPLE # Search (types: PEOPLE, COMPANIES, ALL) wonda linkedin profile johndoe # View profile (vanity name or URL) wonda linkedin company google # View company page wonda linkedin conversations # List message threads wonda linkedin messages # Read messages in a thread wonda linkedin notifications -n 20 # Recent notifications wonda linkedin connections # Your connections wonda linkedin reactions # Reactions with reactor profiles + type # Write wonda linkedin connect --message "Hey!" # Send connection request with note wonda linkedin connect -m "Hey!" --browser # Full stealth via real browser (Patchright) wonda linkedin like # Like a post wonda linkedin unlike # Remove a like wonda linkedin send-message "Hi!" # Send a message wonda linkedin post "Excited to announce..." # Create a post wonda linkedin delete-post # Delete a post ``` Paginated commands support: `-n `, `--start`, `--all`, `--max-pages`, `--delay `. **Connection request modes:** The `connect` command has two modes: - **Default (API):** Voyager REST API with fingerprint mitigations (profile visit → drawer warm-up → connect). Fast (~3s), supports notes via `customMessage`. - **`--browser` (Patchright):** Launches a real undetected Chrome browser, navigates to the profile, and clicks through the UI. Zero fingerprinting risk. Slower (~10s) but fully safe. Use this as a fallback if you want full protection. Requires: `npm i patchright && npx patchright install chromium`. ### Reddit Auth is optional — many reads work unauthenticated. Supports search, feeds, users, posts, trending, and chat/DMs. ```bash # Auth setup (run `wonda reddit auth --help` for details) wonda reddit auth set --session-value wonda reddit auth set --account burner-1 --session-value # multi-account wonda reddit auth check # Read (works without auth) wonda reddit search "AI video" --sort top --time week # Search posts (sort: relevance, hot, top, new, comments) wonda reddit subreddit marketing # Subreddit info wonda reddit feed marketing --sort hot # Subreddit posts (sort: hot, new, top, rising) wonda reddit user spez # User profile wonda reddit user-posts spez --sort top # User's posts wonda reddit user-comments spez # User's comments wonda reddit post -n 50 # Post with comments wonda reddit trending --sort hot # Popular/trending posts # Read (requires auth) wonda reddit home --sort best # Your home feed # Write (requires auth) wonda reddit submit marketing --title "Great tool" --text "Check this out..." # Self post wonda reddit submit marketing --title "Great tool" --url "https://..." # Link post wonda reddit comment --text "Nice post!" # Reply wonda reddit vote --up # Upvote (--down, --unvote) wonda reddit subscribe marketing # Subscribe (--unsub to unsubscribe) wonda reddit save # Save a post or comment wonda reddit unsave # Unsave wonda reddit delete # Delete your post or comment ``` Paginated commands support: `-n `, `--after `, `--all`, `--max-pages`, `--delay `. ### Reddit chat / DMs Direct messaging via the Matrix protocol. Requires a separate chat token. ```bash # Auth setup (run `wonda reddit chat auth-set --help` for details) wonda reddit chat auth-set # Read wonda reddit chat inbox # List DM conversations with latest messages wonda reddit chat messages -n 50 # Fetch messages from a room wonda reddit chat all-rooms # List ALL joined rooms (not limited to sync window) # Write wonda reddit chat send --text "Hey!" # Send a DM (mimics browser typing behavior) # Management wonda reddit chat accept-all # Accept all pending chat requests wonda reddit chat refresh # Force-refresh the Matrix chat token ``` **Important**: The chat token expires every ~24h. The CLI auto-refreshes on use, but if it expires fully, re-run `auth-set`. Rate limit DM sends to 15-20/day with varied text to avoid detection. The `send` command includes a typing delay (1-5s) to mimic human behavior. ## Workflow & discovery ### Video analysis Analyze a video to extract a composite frame grid (visual) and audio transcript (text). Useful for understanding video content before creating variations. Requires a **full account** (not anonymous) and costs credits based on video duration (ElevenLabs STT pricing). If the video was just uploaded and is still normalizing, the CLI auto-retries until the media is ready. ```bash # Analyze a video — returns composite grid image + transcript ANALYSIS_JOB=$(wonda analyze video --media $VIDEO_MEDIA --wait --quiet) # The job output contains: # - compositeGrid: image showing 24 evenly-spaced frames # - transcript: full text of any speech # - wordTimestamps: word-level timing [{word, start, end}] # - videoMetadata: {width, height, durationMs, fps, aspectRatio} # Download the composite grid for visual inspection wonda analyze video --media $VIDEO_MEDIA --wait -o /tmp/grid.jpg # Get just the transcript wonda analyze video --media $VIDEO_MEDIA --wait --jq '.outputs[] | select(.outputKey=="transcript") | .outputValue' ``` **Error handling**: 402 = insufficient credits, 409 = media still processing (CLI auto-retries). ### Chat (AI assistant) Interactive chat sessions for content creation — the AI handles generation, editing, and iteration. ```bash wonda chat create --title "Product launch" # New session wonda chat list # List sessions (--limit, --offset) wonda chat messages # Get messages wonda chat send --message "Create a UGC reaction video" wonda chat send --message "Edit it" --media wonda chat send --message "..." --aspect-ratio 9:16 --quality-tier max wonda chat send --message "..." --style wonda chat send --message "..." --passthrough-prompt # Use exact prompt, no AI enhancement ``` ### Jobs & runs ```bash wonda jobs get inference # Inference job status wonda jobs get editor # Editor job status wonda jobs get publish # Publish job status wonda jobs wait inference --timeout 20m # Wait for completion wonda run get # Run status wonda run wait --timeout 30m # Wait for run completion ``` ### Discovery ```bash wonda models list # All available models wonda models info # Model details and params wonda operations list # All editor operations wonda operations info # Operation details wonda capabilities # Full platform capabilities wonda pricing list # Pricing for all models wonda pricing estimate --model seedance-2 --prompt "..." # Cost estimate wonda style list # Available visual styles wonda topup # Top up credits (opens Stripe checkout) ``` ### Editing audio & images ```bash # Edit audio wonda edit audio --operation --media --wait -o out.mp3 ``` For any image edit (crop, text overlay, img2img, background removal, vectorize) pull the dedicated skill: `wonda skill get image-edit`. ### Alignment (timestamp extraction) ```bash wonda alignment extract-timestamps --model --attach --wait ``` ## Quality tiers | Tier | Image Model | Resolution | Video Model | When | | -------- | ---------------------------------------------- | --------------------------------------- | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- | | Standard | `gpt-image-2` (auto) — alt: `nano-banana-2` 1K | 1024×1024 / 1024×1536 (gpt) / 1K (nano) | `seedance-2` (high, 5s) | Default. gpt-image-2 for strongest prompt adherence + text-in-image; nano-banana-2 for faster Gemini iteration with multi-reference support. | | High | `gpt-image-2` (high) — alt: `nano-banana-2` 2K | 1024×1024 / 1024×1536 (gpt) / 2K (nano) | `seedance-2` (high, 15s) | Crisp output. Use `--params '{"quality":"high"}'` on gpt-image-2 or bump `--params '{"resolution":"2K"}'` on nano-banana-2. Also offer `sora2pro`. | | Max | `nano-banana-pro` 4K — alt: `nano-banana-2` 4K | 4K | `seedance-2` (high, 15s) | True 4K (gpt-image-2 caps at 1536px). Use `--params '{"resolution":"4K"}'`. Also offer `sora2pro` (1080p) for video. | ## Troubleshooting | Symptom | Likely Cause | Fix | | -------------------------------- | --------------------------------------------- | ------------------------------------------------------ | | Sora rejected image | Person in image | Switch to `kling_3_pro` | | Video adds objects not in source | Motion prompt describes elements not in image | Simplify to camera movement and atmosphere only | | Text unreadable in video | AI tried to render text in generation | Remove text from video prompt, use textOverlay instead | | Hands look wrong | Complex hand actions in prompt | Simplify to passive positions or frame to exclude | | Style inconsistent across series | No shared anchor | Use same reference image via `--attach` | | Changes to step A not in step B | Stale render | Re-run all downstream steps | ## Timing expectations - Image: 30s - 2min - Video (Sora): 2 - 5min - Video (Sora Pro): 5 - 10min - Video (Veo 3.1): 1 - 3min - Video (Kling): 3 - 8min - Video (Grok): 2 - 5min - Music (Suno): 1 - 3min - TTS: 10 - 30s - Editor operations: 30s - 2min - Lip sync: 1 - 3min - Video upscale: 2 - 5min ## Error recovery - **Unknown model**: `wonda models list` - **No API key**: `wonda auth login` or set `WONDA_API_KEY` env var - **Job failed**: `wonda jobs get inference ` for error details - **Bad params**: `wonda models info ` for valid params - **Timeout**: `wonda jobs wait inference --timeout 20m` - **Insufficient credits (402)**: `wonda topup` to add credits