--- name: minimax-music-gen description: > Use when user wants to generate music, songs, or audio tracks. Triggers on any request involving music creation, song writing, lyrics generation, audio production, or covers. Also triggers when user provides lyrics and wants them turned into a song, or describes a mood/scene and wants background music. Supports multilingual triggers — match equivalent phrases in any language. Do NOT use for music playback of existing files, music theory questions, or music recommendation without generation. license: MIT metadata: version: "1.1" category: creative --- # MiniMax Music Generation Skill Generate songs (vocal or instrumental) using the MiniMax Music API. Supports two creation modes: **Basic** (one-sentence-in, song-out) and **Advanced Control** (edit lyrics, refine prompt, plan before generating). ## Prerequisites - **mmx CLI** (required): Music generation uses the `mmx` command-line tool. **Check if installed:** ```bash command -v mmx && mmx --version || echo "mmx not found" ``` **Install (requires Node.js):** ```bash npm install -g mmx-cli ``` **Authenticate (first time only):** ```bash mmx auth login --api-key ``` The API key can be obtained from [MiniMax Platform](https://platform.minimaxi.com/). Credentials are saved to `~/.mmx/credentials.json` and persist across sessions. **Verify:** ```bash mmx quota show ``` - **Audio player** (recommended): `mpv`, `ffplay`, or `afplay` (macOS built-in) for local playback. `mpv` is preferred for its interactive controls. ## CLI Tool This skill uses the `mmx` CLI for all music generation: - **Music Generation**: `mmx music generate` — model: `music-2.6-free` - Supports `--lyrics-optimizer` to auto-generate lyrics from prompt - Supports `--instrumental` for instrumental tracks - Supports `--lyrics` for user-provided lyrics - Structured params: `--genre`, `--mood`, `--vocals`, `--instruments`, `--bpm`, `--key`, `--tempo`, `--structure`, `--references` - **Cover**: `mmx music cover` — model: `music-cover-free` - Takes reference audio via `--audio-file ` or `--audio ` - `--prompt` describes the target cover style **Agent flags**: Always add `--quiet --non-interactive` when calling mmx from agents. **Pipeline**: - Vocal: `User description -> mmx music generate --lyrics-optimizer -> MP3` - Instrumental: `User description -> mmx music generate --instrumental -> MP3` - Cover: `Source audio + style -> mmx music cover -> MP3` ## Storage All generated music is saved to `~/Music/minimax-gen/`. Create the directory if it doesn't exist. Files are named with a timestamp and a short slug derived from the prompt: `YYYYMMDD_HHMMSS_.mp3` --- ## Language & Interaction Detect the user's language from their first message and respond in that language for the entire session. This applies to all interaction text, questions, confirmations, and feedback prompts. **User-facing text localization rule**: - ALL text shown to the user — including preview labels, field names, confirmations, status messages, playback info, feedback prompts, **and the prompt/description preview** — MUST be fully translated into the user's language. - The **API prompt** sent to the model should always be written in English for best generation quality. However, when previewing the prompt to the user, show a localized description in the user's language instead of the raw English prompt. The English prompt is an internal implementation detail — the user does not need to see it. - The templates below are written in English as reference. At runtime, translate every label and message into the user's detected language. **Lyrics language rule**: - Default lyrics language = the user's language. A Chinese-speaking user gets Chinese lyrics; an English-speaking user gets English lyrics. - Only generate lyrics in a different language if the user **explicitly** requests it. - When a different lyrics language is needed, embed it naturally into the vocal or genre description in the prompt. For example, instead of appending "with Korean lyrics", use "featuring a Korean female vocalist" or specify a genre that implies the language (e.g., "K-pop", "J-rock", "Mandopop", "Latin pop"). --- ## Workflow ### Step 0: Detect Intent Parse the user's message to determine: 1. **Song category**: vocal (with lyrics), instrumental (no vocals), or cover 2. **Creation mode preference**: did they provide detailed requirements (Advanced) or a casual one-liner (Basic)? If ambiguous, ask using this decision tree: ``` Q1: What type of music? - Vocal (with lyrics) - Instrumental (no vocals) - Cover Q2: Creation mode? - Basic — one-line description, auto-generate - Advanced — edit lyrics, refine prompt, plan ``` If the user gives a clear one-liner like "make me a sad piano piece", skip the questions — infer instrumental + basic mode and proceed. --- ### Step 1: Basic Mode **Goal**: User provides a short description, the skill auto-generates everything, then calls the API. 1. **Expand the description into a prompt**: Take the user's one-liner and expand it into a rich music prompt. Refer to the **Prompt Writing Guide** appendix at the end of this document for style vocabulary, genre/instrument references, and prompt structure. **The API prompt should always be written in English** for best generation quality, regardless of the user's language. Follow this pattern: ``` A [mood] [BPM optional] [genre] song, featuring [vocal description], about [narrative/theme], [atmosphere], [key instruments and production]. ``` 2. **Show the user a preview** before generating. Translate all labels AND the prompt description into the user's language. The English prompt is only used internally when calling the API — the user should never see it. Example template (English reference — localize everything at runtime): ``` About to generate: Type: Vocal / Instrumental Description: indie folk, melancholy, acoustic guitar, gentle female voice Lyrics: Auto-generated (--lyrics-optimizer) Confirm? (press enter to confirm, or tell me what to change) ``` 3. **Call mmx**: Generate the music directly. --- ### Step 2: Advanced Control Mode **Goal**: User has full control over every parameter before generation. 1. **Lyrics phase**: - If user provided lyrics: display them formatted with section markers, ask for edits. The final lyrics will be passed via `--lyrics` to mmx. - If user has a theme but no lyrics: will use `--lyrics-optimizer` to auto-generate. - Support iterative editing: "change the second chorus" -> only rewrite that section. - User can also write lyrics themselves and pass via `--lyrics`. 2. **Prompt phase**: - Generate a recommended prompt based on the lyrics' mood and content. - Present it as editable tags the user can add/remove/modify. - Refer to the **Prompt Writing Guide** appendix for the full vocabulary. 3. **Advanced planning** (optional, offer but don't force): - Song structure: verse-chorus-verse-chorus-bridge-chorus or custom - BPM suggestion (encode in prompt as tempo descriptor) - Reference style: "something like X style" -> map to prompt tags - Vocal character description 4. **Final confirmation**: Show complete parameter summary, then generate. --- ### Step 3: Call mmx Generate music using the mmx CLI: **Vocal with auto-generated lyrics:** ```bash mmx music generate \ --prompt "" \ --lyrics-optimizer \ --genre "" --mood "" --vocals "" \ --instruments "" --bpm \ --out ~/Music/minimax-gen/.mp3 \ --quiet --non-interactive ``` **Vocal with user-provided lyrics:** ```bash mmx music generate \ --prompt "" \ --lyrics "" \ --genre "" --mood "" --vocals "" \ --out ~/Music/minimax-gen/.mp3 \ --quiet --non-interactive ``` **Instrumental (no vocal):** ```bash mmx music generate \ --prompt "" \ --instrumental \ --genre "" --mood "" --instruments "" \ --out ~/Music/minimax-gen/.mp3 \ --quiet --non-interactive ``` Use structured flags (`--genre`, `--mood`, `--vocals`, `--instruments`, `--bpm`, `--key`, `--tempo`, `--structure`, `--references`, `--avoid`, `--use-case`) to give the API fine-grained control instead of cramming everything into `--prompt`. Display a progress indicator while waiting. Typical generation takes 30-120 seconds. --- ### Step 4: Playback After generation, detect an available audio player and play the file. **Detect player:** ```bash command -v mpv || command -v ffplay || command -v afplay ``` **Play based on detected player (in priority order):** | Player | Command | Controls | |--------|---------|----------| | `mpv` (preferred) | `mpv --no-video ~/Music/minimax-gen/.mp3` | space = pause/resume, q = quit, left/right = seek | | `ffplay` | `ffplay -nodisp -autoexit ~/Music/minimax-gen/.mp3` | q = quit | | `afplay` (macOS) | `afplay ~/Music/minimax-gen/.mp3` | Ctrl+C = stop | | None found | Do not attempt playback | Show file path only | After starting playback, tell the user (localize all text): ``` Now playing: .mp3 Saved to: ~/Music/minimax-gen/.mp3 ``` Do NOT show playback controls (e.g. keyboard shortcuts) — they don't work in this environment since the player runs in the background. If no player is found (localize all text): ``` No audio player detected. File saved to: ~/Music/minimax-gen/.mp3 Tip: Install mpv for the best playback experience (brew install mpv). ``` --- ### Step 5: Feedback & Iteration After playback, ask for feedback: ``` How was this song? 1. Love it, keep it! 2. Not quite, adjust and regenerate 3. Fine-tune lyrics/style then regenerate 4. Don't want it, start over ``` Based on feedback: - **Satisfied**: Done. Mention the file path again. - **Adjust & regenerate**: Ask what to change (prompt? lyrics? style?), apply edits, re-run generation. Keep the old file with a `_v1` suffix for comparison. - **Fine-tune**: Enter Advanced Control Mode with the current parameters pre-filled. - **Delete & restart**: Remove the file, go back to Step 0. --- ## Cover Mode Generate a cover version of a song based on reference audio. Model: `music-cover-free`. **Reference audio requirements**: mp3, wav, flac — duration 6s to 6min, max 50MB. If no lyrics are provided, the original lyrics are extracted via ASR automatically. ### Workflow When the user selects Cover mode: 1. Ask for the source audio — a local file path or URL 2. Ask for the target cover style (e.g., "acoustic cover, stripped-down, intimate vocal") 3. Optionally ask for custom lyrics or lyrics file ### Commands **Cover from local file:** ```bash mmx music cover \ --prompt "" \ --audio-file \ --out ~/Music/minimax-gen/.mp3 \ --quiet --non-interactive ``` **Cover from URL:** ```bash mmx music cover \ --prompt "" \ --audio \ --out ~/Music/minimax-gen/.mp3 \ --quiet --non-interactive ``` **With custom lyrics (text):** ```bash mmx music cover \ --prompt "