--- name: ac-tools-setup-voice-mode description: "Installs and configures VoiceMode MCP for voice interactions in the target client. Triggers on keywords: setup voice, voice mode, install voicemode, configure voice" project-agnostic: true allowed-tools: - Bash - Read --- # Setup VoiceMode Install and configure VoiceMode MCP for voice interactions in the target client. ## Steps 1. Install VoiceMode: ```bash uvx voice-mode-install --yes ``` 2. Add the MCP server to the target client: ```bash claude mcp add --scope user voicemode -- uvx --refresh voice-mode ``` 3. Configure local endpoints (Kokoro TTS + Whisper STT): ```bash voicemode config set VOICEMODE_TTS_BASE_URLS http://127.0.0.1:8880/v1 voicemode config set VOICEMODE_STT_BASE_URLS http://127.0.0.1:2022/v1 voicemode config set VOICEMODE_PREFER_LOCAL true voicemode config set VOICEMODE_ALWAYS_TRY_LOCAL true ``` **This is critical.** Without explicit `_BASE_URLS`, the default includes `https://api.openai.com/v1` as fallback, which crashes with `OPENAI_API_KEY` errors even when local services are running. 4. Verify installation: ```bash claude mcp list ``` 5. Test voice mode: - Restart the target client - If the target runtime exposes a VoiceMode tool, use it to verify; otherwise restart the target client and confirm voice input/output there ## First Run Note **Kokoro TTS may take 5+ minutes to load on first run** while it downloads and initializes the model (~111MB). Check status with: ```bash voicemode service kokoro status ``` **Two MCP restarts required:** 1. After initial setup (step 5) 2. After Kokoro model finishes downloading Without the second restart, you may get "OpenAI API key" errors even with local config. ## Configuration Options Edit config with: ```bash voicemode config edit ``` List all options: ```bash voicemode config list ``` ### Key Settings | Setting | Description | |---------|-------------| | `VOICEMODE_PREFER_LOCAL` | Prefer local providers over cloud (true/false) | | `VOICEMODE_ALWAYS_TRY_LOCAL` | Always attempt local providers first (true/false) | | `VOICEMODE_SAVE_AUDIO` | Save audio files (true/false, default: false) | | `VOICEMODE_WHISPER_MODEL` | Whisper model (tiny, base, small, medium, large-v2) | | `VOICEMODE_KOKORO_DEFAULT_VOICE` | Default voice (e.g., af_sky) | | `OPENAI_API_KEY` | Required only for cloud processing | ### Provider Options - **Local-only** (default, recommended): Set `VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1` and `VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1` (no API key needed) - **Cloud-only**: Set `OPENAI_API_KEY` and set URLs to `https://api.openai.com/v1` - **Hybrid** (local-first, cloud fallback): Set `OPENAI_API_KEY` and set URLs to `http://127.0.0.1:8880/v1,https://api.openai.com/v1` (TTS) and `http://127.0.0.1:2022/v1,https://api.openai.com/v1` (STT) ## Troubleshooting - **OpenAI API key error**: Ensure `VOICEMODE_TTS_BASE_URLS` and `VOICEMODE_STT_BASE_URLS` point to local endpoints only (step 3). The `PREFER_LOCAL` flag alone is NOT sufficient — it does not remove OpenAI from the fallback chain - **Kokoro stuck "starting up"**: Wait 5+ mins on first run, or check logs: `voicemode service kokoro logs` - **macOS M3 crash**: Known issue with ggml_metal - use CPU mode - **WSL audio issues**: Install PulseAudio packages - **Slow transcription**: Use GPU acceleration or smaller Whisper model ## Improved Accuracy (Optional) The default `tiny` model is fast but less accurate. For better transcription: | Model | Size | Accuracy | Speed | |-------|------|----------|-------| | tiny | 75MB | ~70% | Fastest | | small | 466MB | ~82% | Fast | | medium | 1.4GB | ~88% | Moderate | ```bash voicemode config set VOICEMODE_WHISPER_MODEL small # or for best accuracy: voicemode config set VOICEMODE_WHISPER_MODEL medium ``` Restart Whisper service after changing: ```bash voicemode service whisper restart ``` ## macOS Metal GPU Acceleration (Optional) For significantly faster transcription on Apple Silicon, convert Whisper to Core ML: ### Prerequisites ```bash # Install whisper.cpp via Homebrew brew install whisper-cpp # Set Whisper directory WHISPER_DIR=~/.voicemode/services/whisper ``` ### Steps **1. Download model** ```bash cd $WHISPER_DIR/models ./download-ggml-model.sh medium ``` **2. Install Python dependencies** ```bash pip3 install torch coremltools openai-whisper ane_transformers ``` **3. Convert to Core ML** ```bash cd $WHISPER_DIR ./models/generate-coreml-model.sh medium ``` **4. Update config** ```bash voicemode config set VOICEMODE_WHISPER_MODEL medium ``` **5. Restart Whisper** ```bash voicemode service whisper restart ``` ### Verification ```bash # Check Core ML model exists ls -la $WHISPER_DIR/models/ggml-medium-encoder.mlmodelc ``` When running, logs should show: `GPU: Metal, Core ML: Enabled` ## Links - GitHub: https://github.com/mbailey/voicemode - Docs: https://voice-mode.readthedocs.io - LiveKit Cloud: https://cloud.livekit.io