# Getting Started with VoiceMode VoiceMode brings voice conversations to AI coding assistants. It works as both an MCP server for Claude Code and as a standalone CLI tool. ## What is VoiceMode? VoiceMode provides: - **MCP Server**: Adds voice tools to Claude Code - no installation needed - **CLI Tool**: Use VoiceMode's tools directly from your terminal - **Local Services**: Optional privacy-focused speech processing ## Quick Start: Using with Claude Code The fastest way to get started is using VoiceMode with Claude Code. ### Installation Install UV package manager (if not already installed), then run the VoiceMode installer: ```bash # Install UV package manager (if not already installed) curl -LsSf https://astral.sh/uv/install.sh | sh # Install VoiceMode and configure services uvx voice-mode-install # Add to Claude Code MCP claude mcp add --scope user voicemode -- uvx --refresh --from voice-mode voicemode-mcp-launcher ``` The installer will: - Install missing system dependencies (FFmpeg, PortAudio, etc.) - Set up your environment for VoiceMode - Offer to install local voice services (Whisper STT and Kokoro TTS) **Alternative UV installation methods:** - **macOS**: `brew install uv` - **With pip**: `pip install uv` Learn more: [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/) ### 2. Configure Your API Key Set your OpenAI API key as an environment variable: ```bash export OPENAI_API_KEY="sk-your-api-key-here" ``` Or add it to your shell configuration file (`~/.bashrc`, `~/.zshrc`, etc.) ### 3. Verify Installation ```bash # Check that VoiceMode is connected claude mcp list ``` You should see `voicemode` in the list of connected servers. ### 4. Configure Permissions (Optional) By default, Claude Code prompts for permission each time VoiceMode tools are used. To enable automatic approval, add to `~/.claude/settings.json`: ```json { "permissions": { "allow": [ "mcp__voicemode__converse", "mcp__voicemode__service" ] } } ``` This allows voice conversations and service management without prompts. For more permission options, see the [Permissions Guide](../guides/permissions.md). ### 5. Start Using Voice In Claude Code, simply type: ``` converse ``` Speak when you hear the chime, and Claude will respond with voice! ## Alternative: Using as a CLI Tool If you want to use VoiceMode from the command line: ### Installation ```bash # Install with pip uv tool install voice-mode # Or install from source in editable mode git clone https://github.com/mbailey/voicemode cd voicemode uv tool install -e . ``` ### Basic Usage ```bash # Set your API key export OPENAI_API_KEY="sk-your-api-key-here" # Start a voice conversation voicemode converse ``` ## Setting Up Local Services (Optional) For complete privacy, you can run voice services locally instead of using OpenAI. ### Quick Setup ```bash # Install local services voicemode service install whisper # Speech-to-text voicemode service install kokoro # Text-to-speech # Start services voicemode service start whisper voicemode service start kokoro # Check status of all services voicemode service status ``` VoiceMode will automatically detect and use these local services when available. ### Enable Auto-Start (Recommended) To have services start automatically at login: ```bash # Enable services to start at boot/login voicemode service enable whisper voicemode service enable kokoro ``` On macOS, this creates launchd agents. On Linux, it creates systemd user services. ### Download Sizes and Requirements | Service | Download Size | Disk Space | First Start Time | |---------|---------------|------------|------------------| | Whisper (tiny) | ~75MB | ~150MB | 30 seconds | | Whisper (base) | ~150MB | ~300MB | 1-2 minutes | | Whisper (small) | ~460MB | ~1GB | 2-3 minutes | | Kokoro TTS | ~350MB | ~700MB | 2-3 minutes | **Recommended**: Whisper base + Kokoro = ~500MB download, ~1GB disk space. ### Waiting for Services After installation, services download models on first start. Wait for them to be ready: ```bash # Wait for Whisper (port 2022) while ! nc -z localhost 2022 2>/dev/null; do sleep 2; done echo "Whisper ready" # Wait for Kokoro (port 8880) while ! nc -z localhost 8880 2>/dev/null; do sleep 2; done echo "Kokoro ready" ``` Learn more: [Whisper Setup Guide](../guides/whisper-setup.md) | [Kokoro Setup Guide](../guides/kokoro-setup.md) ## Configuration VoiceMode works out of the box with sensible defaults. To customize: ### Select Your Voice ```bash # OpenAI voices export VOICEMODE_VOICES="nova,shimmer" # Or Kokoro voices (if using local TTS) export VOICEMODE_VOICES="af_sky,am_adam" ``` Available OpenAI voices: alloy, echo, fable, onyx, nova, shimmer ### Project-Specific Settings Create `.voicemode.env` in your project: ```bash export VOICEMODE_VOICES="af_nova,nova" export VOICEMODE_TTS_SPEED=1.2 ``` Learn more: [Configuration Guide](../guides/configuration.md) ## Troubleshooting ### Voice Not Working in Claude? 1. **Check MCP connection**: ```bash claude mcp list ``` 2. **Verify OPENAI_API_KEY** is set in your MCP configuration 3. Add to your MCP config: ```json "env": { "OPENAI_API_KEY": "sk-...", } ``` ### No Audio Input? ```bash # List audio devices voicemode diag devices # Test TTS and STT voicemode converse ``` ### Service Issues? ```bash # Check service status voicemode service status # All services voicemode service status whisper # Specific service # View logs voicemode service logs whisper -n 50 voicemode service logs kokoro -n 50 # Check if service is responding voicemode service health whisper voicemode service health kokoro ``` ## Running VoiceMode as a Service (Advanced) For remote access or persistent operation, run VoiceMode as a background service: ```bash # Start the VoiceMode HTTP server voicemode service start voicemode # Enable auto-start at boot/login voicemode service enable voicemode # Check all services voicemode service status ``` The HTTP server enables remote access from other machines on your network or via secure tunnels. For security best practices when running remotely, see the [Configuration Guide](../guides/configuration.md#http-server-security). ## Next Steps - **[Configuration Guide](../guides/configuration.md)** - Customize VoiceMode - **[Development Setup](development-setup.md)** - Contribute to VoiceMode - **[Service Guides](../guides/)** - Set up Whisper, Kokoro, or LiveKit - **[CLI Reference](../reference/cli.md)** - All available commands ## Getting Help - **GitHub Issues**: [github.com/mbailey/voicemode/issues](https://github.com/mbailey/voicemode/issues) - **Discord**: Join our community for support Welcome to voice-enabled AI coding! 🎙️