# Speech.sh

A text-to-speech CLI and MCP server using the Groq TTS API (OpenAI-compatible).

## Features

- Convert text to speech with a simple command
- Multiple voice options (troy, austin, hannah, autumn)
- Adjustable speech speed
- Hash-based caching to avoid duplicate API calls (24h auto-cleanup)
- Retry with exponential backoff
- Audio playback via ffplay, mplayer, or VLC
- MCP server for integration with AI assistants (Claude Desktop, Claude Code)

## Quick Start

```bash
git clone https://github.com/j3k0/speech.sh.git
cd speech.sh
export OPENAI_API_KEY="your-groq-api-key"
./speech.sh --text "Hello, world!"
```

### Dependencies

- `curl`, `jq` (for the shell version)
- One audio player: `ffplay` (from ffmpeg), `mplayer`, or `vlc`

## CLI Usage

```bash
# Basic
./speech.sh --text "Hello, world!"

# With options
./speech.sh --text "Hello!" --voice austin --speed 1.2 --verbose
```

### Options

```
-t, --text TEXT       Text to convert to speech (required)
-v, --voice VOICE     Voice to use (default: troy)
-s, --speed SPEED     Speech speed (default: 1.0)
-o, --output FILE     Output file path (default: auto-generated)
-a, --api_key KEY     API key
-m, --model MODEL     TTS model (default: canopylabs/orpheus-v1-english)
-p, --player PLAYER   Audio player: auto, ffmpeg, mplayer, vlc (default: auto)
-r, --retries N       Retry attempts (default: 3)
-T, --timeout N       Timeout in seconds (default: 30)
    --verbose         Enable verbose logging
```

### API Key

Provide your Groq API key in one of three ways (in order of precedence):
1. `--api_key "your-key"`
2. `export OPENAI_API_KEY="your-key"`
3. A file named `API_KEY` in the script's directory

## MCP Server

Two implementations are available:

### Python (recommended)

Uses the [FastMCP](https://github.com/modelcontextprotocol/python-sdk) SDK. Requires Python 3.10+ and `uv`.

```bash
# Setup
uv venv --python python3 .venv
uv pip install --python .venv/bin/python "mcp[cli]" httpx

# Run
OPENAI_API_KEY="your-key" .venv/bin/python server.py
```

#### Claude Desktop / Claude Code configuration

```json
{
  "mcpServers": {
    "speak": {
      "command": "/path/to/speech.sh/.venv/bin/python",
      "args": ["/path/to/speech.sh/server.py"],
      "env": {
        "OPENAI_API_KEY": "your-groq-api-key",
        "SPEECH_VOICE": "troy",
        "SPEECH_SPEED": "1.0",
        "SPEECH_MODEL": "canopylabs/orpheus-v1-english"
      }
    }
  }
}
```

### Shell (legacy)

The original shell-based MCP server (`mcp.sh`). Works in environments without Python but may hit macOS sandboxing issues with Claude Desktop.

```bash
./mcp.sh
```

### MCP Tool

The server exposes a single `speak` tool:

| Parameter | Type   | Required | Default | Description              |
|-----------|--------|----------|---------|--------------------------|
| text      | string | yes      |         | The text to speak        |
| voice     | string | no       | troy    | Voice to use             |
| speed     | number | no       | 1.0     | Speech speed             |

### Environment Variables

| Variable         | Description            | Default                          |
|------------------|------------------------|----------------------------------|
| OPENAI_API_KEY   | Groq API key           | (required)                       |
| SPEECH_VOICE     | Default voice          | troy                             |
| SPEECH_SPEED     | Default speed          | 1.0                              |
| SPEECH_MODEL     | TTS model              | canopylabs/orpheus-v1-english    |
| SPEECH_API_URL   | API endpoint (Python)  | https://api.groq.com/openai/v1/audio/speech |

## Architecture

- **speech.sh** - Shell-based TTS engine (API calls, caching, playback)
- **mcp.sh** - Shell-based MCP wrapper over speech.sh (JSON-RPC 2.0 over stdio)
- **server.py** - Python MCP server, self-contained replacement for both scripts above

## License

GPL