# FL ChatterBox

High-quality text-to-speech nodes for ComfyUI powered by ResembleAI's Chatterbox models. Features voice cloning, multilingual synthesis, paralinguistic expressions, and voice conversion.

[![Chatterbox](https://img.shields.io/badge/Chatterbox-Original%20Repo-blue?style=for-the-badge&logo=github&logoColor=white)](https://github.com/resemble-ai/chatterbox)
[![Patreon](https://img.shields.io/badge/Patreon-Support%20Me-F96854?style=for-the-badge&logo=patreon&logoColor=white)](https://www.patreon.com/Machinedelusions)

![Workflow Preview](assets/workflow_preview.png)

## Features

- **Zero-Shot Voice Cloning** - Clone any voice from a few seconds of reference audio
- **3 TTS Models** - Standard, Turbo (faster), and Multilingual variants
- **23 Languages** - Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish
- **Paralinguistic Tags** - Express emotions with tags like `[laugh]`, `[sigh]`, `[gasp]`, `[chuckle]` (Turbo model)
- **Voice Conversion** - Transform one voice to sound like another
- **Dialog Synthesis** - Multi-speaker conversations with up to 4 voices
- **Model Caching** - Keep models loaded between runs for faster iteration

## Nodes

| Node | Description |
|------|-------------|
| **FL Chatterbox TTS** | Standard high-quality text-to-speech with voice cloning |
| **FL Chatterbox Turbo TTS** | Faster GPT2-based TTS with paralinguistic tag support |
| **FL Chatterbox Multilingual TTS** | 23-language TTS with voice cloning |
| **FL Chatterbox VC** | Voice conversion - transform source audio to target voice |
| **FL Chatterbox Dialog TTS** | Multi-speaker dialog synthesis with up to 4 voices |

## Installation

### ComfyUI Manager
Search for "FL ChatterBox" and install.

### Manual
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git
cd ComfyUI_Fill-ChatterBox
pip install -r requirements.txt
```

### Optional: Watermarking Support
```bash
pip install resemble-perth
```
**Note**: The `resemble-perth` package may have compatibility issues with Python 3.12+. Nodes will function without watermarking if import fails.

## Quick Start

1. Add **FL Chatterbox TTS** (or Turbo/Multilingual variant)
2. Enter your text in the text field
3. Optionally connect reference audio for voice cloning
4. Set `keep_model_loaded = True` for faster subsequent runs
5. Generate!

### Turbo Model with Expressions
```
Hello there! [laugh] Isn't this amazing? [sigh] I just love text to speech.
```
Supported tags: `[laugh]`, `[sigh]`, `[gasp]`, `[chuckle]`, `[cough]`, `[sniff]`, `[groan]`, `[shush]`, `[clear throat]`

## Models

| Model | Speed | Languages | Notes |
|-------|-------|-----------|-------|
| Standard | Normal | English | Highest quality |
| Turbo | Fast | English | Paralinguistic tags, GPT2-based |
| Multilingual | Normal | 23 languages | Cross-lingual voice cloning |

Models download automatically on first use to `ComfyUI/models/chatterbox/`.

## Parameters

### TTS Parameters
| Parameter | Range | Description |
|-----------|-------|-------------|
| `exaggeration` | 0.25-2.0 | Emotion intensity |
| `cfg_weight` | 0.2-1.0 | Pace/classifier-free guidance |
| `temperature` | 0.05-5.0 | Randomness in generation |
| `seed` | 0-4.29B | Reproducible generation |
| `keep_model_loaded` | bool | Cache model between runs |

### Turbo Parameters
| Parameter | Range | Description |
|-----------|-------|-------------|
| `temperature` | 0.05-2.0 | Randomness in generation |
| `top_k` | 1-5000 | Top-k sampling |
| `top_p` | 0.1-1.0 | Nucleus sampling threshold |
| `repetition_penalty` | 1.0-3.0 | Token repetition penalty |

## Limitations

- Maximum audio length: ~40 seconds per generation
- Reference audio: Minimum 5-6 seconds recommended
- Turbo paralinguistic tags: English only

## Requirements

- Python 3.10+
- 8GB RAM minimum (16GB+ recommended)
- NVIDIA GPU with 8GB+ VRAM recommended
- CPU and Mac MPS supported

## License

MIT License - See [Chatterbox repo](https://github.com/resemble-ai/chatterbox) for model licenses.

## Changelog

### 2025-12-28
- Added Turbo TTS node (faster, GPT2-based with paralinguistic tags)
- Added Multilingual TTS node (23 languages)
- Improved model caching using module-level globals
- Centralized model downloads to `ComfyUI/models/chatterbox/`

### 2025-07-24
- Added Dialog TTS node for multi-speaker conversations (up to 4 speakers)
- Extended all nodes with seed parameters for reproducible generation
- Isolated audio track outputs per speaker

### 2025-06-24
- Added seed parameter for reproducible generation
- Made Perth watermarking optional for Python 3.12+ compatibility

### 2025-05-31
- Added persistent model loading and loading bar
- Added Mac MPS support
- Native inference code (removed chatterbox-tts library dependency)