# AbstractVoice (full)
> Modular Python voice I/O for AI apps. The base install is remote-first (OpenAI/OpenAI-compatible TTS/STT by default); `abstractvoice[apple]` and `[gpu]` install the local Piper/Supertonic/faster-whisper/audio/cloning stack.

This is an expanded, agent-oriented repo index. For the short version, see `llms.txt`.

Sources of truth:
- Human entry flow: `README.md`
- Supported integrator contract: `docs/api.md`
- Internal map + diagrams: `docs/architecture.md`

Requires Python `>=3.9` (see `pyproject.toml`).

Offline-first rule: the REPL runs with `allow_downloads=False`, so downloads should be explicit (prefetch/download commands + clear errors).

Format: follows the [llms.txt spec](https://llmstxt.org/) (`## Optional` is skippable when context is tight).

Positioning:
- AbstractVoice is a **voice I/O library** (TTS/STT + optional cloning). It does not implement an agent loop or an LLM server.
- In the AbstractFramework ecosystem, AbstractVoice is meant to be used with **AbstractCore** via the capability plugin entry point (`abstractvoice/integrations/abstractcore_plugin.py`), including TTS/STT and voice catalog discovery.
- The REPL (`abstractvoice/examples/cli_repl.py`) and local FastAPI web UI (`abstractvoice/examples/web_ui.py`) are **examples/smoke-test harnesses**. They share a minimal OpenAI-compatible LLM HTTP client (`abstractvoice/examples/llm_provider.py`) for local providers such as Ollama/LM Studio.
- REPL voice selection is centered on `/voices`; `/profile`, `/tts_voice`, and `/setvoice` remain compatibility/direct commands.
- Plain `abstractvoice` is remote/OpenAI by default; `abstractvoice[all-apple]` and `abstractvoice[all-gpu]` include Supertonic and the REPL/web examples default to Supertonic via their install-aware `auto` resolver.
- One-shot TTS is available without entering the REPL: `abstractvoice --provider openai --model tts-1 --voice alloy --prompt "Hello" --output hello.wav`.
- Library `VoiceManager()` remains remote-first unless a provider is selected explicitly: `tts_engine="openai"` and `stt_engine="openai"` by default. OpenAI uses `https://api.openai.com/v1` and reads `OPENAI_API_KEY` or `remote_api_key=...`; compatible endpoints use `OPENAI_BASE_URL` or `remote_base_url=...`.
- Local inference/listening is explicit: install `abstractvoice[apple]` / `abstractvoice[gpu]`, or granular extras such as `abstractvoice[supertonic,stt,audio-io]`, and select `tts_engine="supertonic"` / `stt_engine="faster_whisper"` (providers). Supertonic is the recommended local base TTS path; OmniVoice is the recommended/default local cloning backend.
- The AbstractCore plugin exposes clean provider/model/voice discovery for TTS, STT, and cloning. Use `available_providers()` for provider lists, `list_models(...)` / `list_tts_models(...)` / `list_stt_models(...)` / `list_cloning_models(...)` for model lists, `list_tts_voices(...)` / `list_cloned_voices(...)` for voices, `get_capability_support(...)` / `find_compatible_models(...)` for feature filtering, `clone(...)` / `clone_voice(...)` for clone creation, and `voice_catalog()` for the richer nested discovery payload.
- Local engine preload/unload is explicit:
  - library: `VoiceManager.preload_tts_engine(...)` / `preload_stt_engine(...)` and `unload_tts_engine()` / `unload_stt_engine()`
  - AbstractCore plugin: `load_resident_model(...)` / `list_resident_models(...)` / `unload_resident_model(...)` (local engines only; remote providers remain configured)
- The STT-only plugin backend (`abstractvoice:stt`) also implements AbstractCore's generic capability discovery contract (`available_providers(task=None)`, `list_models(task=None, provider=...|provider_id=...)`) so `llm.capabilities.*` can query audio discovery without special-casing.
- The plugin accepts both split and combined selectors: `tts(..., provider=\"openai\", model=\"tts-1\", voice=\"alloy\")`, `tts(..., provider=\"openai:tts-1\", voice=\"voice_...\")`, `stt(..., provider=\"faster-whisper\", model=\"large\")`, or `stt(..., provider=\"transformers-asr:Qwen/Qwen3-ASR-1.7B\")`.
- `voice_catalog()` includes `tts_model_variants` and `stt_engine_variants` so AbstractCore can expose clean OpenAI-compatible `/v1/audio/voices`, `/v1/audio/speech/models`, and `/v1/audio/transcriptions/models` discovery endpoints.
- The local web UI has assistant/user voice selectors, browser voice cloning from uploaded or recorded reference audio, message/conversation playback, an example-only `/api/chat` bridge, local `/api/*` routes, and compatible extension routes `/v1/audio/voices` + `/v1/voice/clone`; production `/v1/audio/*` endpoints remain owned by AbstractCore Server.
- Web extras compose directly: `abstractvoice[web]` is lightweight; use `abstractvoice[all-apple]` / `abstractvoice[all-gpu]` for the full platform local lab or `abstractvoice[web,supertonic]` / `abstractvoice[web,omnivoice]` for narrower engine installs.

Quick commands:
```bash
# Tests
python -m pytest -q
# (defaults to skipping `integration`, `model_download`, and `slow` marked tests)

# Bench preload vs cold start (local engines only; requires cached models)
python examples/bench_preload_local_models.py --runs 3

# REPL smoke test (mic is OFF by default; plain install uses OpenAI, all-* uses Supertonic)
abstractvoice --verbose
python -m abstractvoice cli --verbose

# One-shot TTS to file
OPENAI_API_KEY=... abstractvoice --provider openai --model tts-1 --voice alloy --prompt "Hello from AbstractVoice." --output hello.wav

# Local browser smoke test (requires abstractvoice[web])
abstractvoice web --port 5000

# Remote OpenAI-compatible audio from the web example
abstractvoice web --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1

# Fully local REPL after local/full install
abstractvoice --tts-engine supertonic --stt-engine faster_whisper --verbose

# Explicit prefetch (offline-first friendly)
abstractvoice-prefetch --supertonic
abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small
abstractvoice-prefetch --openf5   # optional; requires abstractvoice[cloning]
abstractvoice-prefetch --chroma   # optional; requires abstractvoice[chroma] (GPU-heavy)
abstractvoice-prefetch --audiodit # optional; requires abstractvoice[audiodit]
abstractvoice-prefetch --omnivoice # optional; requires abstractvoice[omnivoice]

# Equivalent explicit downloads
python -m abstractvoice download --piper en
python -m abstractvoice download --supertonic
python -m abstractvoice download --stt small
python -m abstractvoice download --openf5   # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma   # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]
```

Minimal library usage:
```python
from abstractvoice import VoiceManager

vm = VoiceManager(language="en")
vm.speak("Hello from AbstractVoice.")
wav_bytes = vm.speak_to_bytes("Headless TTS.", format="wav")
```
This reads `OPENAI_API_KEY`; pass `remote_api_key=...` explicitly when env vars are not available.

Optional dependency groups (see `pyproject.toml`):
- `abstractvoice[apple]` / `abstractvoice[gpu]`: platform local stack (Piper, Supertonic, faster-whisper, audio I/O, AEC where supported, and local cloning/TTS engines gated by Python markers)
- `abstractvoice[piper]`: local Piper TTS only
- `abstractvoice[supertonic]`: local Supertonic 3 ONNX TTS only
- `abstractvoice[stt]`: local faster-whisper STT path
- `abstractvoice[audio-io]`: microphone/playback/VAD dependencies
- `abstractvoice[cloning]`: explicit F5‑TTS cloning backend (still requires explicit OpenF5 artifact downloads)
- `abstractvoice[chroma]`: Chroma-4B runtime deps (GPU-heavy; artifacts still prefetched explicitly)
- `abstractvoice[audiodit]`: LongCat-AudioDiT TTS + prompt-audio cloning backend (heavy; artifacts still prefetched explicitly)
- `abstractvoice[omnivoice]`: OmniVoice omnilingual TTS + recommended/default cloning backend (very heavy; artifacts still prefetched explicitly)
- `abstractvoice[aec]`: optional acoustic echo cancellation (true barge-in)
- `abstractvoice[audio-fx]`: speed change without pitch change (librosa)
- `abstractvoice[openai]`: no-op intent extra for remote OpenAI/OpenAI-compatible audio (uses core requests)
- `abstractvoice[openai-compatible]`: no-op intent extra for remote compatible audio
- `abstractvoice[remote]`: no-op intent extra for remote compatible audio
- `abstractvoice[web]`: local FastAPI web example; production HTTP audio endpoints live in AbstractCore Server
- `abstractvoice[dev]`: dev/test tooling

## Start here
- [README](README.md): install + smoke tests + AbstractFramework ecosystem notes
- [Docs index](docs/README.md): map of user-facing vs internal docs
- [Getting started](docs/getting-started.md): recommended setup + first smoke tests
- [API (integrator contract)](docs/api.md): supported surface area (incl. integrations)
- [FAQ](docs/faq.md): cache/history reset + common install/runtime issues
- [Known issues](docs/known-issues.md): active release caveats and workarounds
- [Installation](docs/installation.md): platform notes + extras
- [REPL guide](docs/repl_guide.md): end-to-end validation + commands

## AbstractFramework integrations
- [AbstractFramework](https://github.com/lpalbou/AbstractFramework): umbrella ecosystem
- [AbstractCore](https://github.com/lpalbou/abstractcore): capabilities/plugins
- [AbstractRuntime](https://github.com/lpalbou/abstractruntime): runtime + ArtifactStore
- [AbstractCore capability plugin](abstractvoice/integrations/abstractcore_plugin.py): registers voice/audio backends via entry points; exposes provider/model/voice discovery, clone creation, plus TTS/STT execution with split or combined `provider:model` selectors
- [AbstractCore tool helpers](abstractvoice/integrations/abstractcore.py): `make_voice_tools(...)` (manual wiring)
- [Artifact store adapter](abstractvoice/artifacts.py): AbstractRuntime-like ArtifactStore adapter (duck-typed)
- [Entry points + scripts](pyproject.toml): `abstractvoice`, `abstractvoice-prefetch`, plugin registration
- [CI workflow](.github/workflows/ci.yml): Python 3.9-3.12 tests + build check
- [Release workflow](.github/workflows/release.yml): tag/manual release to PyPI + GitHub Release
- [Known issues](docs/known-issues.md): current bug tracker mirror for release-facing caveats

## Core architecture (internal)
- [Architecture](docs/architecture.md): component diagram + code map
- [Acronyms](docs/acronyms.md)
- [ADR 0001](docs/adr/0001-local_assistant_out_of_box.md): out-of-box local assistant
- [ADR 0002](docs/adr/0002_barge_in_interruption.md): voice modes + stop phrase + optional AEC
- [ADR 0003](docs/adr/0003_cloning_reference_text_fallback.md): cloning `reference_text` auto-fallback
- [ADR 0004](docs/adr/0004_streaming_and_cancellation_for_cloned_tts.md): streaming + cancellation for cloned TTS
- [ADR 0005](docs/adr/0005_torch_device_and_dtype_policy.md): torch device + dtype selection policy (torch engines)

## Code map (start points)
- [Version source](abstractvoice/_version.py): single release version source
- [Public package entry](abstractvoice/__init__.py): main public exports
- [Module CLI entry](abstractvoice/__main__.py): `python -m abstractvoice ...` (incl. `download`)
- [VoiceManager façade](abstractvoice/voice_manager.py): public import target
- [VoiceManager wiring](abstractvoice/vm/manager.py): constructor + engine selection
- [Voice-mode callbacks](abstractvoice/vm/core.py): behavior while speaking
- [TTS methods + cloning orchestration](abstractvoice/vm/tts_mixin.py)
- [Runtime TTS switching](abstractvoice/vm/tts_mixin.py): `set_tts_engine(...)` resets the base profile to the provider/language default
- [Voice profiles abstraction](abstractvoice/voice_profiles.py): cross-engine `VoiceProfile` ids + metadata
- [Remote audio HTTP helpers](abstractvoice/adapters/openai_compatible_http.py): lightweight requests-based OpenAI-compatible audio client
- [Remote TTS adapter](abstractvoice/adapters/tts_openai_compatible.py): `/audio/speech` + remote profile discovery
- [Remote STT adapter](abstractvoice/adapters/stt_openai_compatible.py): `/audio/transcriptions`
- [TTS delivery mode](abstractvoice/tts/delivery_mode.py): normalize `"buffered"` vs `"streamed"`
- [Text chunking](abstractvoice/tts/text_chunking.py): `split_text_batches(...)` + `TextStreamChunker`
- [Text→audio streaming bridge](abstractvoice/tts/text_to_speech_stream.py): `TextToSpeechStream` (LLM streaming → TTS)
- [Audio chunk smoothing](abstractvoice/audio/fade.py): edge fades + headroom scaling
- [STT + listening methods](abstractvoice/vm/stt_mixin.py)
- [Explicit prefetch tool](abstractvoice/prefetch.py): `abstractvoice-prefetch ...`
- [Local web example](abstractvoice/examples/web_ui.py): browser TTS/STT smoke test

## Engines & adapters
- [Adapters base types](abstractvoice/adapters/base.py)
- [Piper TTS adapter](abstractvoice/adapters/tts_piper.py): voice selection + caching + synthesis
- [Supertonic TTS adapter](abstractvoice/adapters/tts_supertonic.py): fixed-profile ONNX TTS (`M1`-`M5`, `F1`-`F5`)
- [Supertonic runtime](abstractvoice/supertonic/runtime.py): internal ONNX loader + explicit `--supertonic` prefetch; no external Supertonic SDK dependency
- [AudioDiT TTS adapter (optional)](abstractvoice/adapters/tts_audiodit.py)
- [AudioDiT runtime + vendored model](abstractvoice/audiodit/): LongCat-AudioDiT implementation (MIT)
- [OmniVoice TTS adapter (optional)](abstractvoice/adapters/tts_omnivoice.py)
- [OmniVoice runtime wrapper (optional)](abstractvoice/omnivoice/runtime.py)
- [OmniVoice cloning engine (optional)](abstractvoice/cloning/engine_omnivoice.py)
- [faster-whisper STT adapter](abstractvoice/adapters/stt_faster_whisper.py)
- [Remote OpenAI-compatible TTS adapter](abstractvoice/adapters/tts_openai_compatible.py)
- [Remote OpenAI-compatible STT adapter](abstractvoice/adapters/stt_openai_compatible.py)
- [TTS engine wrapper](abstractvoice/tts/adapter_tts_engine.py): wraps adapters + uses `NonBlockingAudioPlayer`
- [Audio playback](abstractvoice/tts/tts_engine.py): pause/resume/stop + optional audio-fx
- [Mic capture + VAD + stop phrase](abstractvoice/recognition.py)
- [Stop phrase helper](abstractvoice/stop_phrase.py)
- [VAD](abstractvoice/vad/voice_detector.py)
- [AEC (optional)](abstractvoice/aec/webrtc_apm.py)

## Voice profiles + streaming TTS (cross-engine)
- [Preset profile catalogs](abstractvoice/assets/voice_profiles/): shipped presets (engine-specific)
- [TTS delivery mode](abstractvoice/tts/delivery_mode.py): `"buffered"` vs `"streamed"`
- [Text chunking](abstractvoice/tts/text_chunking.py): sentence + soft-boundary segmentation
- [Text→audio streaming bridge](abstractvoice/tts/text_to_speech_stream.py): incremental LLM → TTS pipelining
- [Audio chunk smoothing](abstractvoice/audio/fade.py): mitigate clicks/clipping at chunk boundaries

## Voice cloning (optional; heavy)
- [REPL cloning workflow](docs/repl_guide.md): commands + explicit downloads
- [Voices and licenses](docs/voices-and-licenses.md): licensing caveats for models/voices
- [VoiceCloner + engine dispatch](abstractvoice/cloning/manager.py)
- [Clone store + bundles](abstractvoice/cloning/store.py)
- [F5‑TTS engine](abstractvoice/cloning/engine_f5.py)
- [Chroma engine](abstractvoice/cloning/engine_chroma.py)
- [AudioDiT engine](abstractvoice/cloning/engine_audiodit.py)
- [OmniVoice engine](abstractvoice/cloning/engine_omnivoice.py)
- [Remote cloning bridge](abstractvoice/cloning/engine_remote.py): compatible `/voice/clone` + stored remote voice ids

## Tests
- [tests/](tests/): CI-safe run is `python -m pytest -q` (defaults to skipping `integration`, `model_download`, and `slow` marked tests)
- [Contributing](CONTRIBUTING.md): dev setup + env vars for heavy tests (`ABSTRACTVOICE_RUN_CLONING_TESTS=1`, `ABSTRACTVOICE_RUN_CHROMA_TESTS=1`)
- [Bug report template](.github/ISSUE_TEMPLATE/bug_report.yml): structured issue intake
- [test_voice_cloner_engine_dispatch.py](tests/test_voice_cloner_engine_dispatch.py)
- [test_cloning_reference_text_autofallback.py](tests/test_cloning_reference_text_autofallback.py)
- [test_cloned_tts_cancellation.py](tests/test_cloned_tts_cancellation.py)
- [test_chroma_cloning_integration.py](tests/test_chroma_cloning_integration.py): heavy; gated by env var + optional deps

## Dependencies & licensing
- [pyproject.toml](pyproject.toml): deps, extras, entry points
- [Acknowledgments](ACKNOWLEDGMENTS.md)
- [Security policy](SECURITY.md)
- [License](LICENSE)
- [LongCat AudioDiT license](third_party_licenses/longcat_audiodit_license.txt)
- [Supertonic model notice](third_party_licenses/supertone_supertonic_notice.txt)
- [OmniVoice notice](third_party_licenses/omnivoice_notice.txt)

## Optional
- [Model / voice management](docs/model-management.md)
- [Multilingual notes](docs/multilingual.md)
- [REPL demonstrator](abstractvoice/examples/cli_repl.py) + minimal LLM client (`abstractvoice/examples/llm_provider.py`)
- [Backlog](docs/backlog/): internal planning (not an API contract)
- [Reports](docs/reports/): historical snapshots
- [Voice cloning research notes](docs/voice_cloning_2026.md): non-contract research notes