# AbstractVoice (full) > Modular Python voice I/O for AI apps. The base install is remote-first (OpenAI/OpenAI-compatible TTS/STT by default); `abstractvoice[apple]` and `[gpu]` install the local Piper/Supertonic/faster-whisper/audio/cloning stack. This is an expanded, agent-oriented repo index. For the short version, see `llms.txt`. Sources of truth: - Human entry flow: `README.md` - Supported integrator contract: `docs/api.md` - Internal map + diagrams: `docs/architecture.md` Requires Python `>=3.9` (see `pyproject.toml`). Offline-first rule: the REPL runs with `allow_downloads=False`, so downloads should be explicit (prefetch/download commands + clear errors). Format: follows the [llms.txt spec](https://llmstxt.org/) (`## Optional` is skippable when context is tight). Positioning: - AbstractVoice is a **voice I/O library** (TTS/STT + optional cloning). It does not implement an agent loop or an LLM server. - In the AbstractFramework ecosystem, AbstractVoice is meant to be used with **AbstractCore** via the capability plugin entry point (`abstractvoice/integrations/abstractcore_plugin.py`), including TTS/STT and voice catalog discovery. - The REPL (`abstractvoice/examples/cli_repl.py`) and local FastAPI web UI (`abstractvoice/examples/web_ui.py`) are **examples/smoke-test harnesses**. They share a minimal OpenAI-compatible LLM HTTP client (`abstractvoice/examples/llm_provider.py`) for local providers such as Ollama/LM Studio. - REPL voice selection is centered on `/voices`; `/profile`, `/tts_voice`, and `/setvoice` remain compatibility/direct commands. - Plain `abstractvoice` is remote/OpenAI by default; `abstractvoice[all-apple]` and `abstractvoice[all-gpu]` include Supertonic and the REPL/web examples default to Supertonic via their install-aware `auto` resolver. - One-shot TTS is available without entering the REPL: `abstractvoice --provider openai --model tts-1 --voice alloy --prompt "Hello" --output hello.wav`. - Library `VoiceManager()` remains remote-first unless a provider is selected explicitly: `tts_engine="openai"` and `stt_engine="openai"` by default. OpenAI uses `https://api.openai.com/v1` and reads `OPENAI_API_KEY` or `remote_api_key=...`; compatible endpoints use `OPENAI_BASE_URL` or `remote_base_url=...`. - Local inference/listening is explicit: install `abstractvoice[apple]` / `abstractvoice[gpu]`, or granular extras such as `abstractvoice[supertonic,stt,audio-io]`, and select `tts_engine="supertonic"` / `stt_engine="faster_whisper"` (providers). Supertonic is the recommended local base TTS path; OmniVoice is the recommended/default local cloning backend. - The AbstractCore plugin exposes clean provider/model/voice discovery for TTS, STT, and cloning. Use `available_providers()` for provider lists, `list_models(...)` / `list_tts_models(...)` / `list_stt_models(...)` / `list_cloning_models(...)` for model lists, `list_tts_voices(...)` / `list_cloned_voices(...)` for voices, `get_capability_support(...)` / `find_compatible_models(...)` for feature filtering, `clone(...)` / `clone_voice(...)` for clone creation, and `voice_catalog()` for the richer nested discovery payload. - Local engine preload/unload is explicit: - library: `VoiceManager.preload_tts_engine(...)` / `preload_stt_engine(...)` and `unload_tts_engine()` / `unload_stt_engine()` - AbstractCore plugin: `load_resident_model(...)` / `list_resident_models(...)` / `unload_resident_model(...)` (local engines only; remote providers remain configured) - The STT-only plugin backend (`abstractvoice:stt`) also implements AbstractCore's generic capability discovery contract (`available_providers(task=None)`, `list_models(task=None, provider=...|provider_id=...)`) so `llm.capabilities.*` can query audio discovery without special-casing. - The plugin accepts both split and combined selectors: `tts(..., provider=\"openai\", model=\"tts-1\", voice=\"alloy\")`, `tts(..., provider=\"openai:tts-1\", voice=\"voice_...\")`, `stt(..., provider=\"faster-whisper\", model=\"large\")`, or `stt(..., provider=\"transformers-asr:Qwen/Qwen3-ASR-1.7B\")`. - `voice_catalog()` includes `tts_model_variants` and `stt_engine_variants` so AbstractCore can expose clean OpenAI-compatible `/v1/audio/voices`, `/v1/audio/speech/models`, and `/v1/audio/transcriptions/models` discovery endpoints. - The local web UI has assistant/user voice selectors, browser voice cloning from uploaded or recorded reference audio, message/conversation playback, an example-only `/api/chat` bridge, local `/api/*` routes, and compatible extension routes `/v1/audio/voices` + `/v1/voice/clone`; production `/v1/audio/*` endpoints remain owned by AbstractCore Server. - Web extras compose directly: `abstractvoice[web]` is lightweight; use `abstractvoice[all-apple]` / `abstractvoice[all-gpu]` for the full platform local lab or `abstractvoice[web,supertonic]` / `abstractvoice[web,omnivoice]` for narrower engine installs. Quick commands: ```bash # Tests python -m pytest -q # (defaults to skipping `integration`, `model_download`, and `slow` marked tests) # Bench preload vs cold start (local engines only; requires cached models) python examples/bench_preload_local_models.py --runs 3 # REPL smoke test (mic is OFF by default; plain install uses OpenAI, all-* uses Supertonic) abstractvoice --verbose python -m abstractvoice cli --verbose # One-shot TTS to file OPENAI_API_KEY=... abstractvoice --provider openai --model tts-1 --voice alloy --prompt "Hello from AbstractVoice." --output hello.wav # Local browser smoke test (requires abstractvoice[web]) abstractvoice web --port 5000 # Remote OpenAI-compatible audio from the web example abstractvoice web --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1 # Fully local REPL after local/full install abstractvoice --tts-engine supertonic --stt-engine faster_whisper --verbose # Explicit prefetch (offline-first friendly) abstractvoice-prefetch --supertonic abstractvoice-prefetch --piper en abstractvoice-prefetch --stt small abstractvoice-prefetch --openf5 # optional; requires abstractvoice[cloning] abstractvoice-prefetch --chroma # optional; requires abstractvoice[chroma] (GPU-heavy) abstractvoice-prefetch --audiodit # optional; requires abstractvoice[audiodit] abstractvoice-prefetch --omnivoice # optional; requires abstractvoice[omnivoice] # Equivalent explicit downloads python -m abstractvoice download --piper en python -m abstractvoice download --supertonic python -m abstractvoice download --stt small python -m abstractvoice download --openf5 # optional; requires abstractvoice[cloning] python -m abstractvoice download --chroma # optional; requires abstractvoice[chroma] (GPU-heavy) python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit] python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice] ``` Minimal library usage: ```python from abstractvoice import VoiceManager vm = VoiceManager(language="en") vm.speak("Hello from AbstractVoice.") wav_bytes = vm.speak_to_bytes("Headless TTS.", format="wav") ``` This reads `OPENAI_API_KEY`; pass `remote_api_key=...` explicitly when env vars are not available. Optional dependency groups (see `pyproject.toml`): - `abstractvoice[apple]` / `abstractvoice[gpu]`: platform local stack (Piper, Supertonic, faster-whisper, audio I/O, AEC where supported, and local cloning/TTS engines gated by Python markers) - `abstractvoice[piper]`: local Piper TTS only - `abstractvoice[supertonic]`: local Supertonic 3 ONNX TTS only - `abstractvoice[stt]`: local faster-whisper STT path - `abstractvoice[audio-io]`: microphone/playback/VAD dependencies - `abstractvoice[cloning]`: explicit F5‑TTS cloning backend (still requires explicit OpenF5 artifact downloads) - `abstractvoice[chroma]`: Chroma-4B runtime deps (GPU-heavy; artifacts still prefetched explicitly) - `abstractvoice[audiodit]`: LongCat-AudioDiT TTS + prompt-audio cloning backend (heavy; artifacts still prefetched explicitly) - `abstractvoice[omnivoice]`: OmniVoice omnilingual TTS + recommended/default cloning backend (very heavy; artifacts still prefetched explicitly) - `abstractvoice[aec]`: optional acoustic echo cancellation (true barge-in) - `abstractvoice[audio-fx]`: speed change without pitch change (librosa) - `abstractvoice[openai]`: no-op intent extra for remote OpenAI/OpenAI-compatible audio (uses core requests) - `abstractvoice[openai-compatible]`: no-op intent extra for remote compatible audio - `abstractvoice[remote]`: no-op intent extra for remote compatible audio - `abstractvoice[web]`: local FastAPI web example; production HTTP audio endpoints live in AbstractCore Server - `abstractvoice[dev]`: dev/test tooling ## Start here - [README](README.md): install + smoke tests + AbstractFramework ecosystem notes - [Docs index](docs/README.md): map of user-facing vs internal docs - [Getting started](docs/getting-started.md): recommended setup + first smoke tests - [API (integrator contract)](docs/api.md): supported surface area (incl. integrations) - [FAQ](docs/faq.md): cache/history reset + common install/runtime issues - [Known issues](docs/known-issues.md): active release caveats and workarounds - [Installation](docs/installation.md): platform notes + extras - [REPL guide](docs/repl_guide.md): end-to-end validation + commands ## AbstractFramework integrations - [AbstractFramework](https://github.com/lpalbou/AbstractFramework): umbrella ecosystem - [AbstractCore](https://github.com/lpalbou/abstractcore): capabilities/plugins - [AbstractRuntime](https://github.com/lpalbou/abstractruntime): runtime + ArtifactStore - [AbstractCore capability plugin](abstractvoice/integrations/abstractcore_plugin.py): registers voice/audio backends via entry points; exposes provider/model/voice discovery, clone creation, plus TTS/STT execution with split or combined `provider:model` selectors - [AbstractCore tool helpers](abstractvoice/integrations/abstractcore.py): `make_voice_tools(...)` (manual wiring) - [Artifact store adapter](abstractvoice/artifacts.py): AbstractRuntime-like ArtifactStore adapter (duck-typed) - [Entry points + scripts](pyproject.toml): `abstractvoice`, `abstractvoice-prefetch`, plugin registration - [CI workflow](.github/workflows/ci.yml): Python 3.9-3.12 tests + build check - [Release workflow](.github/workflows/release.yml): tag/manual release to PyPI + GitHub Release - [Known issues](docs/known-issues.md): current bug tracker mirror for release-facing caveats ## Core architecture (internal) - [Architecture](docs/architecture.md): component diagram + code map - [Acronyms](docs/acronyms.md) - [ADR 0001](docs/adr/0001-local_assistant_out_of_box.md): out-of-box local assistant - [ADR 0002](docs/adr/0002_barge_in_interruption.md): voice modes + stop phrase + optional AEC - [ADR 0003](docs/adr/0003_cloning_reference_text_fallback.md): cloning `reference_text` auto-fallback - [ADR 0004](docs/adr/0004_streaming_and_cancellation_for_cloned_tts.md): streaming + cancellation for cloned TTS - [ADR 0005](docs/adr/0005_torch_device_and_dtype_policy.md): torch device + dtype selection policy (torch engines) ## Code map (start points) - [Version source](abstractvoice/_version.py): single release version source - [Public package entry](abstractvoice/__init__.py): main public exports - [Module CLI entry](abstractvoice/__main__.py): `python -m abstractvoice ...` (incl. `download`) - [VoiceManager façade](abstractvoice/voice_manager.py): public import target - [VoiceManager wiring](abstractvoice/vm/manager.py): constructor + engine selection - [Voice-mode callbacks](abstractvoice/vm/core.py): behavior while speaking - [TTS methods + cloning orchestration](abstractvoice/vm/tts_mixin.py) - [Runtime TTS switching](abstractvoice/vm/tts_mixin.py): `set_tts_engine(...)` resets the base profile to the provider/language default - [Voice profiles abstraction](abstractvoice/voice_profiles.py): cross-engine `VoiceProfile` ids + metadata - [Remote audio HTTP helpers](abstractvoice/adapters/openai_compatible_http.py): lightweight requests-based OpenAI-compatible audio client - [Remote TTS adapter](abstractvoice/adapters/tts_openai_compatible.py): `/audio/speech` + remote profile discovery - [Remote STT adapter](abstractvoice/adapters/stt_openai_compatible.py): `/audio/transcriptions` - [TTS delivery mode](abstractvoice/tts/delivery_mode.py): normalize `"buffered"` vs `"streamed"` - [Text chunking](abstractvoice/tts/text_chunking.py): `split_text_batches(...)` + `TextStreamChunker` - [Text→audio streaming bridge](abstractvoice/tts/text_to_speech_stream.py): `TextToSpeechStream` (LLM streaming → TTS) - [Audio chunk smoothing](abstractvoice/audio/fade.py): edge fades + headroom scaling - [STT + listening methods](abstractvoice/vm/stt_mixin.py) - [Explicit prefetch tool](abstractvoice/prefetch.py): `abstractvoice-prefetch ...` - [Local web example](abstractvoice/examples/web_ui.py): browser TTS/STT smoke test ## Engines & adapters - [Adapters base types](abstractvoice/adapters/base.py) - [Piper TTS adapter](abstractvoice/adapters/tts_piper.py): voice selection + caching + synthesis - [Supertonic TTS adapter](abstractvoice/adapters/tts_supertonic.py): fixed-profile ONNX TTS (`M1`-`M5`, `F1`-`F5`) - [Supertonic runtime](abstractvoice/supertonic/runtime.py): internal ONNX loader + explicit `--supertonic` prefetch; no external Supertonic SDK dependency - [AudioDiT TTS adapter (optional)](abstractvoice/adapters/tts_audiodit.py) - [AudioDiT runtime + vendored model](abstractvoice/audiodit/): LongCat-AudioDiT implementation (MIT) - [OmniVoice TTS adapter (optional)](abstractvoice/adapters/tts_omnivoice.py) - [OmniVoice runtime wrapper (optional)](abstractvoice/omnivoice/runtime.py) - [OmniVoice cloning engine (optional)](abstractvoice/cloning/engine_omnivoice.py) - [faster-whisper STT adapter](abstractvoice/adapters/stt_faster_whisper.py) - [Remote OpenAI-compatible TTS adapter](abstractvoice/adapters/tts_openai_compatible.py) - [Remote OpenAI-compatible STT adapter](abstractvoice/adapters/stt_openai_compatible.py) - [TTS engine wrapper](abstractvoice/tts/adapter_tts_engine.py): wraps adapters + uses `NonBlockingAudioPlayer` - [Audio playback](abstractvoice/tts/tts_engine.py): pause/resume/stop + optional audio-fx - [Mic capture + VAD + stop phrase](abstractvoice/recognition.py) - [Stop phrase helper](abstractvoice/stop_phrase.py) - [VAD](abstractvoice/vad/voice_detector.py) - [AEC (optional)](abstractvoice/aec/webrtc_apm.py) ## Voice profiles + streaming TTS (cross-engine) - [Preset profile catalogs](abstractvoice/assets/voice_profiles/): shipped presets (engine-specific) - [TTS delivery mode](abstractvoice/tts/delivery_mode.py): `"buffered"` vs `"streamed"` - [Text chunking](abstractvoice/tts/text_chunking.py): sentence + soft-boundary segmentation - [Text→audio streaming bridge](abstractvoice/tts/text_to_speech_stream.py): incremental LLM → TTS pipelining - [Audio chunk smoothing](abstractvoice/audio/fade.py): mitigate clicks/clipping at chunk boundaries ## Voice cloning (optional; heavy) - [REPL cloning workflow](docs/repl_guide.md): commands + explicit downloads - [Voices and licenses](docs/voices-and-licenses.md): licensing caveats for models/voices - [VoiceCloner + engine dispatch](abstractvoice/cloning/manager.py) - [Clone store + bundles](abstractvoice/cloning/store.py) - [F5‑TTS engine](abstractvoice/cloning/engine_f5.py) - [Chroma engine](abstractvoice/cloning/engine_chroma.py) - [AudioDiT engine](abstractvoice/cloning/engine_audiodit.py) - [OmniVoice engine](abstractvoice/cloning/engine_omnivoice.py) - [Remote cloning bridge](abstractvoice/cloning/engine_remote.py): compatible `/voice/clone` + stored remote voice ids ## Tests - [tests/](tests/): CI-safe run is `python -m pytest -q` (defaults to skipping `integration`, `model_download`, and `slow` marked tests) - [Contributing](CONTRIBUTING.md): dev setup + env vars for heavy tests (`ABSTRACTVOICE_RUN_CLONING_TESTS=1`, `ABSTRACTVOICE_RUN_CHROMA_TESTS=1`) - [Bug report template](.github/ISSUE_TEMPLATE/bug_report.yml): structured issue intake - [test_voice_cloner_engine_dispatch.py](tests/test_voice_cloner_engine_dispatch.py) - [test_cloning_reference_text_autofallback.py](tests/test_cloning_reference_text_autofallback.py) - [test_cloned_tts_cancellation.py](tests/test_cloned_tts_cancellation.py) - [test_chroma_cloning_integration.py](tests/test_chroma_cloning_integration.py): heavy; gated by env var + optional deps ## Dependencies & licensing - [pyproject.toml](pyproject.toml): deps, extras, entry points - [Acknowledgments](ACKNOWLEDGMENTS.md) - [Security policy](SECURITY.md) - [License](LICENSE) - [LongCat AudioDiT license](third_party_licenses/longcat_audiodit_license.txt) - [Supertonic model notice](third_party_licenses/supertone_supertonic_notice.txt) - [OmniVoice notice](third_party_licenses/omnivoice_notice.txt) ## Optional - [Model / voice management](docs/model-management.md) - [Multilingual notes](docs/multilingual.md) - [REPL demonstrator](abstractvoice/examples/cli_repl.py) + minimal LLM client (`abstractvoice/examples/llm_provider.py`) - [Backlog](docs/backlog/): internal planning (not an API contract) - [Reports](docs/reports/): historical snapshots - [Voice cloning research notes](docs/voice_cloning_2026.md): non-contract research notes