--- name: "prog-expert-tts-stt-integration" description: "Use when you need to implement TTS (text-to-speech) or STT (speech-to-text) integrations in Nuxt 3/Nitro applications, including Deepgram streaming STT via WebSocket, ElevenLabs TTS, browser MediaRecorder, and audio settings management. Expert-level programming and pattern management." metadata: stage: "alpha" source: "MIGRATED" requires: - "impl-experts-js-contract" env_vars: DEEPGRAM_API_KEY: "Required for STT. Your Deepgram API key." ELEVENLABS_API_KEY: "Required for TTS. Your ElevenLabs API key." --- # Prog Expert TTS/STT Integration ## Goal Expert-level implementation of audio I/O integrations in Nuxt 3 + Nitro applications. Covers Deepgram streaming STT via WebSocket proxy, ElevenLabs TTS, browser MediaRecorder API, audio settings, and multi-provider configuration. ## Core Workflow (Progressive Disclosure) 1. **Context Analysis**: Identify if the task is STT (input/transcription) or TTS (output/speech), and which provider is involved. 2. **Knowledge Retrieval**: - Check `recipes/` for matching implementation patterns. - Read `references/patterns.md` for architectural patterns and code snippets. - Check `references/versions.json` for current provider SDK versions. 3. **Implementation**: Follow the layered architecture (see below). Leverage `atomic-examples/` for isolated building blocks. 4. **Learning**: - After successful implementation, extract patterns via `programming/prog-expert-advisor/scripts/learn.py`. - Create/refine recipes in `recipes/` using `scripts/manage_recipes.py`. ## Architecture Overview ``` Browser Nitro Server External APIs ────────────────────────────────────────────────────────────────────────── MediaRecorder (WebM/Opus) → WS /api/transcribe-stream → Deepgram WSS ← {type,transcript,is_final} ← POST /api/speak (text) → AgentManager.speak() → ElevenLabs REST ← ReadableStream (mp3) ← Audio() .play() ← sendStream(event, stream) ← ``` ## Instructions - **Package manager**: Always use `bun` for this project. - **Before starting**: Search `recipes/` for relevant patterns. - **STT streaming**: Always proxy through Nitro WebSocket — never expose Deepgram API keys to the browser. - **TTS streaming**: Use Nitro's `sendStream()` to pipe the ElevenLabs ReadableStream directly to the client. - **Settings resolution**: API keys come from env vars first, then `data/settings.json`. Throw clear errors if missing. - **Cleanup**: Always stop MediaRecorder tracks and close WebSocket on component unmount. - **MIME type**: Use `audio/webm;codecs=opus` for MediaRecorder; Deepgram accepts it natively. - **Language**: Support `auto` (detect_language=true), `multi` (detect_language=true&multi_channel=true), or specific codes. - **After new recipe**: Add it to `recipes/README.md` index. ## Layered Implementation Checklist ### Backend (Nitro) - [ ] `server/api/speak.post.ts` — TTS REST endpoint - [ ] `server/api/transcribe.post.ts` — STT one-shot endpoint (optional) - [ ] `server/routes/api/transcribe-stream.ts` — STT WebSocket proxy (preferred) - [ ] `server/utils/agent-manager.ts` or equivalent — `speak()` and `transcribe()` methods - [ ] `nuxt.config.ts` — enable `experimental.websocket: true` ### Frontend (Vue 3) - [ ] Recording state: `isRecording`, `isTranscribing`, `mediaRecorder`, `audioChunks` - [ ] WebSocket state: `sttSocket`, `sttCursorPos`, `sttUtterance`, `sttInterimText` - [ ] `startRecording()` — mic access, WS connection, MediaRecorder setup - [ ] `stopRecording()` — send `{type:'stop'}`, stop tracks - [ ] `handleSttTranscript()` — merge interim + final results at cursor - [ ] `playMessage(text)` — POST /speak, blob URL, Audio().play() - [ ] `onUnmounted()` — full cleanup ### Settings - [ ] `defaultSttProvider` (deepgram) - [ ] `defaultTtsProvider` (elevenlabs) - [ ] `sttLanguage` (auto | multi | en | de | fr | es | it | pt | nl | ja | ko | zh) - [ ] API key masking in UI for env-sourced keys ## Self-Learning & Research - Gather knowledge from web research and the `use-context7-api` skill. - Learn from successful implementations and other codebases. - Track latest SDK/API versions. - Refine recipes based on implementation experience. ## Auto-Improvement - Every time this skill is used, analyze the usage chat to find out if further improvement is advisable. - Ask the user if those changes should be made. - If approved, store improvement ideas in `resources/improvement_ideas.md`. ## References - [Patterns](references/patterns.md) — Complete code patterns from Pichi: WebSocket proxy, MediaRecorder, TTS stream, settings. - [Versions](references/versions.json) — Tracked provider/library versions. - [Recipes](recipes/README.md) — Recipes index for recurring TTS/STT tasks. ## Example Code When learning or implementing, use these code examples. ALWAYS load them via `view_file` to maintain Progressive Disclosure: - **Atomic Examples** (Small code chunks from docs): - *(Add examples here)*: `view_file(/atomic-examples/...)` - **Recipes** (Larger patterns like Auth, Pages): - *(Add recipes here)*: `view_file(/recipes/...)` - **Reference Implementations** (Complex pseudo-code from existing real-world codebases): - *(Add references here)*: `view_file(/reference-implementations/...)` ## Constraints - Never expose provider API keys to the browser. - Do not overwrite existing files without explicit user confirmation. - Do not add Pinia/Vuex — use local `ref()` state only. ## Script Integration - **Research**: `uv run scripts/research_knowledge.py ""` - **Version Tracking**: `uv run scripts/track_versions.py` - **Recipe Management**: `uv run scripts/manage_recipes.py [args]` - **Example Management**: `uv run scripts/manage_examples.py [args]` - **Pattern Learning**: `uv run programming/prog-expert-advisor/scripts/learn.py `