---
name: voice-system-expert
description: Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.
allowed-tools: Read
---

# Quetrex Voice System Expert

## CRITICAL: Read This First

Quetrex's voice system architecture is **extensively documented and battle-tested**. Before making ANY changes to voice-related code, you MUST read:

1. **ADR-001-VOICE-ECHO-CANCELLATION.md** (definitive architectural decision)
2. **docs/architecture/VOICE-SYSTEM.md** (technical implementation)
3. **docs/features/voice-interface.md** (user-facing features)

**Location:** `src/lib/openai-realtime.ts`

## Core Architecture Decision

### ALWAYS-ON MICROPHONE + BROWSER AEC

This is **Decision 4** from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.

```typescript
// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,  // CRITICAL - browser handles echo cancellation
    noiseSuppression: true,
    autoGainControl: true,
  },
})

// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection
```

## How It Works

### Audio Pipeline
```
User speaks
    ↓
Microphone (always enabled, echoCancellation: true)
    ↓
WebRTC → OpenAI Realtime API
    ↓
Server-side VAD detects speech
    ↓
OpenAI processes and responds
    ↓
Audio response via WebRTC
    ↓
HTMLAudioElement playback (stays in browser pipeline)
    ↓
Browser AEC compares mic input + speaker output
    ↓
Echo automatically canceled (no feedback loop)
```

### Why This Works

**Browser Echo Cancellation Requirements:**
1. Both microphone input AND speaker output must be in browser's audio graph
2. Audio must flow through browser's WebRTC stack
3. No manual intervention needed (browser handles it)

**This is the industry standard:**
- ChatGPT voice mode
- Google Meet
- Zoom
- Discord
- Microsoft Teams

They ALL use always-on microphone + browser AEC.

## DO NOT DO THESE THINGS

### ❌ DON'T: Toggle Microphone Track

```typescript
// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
  microphone.enabled = false // DON'T DO THIS
}

async function resumeRecording() {
  microphone.enabled = true // DON'T DO THIS
}
```

**Why this fails:**
- Not industry standard
- Causes audio glitches
- Can break echo cancellation
- Adds artificial delays
- More complex state management
- No real benefit

### ❌ DON'T: Route Audio Outside Browser

```typescript
// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })
```

**Why this fails:**
- Breaks browser AEC (audio leaves browser pipeline)
- Causes echo/feedback loops
- Requires complex native audio handling
- Platform-specific implementations
- Already tried and failed (see abandoned-approaches/)

### ❌ DON'T: Implement Custom Echo Cancellation

```typescript
// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
  cancelEcho(input: AudioBuffer, output: AudioBuffer) {
    // Complex DSP code...
  }
}
```

**Why this fails:**
- Reinventing the wheel
- Browser AEC is battle-tested by billions of users
- Extremely complex to implement correctly
- Requires deep DSP knowledge
- Performance issues
- Platform-specific tuning needed

### ❌ DON'T: Add Artificial Delays

```typescript
// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()
```

**Why this fails:**
- Not necessary (browser AEC handles it)
- Degrades user experience
- Adds latency
- Still doesn't prevent echo if implementation is wrong

## What You CAN Change

### ✅ DO: Adjust Audio Settings

```typescript
// ✅ Can tweak these settings
const constraints = {
  audio: {
    echoCancellation: true,      // MUST be true
    noiseSuppression: true,       // Can adjust
    autoGainControl: true,        // Can adjust
    sampleRate: 24000,            // Can change for quality
    channelCount: 1,              // Mono is fine for voice
  },
}
```

### ✅ DO: Handle Connection States

```typescript
// ✅ Can manage connection lifecycle
async function connectVoice() {
  // Setup WebRTC connection
  // Start streaming
  // Handle connection events
}

async function disconnectVoice() {
  // Clean up WebRTC connection
  // Stop streaming
  // Release microphone
}
```

### ✅ DO: Add UI Feedback

```typescript
// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
  if (active) {
    // Show "listening" indicator
    // Animate microphone icon
  } else {
    // Show "idle" indicator
  }
}
```

### ✅ DO: Handle Errors

```typescript
// ✅ Can improve error handling
try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
  if (error.name === 'NotAllowedError') {
    // Show permission request UI
  } else if (error.name === 'NotFoundError') {
    // Show "no microphone found" error
  }
}
```

## Implementation Location

**Primary file:** `src/lib/openai-realtime.ts`

**Key functions:**
- `setupMediaStream()` - Initializes microphone with correct constraints
- `connectToOpenAI()` - Establishes WebRTC connection
- `handleAudioResponse()` - Plays AI responses via HTMLAudioElement

**DO NOT modify:**
- Microphone enable/disable logic (should stay always-on)
- Echo cancellation settings (must be true)
- Audio routing (must stay in browser pipeline)

**CAN modify:**
- UI feedback and visual indicators
- Error handling and user messages
- Connection retry logic
- Audio quality settings (sample rate, etc.)

## Why This Architecture Was Chosen

From ADR-001:

**Option A: Trust Industry Pattern (CHOSEN)**
- ✅ Used by ChatGPT, Google Meet, Zoom, Discord
- ✅ Browser vendors optimize for this pattern
- ✅ Works across all platforms (macOS, Windows, Linux, mobile)
- ✅ No custom implementation needed
- ✅ Proven reliability

**Options Rejected:**
- ❌ Manual microphone toggling (not industry standard)
- ❌ AudioWorklet native bypass (breaks AEC)
- ❌ Custom echo cancellation (reinventing wheel)
- ❌ Native Rust WebRTC (2-4 weeks work for no benefit)

## Platform Context: Web Application

**IMPORTANT:** Quetrex is now a **pure web application**, not a Tauri desktop app.

**Why this matters for voice:**
- Browser echo cancellation works perfectly in all browsers
- No WKWebView limitations (that was the Tauri problem)
- Universal compatibility (Chrome, Safari, Firefox, Edge)
- No platform-specific audio handling needed
- Just works™️

**The WKWebView Problem (Historical):**
Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: **WebRTC audio playback doesn't work**. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.

**Solution:** Convert to web application. Now echo cancellation works perfectly everywhere.

## Testing Voice Changes

If you must modify voice code:

1. **Test on real browsers** (not just dev tools)
2. **Test the echo scenario:**
   - Turn up speaker volume
   - Start voice conversation
   - Verify no feedback loop/echo
3. **Test across platforms:**
   - Chrome (most common)
   - Safari (macOS, iOS)
   - Firefox
   - Edge
4. **Test edge cases:**
   - Poor network connection
   - Microphone permission denied
   - Mid-conversation disconnect

## When to Consult Documentation

**Before any voice changes, read:**
1. `docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.md`
2. `docs/architecture/VOICE-SYSTEM.md`
3. `docs/development/abandoned-approaches/`

**If you see mentions of:**
- Tauri → Ignore (Quetrex is web app now)
- WKWebView → Ignore (not relevant anymore)
- AudioWorklet bypass → Don't implement (already tried, failed)
- Manual mic toggling → Don't implement (not industry standard)

## Summary

**The Golden Rule:** Trust browser echo cancellation. Always-on microphone + `echoCancellation: true` + audio in browser pipeline = Perfect echo cancellation.

**DO:**
- Keep microphone enabled throughout conversation
- Use `echoCancellation: true`
- Keep audio in browser (HTMLAudioElement)
- Let OpenAI Realtime API handle VAD
- Trust the industry pattern

**DON'T:**
- Toggle microphone track
- Route audio outside browser
- Implement custom echo cancellation
- Add artificial delays
- Try to "improve" what already works

**If in doubt:** Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.