--- name: mindwork-transcribe description: Transcribe therapy session recordings to formatted text. Converts audio to clean, speaker-labeled transcripts (Me/Therapist format) with grammar correction and English translation. Use when processing therapy recordings, session audio, or any two-person conversation recording. --- # Therapy Session Transcriber Part of the **mindwork** suite. Converts therapy session recordings into clean, formatted transcripts. ## What It Does 1. **Chunks** large audio files at natural silence points (sentence boundaries) 2. **Transcribes** using OpenAI Whisper API 3. **Formats** as two-person conversation with **Me:** / **Therapist:** labels 4. **Corrects** grammar and transcription errors 5. **Translates** to English (for non-English sessions) ## Prerequisites - Docker installed and running - `OPENAI_API_KEY` environment variable set - The `mindwork-transcribe` Docker image built (see Setup) ## Setup (One-Time) Build the transcription Docker image from the plugin's transcribe directory: ```bash # Find the mindwork plugin location and build the image docker build -t mindwork-transcribe ~/src/mindwork/transcribe ``` Or if installed as a plugin, find the plugin path first: ```bash # The transcribe tool is in the 'transcribe/' directory of this plugin docker build -t mindwork-transcribe /path/to/mindwork/transcribe ``` ## Usage ### Full Therapy Session Processing (Recommended) Transcribe, format as conversation, and translate to English: ```bash docker run --rm \ -e OPENAI_API_KEY \ -v $(pwd):/data \ mindwork-transcribe /data/session.m4a --format-conversation --output /data/transcript.txt ``` ### Raw Transcription Only Just transcribe without formatting or translation: ```bash docker run --rm \ -e OPENAI_API_KEY \ -v $(pwd):/data \ mindwork-transcribe /data/session.m4a --output /data/transcript.txt ``` ### With Speaker Diarization For automatic speaker detection (alternative to --format-conversation): ```bash docker run --rm \ -e OPENAI_API_KEY \ -v $(pwd):/data \ mindwork-transcribe /data/session.m4a --diarize --output /data/transcript.txt ``` ### Only Chunk (No Transcription) Split a large file into chunks for later processing: ```bash docker run --rm \ -v $(pwd):/data \ mindwork-transcribe /data/session.m4a --no-transcribe --keep-chunks ``` ### Process Existing Chunks Resume from previously created chunks: ```bash docker run --rm \ -e OPENAI_API_KEY \ -v $(pwd):/data \ mindwork-transcribe /data/chunks/ --format-conversation --output /data/transcript.txt ``` ## Options Reference | Option | Description | |--------|-------------| | `--output FILE` | Save transcript to file (default: stdout) | | `--format-conversation` | Format as Me/Therapist dialogue + translate to English | | `--diarize` | Auto-detect speakers (uses gpt-4o-transcribe-diarize) | | `--no-transcribe` | Only chunk, skip transcription | | `--keep-chunks` | Preserve chunk files after processing | | `--model MODEL` | `whisper-1` (default, fast) or `gpt-4o-transcribe` (better accuracy) | ## Supported Audio Formats mp3, mp4, m4a, wav, webm, ogg, flac ## Configuration (mindwork.yaml) If a `mindwork.yaml` config file exists, use it to determine output paths: ```yaml vault: ~/Therapy sources: recordings: paths: [recordings/] outputs: transcriptions: transcriptions/ ``` **Config locations** (checked in order): 1. `./mindwork.yaml` (current directory) 2. `~/.config/mindwork/config.yaml` 3. `~/.mindwork.yaml` **Default behavior** (no config): - Save to current directory or user-specified `--output` path **With config**: - Save to `{vault}/{outputs.transcriptions}/{date}-{filename}.md` - Example: `~/Therapy/transcriptions/2024-01-15-session-001.md` See `config/mindwork.example.yaml` for full configuration options. ## Output Format With `--format-conversation`, output looks like: ``` **Me:** I've been feeling anxious about work lately. The deadlines keep piling up. **Therapist:** That sounds overwhelming. Can you tell me more about what specifically triggers that anxiety? **Me:** It's mostly when I have multiple projects due at the same time... ``` ## Cost Estimate OpenAI Whisper API: ~$0.006/minute of audio GPT-4o for formatting/translation: ~$0.01-0.02 per session (varies by length) A typical 50-minute session costs approximately $0.30-0.50 total. ## Troubleshooting **"Docker image not found"** Build the image from the plugin's transcribe directory: ```bash docker build -t mindwork-transcribe /path/to/mindwork/transcribe ``` **"OPENAI_API_KEY not set"** ```bash export OPENAI_API_KEY="sk-..." ``` **"File not found"** Ensure you're in the directory containing your audio file, or use absolute paths. **Transcription quality issues** Try `--model gpt-4o-transcribe` for better accuracy (same price as whisper-1).