# Whisper Model: Download, Caching, and Lifecycle **Version:** 2.0.0 **Last Updated:** November 26, 2025 **Author:** Mantej Singh Dhanjal --- ## Table of Contents 1. [Overview](#overview) 2. [Model Download Process](#model-download-process) 3. [Caching Strategy](#caching-strategy) 4. [Lifecycle Management](#lifecycle-management) 5. [Shutdown Behavior](#shutdown-behavior) 6. [Storage Locations](#storage-locations) 7. [Troubleshooting](#troubleshooting) --- ## Overview STT-CLI v2.0 uses **OpenAI Whisper** via the `faster-whisper` library for offline speech recognition. The model is downloaded once and cached locally for subsequent uses. ### Model Specifications | Property | Value | |----------|-------| | **Model Name** | Systran/faster-whisper-tiny | | **Repository** | Hugging Face (https://huggingface.co/Systran) | | **License** | MIT (100% free, commercial use allowed) | | **Size** | ~75MB (model.bin + tokenizer + vocab files) | | **Parameters** | 39 million (smallest Whisper model) | | **Quantization** | INT8 (2-3x speedup on CPU) | | **Backend** | CTranslate2 (optimized inference engine) | --- ## Model Download Process ### First-Time Download Flow ``` User starts STT-CLI (first run with Whisper mode) ↓ User double-taps Left Alt (start recording) ↓ recording_loop() captures audio ↓ transcribe_with_whisper(audio) called ↓ get_whisper_model() - FIRST CALL ↓ Check: Is whisper_model already loaded in memory? NO → Proceed with download ↓ Thread Lock Acquired (whisper_model_lock) ↓ WhisperModel("tiny", device="cpu", compute_type="int8", download_root=...) ↓ faster-whisper checks: Is model cached locally? NO → Download from Hugging Face CDN ↓ Download Progress (via huggingface_hub library): 1. config.json (~1KB) 2. tokenizer.json (~2MB) 3. vocabulary.txt (~450KB) 4. model.bin (~75MB) ← Main file, takes 5-10s on broadband ↓ Files saved to: %APPDATA%\stt-cli\models\models--Systran--faster-whisper-tiny\ ↓ Model loaded into memory (whisper_model global variable) ↓ Thread Lock Released ↓ Model ready for transcription (~8-12 seconds total for first use) ``` ### Subsequent Use Flow ``` User starts STT-CLI (model already downloaded) ↓ User double-taps Left Alt (start recording) ↓ transcribe_with_whisper(audio) called ↓ get_whisper_model() - SUBSEQUENT CALL ↓ Check: Is whisper_model already loaded in memory? YES → Return cached model (INSTANT, no disk access) ↓ Model ready for transcription (~0ms overhead) ``` **Key Insight:** The model is loaded **once per app session** and kept in memory. No disk I/O or download on subsequent transcriptions. --- ## Caching Strategy ### Memory Cache (whisper_model global variable) **Code Location:** `main.pyw:whisper_model` (line ~74) ```python whisper_model: Optional["WhisperModel"] = None # Global in-memory cache ``` **Lifecycle:** - **Created:** On first call to `get_whisper_model()` - **Exists:** Throughout app lifetime (never unloaded) - **Destroyed:** When app quits (automatic garbage collection) **Benefits:** - **Fast repeated transcriptions** (no reload penalty) - **Low latency** after first use (<1s transcription time) **Trade-off:** - **High RAM usage** (~1.5GB while model in memory) - Not suitable for memory-constrained devices ### Disk Cache (%APPDATA%\stt-cli\models\) **Purpose:** Persistent storage of downloaded model files **Directory Structure:** ``` %APPDATA%\stt-cli\models\ └── models--Systran--faster-whisper-tiny\ ├── blobs\ │ ├── [hash1] → config.json │ ├── [hash2] → tokenizer.json │ ├── [hash3] → vocabulary.txt │ └── [hash4] → model.bin (75MB) ├── refs\ │ └── main → points to specific commit └── snapshots\ └── [commit-hash]\ ├── config.json → symlink to blob ├── tokenizer.json → symlink to blob ├── vocabulary.txt → symlink to blob └── model.bin → symlink to blob ``` **Managed By:** `huggingface_hub` library (automatic) **Benefits:** - **One-time download** (subsequent runs skip download) - **Version tracking** (commit-based, can update model) - **Atomic updates** (blob + snapshot design prevents corruption) --- ## Lifecycle Management ### App Startup ```python # At startup, NO model loading occurs # Model is loaded LAZILY on first Whisper transcription ``` **Why Lazy Loading?** - **Fast startup** (<2s to system tray) - **Efficient for Google-only users** (don't pay Whisper cost) - **Reduced memory footprint** (model only loaded when needed) ### During Execution **Scenario 1: User switches to Whisper mode** ``` 1. User: Right-click tray → Engine → Whisper 2. set_engine("whisper") updates global current_engine 3. Settings saved to %APPDATA%\stt-cli\settings.json 4. Next recording uses Whisper (model loaded if not already) ``` **Scenario 2: User switches from Whisper to Google** ``` 1. User: Right-click tray → Engine → Google 2. set_engine("google") updates global current_engine 3. Model REMAINS in memory (not unloaded) 4. Next recording uses Google (Whisper model idle but ready) ``` **Memory Optimization Insight:** Even if user switches away from Whisper, the model stays in RAM for fast re-activation. This is acceptable because: - Model loading is expensive (~5s) - RAM is cheaper than latency - Users may switch back frequently --- ## Shutdown Behavior ### Normal Shutdown (User clicks "Quit") ```python def quit_program(icon_param: Optional[pystray.Icon] = None) -> None: logging.info("Exiting application...") # 1. Stop recording gracefully recording_event.clear() # 2. Wait for recording thread (max 1s) if recording_thread and recording_thread.is_alive(): recording_thread.join(timeout=1.0) # 3. Stop system tray icon if icon: icon.stop() # 4. Exit process os._exit(0) # Immediate exit, no Python cleanup ``` **What Happens to Whisper Model:** - ❌ **NOT explicitly unloaded** (no `model.close()` or `del whisper_model`) - ✅ **Automatic cleanup** by `os._exit(0)` → Process terminates → OS reclaims all memory - ✅ **Disk cache preserved** → Model files remain in `%APPDATA%\stt-cli\models\` **Why `os._exit(0)` instead of `sys.exit()`?** - `sys.exit()` → Raises SystemExit exception → Python cleanup → Slower (1-2s) - `os._exit(0)` → Immediate process termination → OS cleanup → Fast (<100ms) ### Windows Shutdown / PC Restart ``` User shuts down Windows ↓ Windows sends WM_QUERYENDSESSION to all apps ↓ STT-CLI (as a GUI app with no window) receives signal ↓ Python process terminated by OS (no custom handler) ↓ All memory released (including Whisper model) ↓ Disk cache remains intact (%APPDATA% persists) ↓ Next boot: Model loads instantly from disk cache ``` **Key Points:** - ✅ **No data loss** (model cache is on disk) - ✅ **No corruption** (huggingface_hub uses atomic writes) - ✅ **Auto-start works** (if enabled, app restarts on login) ### Crash / Task Manager Kill ``` User force-kills pythonw.exe via Task Manager ↓ Process terminated immediately (no cleanup) ↓ Whisper model unloaded (OS reclaims memory) ↓ Disk cache MAY be incomplete if download was in progress ↓ Next run: faster-whisper detects incomplete download → Re-downloads ``` **Safety:** Hugging Face Hub uses `.incomplete` files during download. If interrupted, next run detects and resumes/restarts. --- ## Storage Locations ### Model Cache Directory **Location:** `%APPDATA%\stt-cli\models\` **Typical Path:** ``` C:\Users\\AppData\Roaming\stt-cli\models\ ``` **Size:** ~75MB for tiny model **Can I Delete It?** - ✅ **Yes** - App will re-download on next Whisper use - ⚠️ **Trade-off** - Re-download takes 5-10 seconds on broadband **How to Clear:** ```bash rmdir /s /q "%APPDATA%\stt-cli\models" ``` ### Settings File **Location:** `%APPDATA%\stt-cli\settings.json` **Contains:** ```json { "stt_engine": "whisper", "whisper_model": "tiny", "auto_start": true, "first_run": false, "version": "2.0.0" } ``` **If Deleted:** App recreates with defaults on next run. ### Logs **Location:** `%TEMP%\stt-cli\app.log` **Typical Path:** ``` C:\Users\\AppData\Local\Temp\stt-cli\app.log ``` **Log Messages Related to Whisper:** ``` INFO - Whisper model loaded successfully DEBUG - [WHISPER] Transcribed: hello world WARNING - faster-whisper not installed. Only Google Web Speech API will be available. ``` --- ## Troubleshooting ### Issue 1: Model Download Fails **Symptoms:** ``` ERROR - Failed to load Whisper model: HTTPError 404 ``` **Causes:** - No internet connection during first use - Corporate firewall blocks Hugging Face CDN - Antivirus quarantines downloaded files **Solutions:** 1. **Check internet:** Ensure connection during first Whisper use 2. **Try different network:** Use personal hotspot if corporate firewall blocks 3. **Whitelist in antivirus:** Add `%APPDATA%\stt-cli\models\` to exclusions 4. **Manual download:** ```python # Run in Python console: from faster_whisper import WhisperModel model = WhisperModel("tiny") # Forces download ``` ### Issue 2: Model Not Found After Download **Symptoms:** ``` WARNING - faster-whisper not installed ``` **Cause:** `faster-whisper` library not installed in Python environment **Solution:** ```bash pip install -r requirements.txt --force-reinstall ``` ### Issue 3: Slow Transcription (>3s) **Symptoms:** ``` INFO - Whisper transcription took 4.2s ``` **Causes:** - CPU overloaded (other applications) - Model not using INT8 quantization - Antivirus scanning model files **Solutions:** 1. Close other applications (free CPU) 2. Verify INT8 in logs: `compute_type="int8"` 3. Exclude `%APPDATA%\stt-cli\models\` from real-time scanning 4. Upgrade to "base" model (better accuracy, slightly slower) ### Issue 4: High RAM Usage (>3GB) **Symptoms:** - Task Manager shows pythonw.exe using 3GB+ RAM - System becomes sluggish **Causes:** - Whisper model in memory (~1.5GB) - Multiple recording sessions (memory not released) - Memory leak (rare) **Solutions:** 1. **Normal:** Whisper uses 1.5GB RAM (expected) 2. **Switch to Google mode:** Uses <100MB RAM 3. **Restart app:** Right-click tray → Quit → Restart 4. **Upgrade RAM:** 8GB minimum recommended for Whisper mode ### Issue 5: Model Cache Corrupt **Symptoms:** ``` ERROR - Failed to load model: FileNotFoundError ``` **Solution:** Delete cache and re-download: ```bash rmdir /s /q "%APPDATA%\stt-cli\models" # Restart STT-CLI, will re-download automatically ``` --- ## Model Upgrade Path (Future) **Current:** Only "tiny" model supported **Planned (v2.1):** - **User-selectable models:** tiny, base, small - **Settings menu:** Change model without editing JSON - **Hot-swapping:** Unload old model, load new model - **Trade-off UI:** Show size, speed, accuracy comparison **How to Upgrade Model Manually (Advanced):** ```json // Edit %APPDATA%\stt-cli\settings.json { "whisper_model": "base" // Change from "tiny" to "base" } // Restart STT-CLI → Will download 140MB "base" model ``` **Model Comparison:** | Model | Size | RAM | Speed | Accuracy | |-------|------|-----|-------|----------| | tiny | 75MB | 1.5GB | ⚡⚡⚡ | ⭐⭐ | | base | 140MB| 2GB | ⚡⚡ | ⭐⭐⭐ | | small | 460MB| 2.5GB | ⚡ | ⭐⭐⭐⭐ | --- ## Related Documentation - [ARCHITECTURE.md](./ARCHITECTURE.md) - System architecture overview - [THREADING.md](./THREADING.md) - Threading model and synchronization - [STT-CLI-Architecture.drawio](./STT-CLI-Architecture.drawio) - Visual flow diagrams --- **End of Whisper Model Documentation**