Log for Standalone Faster-Whisper & Faster-Whisper-XXL ### Note: It's not a comprehensive log. ## r245.4 [XXL]: Fixed: 'unload_model' error when using batched inference ## r245.3 [XXL]: Fixed: Model didn't free VRAM when --model_preload False Fixed: "cuda:1" didn't work with MDX Fixed: Some issues in "One Click Transcribe.bat" Fixed: Matplotlib backend error [Colab] Fixed: cuDNN libs not found error [Linux] Changed: `--ff_mdx_kim2` to `--ff_vocal_extract mdx_kim2` Changed: `--mdx_device` to `--voc_device` Added: `distil-large-v3.5` model Updated: pyinstaller to 6.12.0 ## r245.2 [XXL]: Fixed: Audio track selection with `--ff_track` didn't work with `--ff_mdx_kim2`. New feature: Deal with inverted polarity/phase in audios with new args: `--ff_lc` & `--ff_invert`. New feature: `--model_preload` [its automatic]. Fixes FW model interference with `--ff_mdx_kim2`. New feature: `--diarize_ff` [its automatic]. Enables diarization after `--ff_...` filters. New feature: Output diarization embeddings with `--return_embeddings`/`-embeddings`. New feature: Diarize without transcription with `--diarize_only`. ## r245.1 [XXL]: Clean-up temp folder after Ctrl+C Fixed OOM error for `silero_v5_fw` on long audios New arg `--japanese` or `-ja` ## r239.1 [XXL]: New features/args: `--batched` `--unmerged` `--batch_size` `--multilingual` `--hotwords` `--rehot` `--ignore_dupe_prompt` New `vad_method`'s: `silero_v5` `silero_v4_fw` `silero_v5_fw` Removed not useful experimental options: no_speech_strict_lvl nullify_non_speech prompt_max reprompt's first option ## r194.5 [XXL]: Change: Enable auto/default prompt presets for translation. ## r194.4 [XXL]: Fix: --sentence was splitting digits containing dot. Fix: auditok Fix: Error with empty audio and pyannote_v3 vad Fix: Disable prompt_reset_on_no_end when HST is triggered Removed experimental condition in HST algo ## r194.2 [XXL]: Bugfix: Broken outputs when diarize/sentence and batch input [194.1]. Fix: --sentence was splitting words on hyphens. Fix: --sentence was splitting digits containing comma. Change: --sentence affects all output formats except json [previously only srt & vtt]. Improvements and bugfixes for json input. Added "One Click Transcribe" tool: https://github.com/Purfview/whisper-standalone-win/discussions/337 ## r194.1 [XXL]: New feature: CUDA support for `pyannote_v3` and `pyannote_onnx_v3` VADs New feature: Selective output formats, for example: `--output_format srt json` New feature: `--diarize` will auto-activate `--sentence` and all output formats will be affected except json New `--diarize` options: `pyannote_v3.0`, `pyannote_v3.1`, `reverb_v1`, `reverb_v2` New diarization args: `--num_speakers`, `--min_speakers`, `--max_speakers` `--diarize_dump` Bugfix: Bug in alternative subtitle writer with max_line_width [r189.1]. Bugfix: Bug in --sentence subtitle writer if "word" is space. Change: .cache dir is bound to exe dir instead of cwd. pyinstaller updated to 6.11.1 CTranslate2 updated to 4.4.0 [+ Compute Capability 5.0 support] onnxruntime_gpu updated to 1.18.0 [CUDA 12.x cuDNN 8.x] Python updated to 3.10.11 ## r193.1 [XXL]: New feature: Speaker Diarization with `--diarize`. New feature: `large-v3-turbo` model autodownload. New feature: JSON output has additional key: 'language'. New feature: `--postfix` works without --language. Bigfix: Rare bug in alternative subtitle writer if "words" is empty [r189.1]. Change: `--vad_alt_method` shorted to `--vad_method`. Change: Windows 7 not supported anymore. ## r192.3.4 [XXL]: New feature: Write intermediate files to temp folder. [ignored on dump] New feature: Can take JSON files as input to generate subtitles from it according to the settings. New alternative VAD method : `pyannote_v3` [previous same named option renamed to "pyannote_onnx_v3"] New arg: `--nullify_non_speech` [for WIP experiments] ## r192.3.3 [XXL]: CTranslate2 updated to v4.2.0 Transcription on CPU is up to ~26% faster. ## r192.3.2 [XXL]: Various improvements and bugfixes to `--vad_alt_method` ## r192.3: Bugfix: 'one_word' was broken in r192.2. ## r192.2: Allowed to use 'max_line_width' and 'max_line_count' without '--sentence'. ## r192.1.1 [XXL]: `--ff_mdx_kim2`: Preprocess audio with MDX23 Kim vocal v2 model. `--vad_alt_method`: Alternative VAD methods - "silero_v3", "silero_v4", "pyannote_v3", "auditok", "webrtc". ## r192.1: Bugfix: It was unnecessarily going through ffmpeg routines. [introduced in r189.1] ## r189.1: Supports `distil-large-v3` model. Switched auto prompt for Serbian to the latin alphabet. Auto-disable VAD when --clip_timestamps is in use. JSON output has additional 'text' key with all text aggregated. Fixed --task=translate and --postfix bug. --highlight_words now supports --max_line_width and --max_line_count Changed patience default. Adjusted "--hallucination_silence_threshold"s score algo, and new `--hallucination_silence_th_temp` Auto pseudo-vad th offsets for "large-v3", can be disabled with --v3_offsets_off. `--language_detection_threshold` `--language_detection_segments` `--ff_track` - Audio track selection. `--ff_fc` - Front-Center channel selection. `--ff_dump` - Additionally prevents deletion of some intermediate audio files. `--vad_dump` - Writes VAD timings to srt file. Updated PyInstaller to v6.5.0 Fix Linux exec issue with forward compatibility. v2 ## r186.1: Fix quirks with "The last progress update is passed to the title bar." [r160.11] Fix Linux exec issue with forward compatibility. ## r182.2: Bugfix: PR705 was incorporated a bit incorrectly. Bugfix: Bug with the custom prompt presets and task "translate". ## r182.1: Includes PR705 & PR706 Excludes PR694 [aka CUDA12] Skip micro chunks. New args: `--hallucination_silence_threshold` `--clip_timestamps` Updated PyAV to v11.0.0 [ffmpeg 6.0] Updated PyInstaller to v6.4.0 Updated CTranslate2 to v3.24.0 ## r172.4: Bugfix: With some CPUs rnndn filters were not working. Improved: `--sentence` doesn't treat ellipses (...) as the end of sentence, unless ellipses are at the end of a segment. Changed default of `--initial_prompt` to `auto`. ## r172.3: Better error handling for the audio filters. ## r172.2: Increased "recursion" limit from 1000 to 10000. ## r172.1: Supports "Distil-Whisper" models: `distil-large-v2` `distil-medium.en` `distil-small.en` New args: `--max_new_tokens` `--chunk_length` ## r167.5: Additional option for `--prompt_reset_on_no_end` Various audio filters: `--ff_dump` `--ff_mp3` `--ff_sync` `--ff_rnndn_sh` `--ff_rnndn_xiph` `--ff_fftdn` `--ff_tempo` `--ff_gate` `--ff_speechnorm` `--ff_loudnorm` `--ff_silence_suppress` `--ff_lowhighpass` ## r167.4: Bugfix: Bug if couldn't detect the number of CPU's cores. Updated psutil to v5.9.7 ## r167.3: Bugfix: Illogical "Avoid computing higher temperatures on no_speech". [It could cause bad hallucination loops] https://github.com/openai/whisper/pull/1903 https://github.com/SYSTRAN/faster-whisper/pull/625 ## r167.2: Fix: Commit #165 was missing in `r167.1` release. ## r167.1: `large-v3` now is using the official model. Updated PyInstaller to v6.3.0 Updated CTranslate2 to v3.23.0 ## r160.12: Doesn't add to prompt segments triggered by the common hallucinations in hallucinations_list. New parameters : `--max_comma_cent` `--min_dist_to_end` ## r160.11: Bugfix: `--prompt_reset_on_temperature` was broken. Regression since r139. Improved `--initial_prompt` default preset, there is alternative experimental `auto` preset. Improved `--sentence` with the ignore words list, words like `Mr.` and ect.. The last progress update is passed to the title bar. New parameters : `--prompt_max` `--reprompt` [enabled by default] `--prompt_reset_on_no_end` [enabled by default] ## r160.10: Bugfix: The final line disappeared if the final word wasn't punctuated when using `--sentence`. ## r160.9: Bugfix: The final word could disappear in some combination of the new parameters from r160.8. New parameter : --standard_asia ## r160.8: Improved --skip to work with all output folders. New parameters : `--sentence` `--standard` `--max_comma` `--max_gap` `--max_line_width` `--max_line_count` ## r160.7: Bugfix: Wildcard input could fail if brackets were in a folder's name. Improved `--one_word` with the second option. Improved `--skip` to work with all output formats. If output is "all" - checks only "srt". New `--check_files` argument. Added a catch for invalid `audio` input. Turns off word_timestamps if output format is "text". Exposed options : `--prefix` `--suppress_blank` `--without_timestamps` `--max_initial_timestamp` ## r160.6: Bugfix: Autodownload wasn't working for `large-v3` [ regression in r160.5 ] ## r160.5: large-v3 use fp16 model by default. [Removed other v3 options] New switch: --one_word Updated Pyinstaller to v6.2.0 ## r160.4: Bugfix: -prompt=None wasn't working. UTF-8 chars are now supported in SE's "console". ## r160.3: 'large-v3' model. 'Cantonese' language. ## r160.2: Failed release with a code typo. Was online for ~15 mins... ## r160.1: Bugfix: Fixed bug on macOS -> https://github.com/Purfview/whisper-standalone-win/issues/86