Voice-Pro

The best AI speech recognition, translation, and multilingual dubbing solution 🚀

Dubbing Studio

## 🎙️ An AI-powered web application for speech recognition, translation, and dubbing

South Korea Flag 한국어 ∙ United Kingdom Flag English ∙ China Flag 中文简体 ∙ Taiwan Flag 中文繁體 ∙ Japan Flag 日本語 ∙ Germany Flag Deutsch ∙ Spain Flag Español ∙ Portugal Flag Português

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals. - 🔊 Top-tier speech recognition: **Whisper**, **Faster-Whisper**, **Whisper-Timestamped**, **WhisperX** - 🎤 Zero-shot voice cloning: **F5-TTS**, **E2-TTS**, **CosyVoice** - 📢 Multilingual text-to-speech: **Edge-TTS**, **kokoro** (Paid version includes **Azure TTS**) - 🎥 YouTube processing & audio extraction: **yt-dlp** - 🌍 Instant translation for 100+ languages: **Deep-Translator** (Paid version includes **Azure Translator**) A robust alternative to **ElevenLabs**, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions. ## ⚠️ Please Note - Due to [WeConnect](https://www.wctokyoseoul.com) development work, Voice-Pro development and updates are not possible for the time being. - We have made all Voice-Pro code open source and completely free. Voice-Pro can now be freely distributed and modified by anyone. - It works well on Windows with NVIDIA GPU. Operation on Mac and Linux has not been verified. - Please leave your requests on the [![GitHub Issues](https://img.shields.io/github/issues/abus-aikorea/voice-pro)](https://github.com/abus-aikorea/voice-pro/issues) or [![GitHub Discussions](https://img.shields.io/github/discussions/abus-aikorea/voice-pro)](https://github.com/abus-aikorea/voice-pro/discussions) pages. - **Troubleshooting**: In most cases, issues can be resolved by deleting the `installer_files` folder and then running `configure.bat` followed by `start.bat`. ## 📰 News & History

version 3.2

- We have been focusing on [WeConnect](https://www.wctokyoseoul.com) development for the past few months and have not been able to manage Voice-Pro at all. - We have decided to open source all Voice-Pro code. - Voice-Pro is completely free and supports Windows, Mac, Linux. - [WeConnect](https://www.wctokyoseoul.com) is an application for global cultural exchange. - Connect with people from all over the world for meaningful cultural exchanges, language learning, and international friendships.

ScreenShot 0 ScreenShot 1 ScreenShot 2 ScreenShot 3 ScreenShot 4

version 3.1

- 🪄 Support for fine-tuned models of **F5-TTS** - 🌍 Supported languages - United Kingdom Flag

English &

Chinese: SWivid/F5-TTS_v1 - Spain Flag

Finnish: AsmoKoskinen/F5-TTS_Finnish_Model - Spain Flag

French: RASPIAUDIO/F5-French-MixedSpeakers-reduced - Spain Flag

Hindi: SPRINGLab/F5-Hindi-24KHz - Spain Flag

Italian: alien79/F5-TTS-italian - Spain Flag

Japanese: Jmica/F5TTS/JA_21999120 - Spain Flag

Russian: hotstone228/F5-TTS-Russian - Spain Flag

Spanish: jpgallegoar/F5-Spanish

version 3.0

- 🔥 Removed the **AI Cover** feature. - 🚀 Added support for **m-bain/whisperX**.

version 2.0

- 🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0. - 🆓 Free trial supports media up to **60 seconds** in length. - 🔥 Added the **AI Cover** feature. - 🎤 Introduced support for **CosyVoice** and **kokoro**. - ⏳ Initial run downloads **CozyVoice2-0.5B (9GB)**, which may take over an hour depending on network speed. - 🎧 Voice samples for cloning will be continuously updated. - 📝 Added **spaCy** for natural sentence-by-sentence translation and TTS. - ☁️ Subscription version includes **Microsoft Azure** Translator and TTS. - 🏪 Subscription offers **unlimited usage** (no 60-second limit) during the subscription period, available via [![Shopify](https://img.shields.io/badge/Shopify-7ab55c.svg?style=flat-square&logo=shopify&logoColor=white)](https://r17wvy-t2.myshopify.com).

## 🎥 YouTube Showcase

Demo for Voice-Pro (v2.0)	F5-TTS: Voice Cloning	Live Transcription & Translation	Multi-Lingual Voice Cloning: Korean - German
Multi-Lingual Voice Cloning: English - Korean	Multi-Lingual Voice Cloning: Korean - Japanese	NVIDIA RTX Video Super-Resolution	AI Karaoke
Multi-Lingual Voice Cloning: English - Korean

## ⭐ Key Features ### 1. Dubbing Studio - YouTube video downloads & audio extraction - Voice separation with **Demucs** - Supports 100+ languages for speech recognition & translation ### 2. Speech Technologies - **Speech-to-Text:** **Whisper**, **Faster-Whisper**, **Whisper-Timestamped**, **WhisperX** - **Text-to-Speech:** - **Edge-TTS**: 100+ languages, 400+ voices - **E2-TTS**, **F5-TTS**, **CosyVoice**: Zero-shot cloning - **kokoro**: Ranked #2 in HuggingFace TTS Arena ### 3. Real-Time Translation - Instant speech recognition - Multilingual translation on the fly - Customizable audio inputs ## 🤖 WebUI ### `Dubbing Studio` Tab - All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS - Supports all ffmpeg-compatible formats - Output options: WAV, FLAC, MP3 - Subtitles & recognition for 100+ languages - TTS with speed, volume, & pitch controls

Multilingual Voice Conversion and Subtitle Generation Web UI Interface

### `Whisper Caption` Tab - Subtitle-focused: 90+ languages - Video-integrated subtitle display - Word-level highlighting & denoise options ### `Translate` Tab - Translation for 100+ languages - Supports subtitle files (ASS, SSA, SRT, etc.) - Real-time voice recognition & translation

WebUI for Real-Time Speech Recognition and Translation

### `Speech Generation` Tab - Options: **Edge-TTS**, **F5-TTS**, **CosyVoice**, **kokoro** - Celeb voice podcasts & multilingual support

Podcast Production WebUI Using Voice-Cloning Technology

## 🎤✨ Reference Voice - Please request the voice you want to add on the Issues page. [Issues](https://github.com/abus-aikorea/voice-pro/issues/50)

English

Andrew Bustamante	Andrew Huberman	Avi Loeb	Ben Shapiro	Brett Johnson	Brian Keating
Coffeezilla	Dan Carlin	David Buss	David Fravor	David Kipping	Dennis Whyte
Donald Hoffman	Donald Trump	Douglas Murray	Duncan Trussell	Elon Musk	Garry Nolan
Jack Barsky	James Sexton	Jeff Bezos	Joe Rogan	John Mearsheimer	Jordan Peterson
Kanye 'Ye' West	Mark Zuckerberg	Michael Levin	Michael Saylor	Michio Kaku	MrBeast
Nick Lane	Paul Rosolie	Ryan Graves	Sam Altman	Sam Harris	Stephen Wolfram
Tucker Carlson	Vitalik Buterin	Yuval Harari

Chinese

迪丽热巴 (Dílì Rèbā)

蔡依林 (Cài Yīlín)

吴亦凡 (Wú Yìfán)

李易峰 (Lǐ Yìfēng)

杨幂 (Yáng Mì)

赵丽颖 (Zhào Lìyǐng)

Korean

BTS 진 (Jin)

BTS RM

IU (아이유)

이병헌

이정재

유재석

Japanese

綾瀬はるか (Ayase Haruka)

## 💻 System Requirements - **OS:** Windows 10/11 (64-bit), Linux, Mac - **GPU:** NVIDIA with CUDA 12.4 (recommended) - **VRAM:** 4GB+ (8GB+ preferred) - **RAM:** 4GB+ - **Storage:** 20GB+ free space - **Internet:** Required ## 📀 Installation Install Voice-Pro with ease using **configure.bat** and **start.bat** (use configure.sh and start.sh on Mac/Linux). ### 1. Get the Package + Clone or download the latest release (**Source code (zip)**) from [![GitHub Release](https://img.shields.io/github/v/release/abus-aikorea/voice-pro)](https://github.com/abus-aikorea/voice-pro/) ```bash git clone https://github.com/abus-aikorea/voice-pro.git ``` ### 2. Install & Run 1. 🚀 **configure.bat** - Sets up git, ffmpeg, and CUDA (if NVIDIA GPU) - Run once; takes 1+ hour with internet - Don’t close the command window 2. 🚀 **start.bat** - Launches Voice-Pro WebUI - First run installs dependencies (1+ hour) - Retry after deleting **installer_files** if issues arise ### 3. Update - 🚀 **update.bat**: Refreshes Python environment (faster than reinstall) ### 4. Uninstall - Run **uninstall.bat** or delete the folder (portable install) ## ❓Tips & Tricks #### If Browser does not run automatically - Close the Windows-Commnad window and run start.bat again. - Run the browser directly and enter the address displayed in the Windows-Command window (e.g. **http://127.0.0.1:7870**) in the address bar. #### If a CUDA Out-Of-Memory error occurs - Check the GPU memory status in Windows Task Manager - Performance tab. - Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory. - Set Compute Type to int type. The float type has better quality, but requires more GPU memory. #### How to improve the quality of subtitles? - The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny - Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases. - If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results. ## 🚨 Notice - Due to [WeConnect](https://www.wctokyoseoul.com) development work, there will be no Voice-Pro updates for the time being. - All Voice-Pro code has been made open source. It is now completely free to use. - [WeConnect](https://www.wctokyoseoul.com) is a communication platform for global cultural exchange.
## ⏳ SaaS Platforms for Subtitling, Translation, and TTS The following table lists SaaS platforms supporting subtitling, translation, and text-to-speech (TTS/dubbing) functionalities. Costs are calculated for processing a 60-minute Korean video, including subtitle generation, English translation, and English dubbing, based on the latest available pricing data as of April 15, 2025. | Platform | Subtitling | Translation | TTS/Dubbing | Cost for 60-min Video (USD, Approx.) | Key Features | |-----------------|------------|-------------|-------------|-------------------------------------|------------------------------------------------------------------------------| | **[Maestra](https://maestra.ai)** | ✅ | ✅ | ✅ | $23.70 | 125+ languages, real-time captions, SEO keyword extraction, 15-min free trial. | | **[Kapwing](https://www.kapwing.com)** | ✅ | ✅ | ✅ | $30~$40 (Pro plan, per minute) | AI subtitles, 100+ language translations, auto lip-sync dubbing, free tier. | | **[VEED.IO](https://www.veed.io)** | ✅ | ✅ | ❌ | $24~$36 (Pro plan, partial) | 99.9% accurate subtitles, Instagram-optimized captions, intuitive editor. | | **[HappyScribe](https://happyscribe.com)** | ✅ | ✅ | ✅ | $36~$48 (Pay-as-you-go) | 120+ languages, professional proofreading, secure, meeting transcription. | | **[Sonix](https://sonix.ai)** | ✅ | ✅ | ✅ | $30~$40 (Standard plan) | 54+ languages, 30-min free transcription, YouTube/Zoom integration. | | **[Descript](https://descript.com)** | ✅ | ✅ | ✅ | $36~$48 (Creator plan) | Text-based editing, Overdub TTS, filler word removal, 1-hour free transcription. | | **[AppTek](https://apptek.ai)** | ✅ | ✅ | ✅ | Custom pricing (Contact) | Media-focused, custom models, metadata generation, cloud-based Workbench. | | **[Transkriptor](https://transkriptor.com)**| ✅ | ✅ | ❌ | $12~$18 (Pay-as-you-go) | 100+ languages, YouTube link transcription, 99% accuracy, simple editor. | ### Cost Calculation Details - **[Maestra](https://maestra.ai/)**: Premium Plan ($158/month, 1200 credits). 60-min video: 60 credits (subtitles) + 60 credits (translation) + 60 credits (dubbing) = 180 credits. Cost = (180/1200) * $158 = $23.70.[](https://maestra.ai/pricing) - **[Kapwing](https://www.kapwing.com)**: Pro plan (\~$24/month, limited minutes). Estimated $0.50\~$0.67/min for subtitles+translation+dubbing (based on per-minute pricing trends). 60-min cost: $30\~$40. Exact pricing requires confirmation. - **[VEED.IO](https://www.veed.io)**: Pro plan (\~$24/month). Subtitles+translation estimated at $0.40\~$0.60/min. No TTS, so partial processing. 60-min cost: $24\~$36. Confirm at [veed.io](https://veed.io). - **[HappyScribe](https://happyscribe.com)**: Pay-as-you-go (\~$0.20/min transcription, $0.20/min translation, $0.20/min dubbing). 60-min cost: $36\~$48 (assuming combined services). Confirm at [happyscribe.com](https://happyscribe.com). - **[Sonix](https://sonix.ai)**: Standard plan (\~$10/hour transcription, additional for translation/dubbing). Estimated $0.50\~$0.67/min total. 60-min cost: $30\~$40. Confirm at [sonix.ai](https://sonix.ai). - **[Descript](https://descript.com)**: Creator plan (\~$24/month, limited hours). Estimated $0.60\~$0.80/min for subtitles+translation+dubbing. 60-min cost: $36\~$48. Confirm at [descript.com](https://descript.com). - **[AppTek](https://apptek.ai)**: Custom pricing for enterprise. No public per-minute rates. Contact [apptek.ai](https://apptek.ai) for quotes. - **[Transkriptor](https://transkriptor.com)**: Pay-as-you-go ($0.05\~$0.10/min transcription, similar for translation). No TTS, so partial processing. 60-min cost: $12\~$18. Confirm at [transkriptor.com](https://transkriptor.com). ### Notes - **Cost for 60-min Video**: Costs are approximate and assume processing a 60-minute Korean video for subtitles, English translation, and English dubbing (where available). Platforms without TTS (e.g., VEED.IO, Transkriptor) reflect partial processing costs. - **Language Support**: Most platforms support Korean and English. Verify specific language availability on their websites. - **Use Cases**: - Media/Entertainment: AppTek, Maestra - Social Media: Kapwing, VEED.IO - Podcasts/Interviews: Sonix, Descript - E-learning/Global Content: Transkriptor, HappyScribe - **Pricing Updates**: Pricing may vary due to plan changes or promotions. Check official websites for the latest details. - For contributions or specific use case recommendations, open an issue or submit a pull request in this repository!
## ☕ Contributions Hello, I'm David from the Voice-Pro team. Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently. We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content. Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team. Thank you, ABUS Customer Service - If you want to participate in and help us with this project, feel free to create an [Issues](https://github.com/abus-aikorea/voice-pro/issues) - If something goes wrong, please submit a [Pull requests](https://github.com/abus-aikorea/voice-pro/pulls) to improve this project. - Any type of contribution is welcome. - For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. ()." - If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐ - You can support Voice-Pro with a donation here:

## 📬 Contact - Email: - Homepage (Korean): - Paid Version Purchase: [Shopify (Global)](https://r17wvy-t2.myshopify.com), [Naver (Korean)](https://smartstore.naver.com/abus) ## 🙏 Credits * Demucs: * yt-dlp: * gradio: * edge-TTS: * F5-TTS: * openai-whisper: * faster-whisper: * whisper-timestamped: * whisperX: * CosyVoice: * kokoro: * Deep-Translator: * spaCy: ## ©️ Copyright

by [ABUS](https://www.wctokyoseoul.com)