Voice-Pro
The best AI speech recognition, translation, and multilingual dubbing solution 🚀
## 🎙️ An AI-powered web application for speech recognition, translation, and dubbing
한국어
∙
English
∙
中文简体
∙
中文繁體
∙
日本語
∙
Deutsch
∙
Español
∙
Português
Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.
- 🔊 Top-tier speech recognition: **Whisper**, **Faster-Whisper**, **Whisper-Timestamped**, **WhisperX**
- 🎤 Zero-shot voice cloning: **F5-TTS**, **E2-TTS**, **CosyVoice**
- 📢 Multilingual text-to-speech: **Edge-TTS**, **kokoro** (Paid version includes **Azure TTS**)
- 🎥 YouTube processing & audio extraction: **yt-dlp**
- 🌍 Instant translation for 100+ languages: **Deep-Translator** (Paid version includes **Azure Translator**)
A robust alternative to **ElevenLabs**, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.
## ⚠️ Please Note
- Due to [WeConnect](https://www.wctokyoseoul.com) development work, Voice-Pro development and updates are not possible for the time being.
- We have made all Voice-Pro code open source and completely free. Voice-Pro can now be freely distributed and modified by anyone.
- It works well on Windows with NVIDIA GPU. Operation on Mac and Linux has not been verified.
- Please leave your requests on the [](https://github.com/abus-aikorea/voice-pro/issues) or [](https://github.com/abus-aikorea/voice-pro/discussions) pages.
- **Troubleshooting**: In most cases, issues can be resolved by deleting the `installer_files` folder and then running `configure.bat` followed by `start.bat`.
## 📰 News & History
version 3.2
- We have been focusing on [WeConnect](https://www.wctokyoseoul.com) development for the past few months and have not been able to manage Voice-Pro at all.
- We have decided to open source all Voice-Pro code.
- Voice-Pro is completely free and supports Windows, Mac, Linux.
- [WeConnect](https://www.wctokyoseoul.com) is an application for global cultural exchange.
- Connect with people from all over the world for meaningful cultural exchanges, language learning, and international friendships.
version 3.1
- 🪄 Support for fine-tuned models of **F5-TTS**
- 🌍 Supported languages
-
English &
Chinese: SWivid/F5-TTS_v1
-
Finnish: AsmoKoskinen/F5-TTS_Finnish_Model
-
French: RASPIAUDIO/F5-French-MixedSpeakers-reduced
-
Hindi: SPRINGLab/F5-Hindi-24KHz
-
Italian: alien79/F5-TTS-italian
-
Japanese: Jmica/F5TTS/JA_21999120
-
Russian: hotstone228/F5-TTS-Russian
-
Spanish: jpgallegoar/F5-Spanish
version 3.0
- 🔥 Removed the **AI Cover** feature.
- 🚀 Added support for **m-bain/whisperX**.
version 2.0
- 🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
- 🆓 Free trial supports media up to **60 seconds** in length.
- 🔥 Added the **AI Cover** feature.
- 🎤 Introduced support for **CosyVoice** and **kokoro**.
- ⏳ Initial run downloads **CozyVoice2-0.5B (9GB)**, which may take over an hour depending on network speed.
- 🎧 Voice samples for cloning will be continuously updated.
- 📝 Added **spaCy** for natural sentence-by-sentence translation and TTS.
- ☁️ Subscription version includes **Microsoft Azure** Translator and TTS.
- 🏪 Subscription offers **unlimited usage** (no 60-second limit) during the subscription period, available via [](https://r17wvy-t2.myshopify.com).
## 🎥 YouTube Showcase
## ⭐ Key Features
### 1. Dubbing Studio
- YouTube video downloads & audio extraction
- Voice separation with **Demucs**
- Supports 100+ languages for speech recognition & translation
### 2. Speech Technologies
- **Speech-to-Text:** **Whisper**, **Faster-Whisper**, **Whisper-Timestamped**, **WhisperX**
- **Text-to-Speech:**
- **Edge-TTS**: 100+ languages, 400+ voices
- **E2-TTS**, **F5-TTS**, **CosyVoice**: Zero-shot cloning
- **kokoro**: Ranked #2 in HuggingFace TTS Arena
### 3. Real-Time Translation
- Instant speech recognition
- Multilingual translation on the fly
- Customizable audio inputs
## 🤖 WebUI
### `Dubbing Studio` Tab
- All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
- Supports all ffmpeg-compatible formats
- Output options: WAV, FLAC, MP3
- Subtitles & recognition for 100+ languages
- TTS with speed, volume, & pitch controls
### `Whisper Caption` Tab
- Subtitle-focused: 90+ languages
- Video-integrated subtitle display
- Word-level highlighting & denoise options
### `Translate` Tab
- Translation for 100+ languages
- Supports subtitle files (ASS, SSA, SRT, etc.)
- Real-time voice recognition & translation
### `Speech Generation` Tab
- Options: **Edge-TTS**, **F5-TTS**, **CosyVoice**, **kokoro**
- Celeb voice podcasts & multilingual support
## 🎤✨ Reference Voice
- Please request the voice you want to add on the Issues page. [Issues](https://github.com/abus-aikorea/voice-pro/issues/50)
English
 Andrew Bustamante |
 Andrew Huberman |
 Avi Loeb |
 Ben Shapiro |
 Brett Johnson |
 Brian Keating |
 Coffeezilla |
 Dan Carlin |
 David Buss |
 David Fravor |
 David Kipping |
 Dennis Whyte |
 Donald Hoffman |
 Donald Trump |
 Douglas Murray |
 Duncan Trussell |
 Elon Musk |
 Garry Nolan |
 Jack Barsky |
 James Sexton |
 Jeff Bezos |
 Joe Rogan |
 John Mearsheimer |
 Jordan Peterson |
 Kanye 'Ye' West |
 Mark Zuckerberg |
 Michael Levin |
 Michael Saylor |
 Michio Kaku |
 MrBeast |
 Nick Lane |
 Paul Rosolie |
 Ryan Graves |
 Sam Altman |
 Sam Harris |
 Stephen Wolfram |
 Tucker Carlson |
 Vitalik Buterin |
 Yuval Harari |
|
|
|
Chinese
 迪丽热巴 (Dílì Rèbā) |
 蔡依林 (Cài Yīlín) |
 吴亦凡 (Wú Yìfán) |
 李易峰 (Lǐ Yìfēng) |
 杨幂 (Yáng Mì) |
 赵丽颖 (Zhào Lìyǐng) |
Korean
 BTS 진 (Jin) |
 BTS RM |
 IU (아이유) |
 이병헌 |
 이정재 |
 유재석 |
Japanese
 綾瀬はるか (Ayase Haruka) |
|
|
|
|
|
## 💻 System Requirements
- **OS:** Windows 10/11 (64-bit), Linux, Mac
- **GPU:** NVIDIA with CUDA 12.4 (recommended)
- **VRAM:** 4GB+ (8GB+ preferred)
- **RAM:** 4GB+
- **Storage:** 20GB+ free space
- **Internet:** Required
## 📀 Installation
Install Voice-Pro with ease using **configure.bat** and **start.bat** (use configure.sh and start.sh on Mac/Linux).
### 1. Get the Package
+ Clone or download the latest release (**Source code (zip)**) from [](https://github.com/abus-aikorea/voice-pro/)
```bash
git clone https://github.com/abus-aikorea/voice-pro.git
```
### 2. Install & Run
1. 🚀 **configure.bat**
- Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
- Run once; takes 1+ hour with internet
- Don’t close the command window
2. 🚀 **start.bat**
- Launches Voice-Pro WebUI
- First run installs dependencies (1+ hour)
- Retry after deleting **installer_files** if issues arise
### 3. Update
- 🚀 **update.bat**: Refreshes Python environment (faster than reinstall)
### 4. Uninstall
- Run **uninstall.bat** or delete the folder (portable install)
## ❓Tips & Tricks
#### If Browser does not run automatically
- Close the Windows-Commnad window and run start.bat again.
- Run the browser directly and enter the address displayed in the Windows-Command window (e.g. **http://127.0.0.1:7870**) in the address bar.
#### If a CUDA Out-Of-Memory error occurs
- Check the GPU memory status in Windows Task Manager - Performance tab.
- Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
- Set Compute Type to int type. The float type has better quality, but requires more GPU memory.
#### How to improve the quality of subtitles?
- The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
- Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
- If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.
## 🚨 Notice
- Due to [WeConnect](https://www.wctokyoseoul.com) development work, there will be no Voice-Pro updates for the time being.
- All Voice-Pro code has been made open source. It is now completely free to use.
- [WeConnect](https://www.wctokyoseoul.com) is a communication platform for global cultural exchange.
## ⏳ SaaS Platforms for Subtitling, Translation, and TTS
The following table lists SaaS platforms supporting subtitling, translation, and text-to-speech (TTS/dubbing) functionalities. Costs are calculated for processing a 60-minute Korean video, including subtitle generation, English translation, and English dubbing, based on the latest available pricing data as of April 15, 2025.
| Platform | Subtitling | Translation | TTS/Dubbing | Cost for 60-min Video (USD, Approx.) | Key Features |
|-----------------|------------|-------------|-------------|-------------------------------------|------------------------------------------------------------------------------|
| **[Maestra](https://maestra.ai)** | ✅ | ✅ | ✅ | $23.70 | 125+ languages, real-time captions, SEO keyword extraction, 15-min free trial. |
| **[Kapwing](https://www.kapwing.com)** | ✅ | ✅ | ✅ | $30~$40 (Pro plan, per minute) | AI subtitles, 100+ language translations, auto lip-sync dubbing, free tier. |
| **[VEED.IO](https://www.veed.io)** | ✅ | ✅ | ❌ | $24~$36 (Pro plan, partial) | 99.9% accurate subtitles, Instagram-optimized captions, intuitive editor. |
| **[HappyScribe](https://happyscribe.com)** | ✅ | ✅ | ✅ | $36~$48 (Pay-as-you-go) | 120+ languages, professional proofreading, secure, meeting transcription. |
| **[Sonix](https://sonix.ai)** | ✅ | ✅ | ✅ | $30~$40 (Standard plan) | 54+ languages, 30-min free transcription, YouTube/Zoom integration. |
| **[Descript](https://descript.com)** | ✅ | ✅ | ✅ | $36~$48 (Creator plan) | Text-based editing, Overdub TTS, filler word removal, 1-hour free transcription. |
| **[AppTek](https://apptek.ai)** | ✅ | ✅ | ✅ | Custom pricing (Contact) | Media-focused, custom models, metadata generation, cloud-based Workbench. |
| **[Transkriptor](https://transkriptor.com)**| ✅ | ✅ | ❌ | $12~$18 (Pay-as-you-go) | 100+ languages, YouTube link transcription, 99% accuracy, simple editor. |
### Cost Calculation Details
- **[Maestra](https://maestra.ai/)**: Premium Plan ($158/month, 1200 credits). 60-min video: 60 credits (subtitles) + 60 credits (translation) + 60 credits (dubbing) = 180 credits. Cost = (180/1200) * $158 = $23.70.[](https://maestra.ai/pricing)
- **[Kapwing](https://www.kapwing.com)**: Pro plan (\~$24/month, limited minutes). Estimated $0.50\~$0.67/min for subtitles+translation+dubbing (based on per-minute pricing trends). 60-min cost: $30\~$40. Exact pricing requires confirmation.
- **[VEED.IO](https://www.veed.io)**: Pro plan (\~$24/month). Subtitles+translation estimated at $0.40\~$0.60/min. No TTS, so partial processing. 60-min cost: $24\~$36. Confirm at [veed.io](https://veed.io).
- **[HappyScribe](https://happyscribe.com)**: Pay-as-you-go (\~$0.20/min transcription, $0.20/min translation, $0.20/min dubbing). 60-min cost: $36\~$48 (assuming combined services). Confirm at [happyscribe.com](https://happyscribe.com).
- **[Sonix](https://sonix.ai)**: Standard plan (\~$10/hour transcription, additional for translation/dubbing). Estimated $0.50\~$0.67/min total. 60-min cost: $30\~$40. Confirm at [sonix.ai](https://sonix.ai).
- **[Descript](https://descript.com)**: Creator plan (\~$24/month, limited hours). Estimated $0.60\~$0.80/min for subtitles+translation+dubbing. 60-min cost: $36\~$48. Confirm at [descript.com](https://descript.com).
- **[AppTek](https://apptek.ai)**: Custom pricing for enterprise. No public per-minute rates. Contact [apptek.ai](https://apptek.ai) for quotes.
- **[Transkriptor](https://transkriptor.com)**: Pay-as-you-go ($0.05\~$0.10/min transcription, similar for translation). No TTS, so partial processing. 60-min cost: $12\~$18. Confirm at [transkriptor.com](https://transkriptor.com).
### Notes
- **Cost for 60-min Video**: Costs are approximate and assume processing a 60-minute Korean video for subtitles, English translation, and English dubbing (where available). Platforms without TTS (e.g., VEED.IO, Transkriptor) reflect partial processing costs.
- **Language Support**: Most platforms support Korean and English. Verify specific language availability on their websites.
- **Use Cases**:
- Media/Entertainment: AppTek, Maestra
- Social Media: Kapwing, VEED.IO
- Podcasts/Interviews: Sonix, Descript
- E-learning/Global Content: Transkriptor, HappyScribe
- **Pricing Updates**: Pricing may vary due to plan changes or promotions. Check official websites for the latest details.
- For contributions or specific use case recommendations, open an issue or submit a pull request in this repository!
## ☕ Contributions
Hello, I'm David from the Voice-Pro team.
Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently.
We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content.
Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team.
Thank you,
ABUS Customer Service
- If you want to participate in and help us with this project, feel free to create an [Issues](https://github.com/abus-aikorea/voice-pro/issues)
- If something goes wrong, please submit a [Pull requests](https://github.com/abus-aikorea/voice-pro/pulls) to improve this project.
- Any type of contribution is welcome.
- For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. ()."
- If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐
- You can support Voice-Pro with a donation here:
## 📬 Contact
- Email:
- Homepage (Korean):
- Paid Version Purchase: [Shopify (Global)](https://r17wvy-t2.myshopify.com), [Naver (Korean)](https://smartstore.naver.com/abus)
## 🙏 Credits
* Demucs:
* yt-dlp:
* gradio:
* edge-TTS:
* F5-TTS:
* openai-whisper:
* faster-whisper:
* whisper-timestamped:
* whisperX:
* CosyVoice:
* kokoro:
* Deep-Translator:
* spaCy:
## ©️ Copyright
by [ABUS](https://www.wctokyoseoul.com)