
# Video Translation & Dubbing Tool for Humans / Agents (Skills Included)

**[English](/README.md)|[简体中文](/docs/zh/README.md)|[日本語](/docs/jp/README.md)|[한국어](/docs/kr/README.md)|[Tiếng Việt](/docs/vi/README.md)|[Français](/docs/fr/README.md)|[Deutsch](/docs/de/README.md)|[Español](/docs/es/README.md)|[Português](/docs/pt/README.md)|[Русский](/docs/rus/README.md)|[اللغة العربية](/docs/ar/README.md)**
[](https://x.com/KrillinAI)
[](https://jq.qq.com/?_wv=1027&k=754069680)
[](https://space.bilibili.com/242124650)
[](https://deepwiki.com/krillinai/KrillinAI)
## Project Introduction (v2.0 with Agent support — now released)
[**Quick Start**](#-quick-start)
KrillinAI is a versatile audio and video localization and enhancement solution developed by the Krillin AI team, designed for both human users and AI Agents. The tool covers the complete pipeline including video download, speech transcription, subtitle translation, TTS dubbing, portrait conversion, and cover generation, supporting both landscape and portrait formats to ensure perfect presentation on all major platforms (Bilibili, Xiaohongshu, Douyin, WeChat Video, Kuaishou, YouTube, TikTok, etc.). Human users can complete end-to-end content localization with one click via the client; each capability can also be invoked independently via CLI, and AI Agents can orchestrate single or multiple stages on demand to flexibly compose automated workflows.
## New Features
🤖 **CLI Support**: Provides a phased command-line interface where each stage executes independently and outputs structured results, supporting cross-stage artifact reuse.
🧩 **Skills Collection**: The `skills/` directory provides per-stage Skills for AI Agents to invoke directly under a stable contract, no need to parse CLI documentation.
🔗 **Pipeline Orchestration**: Chain multiple stages in one command, enabling full automation from download to rendering.
🖼️ **Cover Generation**: Automatically generate platform cover images from the original video thumbnail and a prompt template.
## Key Features and Functions:
📥 **Video Acquisition**: Supports yt-dlp downloads or local file uploads
📜 **Accurate Recognition**: High-accuracy speech recognition based on Whisper
🧠 **Intelligent Segmentation**: Subtitle segmentation and alignment using LLM
🔄 **Terminology Replacement**: One-click replacement of professional vocabulary
🌍 **Professional Translation**: LLM translation with context to maintain natural semantics
🎙️ **Voice Cloning**: Offers selected voice tones from CosyVoice or custom voice cloning
🎬 **Video Composition**: Automatically processes landscape and portrait videos and subtitle layout
💻 **Cross-Platform**: Supports Windows, Linux, macOS, providing desktop, server, and CLI modes
## Effect Demonstration
The image below shows the effect of the subtitle file generated after importing a 46-minute local video and executing it with one click, without any manual adjustments. There are no omissions or overlaps, the segmentation is natural, and the translation quality is very high.
