English | [简体中文](./readme/README_cn.md) | [繁體中文](./readme/README_tcn.md) | [日本語](./readme/README_ja.md) | [한국어](./readme/README_ko.md) | [Français](./readme/README_fr.md) | [Русский](./readme/README_ru.md) | [Español](./readme/README_es.md) | [العربية](./readme/README_ar.md)
[](https://github.com/PaddlePaddle/PaddleOCR)
[](https://github.com/PaddlePaddle/PaddleOCR)
[](https://arxiv.org/pdf/2507.05595)
[](https://arxiv.org/abs/2510.14528)
[](https://pepy.tech/projectsproject/paddleocr)
[](https://pepy.tech/projects/paddleocr)
[](https://github.com/PaddlePaddle/PaddleOCR/network/dependents)
[](https://pypi.org/project/paddleocr/)



[](../LICENSE)
[](https://deepwiki.com/PaddlePaddle/PaddleOCR)
[](https://www.paddleocr.com)
**PaddleOCR is an industry-leading, production-ready OCR and document AI engine, offering end-to-end solutions from text extraction to intelligent document understanding**
# PaddleOCR
[](https://www.paddlepaddle.org.cn/en)
[](#)
[](#)
[](#)
[](#)
> [!TIP]
> PaddleOCR now provides an MCP server that supports integration with Agent applications like Claude Desktop. For details, please refer to [PaddleOCR MCP Server](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/mcp_server.html).
>
> The PaddleOCR 3.0 Technical Report is now available. See details at: [PaddleOCR 3.0 Technical Report](https://arxiv.org/abs/2507.05595).
>
> The PaddleOCR-VL Technical Report is now available. See details at [PaddleOCR-VL Technical Report](https://arxiv.org/abs/2510.14528).
>
> The Beta version of the PaddleOCR official website is now live, offering a more convenient online experience and large-scale PDF file parsing, as well as free API and MCP services. For more details, please visit the [PaddleOCR official website](https://www.paddleocr.com).
**PaddleOCR** converts documents and images into **structured, AI-friendly data** (like JSON and Markdown) with **industry-leading accuracy**—powering AI applications for everyone from indie developers and startups to large enterprises worldwide. With over **60,000 stars** and deep integration into leading projects like **MinerU, RAGFlow, pathway and cherry-studio**, PaddleOCR has become the **premier solution** for developers building intelligent document applications in the **AI era**.
### PaddleOCR 3.0 Core Features
[](https://aistudio.baidu.com/paddleocr)
[](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL-1.5_Online_Demo)
[](https://www.modelscope.cn/studios/PaddlePaddle/PaddleOCR-VL-1.5_Online_Demo)
[](https://aistudio.baidu.com/community/app/91660/webUI)
[](https://aistudio.baidu.com/community/app/518494/webUI)
[](https://aistudio.baidu.com/community/app/518493/webUI)
- **PaddleOCR-VL-1.5: 0.9B VLM for Real-World Document Parsing and Text Spotting**
A SOTA and resource-efficient model designed for real-world document parsing and text spotting tasks. It achieves comprehensive leadership across six major scenarios: normal, skew, warping, scanning, varied lighting, and screen photography in document parsing task. It introduces the leading new capabilities for text spotting and seal recognition, strengthens the parsing of complex elements (such as text, tables, formulas, and charts), and expands language support to 111 languages—all while maintaining extremely low resource consumption.
- **PaddleOCR-VL - Multilingual Document Parsing via a 0.9B VLM**
**The SOTA and resource-efficient model tailored for document parsing**, that supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption.
- **PP-OCRv5 — Universal Scene Text Recognition**
**Single model supports five text types** (Simplified Chinese, Traditional Chinese, English, Japanese, and Pinyin) with **13% accuracy improvement**. Solves multilingual mixed document recognition challenges.
- **PP-StructureV3 — Complex Document Parsing**
Intelligently converts complex PDFs and document images into **Markdown and JSON files that preserve original structure**. **Outperforms** numerous commercial solutions in public benchmarks. **Perfectly maintains document layout and hierarchical structure**.
- **PP-ChatOCRv4 — Intelligent Information Extraction**
Natively integrates ERNIE 4.5 to **precisely extract key information** from massive documents, with 15% accuracy improvement over previous generation. Makes documents "**understand**" your questions and provide accurate answers.
In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.
| Project Name | Description |
| ------------ | ----------- |
| [RAGFlow](https://github.com/infiniflow/ragflow)

|RAG engine based on deep document understanding.|
| [pathway](https://github.com/pathwaycom/pathway)

|Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.|
| [MinerU](https://github.com/opendatalab/MinerU)

|Multi-type Document to Markdown Conversion Tool|
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR)

|Free, Open-source, Batch Offline OCR Software.|
| [cherry-studio](https://github.com/CherryHQ/cherry-studio)

|A desktop client that supports for multiple LLM providers.|
| [OmniParser](https://github.com/microsoft/OmniParser)

|OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.|
| [QAnything](https://github.com/netease-youdao/QAnything)

|Question and Answer based on Anything.|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit)

|A powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents.|
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)

|Recognize text on the screen, translate it and show the translation results in real time.|
| [Learn more projects](./awesome_projects.md) | [More projects based on PaddleOCR](./awesome_projects.md)|
## 👩👩👧👦 Contributors