
Auto-TrainingAuto-LabelingDetect AnythingSegment AnythingPromptable Concept GroundingVQAChatbotImage Classifier
## ๐ฅณ What's New
- Added [PP-DocLayoutV3](./examples/optical_character_recognition/document_layout_analysis/README.md), supporting multi-point localization (quadrilaterals/polygons) and logical reading order prediction
- Added [PaddleOCR-VL-1.5](./examples/optical_character_recognition/multi_task/README.md), supporting OCR, table recognition, formula recognition, chart recognition, text spotting, and seal recognition
- Added [YOLO26](https://github.com/ultralytics/ultralytics) series models for object detection, instance segmentation, pose estimation, and rotated object detection
- Added Compare View feature for split-screen image comparison (ideal for infrared/visible fusion, mask preview, and super-resolution) [[docs](./docs/en/user_guide.md#36-compare-view)]
- Added multimodal large language model [Rex-Omni](https://github.com/IDEA-Research/Rex-Omni) with support for grounding, keypoints, referring pointing, OCR, and visual prompting tasks [[docs](./examples/vision_language/rexomni/README.md)]
- Added powerful file search feature upporting text search, regular expression search, and attribute-based filtering [[docs](./docs/en/user_guide.md#25-searching-images)]
- Added semi-transparent mask rendering for polygon, rectangle, rotation, and circle shapes with toggle support (`Ctrl+M`)
- Added one-click text and visual prompt video detection and segmentation tracking based on Segment Anything 3 [[docs](./examples/interactive_video_object_segmentation/sam3/README.md)]
- For more details, please refer to the [CHANGELOG](./CHANGELOG.md)
## X-AnyLabeling
**X-AnyLabeling** is a powerful annotation tool that integrates an AI engine for fast and automatic labeling. It's designed for multi-modal data engineers, offering industrial-grade solutions for complex tasks.
Also, we highly recommend trying out [X-AnyLabeling-Server](https://github.com/CVHub520/X-AnyLabeling-Server), a simple, lightweight, and extensible framework that enables remote inference capabilities for X-AnyLabeling.
## Features
- Supports remote inference service.
- Processes both `images` and `videos`.
- Accelerates inference with `GPU` support.
- Allows custom models and secondary development.
- Supports one-click inference for all images in the current task.
- Supports import/export for formats like COCO, VOC, YOLO, DOTA, MOT, MASK, PPOCR, MMGD, VLM-R1.
- Handles tasks like `classification`, `detection`, `segmentation`, `caption`, `rotation`, `tracking`, `estimation`, `ocr`, `vqa`, `grounding` and so on.
- Supports diverse annotation styles: `polygons`, `rectangles`, `rotated boxes`, `circles`, `lines`, `points`, and annotations for `text detection`, `recognition`, and `KIE`.
### Model library
| **Task Category** | **Supported Models** |
| :--- | :--- |
| ๐ผ๏ธ Image Classification | YOLOv5-Cls, YOLOv8-Cls, YOLO11-Cls, InternImage, PULC |
| ๐ฏ Object Detection | YOLOv5/6/7/8/9/10, YOLO11/12/26, YOLOX, YOLO-NAS, D-FINE, DAMO-YOLO, Gold_YOLO, RT-DETR, RF-DETR, DEIMv2 |
| ๐๏ธ Instance Segmentation | YOLOv5-Seg, YOLOv8-Seg, YOLO11-Seg, YOLO26-Seg, Hyper-YOLO-Seg, RF-DETR-Seg |
| ๐ Pose Estimation | YOLOv8-Pose, YOLO11-Pose, YOLO26-Pose, DWPose, RTMO |
| ๐ฃ Tracking | Bot-SORT, ByteTrack, SAM2/3-Video |
| ๐ Rotated Object Detection | YOLOv5-Obb, YOLOv8-Obb, YOLO11-Obb, YOLO26-Obb |
| ๐ Depth Estimation | Depth Anything |
| ๐งฉ Segment Anything | SAM 1/2/3, SAM-HQ, SAM-Med2D, EdgeSAM, EfficientViT-SAM, MobileSAM |
| โ๏ธ Image Matting | RMBG 1.4/2.0 |
| ๐ก Proposal | UPN |
| ๐ท๏ธ Tagging | RAM, RAM++ |
| ๐ OCR | PP-OCRv4, PP-OCRv5, PP-DocLayoutV3, PaddleOCR-VL-1.5 |
| ๐ฃ๏ธ Vision Foundation Models | Rex-Omni, Florence2 |
| ๐๏ธ Vision Language Models | Qwen3-VL, Gemini, ChatGPT |
| ๐ฃ๏ธ Land Detection | CLRNet |
| ๐ Grounding | CountGD, GeCO, Grounding DINO, YOLO-World, YOLOE |
| ๐ Other | ๐ [model_zoo](./docs/en/model_zoo.md) ๐ |
## Docs
0. [Remote Inference Service](https://github.com/CVHub520/X-AnyLabeling-Server)
1. [Installation & Quickstart](./docs/en/get_started.md)
2. [Usage](./docs/en/user_guide.md)
3. [Command Line Interface](./docs/en/cli.md)
4. [Customize a model](./docs/en/custom_model.md)
5. [Chatbot](./docs/en/chatbot.md)
6. [VQA](./docs/en/vqa.md)
7. [Multi-class Image Classifier](./docs/en/image_classifier.md)
## Examples
- [Classification](./examples/classification/)
- [Image-Level](./examples/classification/image-level/README.md)
- [Shape-Level](./examples/classification/shape-level/README.md)
- [Detection](./examples/detection/)
- [HBB Object Detection](./examples/detection/hbb/README.md)
- [OBB Object Detection](./examples/detection/obb/README.md)
- [Segmentation](./examples/segmentation/README.md)
- [Instance Segmentation](./examples/segmentation/instance_segmentation/)
- [Binary Semantic Segmentation](./examples/segmentation/binary_semantic_segmentation/)
- [Multiclass Semantic Segmentation](./examples/segmentation/multiclass_semantic_segmentation/)
- [Description](./examples/description/)
- [Tagging](./examples/description/tagging/README.md)
- [Captioning](./examples/description/captioning/README.md)
- [Estimation](./examples/estimation/)
- [Pose Estimation](./examples/estimation/pose_estimation/README.md)
- [Depth Estimation](./examples/estimation/depth_estimation/README.md)
- [OCR](./examples/optical_character_recognition/)
- [Text Recognition](./examples/optical_character_recognition/text_recognition/)
- [Key Information Extraction](./examples/optical_character_recognition/key_information_extraction/README.md)
- [MOT](./examples/multiple_object_tracking/README.md)
- [Tracking by HBB Object Detection](./examples/multiple_object_tracking/README.md)
- [Tracking by OBB Object Detection](./examples/multiple_object_tracking/README.md)
- [Tracking by Instance Segmentation](./examples/multiple_object_tracking/README.md)
- [Tracking by Pose Estimation](./examples/multiple_object_tracking/README.md)
- [iVOS](./examples/interactive_video_object_segmentation)
- [SAM2-Video](./examples/interactive_video_object_segmentation/sam2/README.md)
- [SAM3-Video](./examples/interactive_video_object_segmentation/sam3/README.md)
- [Matting](./examples/matting/)
- [Image Matting](./examples/matting/image_matting/README.md)
- [Vision-Language](./examples/vision_language/)
- [Rex-Omni](./examples/vision_language/rexomni/README.md)
- [Florence 2](./examples/vision_language/florence2/README.md)
- [Counting](./examples/counting/)
- [GeCo](./examples/counting/geco/README.md)
- [Grounding](./examples/grounding/)
- [YOLOE](./examples/grounding/yoloe/README.md)
- [SAM 3](./examples/grounding/sam3/README.md)
- [Training](./examples/training/)
- [Ultralytics](./examples/training/ultralytics/README.md)
## Contribute
We believe in open collaboration! **XโAnyLabeling** continues to grow with the support of the community. Whether you're fixing bugs, improving documentation, or adding new features, your contributions make a real impact.
To get started, please read our [Contributing Guide](./CONTRIBUTING.md) and make sure to agree to the [Contributor License Agreement (CLA)](./CLA.md) before submitting a pull request.
If you find this project helpful, please consider giving it a โญ๏ธ star! Have questions or suggestions? Open an [issue](https://github.com/CVHub520/X-AnyLabeling/issues) or email us at cv_hub@163.com.
A huge thank you ๐ to everyone helping to make XโAnyLabeling better.
## License
This project is licensed under the [GPL-3.0 license](./LICENSE) and is completely open source and free. The original intention is to enable more developers, researchers, and enterprises to conveniently use this AI application platform, promoting the development of the entire industry. We encourage everyone to use it freely (including commercial use), and you can also add features based on this project and commercialize it, but you must retain the brand identity and indicate the source project address.
Additionally, to understand the ecosystem and usage of X-AnyLabeling, if you use this project for academic, research, teaching, or enterprise purposes, please fill out the [registration form](https://forms.gle/MZCKhU7UJ4TRSWxR7). This registration is only for statistical purposes and will not incur any fees. We will strictly keep all information confidential.
X-AnyLabeling is independently developed and maintained by an individual. If this project has been helpful to you, we welcome your support through the donation links below to help sustain the project's continued development. Your support is the greatest encouragement! If you have any questions about the project or would like to collaborate, please feel free to contact via WeChat: ww10874 or email provided above.
## Sponsors
- [buy-me-a-coffee](https://ko-fi.com/cvhub520)
- [Wechat/Alipay](https://github.com/CVHub520/X-AnyLabeling/blob/main/README_zh-CN.md#%E8%B5%9E%E5%8A%A9)
## Acknowledgement
I extend my heartfelt thanks to the developers and contributors of [AnyLabeling](https://github.com/vietanhdev/anylabeling), [LabelMe](https://github.com/wkentaro/labelme), [LabelImg](https://github.com/tzutalin/labelImg), [roLabelImg](https://github.com/cgvict/roLabelImg), [PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel) and [CVAT](https://github.com/opencv/cvat), whose work has been crucial to the success of this project.
## Citing
If you use this software in your research, please cite it as below:
```
@misc{X-AnyLabeling,
year = {2023},
author = {Wei Wang},
publisher = {Github},
organization = {CVHub},
journal = {Github repository},
title = {Advanced Auto Labeling Solution with Added Features},
howpublished = {\url{https://github.com/CVHub520/X-AnyLabeling}}
}
```
---
