🤗 HuggingFace Demo | 🌐 Homepage | 📄 arXiv

Today, we are excited to introduce **Seed1.5-VL** 🚀, a powerful and efficient vision-language foundation model designed for advanced general-purpose multimodal understanding and reasoning. ## 🌟 Highlights * 🧠 **Efficient Powerhouse:** Achieves top performance with a relatively modest architecture, 532M vision encoder & 20B active parameter MoE LLM. * 🏆 **Exceptional Benchmark Performance:** Delivers State-of-the-Art results on 38 out of 60 public VLM benchmarks, demonstrating broad competence. * 💡 **Versatile Capabilities:** Excels across diverse capabilities including complex reasoning (e.g., visual puzzles like Rebus), OCR, diagram understanding, visual grounding, 3D spatial understanding, and video comprehension. * 🤖 **Advanced Agent-Centric Abilities:** Demonstrates leading performance in interactive agent tasks, showcasing strong capabilities in GUI control and gameplay. **This repository offers usage cookbook and best practices designed to help developers effectively use Seed1.5-VL.** ## 📢 News * `2025-05-13:` We have deployed our Seed1.5-VL on [🤗 HuggingFace Spaces](https://huggingface.co/spaces/ByteDance-Seed/Seed1.5-VL), Welcome to try out our model! * `2025-05-12:` We have released the [Seed1.5-VL Technical Report](./Seed1.5-VL-Technical-Report.pdf). * `2025-05-12:` We are extremely delighted to release the flagship Seed1.5-VL on [Volcano Engine](https://www.volcengine.com/product/doubao). The Model ID is `doubao-1-5-thinking-vision-pro-250428`. You can try it now! ## 📮 Notice **Call for Bad Cases:** If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue [https://github.com/ByteDance-Seed/Seed1.5-VL/issues/12] ## 📖 Seed1.5-VL Cookbook The Seed1.5-VL cookbook is designed to help you start using the Seed1.5-VL API with diverse code samples. Our flagship Seed1.5-VL has been deployed on [Volcano Engine](https://www.volcengine.com/product/doubao). After obtaining your `API_KEY`, you can use the examples in this cookbook to rapidly understand and leverage the diverse capabilities of our Seed1.5-VL. ### Quick Start - [x] Cookbook for online/offline [Gradio Demo](./GradioDemo) - [x] Cookbook for turning on/off [LongCoT](./longCoT) - [x] Cookbook for [2D Grounding](./Grounding) - [x] Cookbook for [3D Understanding](./3D-Understanding) - [x] Cookbook for [Video Understanding](./Video) - [X] Cookbook for [GUI Agents](./GUI) ## Citations If you Seed1.5-VL useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry. ```bibtex @article{seed2025seed1_5vl, title={Seed1.5-VL Technical Report}, author={ByteDance Seed Team}, journal={arXiv preprint arXiv:2505.07062}, year={2025} } ``` ## License This repo is under [Apache-2.0 License](./LICENSE). ## About [ByteDance Seed Team](https://seed.bytedance.com/) Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.