AI-powered, vision-driven UI automation for every platform.
## ๐ฃ Midscene Skills is here! Use [Midscene Skills](https://github.com/web-infra-dev/midscene-skills) to control any platform with [OpenClaw](https://github.com/OpenClaw/OpenClaw) ## Showcases * [Web Automation - Automatically register the GitHub form in a web browser and pass all field validations](https://midscenejs.com/showcases#web) * [iOS Automation - Meituan coffee order](https://midscenejs.com/showcases#ios) * [iOS Automation - Auto-like the first @midscene_ai tweet](https://midscenejs.com/showcases#ios) * [Android Automation - DCar: Xiaomi SU7 specs](https://midscenejs.com/showcases#android) * [Android Automation - Booking a hotel for Christmas](https://midscenejs.com/showcases#android) * [MCP Integration - Midscene MCP UI prepatch release](https://midscenejs.com/showcases#mcp) * [robotic arm + vision + voice for in-vehicle testing](https://midscenejs.com/showcases#community-showcases) ## ๐ก Features ### Write Automation with Natural Language - Describe your goals and steps, and Midscene will plan and operate the user interface for you. - Use Javascript SDK or YAML to write your automation script. ### Web & Mobile App & Any Interface - **Web Automation**: Either integrate with [Puppeteer](https://midscenejs.com/integrate-with-puppeteer), [Playwright](https://midscenejs.com/integrate-with-playwright) or use [Bridge Mode](https://midscenejs.com/bridge-mode) to control your desktop browser. - **Android Automation**: Use [Javascript SDK](https://midscenejs.com/android-getting-started) with adb to control your local Android device. - **iOS Automation**: Use [Javascript SDK](https://midscenejs.com/ios-getting-started) with WebDriverAgent to control your local iOS devices and simulators. - **Any Interface Automation**: Use [Javascript SDK](https://midscenejs.com/integrate-with-any-interface) to control your own interface. ### For Developers - **Three kinds of APIs**: - [Interaction API](https://midscenejs.com/api#interaction-methods): interact with the user interface. - [Data Extraction API](https://midscenejs.com/api#data-extraction): extract data from the user interface and dom. - [Utility API](https://midscenejs.com/api#more-apis): utility functions like `aiAssert()`, `aiLocate()`, `aiWaitFor()`. - **MCP**: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. [Docs](https://midscenejs.com/mcp) - [**Caching for Efficiency**](https://midscenejs.com/caching): Replay your script with cache and get the result faster. - **Debugging Experience**: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need. ## ๐ Zero-code Quick Experience - **[Chrome Extension](https://midscenejs.com/quick-experience)**: Start in-browser experience immediately through [the Chrome Extension](https://midscenejs.com/quick-experience), without writing any code. - **[Android Playground](https://midscenejs.com/android-getting-started)**: There is also a built-in Android playground to control your local Android device. - **[iOS Playground](https://midscenejs.com/ios-getting-started)**: There is also a built-in iOS playground to control your local iOS device. ## โจ Driven by Visual Language Model Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like `Qwen3-VL`, `Doubao-1.6-vision`, `gemini-3-pro`, and `UI-TARS`. For data extraction and page understanding, you can still opt in to include DOM when needed. * Pure-vision localization for UI actions; the DOM extraction mode is removed. * Works across web, mobile, desktop, and even `