Task-oriented AI Agent productivity platform — redefining operational boundaries and memory evolution, one WorkSpace at a time.

English | 简体中文
 Website · Live Demo · Tutorial · Quick Start · Highlights · Use Cases · Community

--- **News** 🔥 - **[2026.05.28]** PilotDeck is now open source! Visit our official website at [pilotdeck.openbmb.cn](https://pilotdeck.openbmb.cn). We welcome contributions, feedback, and stars from the community. --- ## 💡 About PilotDeck **PilotDeck** is an open-source agent operating system designed around the concept of "WorkSpace". It is jointly developed and open-sourced by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/), and [AI9Stars](https://github.com/AI9Stars). Targeting general-purpose, multi-task scenarios, PilotDeck is built to be a true *productivity tool* for the Agent era. A wave of excellent AI Agent harnesses has emerged in recent years, each with its own focus: **Claude Code / Cursor / Trae Solo** brought model reasoning deep into the programming IDE; **Claude Cowork** introduced the notion of project-level isolation to desktop-side knowledge work; **WorkBuddy** connected agents to IM ecosystems such as WeCom and Feishu so AI is one message away. When we shift the lens from "one-shot programming" or "immediate Q&A" to **long-running, multi-project productivity work**, however, several questions remain open: - When many projects run in parallel, can memory be **white-box and traceable**? When the AI gets something wrong, can you pinpoint which memory entry caused it and edit it directly — without starting a new chat from scratch? - Can token cost be **tracked per task**, so that running agents in the background actually becomes economically viable? - Can tasks of different difficulty **automatically be matched to different models**, instead of burning the flagship model on trivial calls? - When you step away from the keyboard, can the work keep moving? Can the agent **proactively discover what's worth doing, report progress, and land results as files on disk**? PilotDeck is an incremental exploration around exactly these questions. It uses the WorkSpace as the fundamental unit — completely isolating files, memory and skills per project — and pairs it with three pillar capabilities: **White-box Memory**, **Smart Routing** and **Always-on**. The entire system natively supports the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and behaves consistently across front-ends (Web / CLI / IM). ### ✨ Key Highlights

WorkSpace-Level Isolation & Accretion Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.	Traceable White-box Memory Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in Dream Mode consolidates memory in idle windows, and supports one-click rollback.
Smart Routing & Cost Optimization Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.	Always-on Background Execution PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.

### 📊 Real-world Numbers The three pillar capabilities have shown clear advantages in production-grade workflows: #### 1. Smart Routing — ~70% cost savings on social-media workloads In Xiaohongshu-style social-media operations, enabling Smart Routing automatically demotes simple polishing / layout tasks to a sub-agent (e.g. Sonnet 4.5) and only invokes Opus 4.5 at planning checkpoints:

Setup	Model configuration	Cost	Multiplier
Smart Routing ON	Opus 4.5 (main) + Sonnet 4.5 (sub)	$2.83	1.1×
Smart Routing OFF	All Opus 4.5 (main + sub)	$12.58	5.0×
Monolithic	Single Opus 4.5 long-react (estimated)	$12.20	4.8×

#### 2. Smart Routing — 1/6 the cost while beating frontier models on hard tasks The research team benchmarked 7 complex tasks (multilingual podcast push, multi-source data reports, domain-specific literature review, codebase architecture docs, etc.). The "strong main + light sub" routing setup matches or beats the frontier single-model setup at a fraction of the cost:

Setting	Score	Cost
MiniMax-M2.7 single-agent	37.1	$1.90
Claude Sonnet 4.6 single-agent	69.1	$18.36
Sonnet 4.6 (main) + MiniMax-M2.7 (sub)	70.6	$3.15

#### 3. White-box Memory — layout & tone never bleed across projects In black-box agents, mixing tasks in a shared context pool inevitably pollutes memory. PilotDeck's WorkSpace-scoped white-box memory addresses this end-to-end:

Dimension	Current AI Agents (black-box)	PilotDeck (white-box)
Visibility	You can't see what the AI remembers, only what it outputs	View every memory entry: what was stored, when, and which WorkSpace
Control	Once written, memory can't be edited or removed	Edit / delete entries, pin critical decisions so they don't drift
Traceability	When it goes wrong, you can't find the root cause	Generation → extraction → storage → retrieval, all auditable
Isolation	One shared pool — projects bleed into each other	Scoped per WorkSpace; A's memory never reaches B
Reversible	After compression, the original is gone	Dream-mode supports one-click rollback to the prior state

--- ## 🖥️ UI & Demo PilotDeck ships an out-of-the-box Web UI with full WorkSpace management, white-box memory editing, and visualization of multi-agent collaboration. ### Use Cases > All demos below are generated entirely by edge-side models via PilotDeck's Smart Routing — no cloud-side frontier model required. #### Work Document Generation > *"Survey the Chinese LLM application market and turn it into a formal HTML white paper."*

Process	Result

#### Mini-Game Development > *"Walk me through building an iOS AR mini-game Ball Finder in Vibe Coding mode."*

Process	Result

#### AI Engineering Platform Development > *"Build a low-code embedding fine-tuning platform from scratch."*

Process	Result

#### Audio-Video Editing & Social Media Operations > *"Push this English podcast to a global audience in Chinese / Japanese / French / Korean / Spanish / Arabic."*

Process	Result (with audio)
	https://github.com/user-attachments/assets/a7245467-ee3c-4939-a055-c56576ac56d1

--- ## 📦 Installation & Quick Start We provide a one-line installer for macOS / Linux, plus a source-based workflow for developers. ### Option A: One-line install (recommended, macOS / Linux) ```bash curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash ``` The script auto-installs Node.js 22, clones the repo, installs dependencies, and builds the frontend. Once it finishes: ```bash pilotdeck # starts the server at http://localhost:3001 pilotdeck status # check runtime status ``` ### Option B: From source (for developers) **1. Clone and install dependencies** > This repo uses [Git LFS](https://git-lfs.com/) for large media assets. Make sure `git lfs` is installed before cloning. > If you don't need the demo videos/GIFs, add `GIT_LFS_SKIP_SMUDGE=1` before `git clone` to skip downloading them. ```bash git clone https://github.com/OpenBMB/PilotDeck.git cd PilotDeck npm install # root deps (Gateway runtime) cd ui && npm install # UI deps cd .. ``` **2. Configure a model provider** PilotDeck reads `~/.pilotdeck/pilotdeck.yaml`. You can create it manually, let the bootstrap script generate one, **or just open the Web UI and configure providers visually in the settings panel.** Supported protocols include OpenAI, Anthropic, DeepSeek, Qwen, Kimi, MiniMax and other OpenAI-compatible endpoints. ```yaml schemaVersion: 1 agent: model: deepseek/deepseek-v4-pro model: providers: deepseek: protocol: openai url: https://api.deepseek.com/v1 apiKey: sk-your-api-key ``` **3. Start the services** ```bash cd ui && npm run dev # dev mode (HMR), visit http://localhost:5173 # or cd ui && npm run start # production mode, visit http://localhost:3001 ``` ### Option C: Docker Compose If Docker is installed, you can start PilotDeck with: ```bash docker compose up -d ``` --- ## 🛠️ Extension Protocol PilotDeck has an open plugin architecture with a strict boundary between the open-source core and plugin customization. Extending the system is a `plugin.json` away: - **MCP Servers** — first-class integration with any Model Context Protocol server. - **Tools & Skills** — register custom tools, or pull community skills via [ClawHub](https://www.npmjs.com/package/clawhub). - **Lifecycle Hooks** — intercept `PreToolUse`, `UserPromptSubmit`, and other critical lifecycle events. - **Custom Memory** — plug in your own memory store provider. --- ## 🤝 Contributing Thanks to everyone who has contributed code, feedback, and ideas. New contributors are warmly welcome — let's build the next-gen agent OS together. Workflow: **Fork → feature branch → PR**. --- ## 💬 Community - For bugs and feature requests, please open a [GitHub Issue](https://github.com/OpenBMB/PilotDeck/issues). - Join our community channels:

WeChat Community	Feishu Community	Discord Community

--- ## 🙏 Acknowledgements We thank Agent OS pioneers such as OpenClaw, Claude Code, Codex, Cursor, and Hermes for their explorations that helped shape this field. PilotDeck builds upon the following outstanding open-source projects: - [ClawXRouter](https://github.com/OpenBMB/ClawXRouter) — Intelligent model routing - [ClawXMemory](https://github.com/OpenBMB/ClawXMemory) — Agent memory system - [Claude Code UI](https://github.com/siteboon/claudecodeui) — Web UI reference - [Claude Code Router](https://github.com/musistudio/claude-code-router) — Model routing reference - [UltraRAG](https://github.com/OpenBMB/UltraRAG) — RAG framework - [Anthropic Skills](https://github.com/anthropics/skills) — Agent skill framework and built-in skills (skill-creator) - [Vercel Labs Skills](https://github.com/vercel-labs/skills) — find-skills skill - [MiniMax-AI Skills](https://github.com/MiniMax-AI/skills) — minimax-pdf skill - [frontend-slides](https://github.com/zarazhangrui/frontend-slides) — Create beautiful slides on the web using a coding agent's frontend skills - [Karpathy Guidelines](https://x.com/karpathy/status/2015883857489522876) — LLM coding behavioral guidelines - [Vite](https://github.com/vitejs/vite) — Frontend build tool - [React](https://github.com/facebook/react) — UI framework - [Tailwind CSS](https://github.com/tailwindlabs/tailwindcss) — Utility-first CSS framework - [shadcn/ui](https://github.com/shadcn-ui/ui) — Accessible component primitives for React --- ## 🏢 Joint Development PilotDeck is jointly developed by Tsinghua University [THUNLP](https://nlp.csai.tsinghua.edu.cn/), [ModelBest](https://modelbest.cn/), [OpenBMB](https://www.openbmb.cn/) and [AI9Stars](https://github.com/AI9Stars). --- ## ⭐ Support Us If PilotDeck has been helpful in your work or research, please consider giving us a Star on GitHub! --- ## 📝 Citation ```bibtex @misc{pilotdeck2026, author = {PilotDeck Team}, title = {PilotDeck: A WorkSpace-Centric Open-Source Agent Operating System}, howpublished = {\url{https://github.com/OpenBMB/PilotDeck}}, year = {2026}, note = {Accessed: 2026-05-29} } ``` ## 📄 License This project is licensed under the [GNU Affero General Public License v3.0](LICENSE).

WorkSpace-Level Isolation & Accretion Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.	Traceable White-box Memory Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in Dream Mode consolidates memory in idle windows, and supports one-click rollback.
Smart Routing & Cost Optimization Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.	Always-on Background Execution PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.