--- name: vllm-studio-backend description: Use when working on vLLM Studio backend architecture (controller runtime, Pi-mono agent loop, OpenAI-compatible endpoints, LiteLLM gateway, inference process, and debugging commands). --- # vLLM Studio Backend Architecture ## Overview This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management. ## When To Use - Modifying controller routes or run streaming. - Debugging OpenAI-compatible endpoint behavior. - Updating Pi-mono agent runtime or tool execution. - Understanding how inference + LiteLLM fit together. ## Quick Start - Read `references/backend-architecture.md` for the component map and data flow. - Read `references/openai-compat.md` for `/v1/models` and `/v1/chat/completions` behavior. - Read `references/backend-commands.md` for useful run/debug commands. ## Core Guarantees - Keep OpenAI-compatible endpoints stable (`/v1/models`, `/v1/chat/completions`). - `/chat` UI uses controller run stream (`/chats/:id/turn`) and Pi-mono runtime. - Tool execution happens server-side (MCP + AgentFS + plan tools). ## References - `references/backend-architecture.md` - `references/openai-compat.md` - `references/backend-commands.md`