# AutoGLM-GUI - AI Agent Usage Guide **If you are an AI Agent:** This is your complete guide to installing, configuring, and using AutoGLM-GUI via command line and API. If you are a human user, please refer to [README.md](./README.md). ## Overview AutoGLM-GUI is a phone agent platform that lets you control Android devices using natural language. You send a text instruction (e.g., "open WeChat and send a message"), and the vision model executes it by interacting with the device screen. **Two ways to interact programmatically:** | Method | Best For | Protocol | |--------|----------|----------| | **MCP** | Claude, Cursor, and other MCP-compatible AI agents | Model Context Protocol over HTTP | | **REST API** | Any agent that can make HTTP calls | JSON over HTTP | --- ## Prerequisites Before starting, confirm the following with the user: - [ ] **Model API access**: The user must provide one of: - A third-party API endpoint (e.g., ZhiPu BigModel, ModelScope) - A self-hosted model server URL (e.g., vLLM, SGLang) - [ ] **Android device**: Connected via USB or WiFi (ADB debugging enabled) - [ ] **Python 3.11+** or **uv** installed on the system > If the user has `uv` installed, you do NOT need Python pre-installed — `uv` will manage Python automatically. --- ## Step 1: Install AutoGLM-GUI Install and start the server in one command. Choose based on what's available: ### Option A: Using uvx (Recommended) No permanent installation needed. `uvx` runs in an isolated environment. ```bash uvx autoglm-gui \ --base-url {MODEL_API_URL} \ --model {MODEL_NAME} \ --apikey {API_KEY} \ --no-browser \ --port 8000 ``` ### Option B: Using pip ```bash pip install autoglm-gui autoglm-gui \ --base-url {MODEL_API_URL} \ --model {MODEL_NAME} \ --apikey {API_KEY} \ --no-browser \ --port 8000 ``` ### Option C: Using Docker ```bash docker run -d --name autoglm --network host \ -e AUTOGLM_BASE_URL={MODEL_API_URL} \ -e AUTOGLM_MODEL_NAME={MODEL_NAME} \ -e AUTOGLM_API_KEY={API_KEY} \ -v autoglm_config:/root/.config/autoglm \ ghcr.io/suyiiyii/autoglm-gui:main ``` ### Parameter Reference Replace these placeholders with actual values from the user: | Placeholder | Description | Example | |-------------|-------------|---------| | `{MODEL_API_URL}` | Base URL of the vision model API | `https://open.bigmodel.cn/api/paas/v4` | | `{MODEL_NAME}` | Model name | `autoglm-phone` | | `{API_KEY}` | API key (omit `--apikey` if not required) | `sk-xxxxxxxx` | ### Common Model Configurations **ZhiPu BigModel (recommended for beginners):** ```bash uvx autoglm-gui \ --base-url https://open.bigmodel.cn/api/paas/v4 \ --model autoglm-phone \ --apikey {API_KEY} \ --no-browser --port 8000 ``` **ModelScope:** ```bash uvx autoglm-gui \ --base-url https://api-inference.modelscope.cn/v1 \ --model ZhipuAI/AutoGLM-Phone-9B \ --apikey {API_KEY} \ --no-browser --port 8000 ``` **Self-hosted (vLLM/SGLang):** ```bash uvx autoglm-gui \ --base-url http://localhost:8080/v1 \ --model autoglm-phone-9b \ --no-browser --port 8000 ``` **Expected output (all options):** ``` ================================================== AutoGLM-GUI - Phone Agent Web Interface ================================================== Version: 1.5.12 Server: http://127.0.0.1:8000 Model Configuration: Source: CLI arguments Base URL: https://open.bigmodel.cn/api/paas/v4 Model: autoglm-phone API Key: (configured) ================================================== Press Ctrl+C to stop ================================================== ``` **If you see `WARNING: base_url is not configured!`:** The `--base-url` parameter is missing or incorrect. Re-run the command with the correct URL. > IMPORTANT: The server runs in the foreground. To run it in the background, append `&` (Unix) or use `nohup`. For Docker, the `-d` flag already handles this. --- ## Step 2: Verify Server is Running ```bash curl -s http://127.0.0.1:8000/api/health ``` **Expected output:** ```json {"status":"healthy","version":"1.5.12"} ``` **If it fails:** The server is not running or the port is different. Check the server output for the actual port number (auto-detected if 8000 is occupied). --- ## Step 3: Verify Android Device Connection ```bash curl -s http://127.0.0.1:8000/api/devices | python3 -m json.tool ``` **Expected output (device connected):** ```json { "devices": [ { "id": "192.168.1.100:5555", "serial": "192.168.1.100:5555", "model": "Pixel 7", "status": "device", "connection_type": "wifi", "state": "online", "agent": null } ] } ``` **If `"devices": []` (empty list):** 1. No Android device is connected. Ask the user to connect a device via USB or WiFi. 2. Ensure USB debugging is enabled on the device (Settings > Developer Options > USB Debugging). 3. ADB is automatically downloaded on first startup. If it fails, the user can install ADB manually. Save the `id` field value — you will need it for all subsequent API calls. We refer to this as `{DEVICE_ID}` below. --- ## Using AutoGLM-GUI Choose one of the two integration methods below. --- ## Method A: MCP Integration (for Claude, Cursor, etc.) MCP (Model Context Protocol) is the cleanest integration path for AI agents that support it. ### MCP Endpoint ``` http://127.0.0.1:8000/mcp ``` ### Available MCP Tools | Tool | Parameters | Description | |------|-----------|-------------| | `list_devices()` | None | List all connected Android devices and their status | | `chat(device_id, message)` | `device_id`: str, `message`: str | Send a natural language task to the phone agent (max 5 steps) | ### Claude Desktop Configuration Add this to your Claude Desktop MCP config file: **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` ```json { "mcpServers": { "autoglm-gui": { "url": "http://127.0.0.1:8000/mcp" } } } ``` ### Claude Code (CLI) Configuration ```bash claude mcp add autoglm-gui --transport http http://127.0.0.1:8000/mcp ``` ### MCP Usage Example Once configured, you can use the tools directly in conversation: 1. Call `list_devices()` to get the device ID 2. Call `chat(device_id="{DEVICE_ID}", message="open Settings")` to execute a task **Example response from `chat`:** ```json { "result": "Successfully opened Settings app", "steps": 2, "success": true } ``` ### MCP Limitations - Each `chat` call is limited to **5 steps** maximum - If a task requires more steps, break it into smaller subtasks - Each `chat` call resets the agent state (no conversation memory between calls) --- ## Method B: REST API (for Any Agent) ### Core Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/health` | GET | Health check | | `/api/devices` | GET | List connected devices | | `/api/chat` | POST | Execute a task (synchronous, waits for completion) | | `/api/chat/stream` | POST | Execute a task with SSE streaming progress | | `/api/chat/abort` | POST | Abort a running task | | `/api/screenshot` | POST | Capture device screenshot (base64 PNG) | | `/api/status` | GET | Get agent status | | `/api/reset` | POST | Reset agent state | | `/api/config` | GET | Get current configuration | | `/api/config` | POST | Update configuration | ### Execute a Task (Synchronous) ```bash curl -X POST http://127.0.0.1:8000/api/chat \ -H "Content-Type: application/json" \ -d '{"device_id": "{DEVICE_ID}", "message": "open Settings"}' ``` **Expected output:** ```json { "result": "Successfully opened Settings app", "steps": 2, "success": true } ``` **If `"success": false`:** Check the `result` field for the error message. Common causes: - Device is disconnected → verify with `GET /api/devices` - Model API error → check the model configuration ### Execute a Task (Streaming) For long-running tasks, use SSE streaming to get real-time progress: ```bash curl -N -X POST http://127.0.0.1:8000/api/chat/stream \ -H "Content-Type: application/json" \ -d '{"device_id": "{DEVICE_ID}", "message": "open WeChat and send hello to File Transfer"}' ``` **SSE event types:** | Event | Data Fields | Meaning | |-------|-------------|---------| | `step` | `thinking`, `action`, `step` | Agent executed one step | | `done` | `message`, `success`, `steps` | Task completed | | `error` | `message`, `hint` | Error occurred | | `cancelled` | `message` | Task was aborted | ### Take a Screenshot ```bash curl -X POST http://127.0.0.1:8000/api/screenshot \ -H "Content-Type: application/json" \ -d '{"device_id": "{DEVICE_ID}"}' ``` **Expected output:** ```json { "success": true, "image": "iVBORw0KGgoAAAANSUhEUgAA...", "width": 1080, "height": 2400, "is_sensitive": false } ``` The `image` field is a base64-encoded PNG. Decode it to view the screenshot. ### Abort a Running Task ```bash curl -X POST http://127.0.0.1:8000/api/chat/abort \ -H "Content-Type: application/json" \ -d '{"device_id": "{DEVICE_ID}"}' ``` ### Update Configuration at Runtime ```bash curl -X POST http://127.0.0.1:8000/api/config \ -H "Content-Type: application/json" \ -d '{ "base_url": "https://open.bigmodel.cn/api/paas/v4", "model_name": "autoglm-phone", "api_key": "sk-xxxxxxxx" }' ``` --- ## Layered Agent API (Advanced) The layered agent uses a **decision model** (e.g., GPT-4, Claude) for planning and the **vision model** for execution. This is useful for complex multi-step tasks. ### Prerequisites The layered agent requires a separate decision model configuration. Set it via the config API: ```bash curl -X POST http://127.0.0.1:8000/api/config \ -H "Content-Type: application/json" \ -d '{ "decision_base_url": "{DECISION_MODEL_URL}", "decision_model_name": "{DECISION_MODEL_NAME}", "decision_api_key": "{DECISION_API_KEY}" }' ``` ### Execute a Complex Task ```bash curl -N -X POST http://127.0.0.1:8000/api/layered-agent/chat \ -H "Content-Type: application/json" \ -d '{ "message": "open WeChat, find the contact Alice, and send her: meeting at 3pm tomorrow", "device_id": "{DEVICE_ID}", "session_id": "session-001" }' ``` **SSE event types:** | Event | Data Fields | Meaning | |-------|-------------|---------| | `tool_call` | `tool_name`, `tool_args` | Planner is calling a tool | | `tool_result` | `tool_name`, `result` | Tool returned a result | | `message` | `content` | Planner message | | `done` | `content`, `success` | Task completed | | `error` | `message` | Error occurred | The `session_id` parameter maintains conversation context across multiple calls. Use the same `session_id` for follow-up tasks within the same session. --- ## Verification Checklist After setup, verify everything works by running these checks in order: ```bash # 1. Server is running curl -s http://127.0.0.1:8000/api/health # Expected: {"status":"healthy","version":"..."} # 2. Device is connected curl -s http://127.0.0.1:8000/api/devices # Expected: "devices" array contains at least one device with "state":"online" # 3. Agent can execute a task curl -X POST http://127.0.0.1:8000/api/chat \ -H "Content-Type: application/json" \ -d '{"device_id": "{DEVICE_ID}", "message": "what is on the screen right now?"}' # Expected: {"result":"...","steps":1,"success":true} ``` **Success criteria:** All three checks pass. The third check returns `"success": true`. --- ## Troubleshooting | Error | Cause | Fix | |-------|-------|-----| | `connection refused` on port 8000 | Server not running or port conflict | Start the server; check if port 8000 is in use with `lsof -i :8000` | | `"devices": []` empty list | No Android device connected | Connect device via USB; enable USB debugging in Developer Options | | `Device {id} is busy` (HTTP 409) | Another task is running on this device | Wait for the current task to finish, or call `/api/chat/abort` | | `初始化失败` (HTTP 500) | Model API misconfigured | Verify `base_url`, `model_name`, and `api_key` via `GET /api/config` | | `WARNING: base_url is not configured!` | Missing `--base-url` flag | Restart with `--base-url` or set via `POST /api/config` | | `Max steps reached` with `success: false` | Task too complex for step limit | Break the task into smaller subtasks | | `SCRCPY_SERVER_PATH` error | scrcpy-server binary not found | This is bundled in the pip package; reinstall with `pip install --force-reinstall autoglm-gui` | | ADB download fails | Network issue during auto-download | Set `AUTOGLM_ADB_PATH` to a manually installed ADB path, or install ADB via system package manager | --- ## Environment Variables All settings can be configured via environment variables instead of CLI flags: | Variable | CLI Flag | Default | |----------|----------|---------| | `AUTOGLM_BASE_URL` | `--base-url` | (none) | | `AUTOGLM_MODEL_NAME` | `--model` | `autoglm-phone-9b` | | `AUTOGLM_API_KEY` | `--apikey` | (none) | | `AUTOGLM_ADB_PATH` | — | auto-detect | | `AUTOGLM_LOG_LEVEL` | `--log-level` | `INFO` | | `AUTOGLM_CORS_ORIGINS` | — | `http://localhost:3000` | --- ## Command Quick Reference ```bash # Install and start (one-liner) uvx autoglm-gui --base-url {MODEL_API_URL} --model {MODEL_NAME} --apikey {API_KEY} --no-browser --port 8000 # Health check curl -s http://127.0.0.1:8000/api/health # List devices curl -s http://127.0.0.1:8000/api/devices # Execute task curl -X POST http://127.0.0.1:8000/api/chat -H "Content-Type: application/json" -d '{"device_id":"{DEVICE_ID}","message":"open Settings"}' # Screenshot curl -X POST http://127.0.0.1:8000/api/screenshot -H "Content-Type: application/json" -d '{"device_id":"{DEVICE_ID}"}' # Abort task curl -X POST http://127.0.0.1:8000/api/chat/abort -H "Content-Type: application/json" -d '{"device_id":"{DEVICE_ID}"}' # Get config curl -s http://127.0.0.1:8000/api/config # MCP endpoint (for Claude/Cursor) # http://127.0.0.1:8000/mcp ``` --- ## Notes - **No authentication required:** The API server does not require Bearer tokens or API keys for its own endpoints. The `--apikey` flag is for the upstream model API, not for accessing AutoGLM-GUI itself. - **Single device concurrency:** Each device can only run one task at a time. Attempting to send a second task to a busy device returns HTTP 409. - **ADB auto-download:** If ADB is not found in PATH, AutoGLM-GUI will automatically download Android Platform Tools (~12MB) to `~/.cache/autoglm/platform-tools/`. - **Coordinate system:** Touch coordinates (`/api/control/tap`, `/api/control/swipe`) use a 0-10000 normalized range, not pixel coordinates. ## ⚠️ Important - **CRITICAL:** AutoGLM-GUI controls a real Android device. Tasks like "delete all messages" or "uninstall apps" will execute immediately with no undo. Always confirm destructive actions with the user before sending them to the agent. - **CRITICAL:** The `--apikey` value is sensitive. Do not log it or include it in error reports. - **IMPORTANT:** The `message` field in `/api/chat` has a 10,000 character limit.