# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

AutoGLM-GUI is a modern web-based graphical interface for AutoGLM Phone Agent, enabling AI-powered Android device automation through a conversational interface with real-time screen monitoring.

**Key Technologies:**
- **Backend**: FastAPI (Python 3.10+) with WebSocket support
- **Frontend**: React 19 + TanStack Router + Tailwind CSS 4
- **Phone Integration**: ADB (Android Debug Bridge) + scrcpy for video streaming
- **Package Manager**: `uv` for Python, `pnpm` for frontend

## Development Commands

### Backend Development

All Python commands MUST use `uv run python` in the project root directory. Never execute `python` directly.

```bash
# Install dependencies
uv sync

# Run backend with auto-reload (development)
uv run autoglm-gui --base-url http://localhost:8080/v1 --reload

# Run backend (production mode)
uv run autoglm-gui --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model autoglm-phone \
  --apikey sk-xxxxx

# Run with custom log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
uv run autoglm-gui --base-url http://localhost:8080/v1 --log-level DEBUG

# Disable file logging (console only)
uv run autoglm-gui --base-url http://localhost:8080/v1 --no-log-file

# Custom log file path
uv run autoglm-gui --base-url http://localhost:8080/v1 --log-file logs/custom.log
```

### Frontend Development

```bash
# Install dependencies
cd frontend && pnpm install

# Development server (runs on port 3000)
cd frontend && pnpm dev

# Type checking
cd frontend && pnpm type-check

# Linting
cd frontend && pnpm lint
cd frontend && pnpm lint:fix

# Format code
cd frontend && pnpm format
cd frontend && pnpm format:check
```

### Building and Packaging

```bash
# Build frontend only (required before running backend)
uv run python scripts/build.py

# Build frontend + create Python package
uv run python scripts/build.py --pack

# Test built package locally
uvx --from dist/autoglm_gui-*.whl autoglm-gui

# Publish to PyPI
uv publish
```

### Electron Desktop Application

```bash
# One-click build (all platforms)
uv run python scripts/build_electron.py

# Build with skip options (faster incremental builds)
uv run python scripts/build_electron.py --skip-frontend  # Skip frontend rebuild
uv run python scripts/build_electron.py --skip-adb       # Skip ADB download
uv run python scripts/build_electron.py --skip-backend   # Skip backend repackaging

# Development mode (test Electron without building)
cd electron && npm run dev

# Build Electron only (requires resources prepared)
cd electron && npm run build
```

**Build Output**:
- **macOS**: `electron/dist/AutoGLM GUI-{version}-arm64.dmg`
- **Windows**: `electron/dist/AutoGLM GUI Setup {version}.exe` (installer)
- **Windows**: `electron/dist/AutoGLM GUI {version}.exe` (portable)
- **Linux**: `electron/dist/AutoGLM GUI-{version}.AppImage` (universal)
- **Linux**: `electron/dist/autoglm-gui_{version}_amd64.deb` (Debian/Ubuntu)
- **Linux**: `electron/dist/AutoGLM GUI-{version}.tar.gz` (portable)

## Configuration Management

### Configuration File

AutoGLM-GUI supports persistent configuration stored in `~/.config/autoglm/config.json`:

```json
{
  "base_url": "http://localhost:8080/v1",
  "model_name": "autoglm-phone-9b",
  "api_key": "sk-xxxxx"
}
```

### Configuration Priority

Configuration is loaded with the following priority (highest to lowest):

1. **CLI Arguments** (highest priority) - Override everything else
2. **Config File** (`~/.config/autoglm/config.json`) - Persistent settings
3. **Default Values** (lowest priority) - Built-in defaults

### Usage Examples

**First Time Setup (via Frontend)**:
1. Start: `uv run autoglm-gui`
2. Frontend opens config modal automatically (if no base_url configured)
3. Fill in `base_url`, `model_name`, `api_key`
4. Click "保存配置" (Save Configuration)
5. Configuration is saved to `~/.config/autoglm/config.json`

**Using Config File**:
```bash
# Start with saved configuration
uv run autoglm-gui

# The startup banner will show:
#   Configuration Source: config file (~/.config/autoglm/config.json)
```

**Using CLI Arguments (Override Config)**:
```bash
# CLI arguments override config file
uv run autoglm-gui --base-url http://localhost:8080/v1 --model autoglm-phone-9b

# The startup banner will show:
#   Configuration Source: CLI arguments
```

**Managing Config**:
- **View**: Click "全局配置" (Global Config) button in frontend sidebar
- **Edit**: Update via frontend modal and click "保存配置"
- **Delete**: Remove `~/.config/autoglm/config.json` manually
- **Check Current**: Backend startup banner shows config source

### Configuration API Endpoints

The frontend uses these API endpoints for configuration management:

- `GET /api/config` - Read current effective configuration
- `POST /api/config` - Save configuration to file
- `DELETE /api/config` - Delete configuration file

## ADB Keyboard Auto-Management

### Automatic Installation

AutoGLM-GUI automatically checks and installs ADB Keyboard when the Phone Agent is initialized:

1. **Per-Device Check**: Checks ADB Keyboard status only for the device being initialized
2. **Status Check**: Checks installation and enablement status on the device
3. **Auto Install**: If not installed, automatically installs the APK
4. **Auto Enable**: If not enabled, automatically enables the IME
5. **Logging**: All operations are logged to the log file

### APK Sources

Priority order:
1. **Bundled APK**: `AutoGLM_GUI/resources/apks/ADBKeyboard.apk` (included in wheel)
2. **Cached APK**: `~/.cache/autoglm/ADBKeyboard.apk`
3. **GitHub Download**: https://github.com/senzhk/ADBKeyBoard

### Auto-Setup Timing

ADB Keyboard is now checked and installed automatically when the frontend initializes the Phone Agent via `/api/init` endpoint, not during server startup. This provides:

- **Faster server startup**: No device scanning during startup
- **Per-device checking**: Only checks devices when they are actually used
- **Better user experience**: Installation progress is visible in the frontend

### Manual Installation

If automatic installation fails, you can install manually:

1. Download APK: https://github.com/senzhk/ADBKeyBoard/releases
2. Install: `adb install -r ADBKeyboard.apk`
3. Enable: Settings → Language & Input → Enable "ADB Keyboard"

### License Notice

ADB Keyboard uses the **GPL-2.0** license, which differs from AutoGLM-GUI's Apache 2.0 license.
The APK file is bundled as an independent third-party component. When using it, you must comply with GPL-2.0 terms.

See: `AutoGLM_GUI/resources/apks/ADBKeyBoard.LICENSE.txt`

### Troubleshooting

**Issue**: Xiaomi devices cannot enable ADB Keyboard without root

**Solution**: See https://github.com/zai-org/Open-AutoGLM/issues/24

**Issue**: APK download fails (network unreachable)

**Solution**: The APK is bundled in the wheel, no download needed under normal circumstances

**Issue**: Device doesn't support ADB Keyboard

**Solution**: Check if the device allows third-party input methods, or try rooting the device

## Architecture

### Request Flow

**Basic Agent Flow**:
1. **User Chat Request** → Frontend (`ChatKitPanel.tsx`) → API (`/api/chat`) → Backend (`api/agents.py`)
2. **PhoneAgentManager** → `run_chat()` acquires device lock, gets or creates agent
3. **Agent.run()** → Orchestrates multi-step task execution
4. **Each Step**: Screenshot → LLM API (with vision) → `ActionHandler` → ADB execution
5. **Streaming Updates** → SSE (Server-Sent Events) → Frontend updates in real-time

**Layered Agent Flow** (NEW):
1. **User Request** → Frontend → API (`/api/layered/chat`) → Backend (`api/layered_agent.py`)
2. **Decision Model** → Plans high-level strategy using `openai-agents` session
3. **Function Tools** → Calls `do()` tool for device actions or `chat()` for information
4. **Vision Model** → PhoneAgent executes `do()` actions on device
5. **Session Persistence** → SQLiteSession stores conversation history
6. **Streaming** → SSE streams both decision thinking and execution updates

**Video Streaming Flow**:
1. **Frontend** → Socket.IO `connect-device` event → Backend (`socketio_server.py`)
2. **ScrcpyStreamer** → Starts scrcpy server on device, connects TCP socket (port 27183)
3. **H.264 Stream** → NAL units → Backend caches SPS/PPS/IDR frames
4. **Socket.IO** → Emits `video-data` events → Frontend (`ScrcpyPlayer.tsx`)
5. **jmuxer** → Decodes H.264 → Canvas rendering with letterbox

### Backend Architecture (`AutoGLM_GUI/`)

**Modular FastAPI Application**:
- **`server.py`**: Wrapper that imports the FastAPI app from `api/__init__.py`
- **`api/__init__.py`**: App factory pattern with modular routers:
  - `agents.py` - Agent lifecycle (init, chat, reset, abort, status)
  - `layered_agent.py` - Hierarchical execution with planning and execution layers
  - `devices.py` - Device discovery/management (list, WiFi, mDNS, QR pairing)
  - `control.py` - Direct device control (tap, swipe, screenshot)
  - `media.py` - Screenshot/video endpoints
  - `metrics.py` - Prometheus metrics
  - `version.py` - Version information
  - `workflows.py` - Workflow execution

**Core Backend Modules**:
- **`device_manager.py`**: Singleton managing device discovery and state
  - Two-layer device ID system (device_id for ADB, serial for aggregation)
  - Background polling thread (~2s intervals)
  - Connection priority: USB > WiFi > mDNS
  - mDNS discovery support
- **`phone_agent_manager.py`**: Singleton managing PhoneAgent lifecycle
  - Agent states: IDLE, BUSY, ERROR, INITIALIZING
  - Per-device locking (RLock) for concurrency control
  - Configuration hot-reload support
  - Streaming chat with abort capability
  - **NEW**: Agent storage (agents, configs) is now internal to the singleton
  - **Removed**: Dependency on global state.agents and state.agent_configs
- **`scrcpy_stream.py`**: `ScrcpyStreamer` class manages scrcpy server lifecycle and H.264 video streaming
  - Spawns scrcpy-server process on device
  - Handles TCP socket for video data
  - Caches SPS/PPS/IDR frames for new client connections
  - Critical: Uses bundled `scrcpy-server-v3.3.3` binary (must be in project root and package)
- **`socketio_server.py`**: Socket.IO integration for real-time video streaming
  - Events: `connect-device`, `disconnect`
  - Emits: `video-metadata`, `video-data`, `error`
- **`config_manager.py`**: Type-safe configuration management with Pydantic
  - Config file: `~/.config/autoglm/config.json`
  - Hot-reload support (mtime checking)
  - Validation for URLs and model names
- **`logger.py`**: Centralized logging configuration using loguru
  - Provides colorized console output with timestamps, levels, and source locations
  - Automatic file logging with rotation (100MB) and retention (7 days)
  - Separate error log files (50MB rotation, 30 days retention)
  - Configurable via CLI parameters (--log-level, --log-file, --no-log-file)
- **`platform_utils.py`**: Cross-platform subprocess management
  - Async command execution (event loop safe)
  - Windows compatibility (subprocess.run vs asyncio)
- **`adb_plus/`**: Extended ADB utilities
  - `device.py` - Device availability and info
  - `screenshot.py` - Screenshot capture
  - `keyboard_installer.py` - ADB Keyboard auto-setup (GPL-2.0)
  - `qr_pair.py` - QR code pairing for wireless debugging
  - `serial.py` - Hardware serial extraction
  - `ip.py` - WiFi IP retrieval
  - `mdns.py` - mDNS device discovery
  - `touch.py` - Touch/swipe primitives

### Internal Agents (`AutoGLM_GUI/agents/`)

Internal implementations of automation agents:

- **`factory.py`**: Agent factory using registry pattern for creating different agent types.
- **`protocols.py`**: Base interfaces for all agents.
  - `BaseAgent`: Synchronous agent interface (legacy)
  - `AsyncAgent`: Asynchronous agent interface (new, supports immediate cancellation)
- **`glm/`**: GLM-based agent implementation.
  - `async_agent.py`: **AsyncGLMAgent** - Default async implementation using `AsyncOpenAI`
    - Native streaming with `async for event in agent.stream()`
    - Immediate cancellation with `await agent.cancel()` (<1s response)
    - Uses `asyncio.to_thread()` for sync device operations
  - `agent.py`: GLMAgent - Legacy sync implementation (use `agent_type: "glm-sync"` to enable)
- **`mai/`**: Internalized MAI Agent (Mobile Agent) with multi-image support.
- **`stream_runner.py`**: SSE streamer for agent execution steps (legacy, for BaseAgent compatibility).

### Action System (`AutoGLM_GUI/actions/`)

Executes actions parsed from LLM outputs:
- **`handler.py`**: Maps high-level actions (Tap, Swipe, Type) to ADB commands.
- **`types.py`**: Data models for action results.

### Device Identification (Two-Layer System)

AutoGLM-GUI uses a two-layer device identification system:

**Layer 1: `device_id` (ADB Execution Layer)**
- **Purpose**: Identifier passed to ADB commands (`adb -s {device_id}`)
- **Format depends on connection type**:
  - USB: Hardware serial number (e.g., `ABC123DEF456`)
  - WiFi: IP address and port (e.g., `192.168.1.100:5555`)
  - mDNS: Service name (e.g., `adb-243a09b7._adb-tls-connect._tcp`)
- **Usage**: API endpoints, PhoneAgent initialization, ADB command execution
- **Note**: `device_id` changes when connection method changes

**Layer 2: `serial` (Device Aggregation Layer)**
- **Purpose**: Stable, unique identifier for device aggregation in DeviceManager
- **Format**: Hardware serial number from `ro.serialno` property (e.g., `ABC123DEF456`)
- **Usage**: Internal device management, connection aggregation
- **Note**: `serial` never changes regardless of connection method

**Connection Switching Behavior**:
```
Example: Device initially connected via USB
  - device_id: "ABC123DEF456" (USB serial)
  - serial: "ABC123DEF456"

User switches to WiFi debugging
  - device_id: "192.168.1.100:5555" (WiFi IP:port) ← Changed!
  - serial: "ABC123DEF456" ← Unchanged

DeviceManager aggregates both connections:
  - Maintains device identity via serial
  - Automatically selects primary connection (USB > WiFi)
  - API continues using current device_id
```

**Important for API Integration**:
- When calling `/api/init`, `/api/chat`, etc., use the current `device_id`
- `device_id` may change during connection switches
- PhoneAgent instances are indexed by `device_id` in PhoneAgentManager
- Connection switches may require agent reinitialization (future improvement: automatic migration)
- API layer coordinates device and agent information by iterating through device.connections
- PhoneAgentManager does not expose serial-based queries (maintains domain boundary)

### Frontend Architecture (`frontend/src/`)

**Routing (TanStack Router)**:
```
/ (index.tsx)
├── /chat (chat.tsx) - Main chat interface
├── /workflows (workflows.tsx) - Workflow management
└── /about (about.tsx) - About page
```

**Root Layout** (`__root.tsx`):
- Theme provider (light/dark mode)
- i18n context (Chinese/English)
- Global error boundary
- Navigation sidebar

**Key Components**:
- **`ScrcpyPlayer.tsx`**: H.264 video player with Socket.IO streaming
  - Uses jmuxer for H.264 NAL unit decoding
  - WebCodecs Video Decoder API fallback
  - Canvas rendering with letterbox calculation
  - Touch coordinate transformation
  - Ripple animation on tap
- **`ChatKitPanel.tsx`**: Multi-mode chat interface
  - Basic mode: Direct PhoneAgent execution
  - Layered mode: Hierarchical task execution
- **`DevicePanel.tsx`**: Device info and initialization UI
  - Agent configuration (model, base URL, API key)
  - Connection status display
  - Initialization controls
- **`DeviceSidebar.tsx`**: Device list with connection management
  - USB/WiFi device listing
  - WiFi pairing controls
  - QR code pairing (wireless debugging)
  - mDNS device discovery
- **`api.ts`**: API client functions (uses `redaxios` - lightweight axios alternative)

### Electron Desktop Application (`electron/`)

AutoGLM-GUI can be packaged as a standalone desktop application using Electron, bundling the Python backend, ADB tools, and frontend into a single distributable package.

**Architecture**:
- **`main.js`**: Electron main process
  - Manages backend process lifecycle (spawn, health check, cleanup)
  - Dynamic port allocation (8000-8100 range)
  - Window creation and management
  - Environment setup (ADB PATH, PYTHONIOENCODING)
  - Error handling with user dialogs
  - Development and production mode support
- **`preload.js`**: Context isolation bridge between main and renderer
- **`afterPack.js`**: Post-build hook to set executable permissions (ADB, backend)

**Packaging Flow**:
1. **Frontend Build**: React app → `AutoGLM_GUI/static/`
2. **ADB Download**: Platform-specific tools → `resources/adb/{platform}/`
3. **Backend Packaging**: PyInstaller → `resources/backend/`
4. **Electron Build**: electron-builder → DMG/NSIS installers

**Key Features**:
- ✅ **Cross-platform**: Windows (x64) + macOS (ARM64) + Linux (x64)
- ✅ **No dependencies**: Bundles Python runtime, ADB, scrcpy-server
- ✅ **Auto-configuration**: Backend starts with bundled resources
- ✅ **Portable mode**: Windows supports portable .exe
- ✅ **UTF-8 handling**: PyInstaller runtime hook for Windows encoding
- ✅ **Auto-update**: Automatic updates from GitHub Releases using electron-updater

**Auto-Update System**:

AutoGLM-GUI uses `electron-updater` for automatic updates from GitHub Releases:

- **Update Check**: On app startup (packaged mode only), delayed 5 seconds to avoid performance impact
- **Update Metadata**: `latest.yml` / `latest-mac.yml` / `latest-linux.yml` auto-generated by electron-builder
- **Supported Formats**:
  - ✅ Windows NSIS installer (full support)
  - ⚠️ macOS DMG (partial support, unsigned apps need manual confirmation)
  - ✅ Linux AppImage (requires AppImageLauncher)
  - ❌ Windows Portable (not supported by design)
- **Update Flow**:
  1. App startup → delayed 5s → check GitHub Releases for latest version
  2. If update available → auto-download in background with progress logging
  3. Download complete → show dialog offering "Restart Now" or "Later"
  4. If "Restart Now" → quitAndInstall(), otherwise auto-install on next app quit

**Configuration**:
- `electron-builder.yml`:
  - `publish` points to GitHub Releases
  - `dmg.writeUpdateInfo: true` enables metadata generation
  - `electronUpdaterCompatibility: ">=2.16"` for latest format
- `main.js`:
  - `autoUpdater.checkForUpdatesAndNotify()` on startup
  - Event listeners for download progress and completion
  - Uses `electron-log` for debugging
- `release.yml`: CI uploads `latest*.yml` files to GitHub Releases

**Testing**:
- Create `electron/dev-app-update.yml` for local testing with production releases
- Use staging releases (e.g., `v1.5.2-beta`) for end-to-end testing
- Verify update flow: detection → download → install → restart

**DevTools Log Output**:

Auto-update logs are output to the DevTools console by default:

- **View Method**: Open app → Right-click → "Inspect" (or Cmd+Option+I / Ctrl+Shift+I) → Console tab
- **Log Format**:
  - Green `[Updater]` prefix: Normal information (checking for updates, download progress, installation complete)
  - Red `[Updater]` prefix: Error messages
- **Throttling Strategy**: Download progress only shows key percentages (0%, 25%, 50%, 75%, 100%) to avoid flooding the console
- **Disable Method**: To disable DevTools logs, set the environment variable `DEBUG_UPDATER=0`:
  ```bash
  # macOS/Linux
  DEBUG_UPDATER=0 ./AutoGLM\ GUI.app

  # Windows
  set DEBUG_UPDATER=0
  AutoGLM GUI.exe
  ```
- **File Logs**: All logs are still written to log files (see LOG_LOCATION.md), unaffected by this setting

## Critical Implementation Details

### Video Streaming (Scrcpy)

- **Server Binary**: `scrcpy-server-v3.3.3` must exist at project root
- **Deployment**: Binary is bundled in wheel via `pyproject.toml` force-include
- **Stream Protocol**: Socket.IO-based H.264 NAL unit streaming
  - Backend: `socketio_server.py` emits `video-data` events
  - Frontend: `ScrcpyPlayer.tsx` receives and decodes via jmuxer
- **Stream Format**: Raw H.264 NAL units over TCP socket (port 27183)
- **Parameter Sets**: SPS/PPS/IDR frames are cached on first capture and sent to new clients for immediate playback
- **Coordinate Mapping**: Frontend gets device resolution (e.g., 1080x2400) and video size (e.g., 576x1280), calculates letterbox offsets, transforms click coords back to device scale

### Model API Integration

**Basic PhoneAgent Mode**:
- **Compatible APIs**: Any OpenAI-compatible endpoint (智谱 BigModel, ModelScope, vLLM, SGLang)
- **Vision Messages**: Each step sends current screenshot as base64 PNG in message content
- **Response Format**: LLM returns JSON with `thinking` and `action` fields
- **Action Schema**: `{type: "do"|"finish"|"takeover", ...params}` parsed by `ActionHandler`

**Layered Agent Mode** (NEW):
- **Architecture**: Hierarchical execution with planning and execution layers
- **Decision Layer**: Uses `openai-agents` library for session management and planning
- **Execution Layer**: PhoneAgent executes planned actions on device
- **Function Tools**: `do()` for device actions, `chat()` for information extraction
- **Session Persistence**: SQLiteSession stores conversation history
- **API Endpoint**: `/api/layered/chat` with streaming support

### ADB Device Control

- **Connection**: Uses `adb` CLI tool (must be in PATH)
- **Platform Utilities**: Always use `AutoGLM_GUI/platform_utils.py` for ADB command execution
  - Async command execution (event loop safe)
  - Cross-platform compatibility (Windows vs Unix)
- **Coordinate System**: LLM outputs normalized coords (0-1000), converted to pixels based on device resolution
- **Keyboard Handling**: Temporarily switches to ADB keyboard for text input, restores original after
- **Screenshot**: Captures via ADB screencap, converts to PNG with Pillow

### Concurrency Control and State Management

**PhoneAgentManager Locking**:
- **Manager Lock**: RLock for thread-safe manager operations
- **Per-Device Locks**: Dictionary of locks indexed by `device_id`
- **Prevents**: Concurrent execution on same device (would corrupt state)
- **Pattern**: Use `use_agent(device_id)` context manager for safe access

**DeviceManager Polling**:
- **Background Thread**: Runs every ~2 seconds in daemon thread
- **State Tracking**: Monitors device connections without blocking API
- **Connection Aggregation**: Groups connections by hardware serial
- **Primary Selection**: Automatically selects best connection (USB > WiFi)

**Streaming State**:
- **Per-Device Streamers**: Dictionary in `state.scrcpy_streamers`
- **Stream Locks**: Async locks in `state.scrcpy_locks` prevent concurrent starts
- **Cleanup**: Automatic cleanup on disconnect or error

**Agent State Management** (NEW):
- **Storage**: Agent instances and configs are stored internally in PhoneAgentManager singleton
- **Thread Safety**: All state access is protected by `self._manager_lock` (RLock)
- **No Global State**: Removed dependency on `state.agents` and `state.agent_configs` in 2026 refactoring
- **Benefits**:
  - **Encapsulation**: Manager owns its state completely
  - **Testability**: Easier to test in isolation
  - **Clarity**: Single source of truth for agent lifecycle
  - **Safety**: No risk of external code accidentally modifying global state
- **API**: Always use PhoneAgentManager methods (get_agent, use_agent, etc.) for state access

### Logging System

- **Library**: loguru - modern Python logging with zero configuration
- **Scope**: AutoGLM_GUI/ directory
- **Console Output**:
  - Colorized output with timestamps, log levels, and source locations
  - Default level: INFO (adjustable via --log-level)
  - Format: `YYYY-MM-DD HH:mm:ss.SSS | LEVEL | module:function:line - message`
- **File Output**:
  - Main log: `logs/autoglm_{time:YYYY-MM-DD}.log` (all levels ≥ DEBUG)
  - Error log: `logs/errors_{time:YYYY-MM-DD}.log` (only ERROR and above)
  - Rotation: 100MB for main log, 50MB for error log
  - Retention: 7 days for main log, 30 days for error log
  - Compression: zip format for rotated logs
- **Usage in Code**:
  ```python
  from AutoGLM_GUI.logger import logger

  logger.debug("Detailed information for debugging")
  logger.info("Normal operation messages")
  logger.warning("Warning messages")
  logger.error("Error messages")
  logger.exception("Exception with full stack trace")
  ```
- **Log Levels**:
  - DEBUG: NAL unit caching, initialization data details, sent NAL counts
  - INFO: Server startup, device connections, stream lifecycle events
  - WARNING: Retries, failed operations with recovery, takeover requests
  - ERROR: Failed starts, connection errors, unexpected exceptions

## Configuration

### Environment Variables

```bash
# Optional defaults (overridden by CLI args)
AUTOGLM_BASE_URL=http://localhost:8080/v1
AUTOGLM_MODEL_NAME=autoglm-phone-9b
AUTOGLM_API_KEY=EMPTY

# Optional scrcpy server path
SCRCPY_SERVER_PATH=/path/to/scrcpy-server
```

### CLI Arguments

See `AutoGLM_GUI/__main__.py` for full list. Key args:
- `--base-url` (required): Model API endpoint
- `--model`: Model name (default: autoglm-phone-9b)
- `--apikey`: API key
- `--host`: Server host (default: 127.0.0.1)
- `--port`: Server port (default: 8000, auto-finds if occupied)
- `--log-level`: Console log level - DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
- `--log-file`: Log file path (default: logs/autoglm_{time:YYYY-MM-DD}.log)
- `--no-log-file`: Disable file logging (console only)
- `--no-browser`: Skip auto-opening browser
- `--reload`: Enable uvicorn auto-reload (development only)

## Package Structure

```
AutoGLM_GUI/           # Backend FastAPI app (entry point)
  __main__.py          # CLI entry point
  server.py            # FastAPI + Socket.IO wrapper
  api/                 # Modular route handlers
    __init__.py        # App factory
    agents.py          # Agent lifecycle
    layered_agent.py   # Layered agent API
    devices.py         # Device management
    control.py         # Direct device control
    media.py           # Screenshot/video
    metrics.py         # Prometheus metrics
    version.py         # Version info
    workflows.py       # Workflow execution
  scrcpy_stream.py     # Scrcpy video streaming
  socketio_server.py   # Socket.IO integration
  device_manager.py    # Device discovery singleton
  phone_agent_manager.py # Agent lifecycle singleton
  config_manager.py    # Type-safe config management
  logger.py            # Loguru logging setup
  platform_utils.py    # Cross-platform utilities
  adb_plus/            # Extended ADB utilities
    device.py
    screenshot.py
    keyboard_installer.py
    qr_pair.py
    serial.py
    ip.py
    mdns.py
    touch.py
  agents/              # Internal agent implementations
    glm/
    mai/
    factory.py
  static/              # Built frontend (copied from frontend/dist)
  resources/           # Bundled resources
    apks/              # ADB Keyboard APK (GPL-2.0)

frontend/              # React frontend
  src/
    routes/
      chat.tsx         # Main chat interface
      workflows.tsx    # Workflow management
      __root.tsx       # Layout + theme
    components/
      ScrcpyPlayer.tsx # Video player
      ChatKitPanel.tsx # Multi-mode chat
      DevicePanel.tsx  # Device UI
      DeviceSidebar.tsx # Device list
    api.ts             # API client
  dist/                # Build output (not in git)

electron/              # Electron desktop application
  main.js              # Main process (backend lifecycle)
  preload.js           # Context bridge
  afterPack.js         # Post-build hook (permissions)
  electron-builder.yml # Packaging configuration
  package.json         # Electron dependencies
  dist/                # Built installers (not in git)

resources/             # Bundled resources (not in git)
  backend/             # PyInstaller output
  adb/                 # Platform-specific ADB tools
    windows/
    darwin/
    linux/

scripts/
  build.py             # Web app build
  build_electron.py    # Electron one-click build
  autoglm.spec         # PyInstaller configuration
  download_adb.py      # ADB downloader
  pyi_rth_utf8.py      # PyInstaller runtime hook (UTF-8)
  lint.py              # Code linting

scrcpy-server-v3.3.3   # Scrcpy server binary (bundled)
```

## Common Pitfalls

### Web Application
1. **Missing scrcpy-server**: Video streaming fails if binary is missing or not bundled correctly in wheel
2. **Coordinate Mismatch**: Frontend must fetch actual device resolution via `/api/scrcpy/info` before sending taps
3. **Python Execution**: Always use `uv run python`, never plain `python`
4. **Frontend Not Built**: Backend serves static files from `AutoGLM_GUI/static/` - must run `scripts/build.py` first
5. **ADB Not in PATH**: All ADB operations will fail silently or with cryptic errors
6. **Model API Compatibility**: LLM must support vision inputs (base64 images) and follow action schema conventions
7. **Direct State Access**: Never access `state.agents` directly - use `PhoneAgentManager.use_agent()` context manager
8. **ADB Command Execution**: Always use `platform_utils.py` functions instead of direct subprocess calls
9. **Device ID vs Serial**: Remember `device_id` changes with connection type, `serial` is stable
10. **Concurrent Execution**: PhoneAgentManager prevents concurrent tasks on same device - respect the locks
11. **Respecting Domain Boundaries**:
    - PhoneAgentManager should only deal with device_id (not serial)
    - DeviceManager should only deal with device connections (not agents)
    - API layer coordinates between domains using public interfaces only

### Electron Desktop Application
1. **Resources Not Prepared**: Electron build requires `resources/backend/` and `resources/adb/` - use `build_electron.py`
2. **Executable Permissions**: On macOS/Linux, ADB and backend must have execute permissions - handled by `afterPack.js`
3. **Windows Encoding**: Python backend uses PyInstaller runtime hook (`pyi_rth_utf8.py`) for UTF-8, don't modify `__main__.py` encoding
4. **macOS Unsigned App**: First launch may be blocked by Gatekeeper - use `xattr -cr "AutoGLM GUI.app"` or right-click → Open
5. **Port Conflicts**: Electron auto-finds available port (8000-8100), but may fail if all ports occupied
6. **Backend Startup Timeout**: If backend doesn't respond within 30s, check logs and ensure all dependencies bundled correctly
7. **Path Issues in PyInstaller**: Always use `sys._MEIPASS` for bundled resource paths, see `scrcpy_stream.py` and `api/__init__.py`
8. **Runtime Dependencies Missing**: electron-updater and electron-log must be in `electron/package.json` `dependencies` (not devDependencies). Symptom: "Cannot find module 'electron-updater'" after packaging. Fix: Run `cd electron && npm run verify` before building.
9. **Files Configuration**: Don't explicitly exclude node_modules in `electron-builder.yml` unless you use asarUnpack for runtime dependencies. Let electron-builder auto-manage dependencies from package.json.
10. **Package Manager**: Electron directory MUST use npm (not pnpm), because electron-builder requires npm's package structure. Frontend uses pnpm, electron uses npm - they are separate environments.

## Development Workflow

### Web Application Development
1. Make frontend changes → `cd frontend && pnpm dev` (hot reload)
2. Make backend changes → `uv run autoglm-gui --reload` (auto-reload enabled)
3. Before committing code, run linting: `uv run python scripts/lint.py`
4. Before package release:
   - Build frontend: `uv run python scripts/build.py`
   - Test locally: `uv run autoglm-gui`
   - Build package: `uv run python scripts/build.py --pack`
   - Test wheel: `uvx --from dist/autoglm_gui-*.whl autoglm-gui`
   - Publish: `uv publish`

### Electron Desktop Application Development
1. **Initial Setup**:
   ```bash
   cd electron && npm install
   ```

2. **Development Mode** (without packaging):
   ```bash
   # Terminal 1: Run backend directly
   uv run autoglm-gui --base-url http://localhost:8080/v1

   # Terminal 2: Run Electron in dev mode
   cd electron && npm run dev
   ```

3. **Test Full Build** (with packaging):
   ```bash
   # One-click build everything
   uv run python scripts/build_electron.py

   # Or incremental build (skip unchanged parts)
   uv run python scripts/build_electron.py --skip-frontend --skip-adb
   ```

4. **Test Built Application**:
   - **macOS**: `open "electron/dist/mac-arm64/AutoGLM GUI.app"`
   - **Windows**: Run `electron\dist\AutoGLM GUI Setup {version}.exe`

5. **CI/CD**: Push to `main` or `dev` branch triggers GitHub Actions
   - Builds Windows + macOS installers automatically
   - Downloads artifacts from Actions tab

### Important Notes
- **Encoding**: Use PyInstaller runtime hook for Windows UTF-8, not application code
- **Resources**: Always check `sys._MEIPASS` exists in PyInstaller environment
- **ADB**: Use `AutoGLM_GUI/platform_utils.py` for executing commands
- **Refactoring**: Prefer internal agent implementations in `AutoGLM_GUI/agents/`

## Lessons Learned: Common Pitfalls

### 🚨 Case Study: Coordinate System Confusion in Integration Tests

**Date**: 2026-01-15
**PR**: #181 (Integration test fixes)
**Impact**: Introduced incorrect coordinate conversions that broke the semantic meaning of coordinates

#### The Error

I attempted to "improve" the coordinate system by converting pixel coordinates to normalized coordinates (0-1000), but made a critical error in the assumption about the original coordinate system.

**What I did wrong:**

1. **False assumption**: Assumed original pixel coordinates `[487, 2516, 721, 2667]` were based on a 1080x2400 screen
2. **Incorrect conversion**: Converted to `[451, 1048, 667, 1111]` thinking these were normalized coordinates
3. **Failed validation**: Didn't notice that 1048 and 1111 both exceed 1000, which is impossible for valid normalized coordinates
4. **Broke working system**: The original coordinates were already correct for the actual screenshot size (1200x2670)

**The root cause**:
```python
# Original (CORRECT): Pixel coordinates for 1200x2670 screen
click_region: [487, 2516, 721, 2667]  # ✓ Valid pixel coordinates

# My change (WRONG): "Normalized" coordinates  
click_region: [451, 1048, 667, 1111]  # ✗ 1048>1000 and 1111>1000!
```

#### The Correct Analysis

**Actual screenshot dimensions**: 1200x2670 (from `file state_home.jpg`)

**Original coordinates**:
```
[487, 2516, 721, 2667] - Pixel coordinates
✓ x range: 487-721 (within 0-1200)
✓ y range: 2516-2670 (within 0-2670)
✓ y2=2670 is at the screen bottom (bottom navigation button)
```

**True normalized coordinates** (if we wanted them):
```
x1 = 487/1200*1000 = 405.8 ≈ 406
y1 = 2516/2670*1000 = 942.3 ≈ 942
x2 = 721/1200*1000 = 600.8 ≈ 601
y2 = 2667/2670*1000 = 998.9 ≈ 999
```

#### What I Should Have Done

1. **First principle**: Understand the existing system before changing it
   - Check: What coordinate system is being used?
   - Verify: Are the coordinates valid for their claimed system?
   - Test: Do the coordinates make sense for the actual screenshot dimensions?

2. **Validation checklist**:
   - [ ] Verify screenshot dimensions with `file` or PIL
   - [ ] Check if coordinates are within valid ranges
   - [ ] For normalized coordinates: all values must be 0-1000
   - [ ] For pixel coordinates: must be within actual screenshot dimensions
   - [ ] Calculate the conversion both ways to verify

3. **Red flags I missed**:
   - ❌ Normalized coordinates exceeding 1000 (1048, 1111)
   - ❌ Assumed screen size (1080x2400) without verification
   - ❌ Didn't check actual screenshot size first
   - ❌ Made assumptions instead of measuring

#### Prevention Guidelines

**When working with coordinate systems**:

1. **Always verify dimensions first**:
   ```bash
   file screenshot.jpg  # Check actual dimensions
   ```

2. **Validate coordinate ranges**:
   ```python
   # For normalized (0-1000)
   assert all(0 <= v <= 1000 for v in coordinates)
   
   # For pixel coordinates
   assert all(0 <= v <= max_dimension for v in coordinates)
   ```

3. **Document the coordinate system**:
   ```yaml
   # Clearly document what system you're using
   click_region: [487, 2516, 721, 2667]  # Pixel coordinates for 1200x2670 screen
   ```

4. **Test assumptions**:
   ```python
   # Verify the conversion is correct
   screen_width, screen_height = get_screenshot_dimensions()
   assert 0 <= x <= screen_width
   assert 0 <= y <= screen_height
   ```

5. **When in doubt, measure twice**:
   - Use PIL to get exact image dimensions
   - Calculate conversions explicitly
   - Verify with actual test runs

#### Key Takeaway

**Don't optimize what you don't understand.**

The original pixel coordinate system was:
- ✓ Correct for the actual screenshots (1200x2670)
- ✓ Simple and direct
- ✓ Working in production

My "improvement":
- ✗ Based on wrong assumptions
- ✗ Introduced invalid coordinates (>1000)
- ✗ Broke the semantic meaning
- ✗ Made the system more complex

**Lesson**: When fixing bugs, focus on understanding the root cause first, not on "architectural improvements" that may be unnecessary.

#### Related Code

- **Coordinate conversion**: `AutoGLM_GUI/devices/mock_device.py`
- **State machine**: `tests/integration/state_machine.py`
- **Test scenarios**: `tests/integration/fixtures/scenarios/meituan_message/scenario.yaml`
- **Coordinate validation**: Always check ranges against actual dimensions