--- name: browser-agent-server description: Deploy a full-stack multi-model Browser Agent system with FastAPI server, real-time dashboard, VNC streaming, and LLM Council mode. Use when the user asks to set up browser automation, build a browser agent, deploy an AI web agent, create a browser-use server, or needs multi-model browser automation with strategies like council, consensus, fallback chain, or planner-executor. --- # Browser Agent Server Full-stack AI browser automation system: FastAPI backend + real-time dashboard + Xvfb/VNC display + multi-model LLM strategies. ## Architecture ``` agent_server.py (FastAPI + WebSocket + browser-use 0.11.9) dashboard.html (single-file SPA: dark UI, live logs, VNC embed, model picker) start.sh (startup script with prerequisite checks) ``` **5 Strategies**: single, fallback_chain, planner_executor, consensus (per-step judge), council (multi-model failure recovery with loop detection) **Display**: Xvfb :98 (configurable) -> x11vnc :5999 -> noVNC/websockify :6080 **Live Screen**: The dashboard uses WebSocket-streamed screenshots (0.5s intervals) as the primary embedded view. VNC is available via pop-out for interactive control. The server manages its own Xvfb, x11vnc, and noVNC processes automatically on startup. ## Setup ### 1. Install system dependencies ```bash sudo apt-get update && sudo apt-get install -y xvfb x11vnc x11-apps imagemagick novnc ``` ### 2. Create Python venv and install packages ```bash python3 -m venv /home/node/browser-agent-venv /home/node/browser-agent-venv/bin/pip install browser-use==0.11.9 fastapi uvicorn[standard] websockets websockify /home/node/browser-agent-venv/bin/python3 -m playwright install chromium ``` **IMPORTANT**: `websockify` must be installed in the venv (or available system-wide). The server auto-detects it from the venv's `bin/` directory first, then falls back to system PATH. ### 2b. CRITICAL: Install Chromium shared library dependencies Without this step, Chromium will fail with `libatk-1.0.so.0: cannot open shared object file` or similar errors, causing a 30-second timeout on browser launch. ```bash /home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium ``` This installs ~40 system libraries (libatk, libasound, libxkbcommon, fonts, etc.) that Chromium requires at runtime. **This is separate from `playwright install chromium`** which only downloads the browser binary. ### 2c. Fix broken venv symlinks (if needed) If the venv's `python3` symlink is broken (e.g., after system upgrades), fix it: ```bash ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3 ``` ### 3. Deploy application files Copy bundled scripts to a project directory: ```bash DEST="./outputs/browser-agent" mkdir -p "$DEST" cp ~/.claude/skills/happycapy-browser-agent/scripts/agent_server.py "$DEST/" cp ~/.claude/skills/happycapy-browser-agent/scripts/dashboard.html "$DEST/" cp ~/.claude/skills/happycapy-browser-agent/scripts/start.sh "$DEST/" chmod +x "$DEST/start.sh" ``` ### 4. Configure environment ```bash # Required: LLM API key (OpenAI-compatible gateway) export AI_GATEWAY_API_KEY="your-key" # Optional: custom port (default 8888) export AGENT_PORT=8888 # Optional: display number (default 98, avoids conflict with system Xvfb on :99) export DISPLAY_NUM=98 # Optional: virtual display resolution (default 1280x1024) export SCREEN_WIDTH=1280 export SCREEN_HEIGHT=1024 # REQUIRED for sandbox environments: set the public noVNC URL for dashboard VNC pop-out # Replace with the actual exported URL from step 6 export NOVNC_PUBLIC_URL="https://YOUR-NOVNC-URL/vnc.html?host=YOUR-HOST&port=443&encrypt=1&autoconnect=true&resize=scale&scaleViewport=true" ``` ### 5. Start ```bash cd "$DEST" /home/node/browser-agent-venv/bin/python3 agent_server.py ``` The server automatically starts Xvfb, x11vnc, and noVNC. If an Xvfb is already running on the target display, it reuses it instead of failing. ### 6. Export ports (sandbox environments) ```bash /app/export-port.sh $AGENT_PORT # Dashboard (default 8888) /app/export-port.sh 6080 # noVNC (for VNC pop-out, set NOVNC_PUBLIC_URL with exported URL) ``` **Note**: Port 3001 is reserved. Do not use it. If port 8888 is already in use, set `AGENT_PORT` to another value (e.g., 9222). ## API Reference | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Dashboard HTML | | `/api/models` | GET | Available models + strategies | | `/api/agent/start` | POST | Start task (JSON body below) | | `/api/agent/stop` | POST | Stop running task | | `/api/agent/status` | GET | Current status, action_log, result | | `/ws` | WebSocket | Real-time updates (step, status, screenshot, judge_verdict, council_verdict) | ### Start task body ```json { "task": "Go to google.com and search for AI", "max_steps": 50, "model_config_data": { "strategy": "council", "primary_model": "openai/gpt-4o", "secondary_model": "", "council_members": ["moonshotai/kimi-k2.5", "google/gemini-2.5-flash", "google/gemini-2.5-pro"] } } ``` ### WebSocket start (from dashboard) ```json { "type": "start_task", "task": "...", "max_steps": 50, "model_config": { "strategy": "council", "primary_model": "openai/gpt-4o", "council_members": [...] } } ``` ## Strategy Details | Strategy | How it works | When to use | |----------|-------------|-------------| | `single` | One model, all steps | Simple tasks, cost-sensitive | | `fallback_chain` | Primary runs; switches to secondary on error/rate-limit | Reliability | | `planner_executor` | Strong model plans first; fast model executes | Complex multi-step | | `consensus` | Primary acts; judge model validates every step in real-time | Quality-critical | | `council` | Primary runs; on repeated failure/loop/stall, ALL council models convene to diagnose, advise, and replan | Hard tasks, anti-stall | ### Council Mode Details - **Failure trigger**: `consecutive_failures >= 2` - **Loop trigger (3-tier)**: Strict fingerprint match (3 repeats), loose action-type match (4 repeats), same-URL stall with no progress (5 repeats) - **Stall trigger**: Single step running > 60 seconds - **Feedback injection**: Council verdict injected via `ActionResult.long_term_memory` (agent sees it next step) - **Replan**: Council can replace `agent.state.plan` with revised steps - **Cooldown**: 3 steps between loop-triggered councils to prevent meta-loops ## Available Models (AI Gateway) Configure in `AVAILABLE_MODELS` list in agent_server.py: ```python AVAILABLE_MODELS = [ {"id": "openai/gpt-4o", "name": "GPT-4o", "tier": "fast", "vision": True}, {"id": "moonshotai/kimi-k2.5", "name": "Kimi K2.5", "tier": "fast", "vision": True}, {"id": "google/gemini-2.5-flash", "name": "Gemini 2.5 Flash", "tier": "fast", "vision": True}, {"id": "google/gemini-2.5-pro", "name": "Gemini 2.5 Pro", "tier": "reasoning", "vision": True}, ] ``` To add models: add to this list and they appear in dashboard dropdown + available as council members. ## Troubleshooting ### Browser launch timeout (`BrowserStartEvent timed out after 30.0s`) Chromium is missing shared libraries. Fix: ```bash /home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium ``` This installs libatk, libasound, libxkbcommon, fonts, etc. **Must run after `playwright install chromium`.** ### Verify Chromium works ```bash DISPLAY=:98 /home/node/.cache/ms-playwright/chromium-*/chrome-linux64/chrome --version ``` If it prints a version, it's working. If it errors with `cannot open shared object file`, run `install-deps` above. ### Xvfb lock file error (`Server is already active for display :98`) The server now auto-detects and reuses existing Xvfb processes. If you still get lock file errors: ```bash rm -f /tmp/.X98-lock ``` To use a different display number: ```bash export DISPLAY_NUM=97 # or any unused display number ``` ### websockify not found The server auto-detects websockify from the Python venv first, then falls back to system PATH. Ensure it's installed: ```bash /home/node/browser-agent-venv/bin/pip install websockify ``` ### Broken Python venv symlinks If `python3` in the venv is a broken symlink: ```bash ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3 ``` ### Port already in use ```bash export AGENT_PORT=9222 # or any free port (avoid 3001 - reserved) ``` ### noVNC not loading Ensure `novnc` system package is installed (`/usr/share/novnc/` must exist): ```bash sudo apt-get install -y novnc ``` ### Dashboard screen not showing / tiny / wrong size The dashboard uses WebSocket-streamed screenshots as the primary live view. If the screen appears wrong: 1. Hard-refresh the browser (Ctrl+Shift+R) to clear cached CSS 2. Increase Xvfb resolution: `export SCREEN_WIDTH=1280 SCREEN_HEIGHT=1024` 3. The default display `:98` avoids conflicts with system-managed Xvfb on `:99` ## Key Implementation Notes - `browser-use` ChatOpenAI returns `ChatInvokeCompletion` with `.completion` field (NOT `.content`) - `agent.state.plan` is mutable from `on_step_end` hook -- changes affect next step - `ActionResult.long_term_memory` gets injected into next step's context via MessageManager - `agent.state.consecutive_failures` tracks errors; reset on success - The `on_step_end` hook signature: `AgentHookFunc = Callable[['Agent'], Awaitable[None]]` - Dashboard is a single HTML file with inline CSS/JS (no build step) - noVNC served from system install at `/usr/share/novnc/` - Dashboard live screen uses `object-fit: fill` with absolute positioning for full panel coverage - Body uses flexbox layout (`display: flex; flex-direction: column`) to prevent viewport overflow - CSS Grid cells use `min-width: 0` to prevent grid blowout from oversized content - Screenshots are streamed at native Xvfb resolution (no server-side resize) for best quality - The server reuses existing Xvfb if one is already running on the target display