--- name: agent-browser description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*) --- # Browser Automation with agent-browser ## Core Workflow Every browser automation follows this pattern: 1. **Navigate**: `agent-browser open ` 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`) 3. **Interact**: Use refs to click, fill, select 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ``` ## Essential Commands ```bash # Navigation agent-browser open # Navigate (aliases: goto, navigate) agent-browser close # Close browser # Snapshot agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) agent-browser click @e1 # Click element agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser scroll down 500 # Scroll page # Get information agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds # Capture agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser pdf output.pdf # Save as PDF ``` ## Common Patterns ### Form Submission ```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ``` ### Authentication with State Persistence ```bash # Login once and save state agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json # Reuse in future sessions agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ``` ### Session Persistence ```bash # Auto-save/restore cookies and localStorage across browser restarts agent-browser --session-name myapp open https://app.example.com/login # ... login flow ... agent-browser close # State auto-saved to ~/.agent-browser/sessions/ # Next time, state is auto-loaded agent-browser --session-name myapp open https://app.example.com/dashboard # Encrypt state at rest export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32) agent-browser --session-name secure open https://app.example.com # Manage saved states agent-browser state list agent-browser state show myapp-default.json agent-browser state clear myapp agent-browser state clean --older-than 7 ``` ### Data Extraction ```bash agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text # JSON output for parsing agent-browser snapshot -i --json agent-browser get text @e1 --json ``` ### Parallel Sessions ```bash agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i agent-browser session list ``` ### Connect to Existing Chrome ```bash # Auto-discover running Chrome with remote debugging enabled agent-browser --auto-connect open https://example.com agent-browser --auto-connect snapshot # Or with explicit CDP port agent-browser --cdp 9222 snapshot ``` ### Visual Browser (Debugging) ```bash agent-browser --headed open https://example.com agent-browser highlight @e1 # Highlight element agent-browser record start demo.webm # Record session ``` ### Local Files (PDFs, HTML) ```bash # Open local files with file:// URLs agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html agent-browser screenshot output.png ``` ### iOS Simulator (Mobile Safari) ```bash # List available iOS simulators agent-browser device list # Launch Safari on a specific device agent-browser -p ios --device "iPhone 16 Pro" open https://example.com # Same workflow as desktop - snapshot, interact, re-snapshot agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile-specific gesture # Take screenshot agent-browser -p ios screenshot mobile.png # Close session (shuts down simulator) agent-browser -p ios close ``` **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`) **Real devices:** Works with physical iOS devices if pre-configured. Use `--device ""` where UDID is from `xcrun xctrace list devices`. ## Timeouts and Slow Pages The default Playwright timeout is 60 seconds for local browsers. For slow websites or large pages, use explicit waits instead of relying on the default timeout: ```bash # Wait for network activity to settle (best for slow pages) agent-browser wait --load networkidle # Wait for a specific element to appear agent-browser wait "#content" agent-browser wait @e1 # Wait for a specific URL pattern (useful after redirects) agent-browser wait --url "**/dashboard" # Wait for a JavaScript condition agent-browser wait --fn "document.readyState === 'complete'" # Wait a fixed duration (milliseconds) as a last resort agent-browser wait 5000 ``` When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait ` or `wait @ref`. ## Session Management and Cleanup When running multiple agents or automations concurrently, always use named sessions to avoid conflicts: ```bash # Each agent gets its own isolated session agent-browser --session agent1 open site-a.com agent-browser --session agent2 open site-b.com # Check active sessions agent-browser session list ``` Always close your browser session when done to avoid leaked processes: ```bash agent-browser close # Close default session agent-browser --session agent1 close # Close specific session ``` If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work. ## Ref Lifecycle (Important) Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after: - Clicking links or buttons that navigate - Form submissions - Dynamic content loading (dropdowns, modals) ```bash agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs ``` ## Semantic Locators (Alternative to Refs) When refs are unavailable or unreliable, use semantic locators: ```bash agent-browser find text "Sign In" click agent-browser find label "Email" fill "user@test.com" agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click ``` ## JavaScript Evaluation (eval) Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues. ```bash # Simple expressions work with regular quoting agent-browser eval 'document.title' agent-browser eval 'document.querySelectorAll("img").length' # Complex JS: use --stdin with heredoc (RECOMMENDED) agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF # Alternative: base64 encoding (avoids all shell escaping issues) agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)" ``` **Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely. **Rules of thumb:** - Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine - Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'` - Programmatic/generated scripts -> use `eval -b` with base64 ## Deep-Dive Documentation | Reference | When to Use | |-----------|-------------| | [references/commands.md](references/commands.md) | Full command reference with all options | | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting | | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping | | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse | | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation | | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies | ## Ready-to-Use Templates | Template | Description | |----------|-------------| | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation | | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state | | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots | ```bash ./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output ```