--- name: chrome-automation description: Automate Chrome browser tasks using agent-browser CLI. Navigate pages, fill forms, click buttons, take screenshots, extract data, and replay recorded workflows — all inside the user's real Chrome session. --- # Skill: Chrome Automation (agent-browser) Automate browser tasks in the user's real Chrome session via the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI. > **Prerequisite**: agent-browser must be installed and Chrome must have remote debugging enabled. See `references/agent-browser-setup.md` if unsure. --- ## Core Principle: Reuse the User's Existing Chrome This skill operates on a **single Chrome process** — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser. ### Always Start by Listing Tabs Before opening any new page, **always list existing tabs first**: ```bash agent-browser --auto-connect tab list ``` This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open: - **If the target page is already open** → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state. ```bash agent-browser --auto-connect tab ``` - **If the target page is NOT open** → open it in the current tab or a new tab. ```bash agent-browser --auto-connect open ``` ### Why This Matters - The user's Chrome has their cookies, login sessions, and browser state - Opening a new page when one is already available wastes time and may lose login state - Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication --- ## Connection Always use `--auto-connect` to connect to the user's running Chrome instance: ```bash agent-browser --auto-connect ``` This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see `references/agent-browser-setup.md`). --- ## Common Workflows ### 1. Navigate and Interact ```bash # List tabs to find existing pages agent-browser --auto-connect tab list # Switch to an existing tab (if found) agent-browser --auto-connect tab # Or open a new page agent-browser --auto-connect open https://example.com agent-browser --auto-connect wait --load networkidle # Take a snapshot to see interactive elements agent-browser --auto-connect snapshot -i # Click, fill, etc. agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text" ``` ### 2. Extract Data from a Page ```bash # Get all text content agent-browser --auto-connect get text body # Take a screenshot for visual inspection agent-browser --auto-connect screenshot # Execute JavaScript for structured data agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)" ``` ### 3. Replay a Chrome DevTools Recording The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See [Replaying Recordings](#replaying-recordings) below. --- ## Step-by-Step Interaction Guide ### Taking Snapshots Use `snapshot -i` to see all interactive elements with refs (`@e1`, `@e2`, ...): ```bash agent-browser --auto-connect snapshot -i ``` The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions. ### Step Type Mapping | Action | Command | |--------|---------| | Navigate | `agent-browser --auto-connect open ` (optionally `wait --load networkidle`, but some sites like Reddit never reach networkidle — skip if `open` already shows the page title) | | Click | `snapshot -i` → find ref → `click @eN` | | Fill standard input | `click @eN` → `fill @eN "text"` | | Fill rich text editor | `click @eN` → `keyboard inserttext "text"` | | Press key | `press ` (Enter, Tab, Escape, etc.) | | Scroll | `scroll down ` or `scroll up ` | | Wait for element | `wait @eN` or `wait ""` | | Screenshot | `screenshot` or `screenshot --annotate` | | Get page text | `get text body` | | Get current URL | `get url` | | Run JavaScript | `eval ` | ### How to Distinguish Input Types - **Standard input/textarea** → use `fill` - **Contenteditable div / rich text editor** (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use `keyboard inserttext` ### Ref Lifecycle Refs (`@e1`, `@e2`, ...) are **invalidated when the page changes**. Always re-snapshot after: - Clicking links or buttons that trigger navigation - Submitting forms - Triggering dynamic content loads (AJAX, SPA navigation) ### Verification After each significant action, verify the result: ```bash agent-browser --auto-connect snapshot -i # check interactive state agent-browser --auto-connect screenshot # visual verification ``` --- ## Replaying Recordings ### Accepted Formats 1. **JSON** (recommended) — structured, can be read progressively: ```bash # Count steps jq '.steps | length' recording.json # Read first 5 steps jq '.steps[0:5]' recording.json ``` 2. **@puppeteer/replay JS** (`import { createRunner }`) 3. **Puppeteer JS** (`require('puppeteer')`, `page.goto`, `Locator.race`) ### How to Replay 1. **Parse the recording** — understand the full intent before acting. Summarize what the recording does. 2. **List tabs first** — check if the target page is already open. 3. **Navigate** — execute `navigate` steps, reusing existing tabs when possible. 4. **For each interaction step**: - Take a snapshot (`snapshot -i`) to see current interactive elements - Match the recording's `aria/...` selectors against the snapshot - Fall back to `text/...`, then CSS class hints, then screenshot - **Do not rely on ember IDs, numeric IDs, or exact XPaths** — these change every page load 5. **Verify after each step** — snapshot or screenshot to confirm --- ## Iframe-Heavy Sites `snapshot -i` operates on the main frame only and **cannot penetrate iframes**. Sites like LinkedIn, Gmail, and embedded editors render content inside iframes. ### Detecting Iframe Issues - `snapshot -i` returns unexpectedly short or empty results - Recording references elements not appearing in snapshot output - `get text body` content doesn't match what a screenshot shows ### Workarounds 1. **Use `eval` to access iframe content**: ```bash agent-browser --auto-connect eval --stdin <<'EVALEOF' const frame = document.querySelector('iframe[data-testid="interop-iframe"]'); const doc = frame.contentDocument; const btn = doc.querySelector('button[aria-label="Send"]'); btn.click(); EVALEOF ``` Note: Only works for same-origin iframes. 2. **Use `keyboard` for blind input**: If the iframe element has focus, `keyboard inserttext "..."` sends text regardless of frame boundaries. 3. **Use `get text body`** to read full page content including iframes. 4. **Use `screenshot`** for visual verification when snapshot is unreliable. ### When to Ask the User If workarounds fail after 2 attempts on the same step, pause and explain: - The page uses iframes that cannot be accessed via snapshot - Which element you need and what you expected - Ask the user to perform that step manually, then continue --- ## Handling Unexpected Situations ### Handle Automatically (do not stop): - Popups or banners → dismiss them (`find text "Dismiss" click` or `find text "Close" click`) - Cookie consent dialogs → accept or dismiss - Tooltip overlays → close them first - Element not in snapshot → try `find text "..." click`, or scroll to reveal with `scroll down 300` ### Pause and Ask the User: - Login / authentication is required - A CAPTCHA appears - Page structure is completely different from expected - A destructive action is about to happen (deleting data, sending real content) — confirm first - Stuck for more than 2 attempts on the same step - All iframe workarounds have failed When pausing, explain clearly: what step you are on, what you expected, and what you see. --- ## Key Commands Reference | Command | Description | |---------|-------------| | `tab list` | List all open tabs with index, title, and URL | | `tab ` | Switch to an existing tab by index | | `tab new` | Open a new empty tab | | `tab close` | Close the current tab | | `open ` | Navigate to URL | | `snapshot -i` | List interactive elements with refs | | `click @eN` | Click element by ref | | `fill @eN "text"` | Clear and fill standard input/textarea | | `type @eN "text"` | Type without clearing | | `keyboard inserttext "text"` | Insert text (best for contenteditable) | | `press ` | Press keyboard key | | `scroll down/up ` | Scroll page in pixels | | `wait @eN` | Wait for element to appear | | `wait --load networkidle` | Wait for network to settle | | `wait ` | Wait for a duration | | `screenshot [path]` | Take screenshot | | `screenshot --annotate` | Screenshot with numbered labels | | `eval ` | Execute JavaScript in page | | `get text body` | Get all text content | | `get url` | Get current URL | | `set viewport ` | Set viewport size | | `find text "..." click` | Semantic find and click | | `close` | Close browser session | --- ## Known Limitations 1. **Iframe blindness**: `snapshot -i` cannot see inside iframes. See [Iframe-Heavy Sites](#iframe-heavy-sites). 2. **`find text` strict mode**: Fails when multiple elements match. Use `snapshot -i` to locate the specific ref instead. 3. **`fill` vs contenteditable**: `fill` only works on `` and `