---
name: browser-automation
description: Browser automation — analyze pages with a sub-agent, then perform DOM operations (click, fill, navigate, screenshot, scroll). Use when the user wants to interact with web pages, fill forms, extract data, or automate browser tasks.
scripts:
  - browser_action.js
  - click_and_wait.js
  - navigate.js
  - screenshot.js
  - scroll.js
  - smart_fill.js
  - wait_for.js
---

# Browser Automation Skill

You now have tools to control the browser. They are split into **primitive tools** (direct single operations) and **compound tools** (multi-step operations with sub-agent analysis).

## Tools

### Primitive Tools

| Tool | What it does |
|------|-------------|
| `navigate` | Go to a URL |
| `screenshot` | Capture a screenshot (returned as image attachment) |
| `scroll` | Scroll up/down/top/bottom, returns position + atTop/atBottom flags |
| `wait_for` | Wait for a CSS selector to appear in the DOM |

> **Note:** Use the built-in `list_tabs` tool (always available) to find tabIds — this Skill does not provide its own.

### Compound Tools

| Tool | What it does |
|------|-------------|
| `browser_action` | Sub-agent analyzes page structure and returns CSS selectors / extracted data / action suggestions — **read-only, no side effects**. Use this instead of `get_tab_content` when you need to locate interactive elements (buttons, inputs, forms) for subsequent click/fill operations. |
| `smart_fill` | Fill a form field with CDP trusted input + auto-verify the value |
| `click_and_wait` | CDP trusted click + wait for page changes (navigation, new tabs, DOM mutations) — sub-agent summarizes what changed |

## When to use `browser_action` vs `get_tab_content`

- **`get_tab_content`** (built-in, always available): Read page text and extract information via LLM — requires `prompt` to specify what to extract. Best for reading articles, extracting text content, or summarizing.
- **`browser_action`** (this skill): Analyze page structure — for locating interactive elements (buttons, inputs, links) and getting CSS selectors to use with `smart_fill`, `click_and_wait`, etc.

Rule of thumb: if your next step is to **read/summarize**, use `get_tab_content`; if your next step is to **click/fill/interact**, use `browser_action`.

## Workflow

Follow the **analyze → act → analyze → act** loop:

1. `list_tabs` → pick the target `tabId`
2. `browser_action` → understand the page, get CSS selectors
3. Act on the selectors:
   - **Form fields** → `smart_fill` (triggers proper framework events)
   - **Clicks** → `click_and_wait` (captures navigation, popups, async UI)
   - **Wait for loading** → `wait_for`
   - **Load more content** → `scroll`
4. `browser_action` → verify the result or analyze the next step
5. Repeat until done

## Examples

**Search task**:
```
→ list_tabs()
← tabId=123 [active] | Google | https://www.google.com

→ browser_action("find the search input and search button selectors", tabId=123)
← "Search input: `textarea[name=q]`, Search button: `input[name=btnK]`"

→ smart_fill("textarea[name=q]", "ScriptCat", tabId=123)
← { success: true, value: "ScriptCat" }

→ click_and_wait("input[name=btnK]", tabId=123)
← { clicked: true, navigated: true, url: "https://www.google.com/search?q=ScriptCat" }

→ browser_action("extract the top 5 search result titles and links", tabId=123)
← "1. ScriptCat - https://scriptcat.org/ ..."
```

**Navigate + screenshot**:
```
→ navigate("https://example.com", tabId=123)
← { success: true, url: "https://example.com", tabId: 123 }

→ screenshot(tabId=123)
← [image attachment]
```

**Scroll to load more**:
```
→ scroll("down", tabId=123)
← { success: true, atBottom: false, scrollTop: 800 }

→ wait_for(".lazy-loaded-item", tabId=123, timeout=5000)
← { found: true, tagName: "DIV", text: "..." }

→ scroll("down", tabId=123)
← { success: true, atBottom: true }
```

**Click that triggers a popup (sub-agent analyzes DOM changes)**:
```
→ click_and_wait(".delete-btn", tabId=123)
← { clicked: true, navigated: false,
     pageChanges: "Confirmation dialog appeared: 'Are you sure?'. Click `.modal .btn-ok` to confirm." }
```

**Data extraction**:
```
→ browser_action("extract the product list with names and prices", tabId=123)
← "1. Product A ¥99  2. Product B ¥199 ..."
```

**New tab opened by click**:
```
→ click_and_wait("a.detail-link", tabId=123, timeout=5000)
← { clicked: true, navigated: false, newTabs: [{ tabId: 456, url: "https://..." }] }

→ browser_action("read the page content", tabId=456)
```

## Tips for `browser_action` scenario

The `scenario` parameter should be **specific and goal-oriented**:

- Good: "find the login form's username input, password input, and submit button selectors"
- Good: "extract the first 5 search results with title, URL, and snippet"
- Good: "check if the user is logged in; if yes, find the search box selector"
- Bad: "analyze this page" — too vague, the sub-agent won't know what to look for

After an action, describe what you expect:
- "verify the form was submitted successfully and identify the next page"
- "check if the item was added to cart — look for a success toast or badge update"

## Important Notes

- **Popup blocking**: Some clicks open new windows/tabs (`window.open`, `target="_blank"`). If the expected new tab doesn't appear, tell the user to go to the site's address bar → Site settings → allow "Pop-ups and redirects", then retry.
- `browser_action` is **read-only** — it never clicks, fills, or modifies the page.
- Each `browser_action` call is **stateless** — it does not remember previous analyses.
- `click_and_wait` auto-detects DOM changes and JS dialogs; its `pageChanges` summary often makes a follow-up `browser_action` unnecessary.
- If `browser_action` reports "element not found", check its suggestions — it may say you need to click something first to reveal the element.