--- name: opencli-browser description: Use when an agent needs to drive a real Chrome window via opencli — inspect a page, fill forms, click through logged-in flows, or extract data ad-hoc. Covers the selector-first target contract, compound form fields, stale-ref handling, network capture, and the agent-native envelopes the CLI returns. Not for writing adapters — see opencli-adapter-author for that. allowed-tools: Bash(opencli:*), Read, Edit, Write --- # opencli-browser The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess. This skill is for **driving a live browser** to accomplish an agent task. If you are building a reusable adapter under `~/.opencli/clis//` use `opencli-adapter-author` instead. --- ## Prerequisites ```bash opencli doctor ``` Until `doctor` is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which. --- ## Session lifecycle - `opencli browser *` commands require a `` positional immediately after `browser`. Use the same session name for a multi-step flow; use a different name to isolate parallel browser work. - Use a stable session name for any multi-command or human-paced browser workflow. Example: `opencli browser fb-yaya-warmup open https://example.com`, then reuse `opencli browser fb-yaya-warmup state`, `extract`, `click`, etc. - Owned browser sessions keep a tab lease alive between calls. Release it with `opencli browser close` or let the idle timeout expire. - `opencli browser bind` binds the Chrome tab you already have open to that session. Use this for logged-in pages, SSO flows, or pages you manually positioned before handing control to the agent. - `--window foreground|background` (or `OPENCLI_WINDOW=foreground|background`) chooses whether OpenCLI creates/focuses a foreground browser window or uses a background browser window for owned sessions. ### Bind Tab ```bash opencli browser gmail bind opencli browser gmail state opencli browser gmail click "Search" opencli browser gmail network opencli browser gmail unbind ``` Binding never owns the user window and never closes the user tab. It fails closed if the tab is closed or becomes non-debuggable. Re-run `opencli browser bind` when you switch to a different real tab. Navigation is allowed on bound sessions because the session now represents explicit agent ownership of that tab. Tab mutation (`tab new`, `tab select`, `tab close`) is still blocked for bound sessions. Use an owned session when you want OpenCLI to manage tab lifecycle. Bound sessions have no OpenCLI idle-close timer; the binding lasts until `unbind`, tab close, window close, or daemon restart. --- ## Mental model 1. **Selector-first target contract.** Every interaction command (`click`, `type`, `select`, `get text/value/attributes`) takes one ``, which is *either* a numeric ref from `state`/`find` *or* a CSS selector. Use `--nth ` to disambiguate multiple CSS matches. 2. **Every envelope reports `matches_n` and `match_level`.** `match_level` is `exact`, `stable`, or `reidentified` — the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be. 3. **Compact output first, full payload on demand.** `state` is a budget-aware snapshot; `get html --as json` supports `--depth/--children-max/--text-max`; `network` returns shape previews and you re-fetch a single body with `--detail `. If you emit a giant payload you are burning context you did not need to burn. 4. **Structured errors are machine-readable.** On failure the CLI emits `{error: {code, message, hint?, candidates?}}`. Branch on `code`, not on message strings. --- ## Critical rules 1. **Always inspect before you act.** Run `state` or `find` first. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot. 2. **Prefer site adapters before raw browser driving.** If `opencli ` already covers the task, use that adapter command first (`opencli facebook notifications`, `opencli reddit read`, etc.). Use `opencli browser ...` only for gaps, debugging, or one-off UI flows the adapter does not expose. 3. **Prefer numeric ref over CSS once you have it.** Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders. 4. **Read `match_level` after every write.** `exact` = all good. `stable` = the element is the same but some soft attrs drifted — your action still applied. `reidentified` = the original ref was gone and the CLI found a unique replacement; double-check you hit the right element. 5. **Use the `compound` field for form controls.** Do not regex-guess a date format, do not `state` twice to get the full ``. 6. **Verify writes that matter.** After `type `, run `get value `. After `select`, run `get value`. Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you. 7. **`state` → action → `state` after a page change.** Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition. 8. **Chain with `&&` when reusing freshly parsed refs.** A chained sequence runs in one shell so the ref you just read from output can be passed directly to the next command. Separate shell invocations keep the named browser session, but any shell-local variables or copied refs from the previous command can go stale after page changes. 9. **`eval` is read-only.** Wrap the JS in an IIFE and return JSON. If you need to *change* the page, use the structured `click` / `type` / `select` / `keys` commands instead — they produce structured output and fingerprints, `eval` does not. 10. **Prefer `network` to screen-scraping.** If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then `--detail ` the body you need. --- ## Target contract (`` for click / type / select / get text|value|attributes) ``` ::= | ``` - **Numeric ref** — the `[N]` index from `state` or `find`. Cheap, resilient to soft DOM drift. - **CSS selector** — anything `querySelectorAll` accepts. Must be unambiguous on write ops, or pair with `--nth `. ### Envelope on success ```json { "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" } ``` ```json { "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" } ``` ### match_level | level | meaning | you should | |-------|---------|------------| | `exact` | Fingerprint agreed on tag + strong IDs with at most one soft drift | Proceed. | | `stable` | Tag + strong IDs still agree, soft signals (aria-label, role, text) drifted | Proceed, but if *what* you typed/clicked matters, re-check with `get value` or `state`. | | `reidentified` | Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old ref | Double-check you hit the right element before chaining more writes. | ### Structured error codes Branch on these, not on the human message: | code | meaning | |------|---------| | `not_found` | Numeric ref is no longer in the DOM. Re-`state`. | | `stale_ref` | Ref exists but the element at that ref changed identity. Re-`state`. | | `invalid_selector` | CSS was rejected by `querySelectorAll`. Fix the selector. | | `selector_not_found` | CSS matches 0 elements. Try `find` with a looser selector. | | `selector_ambiguous` | CSS matches >1 and no `--nth`. Add `--nth` or narrow the selector. | | `selector_nth_out_of_range` | `--nth` beyond match count. | | `option_not_found` | `select` couldn't find an option matching that label/value. Error envelope includes `available: string[]` of the real option labels. | | `not_a_select` | `select` was called on a non-`` option by label first, then value. With semantic flags, omit `target` and pass option as the only positional. Use `compound` from `find`/`state` to see exactly what labels are available. | | `browser keys ` | `Enter`, `Escape`, `Tab`, `Control+a`, etc. Runs against the focused element. | | `browser scroll [--amount px]` | `up` / `down`. Default amount `500`. | ### Wait ```bash browser wait selector "" [--timeout ms] # wait until the selector matches browser wait text "" [--timeout ms] # wait until the text appears browser wait download [pattern] [--timeout ms] # wait for a Chrome download whose filename/URL/mime contains pattern browser wait time # hard sleep, last resort ``` Default timeout `10000` ms. SPA routes, login redirects, and lazy-loaded lists need `wait` before `state`/`get`. `browser wait download` requires Browser Bridge extension 1.0.8+ because it uses Chrome's downloads lifecycle API. Pass a narrow filename or URL substring such as `receipt.pdf` when possible; an empty pattern waits for the next/recent download in the timeout window. The command reports `{downloaded, filename, url, state, elapsedMs}` on success and a JSON error envelope on timeout/failure. ### Extract - **`web read --url `** — One-shot Markdown reader for arbitrary pages. It expands relevant same-origin iframes by default, so old iframe-shell sites work better than with a top-document-only scrape. Use `--frames all-same-origin` when completeness matters more than Markdown noise. For AJAX shell pages use `opencli web read --url --wait-for "" --wait-until networkidle --diagnose`; diagnostics show frame URLs, empty containers, and API-like XHRs. If the value you need is table/API data, switch to `browser network` or a dedicated adapter instead of relying on Markdown. - **`browser eval [--frame N]`** — Run an expression in the page (or in a cross-origin frame via `--frame`). Wrap in an IIFE and return JSON. Read-only: no `document.forms[0].submit()`, no clicks, no navigations. If the result is a string, stdout is the raw string; otherwise it's JSON. - **`browser extract [--selector ] [--chunk-size N] [--start N]`** — Markdown extraction of long-form content with a continuation cursor. Returns `{url, title, selector, total_chars, chunk_size, start, end, next_start_char, content}`. Loop on `next_start_char` until it is `null`. Auto-scopes to `
`/`
`/`` if you don't pass `--selector`. ### Network ```bash browser network # shape preview + cache key list browser network --detail # full body for one cached entry browser network --filter "field1,field2" # keep only entries whose body shape contains ALL fields as path segments browser network --all # include static resources (usually noise) browser network --raw # full bodies inline — large; use sparingly browser network --ttl # cache TTL (default 24h) ``` List entries look like `{key, method, status, url, ct, size, shape, body_truncated?}`. Detail envelope is `{key, url, method, status, ct, size, shape, body, body_truncated?, body_full_size?, body_truncation_reason}`. Cache lives in `~/.opencli/cache/browser-network/` so you can re-inspect without re-triggering the request. Default output keeps JSON/XML/plain-text and JS-like API responses, then drops obvious static assets and telemetry by URL. If an expected endpoint is missing, run `browser network --all` once and check whether an unusual content type or URL filter hid it. ### Tabs & session | command | purpose | |---------|---------| | `browser tab list` | JSON array of `{index, page, url, title, active}`. The `page` string is the tab identity you pass as `` to `tab select` / `tab close`, or to `--tab ` on any subcommand. (`--tab`'s placeholder is historical — the value is always `page`.) | | `browser tab new [url]` | Open a new tab. Prints the new `page` string. | | `browser tab select [targetId]` | Make a tab the default. All subcommands accept `--tab ` to target one without changing the default. | | `browser tab close [targetId]` | Close by `page`. | | `browser back` | History back on the active tab. | | `browser close` | Release the current owned browser session when done. | | `browser bind` | Bind the current Chrome tab to the named browser session. | | `browser unbind` | Detach the named bound session without closing the user tab/window. | --- ## Compound form controls Every date/time, select, and file input carries a `compound` field. Use it — do not regex attributes. ### Date family ```json { "control": "date", "format": "YYYY-MM-DD", "current": "2026-04-21", "min": "2026-01-01", "max": "2026-12-31" } ``` `control` is one of `date | time | datetime-local | month | week`. `format` is a concrete template string — type into the field using that exact format, or `select` by label if the site wraps the native input in a custom widget. ### Select ```json { "control": "select", "multiple": false, "current": "United States", "options": [ { "label": "United States", "value": "us", "selected": true }, { "label": "Canada", "value": "ca" } ], "options_total": 137 } ``` `options[]` is capped at 50 entries. **`current` is always correct** even when the selected option is past the cap — it's computed by scanning every option, not from the truncated list. If `options_total > options.length` and you need an option that isn't in `options[]`, call `browser select "