`, `

` - Platform-specific components (see [Platform Selectors](#platform-selectors)) - Data attributes: `[data-testid="..."]`, `[data-id="..."]` 4. **Verify with `get box`** to confirm the element has a reasonable bounding box: ```bash agent-browser --auto-connect get box "" ``` This returns `{ x, y, width, height }`. Sanity-check: - Width should be > 100px and < viewport width - Height should be > 50px - If the box is the entire page, the selector is too broad — refine it 5. **If the selector is hard to find**, use `eval` to explore the DOM: ```bash agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()" ``` ### Platform Selectors Common container selectors for popular platforms: | Platform | Target | Typical Selector | |----------|--------|-----------------| | Reddit | A post | `shreddit-post`, `[data-testid="post-container"]` | | X / Twitter | A tweet | `article[data-testid="tweet"]` | | LinkedIn | A feed post | `.feed-shared-update-v2` | | Hacker News | A story + comments | `#hnmain .fatitem` | | GitHub | A repo card | `[data-hpc]`, `.repository-content` | | YouTube | Video player area | `#player-container-outer` | | Generic article | Main content | `article`, `main`, `[role="main"]`, `.post-content`, `.article-body` | > These selectors may change over time. Always verify with `get box` before using. ### Multiple Matching Elements If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down: ```bash # Count matches agent-browser --auto-connect get count "article[data-testid='tweet']" # Use nth-child or :first-of-type, or a more specific selector # Or use eval to find the right one by text content: agent-browser --auto-connect eval --stdin <<'EOF' const posts = document.querySelectorAll('article[data-testid="tweet"]'); for (let i = 0; i < posts.length; i++) { const text = posts[i].textContent.substring(0, 80); console.log(i, text); } EOF ``` Then target a specific one using `:nth-of-type(N)` or a unique parent selector. --- ## Step 3: Capture the Focused Screenshot ### Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets) Best when the target element fits within the viewport. ```bash # Scroll the target into view agent-browser --auto-connect scrollintoview "" agent-browser --auto-connect wait 500 # Take viewport screenshot agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png ``` Then crop using the bounding box (see [Cropping](#cropping)). ### Method B: Full-Page Screenshot + Crop (For Any Size Target) Best when the target might be larger than the viewport or when precise cropping is needed. ```bash # Take full-page screenshot agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png # Get the target element's bounding box agent-browser --auto-connect get box "" # Output: { x: 200, y: 450, width: 680, height: 520 } ``` Then crop (see [Cropping](#cropping)). ### Cropping Use ImageMagick (`magick` on IMv7, `convert` is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room. #### Retina Display Handling **Critical**: On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this: 1. **Detect the scale factor**: Compare viewport size vs actual image dimensions: ```bash # Check actual image dimensions magick identify /tmp/screenshot.png # → 3456x1880 means 2x scale on a 1728x940 viewport ``` 2. **Multiply `get box` coordinates by the scale factor** before cropping: ```bash # get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 } # For 2x Retina, actual image coordinates are: SCALE=2 X=$((200 * SCALE)) Y=$((450 * SCALE)) W=$((680 * SCALE)) H=$((520 * SCALE)) PADDING=$((16 * SCALE)) ``` #### Crop Command ```bash magick /tmp/browser-screenshot-full.png \ -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \ +repage \ .png ``` > **Important**: `get box` returns floating-point values. Round them to integers before passing to ImageMagick. > **Padding**: Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop. ### Output Path - If the user specifies an output path, use that - Otherwise, save to a descriptive name in the current directory, e.g., `reddit-post-screenshot.png`, `tweet-screenshot.png` --- ## Step 4: Verify the Result After cropping, **read the output image** to verify it captured the right content: ```bash # Use the Read tool to visually inspect the cropped screenshot ``` If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry. --- ## Fallback: Visual Highlight Confirmation When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use **JS-injected highlighting** to visually confirm before cropping. ### How It Works 1. **Inject a highlight border** on the candidate element: ```bash agent-browser --auto-connect eval --stdin <<'EOF' (function() { const el = document.querySelector(''); if (!el) { console.log('NOT_FOUND'); return; } el.style.outline = '4px solid red'; el.style.outlineOffset = '2px'; el.scrollIntoView({ block: 'center' }); })(); EOF ``` 2. **Take a screenshot** and visually inspect: ```bash agent-browser --auto-connect screenshot /tmp/highlight-check.png ``` Read the screenshot to check if the red border surrounds the correct content. 3. **If correct**, remove the highlight and proceed with cropping: ```bash agent-browser --auto-connect eval "document.querySelector('').style.outline = ''; document.querySelector('').style.outlineOffset = '';" ``` 4. **If wrong**, try the next candidate or refine the selector, re-highlight, and re-check. ### When to Use This Fallback - The page has complex/nested components and you're not sure which container is right - Multiple similar elements exist and you need to pick the correct one - The user's description is vague ("that chart in the middle of the page") - The `get box` result looks suspicious (too large, too small, zero-sized) --- ## Page Preparation: Clean Up Before Capture Before taking the final screenshot, clean up the page for a better result: ```bash # Dismiss cookie banners, popups, overlays agent-browser --auto-connect eval --stdin <<'EOF' (function() { // Common cookie/popup selectors const selectors = [ '[class*="cookie"] button', '[class*="consent"] button', '[class*="banner"] [class*="close"]', '[class*="modal"] [class*="close"]', '[class*="popup"] [class*="close"]', '[aria-label="Close"]', '[data-testid="close"]' ]; selectors.forEach(sel => { document.querySelectorAll(sel).forEach(el => { if (el.offsetParent !== null) el.click(); }); }); // Hide fixed/sticky elements that overlay content (nav bars, banners) document.querySelectorAll('*').forEach(el => { const style = getComputedStyle(el); if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') { el.style.display = 'none'; } }); })(); EOF ``` > **Use with caution**: Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region. ### Cookie Banners That Won't Dismiss Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS `click()` or `remove()`. Don't waste time with multiple JS attempts. Instead: 1. **Crop it out** — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach. 2. **Scroll past it** — scroll the target content away from the banner area before capturing. --- ## Viewport Sizing For consistent, high-quality screenshots, set the viewport before capturing: ```bash # Standard desktop viewport agent-browser --auto-connect set viewport 1280 800 # Wider for dashboard/data-heavy pages agent-browser --auto-connect set viewport 1440 900 # Narrower for mobile-like content (social media posts) agent-browser --auto-connect set viewport 800 600 ``` Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched. --- ## Complete Example: Screenshot a Reddit Post User: "Screenshot the top post on r/programming" ```bash # 1. List existing tabs agent-browser --auto-connect tab list # 2. Navigate to subreddit agent-browser --auto-connect open https://www.reddit.com/r/programming/ agent-browser --auto-connect wait 2000 # 3. Find the first post container agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()" # 4. Scroll it into view agent-browser --auto-connect scrollintoview "shreddit-post" agent-browser --auto-connect wait 500 # 5. Get bounding box agent-browser --auto-connect get box "shreddit-post" # → { x: 312, y: 80, width: 656, height: 420 } # 6. Take full-page screenshot agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png # 7. Crop with padding convert /tmp/reddit-raw.png \ -crop 688x452+296+64 +repage \ reddit-post-screenshot.png # 8. Verify by reading the output image ``` --- ## Key Commands Quick Reference | Command | Purpose | |---------|---------| | `tab list` | List open tabs | | `open ` | Navigate to URL | | `wait 2000` | Wait for content to settle | | `snapshot -i` | See interactive elements | | `screenshot --annotate` | Visual overview with labels | | `screenshot --full ` | Full-page screenshot | | `get box ""` | Get element bounding box | | `scrollintoview ""` | Scroll element into view | | `eval ` | Run JavaScript in page | | `set viewport ` | Set viewport dimensions | --- ## Troubleshooting ### `get box` returns null or zero-sized - The selector doesn't match any element. Use `get count ""` to verify. - The element may be hidden or not yet rendered. Try `wait 2000` and retry. ### Cropped image is blank or wrong area - The full-page screenshot coordinates may differ from viewport coordinates. Use `screenshot --full` with `get box` (they use the same coordinate system). - Check if the page has horizontal scroll — `get box` x values may be offset. ### Target element is inside an iframe - `get box` and `snapshot -i` cannot see inside iframes. - Use `eval` to access iframe content: ```bash agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('').getBoundingClientRect()" ``` Note: Only works for same-origin iframes. ### `open` succeeded but page content is wrong - The browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation: ```bash agent-browser --auto-connect eval "document.location.href" ``` - If the URL is wrong, use `tab list` to find the correct tab and `tab goto ` to switch. ### Screenshot command times out on fonts - Some pages (e.g., Google developer docs) hang on `document.fonts.ready`. Force-resolve it first: ```bash agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')" ``` Then retry the screenshot. ### Page has lazy-loaded content - Scroll down to trigger loading before taking the screenshot: ```bash agent-browser --auto-connect scroll down 1000 agent-browser --auto-connect wait 1500 agent-browser --auto-connect scroll up 1000 ```