--- name: atl-browser description: Mobile browser and native app automation via ATL (iOS Simulator). Navigate, click, screenshot, and automate web and native app tasks on iPhone/iPad simulators. metadata: openclaw: emoji: "πŸ“±" requires: bins: ["xcrun", "xcodebuild", "curl"] install: - id: "atl-clone" kind: "shell" command: "git clone https://github.com/JordanCoin/Atl ~/Atl" label: "Clone ATL repository" - id: "atl-setup" kind: "shell" command: "~/.openclaw/skills/atl-browser/scripts/setup.sh" label: "Build and install ATL to simulator" --- # ATL β€” Agent Touch Layer > The automation layer between AI agents and iOS ATL provides HTTP-based automation for iOS Simulator β€” both **browser** (mobile Safari) and **native apps**. Think Playwright, but for mobile. ## πŸ”€ Two Servers: Browser & Native ATL uses **two separate servers** for browser and native app automation: | Server | Port | Use Case | Key Commands | |--------|------|----------|--------------| | **Browser** | `9222` | Web automation in mobile Safari | `goto`, `markElements`, `clickMark`, `evaluate` | | **Native** | `9223` | iOS app automation (Settings, Contacts, any app) | `openApp`, `snapshot`, `tapRef`, `find` | ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ BROWSER SERVER (9222) β”‚ NATIVE SERVER (9223) β”‚ β”‚ (mobile Safari/WebView) β”‚ (iOS apps via XCTest) β”‚ β”‚ β”‚ β”‚ β”‚ markElements + clickMark β”‚ snapshot + tapRef β”‚ β”‚ CSS selectors β”‚ accessibility tree β”‚ β”‚ DOM evaluation β”‚ element references β”‚ β”‚ tap, swipe, screenshot β”‚ tap, swipe, screenshot β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Why two ports?** Native app automation requires XCTest APIs (XCUIApplication, XCUIElement) which are only available in UI Test bundles. The native server runs as a UI Test that exposes an HTTP API. ### Starting the Servers ```bash # Browser server (starts automatically with AtlBrowser app) xcrun simctl launch booted com.atl.browser curl http://localhost:9222/ping # β†’ {"status":"ok"} # Native server (run as UI Test) cd ~/Atl/core/AtlBrowser xcodebuild test -workspace AtlBrowser.xcworkspace \ -scheme AtlBrowser \ -destination 'id=' \ -only-testing:AtlBrowserUITests/NativeServer/testNativeServer & # Wait for it to start, then: curl http://localhost:9223/ping # β†’ {"status":"ok","mode":"native"} ``` ### Quick Port Reference | Task | Port | Example | |------|------|---------| | Browse websites | 9222 | `curl localhost:9222/command -d '{"method":"goto",...}'` | | Open native app | 9223 | `curl localhost:9223/command -d '{"method":"openApp",...}'` | | Screenshot (browser) | 9222 | `curl localhost:9222/command -d '{"method":"screenshot"}'` | | Screenshot (native) | 9223 | `curl localhost:9223/command -d '{"method":"screenshot"}'` | --- ## πŸ“± Native App Automation (Port 9223) Native automation uses **port 9223** and automates **any iOS app** using the accessibility tree β€” no DOM, no JavaScript, just direct element interaction. ### Opening & Closing Apps ```bash # Open an app by bundle ID curl -s -X POST http://localhost:9223/command \ -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}' # β†’ {"success":true,"result":{"bundleId":"com.apple.Preferences","mode":"native","state":"running"}} # Check current app state curl -s -X POST http://localhost:9223/command \ -d '{"method":"appState"}' # β†’ {"success":true,"result":{"mode":"native","bundleId":"com.apple.Preferences","state":"running"}} # Close current app curl -s -X POST http://localhost:9223/command \ -d '{"method":"closeApp"}' # β†’ {"success":true,"result":{"closed":true}} ``` ### Common Bundle IDs | App | Bundle ID | |-----|-----------| | Settings | `com.apple.Preferences` | | Contacts | `com.apple.MobileAddressBook` | | Calculator | `com.apple.calculator` | | Calendar | `com.apple.mobilecal` | | Photos | `com.apple.mobileslideshow` | | Notes | `com.apple.mobilenotes` | | Reminders | `com.apple.reminders` | | Clock | `com.apple.mobiletimer` | | Maps | `com.apple.Maps` | | Safari | `com.apple.mobilesafari` | ### The `snapshot` Command `snapshot` returns the accessibility tree β€” all visible elements with their properties and tap-able references. ```bash curl -s -X POST http://localhost:9223/command \ -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result' ``` **Example output:** ```json { "count": 12, "elements": [ { "ref": "e0", "type": "cell", "label": "Wi-Fi", "value": "MyNetwork", "identifier": "", "x": 0, "y": 142, "width": 393, "height": 44, "isHittable": true, "isEnabled": true }, { "ref": "e1", "type": "cell", "label": "Bluetooth", "value": "On", "identifier": "", "x": 0, "y": 186, "width": 393, "height": 44, "isHittable": true, "isEnabled": true }, { "ref": "e2", "type": "button", "label": "Back", "value": null, "identifier": "Back", "x": 0, "y": 44, "width": 80, "height": 44, "isHittable": true, "isEnabled": true } ] } ``` **Parameters:** - `interactiveOnly` (bool, default: `false`) β€” Only return hittable elements - `maxDepth` (int, optional) β€” Limit tree traversal depth ### The `tapRef` Command Tap an element by its reference from the last `snapshot`: ```bash # Take snapshot first curl -s -X POST http://localhost:9223/command \ -d '{"method":"snapshot","params":{"interactiveOnly":true}}' # Tap element e0 (Wi-Fi cell from example above) curl -s -X POST http://localhost:9223/command \ -d '{"method":"tapRef","params":{"ref":"e0"}}' # β†’ {"success":true} ``` ### The `find` Command Find and interact with elements by text β€” no need to parse snapshot manually: ```bash # Find and tap "Wi-Fi" curl -s -X POST http://localhost:9223/command \ -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}' # β†’ {"success":true,"result":{"found":true,"ref":"e0"}} # Check if an element exists curl -s -X POST http://localhost:9223/command \ -d '{"method":"find","params":{"text":"Bluetooth","action":"exists"}}' # β†’ {"success":true,"result":{"found":true,"ref":"e1"}} # Find and fill a text field curl -s -X POST http://localhost:9223/command \ -d '{"method":"find","params":{"text":"First name","action":"fill","value":"John"}}' # Get element info without interacting curl -s -X POST http://localhost:9223/command \ -d '{"method":"find","params":{"text":"Cancel","action":"get"}}' # β†’ {"success":true,"result":{"found":true,"ref":"e5","element":{...}}} ``` **Parameters:** - `text` (string) β€” Text to search for (matches label, value, or identifier) - `action` (string) β€” One of: `tap`, `fill`, `exists`, `get` - `value` (string, optional) β€” Text to fill (required for `action:"fill"`) - `by` (string, optional) β€” Narrow search: `label`, `value`, `identifier`, `type`, or `any` (default) --- ## πŸ”„ Native App Workflow Example Here's a complete flow: open Settings, navigate to Wi-Fi, take a screenshot: ```bash # 1. Open Settings app curl -s -X POST http://localhost:9223/command \ -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}' # 2. Wait for app to launch sleep 1 # 3. Take snapshot to see available elements curl -s -X POST http://localhost:9223/command \ -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result.elements[:5]' # 4. Find and tap Wi-Fi curl -s -X POST http://localhost:9223/command \ -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}' # 5. Wait for navigation sleep 0.5 # 6. Take screenshot of Wi-Fi settings curl -s -X POST http://localhost:9223/command \ -d '{"method":"screenshot"}' | jq -r '.result.data' | base64 -d > /tmp/wifi-settings.png # 7. Navigate back (swipe right from left edge) curl -s -X POST http://localhost:9223/command \ -d '{"method":"swipe","params":{"direction":"right"}}' # 8. Close the app curl -s -X POST http://localhost:9223/command \ -d '{"method":"closeApp"}' ``` ### Helper Script Version ```bash source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh atl_openapp "com.apple.Preferences" sleep 1 atl_find "Wi-Fi" tap sleep 0.5 atl_screenshot /tmp/wifi-settings.png atl_swipe right atl_closeapp ``` --- ## πŸ’‘ Core Insight: Vision-Free Automation ATL's killer feature is **spatial understanding without vision models**: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ markElements + captureForVision = COMPLETE PAGE KNOWLEDGE β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 1. markElements β†’ Numbers every interactive element [1] [2] [3] 2. captureForVision β†’ PDF with text layer + element coordinates 3. tap x=234 y=567 β†’ Pixel-perfect touch at exact position ``` **Why this matters:** - **No vision API calls** β€” zero token cost for "seeing" the page - **Faster** β€” no round-trip to GPT-4V/Claude Vision - **Deterministic** β€” same page = same coordinates, every time - **Reliable** β€” pixel-perfect coordinates vs. vision interpretation ### The Vision-Free Workflow ```bash # 1. Mark elements (adds numbered labels + stores coordinates) curl -s -X POST http://localhost:9222/command \ -d '{"id":"1","method":"markElements","params":{}}' # 2. Capture PDF with text layer (machine-readable, has coordinates) curl -s -X POST http://localhost:9222/command \ -d '{"id":"2","method":"captureForVision","params":{"savePath":"/tmp","name":"page"}}' \ | jq -r '.result.path' # β†’ /tmp/page.pdf (text-selectable, contains element positions) # 3. Get specific element's position by mark label curl -s -X POST http://localhost:9222/command \ -d '{"id":"3","method":"getMarkInfo","params":{"label":5}}' | jq '.result' # β†’ {"label":5, "tag":"button", "text":"Add to Cart", "x":187, "y":432, "width":120, "height":44} # 4. Tap at exact coordinates curl -s -X POST http://localhost:9222/command \ -d '{"id":"4","method":"tap","params":{"x":187,"y":432}}' ``` **The marks tell you WHERE everything is. The PDF tells you WHAT everything says. Together = full page understanding.** ## 🎯 The Escalation Ladder When automation gets stuck, escalate through these levels: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Level 1: COORDINATES (fast, cheap, no API calls) β”‚ β”‚ markElements β†’ getMarkInfo β†’ tap x,y β”‚ β”‚ β”‚ β”‚ ↓ If stuck after 2-3 tries... β”‚ β”‚ β”‚ β”‚ Level 2: VISION FALLBACK (screenshot to understand state) β”‚ β”‚ screenshot β†’ analyze UI β†’ identify blockers (modals, etc) β”‚ β”‚ β”‚ β”‚ ↓ If still stuck... β”‚ β”‚ β”‚ β”‚ Level 3: JS INJECTION (direct DOM manipulation) β”‚ β”‚ evaluate β†’ dispatchEvent β†’ force interactions β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### When to Escalate | Symptom | Likely Cause | Action | |---------|--------------|--------| | Tap succeeds but nothing changes | Modal/overlay opened | Screenshot β†’ find new button | | Cart count doesn't update | Site needs login or has bot detection | Try JS click with events | | Element not found after scroll | Marks are page-relative, not viewport | Use `getBoundingClientRect` via evaluate | | Same error 3+ times | UI state changed unexpectedly | Screenshot to see actual state | ### Real-World Pattern: E-commerce Checkout ```bash # 1. Search and find product atl_goto "https://store.com/search?q=headphones" atl_mark # 2. First, dismiss any modals/banners (ALWAYS DO THIS) # Look for: close, dismiss, continue, accept, no thanks, got it CLOSE=$(atl_find "close") [ -n "$CLOSE" ] && atl_click $CLOSE # 3. Find and click Add to Cart ATC=$(atl_find "Add to cart") atl_click $ATC # 4. Wait, then CHECK if it worked sleep 2 atl_screenshot /tmp/after-click.png # 5. If cart didn't update, LOOK at the screenshot # Maybe a "Choose options" modal opened - find the NEW Add to Cart button # This is the vision fallback - you need to SEE what happened ``` ### Key Insight: Modals Change Everything When you click "Add to cart" on sites like Target, Amazon, etc., they often: 1. Open a "Choose options" modal (size, color, quantity) 2. Show an upsell (protection plans, accessories) 3. Display a confirmation with "View cart" or "Continue shopping" **Your original tap WORKED** β€” you just can't see the result without a screenshot. ## πŸš€ Quick Start (30 seconds) ```bash # 1. Setup (boots sim, installs ATL) ~/.openclaw/skills/atl-browser/scripts/setup.sh # 2. Navigate somewhere curl -s -X POST http://localhost:9222/command \ -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}' # 3. Mark elements (shows [1], [2], [3] labels) curl -s -X POST http://localhost:9222/command \ -d '{"id":"2","method":"markElements","params":{}}' # 4. Take screenshot curl -s -X POST http://localhost:9222/command \ -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png # 5. Click element [1] curl -s -X POST http://localhost:9222/command \ -d '{"id":"4","method":"clickMark","params":{"label":1}}' ``` **Or use the helper functions:** ```bash source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh atl_goto "https://example.com" atl_mark atl_screenshot /tmp/page.png atl_click 1 ``` ## Quick Reference **Base URL:** `http://localhost:9222` ### Common Commands ```bash # Check if ATL is running curl -s http://localhost:9222/ping # Navigate to URL curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}' # Wait for page ready curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}' # Take screenshot (returns base64 PNG) curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > screenshot.png # Mark interactive elements (shows numbered labels) curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"4","method":"markElements","params":{}}' # Click by mark label curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"5","method":"clickMark","params":{"label":3}}' # Scroll page curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"6","method":"evaluate","params":{"script":"window.scrollBy(0, 500)"}}' # Type text curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"7","method":"type","params":{"text":"Hello world"}}' # Click by CSS selector curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"8","method":"click","params":{"selector":"button.submit"}}' ``` ## Setup (First Time) ### 1. Start Simulator ```bash # Boot iPhone 17 simulator (or another device) xcrun simctl boot "iPhone 17" # Open Simulator app open -a Simulator ``` ### 2. Build & Install AtlBrowser ```bash cd ~/Atl/core/AtlBrowser # Build for simulator (RECOMMENDED: target by UDID) # Why: name-based destinations can cause Xcode to pick an older iOS runtime (15/16) # and fail if AtlBrowser has an iOS 17+ deployment target. # # 1) Find a suitable simulator UDID (iOS 17+): # xcrun simctl list devices available # # 2) Build targeting that UDID: xcodebuild -workspace AtlBrowser.xcworkspace \ -scheme AtlBrowser \ -destination 'id=' \ -derivedDataPath /tmp/atl-dd \ build # Install to a specific simulator (preferred) xcrun simctl install \ /tmp/atl-dd/Build/Products/Debug-iphonesimulator/AtlBrowser.app # Launch the app xcrun simctl launch com.atl.browser ``` ### 3. Verify Server ```bash curl -s http://localhost:9222/ping # Should return: {"status":"ok"} ``` ## All Available Methods ### App Control (Native Mode) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `openApp` | `{bundleId}` | Anyβ†’Native | Open app, switch to native mode | | `closeApp` | - | Native | Close current app, return to browser mode | | `appState` | - | Any | Get current mode and bundleId | | `openBrowser` | - | Nativeβ†’Browser | Switch back to browser mode | ### Native Accessibility | Method | Params | Mode | Description | |--------|--------|------|-------------| | `snapshot` | `{interactiveOnly?, maxDepth?}` | Native | Get accessibility tree | | `tapRef` | `{ref}` | Native | Tap element by ref (e.g., "e0") | | `find` | `{text, action, value?, by?}` | Native | Find element and interact | | `fillRef` | `{ref, text}` | Native | Tap element and type text | | `focusRef` | `{ref}` | Native | Focus element without typing | ### Navigation (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `goto` | `{url}` | Browser | Navigate to URL | | `reload` | - | Browser | Reload page | | `goBack` | - | Browser | Go back | | `goForward` | - | Browser | Go forward | | `getURL` | - | Browser | Get current URL | | `getTitle` | - | Browser | Get page title | ### Interactions (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `click` | `{selector}` | Browser | Click element | | `doubleClick` | `{selector}` | Browser | Double-click | | `type` | `{text}` | Both | Type text | | `fill` | `{selector, value}` | Browser | Fill input field | | `press` | `{key}` | Both | Press key | | `hover` | `{selector}` | Browser | Hover over element | | `scrollIntoView` | `{selector}` | Browser | Scroll to element | ### Mark System (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `markElements` | - | Browser | Mark visible interactive elements | | `markAll` | - | Browser | Mark ALL interactive elements | | `unmarkElements` | - | Browser | Remove marks | | `clickMark` | `{label}` | Browser | Click by label number | | `getMarkInfo` | `{label}` | Browser | Get element info by label | ### Screenshots & Capture | Method | Params | Mode | Description | |--------|--------|------|-------------| | `screenshot` | `{fullPage?, selector?}` | Both | Take screenshot | | `captureForVision` | `{savePath?, name?}` | Browser | Full page PDF | | `captureJPEG` | `{quality?, fullPage?}` | Both | JPEG capture | | `captureLight` | - | Browser | Text + interactives only | ### Waiting (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `waitForSelector` | `{selector, timeout?}` | Browser | Wait for element | | `waitForNavigation` | - | Browser | Wait for navigation | | `waitForReady` | `{timeout?, stabilityMs?}` | Browser | Wait for page ready | | `waitForAny` | `{selectors, timeout?}` | Browser | Wait for any selector | ### JavaScript (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `evaluate` | `{script}` | Browser | Run JavaScript | | `querySelector` | `{selector}` | Browser | Find element | | `querySelectorAll` | `{selector}` | Browser | Find all elements | | `getDOMSnapshot` | - | Browser | Get page HTML | ### Cookies (Browser) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `getCookies` | - | Browser | Get all cookies | | `setCookies` | `{cookies}` | Browser | Set cookies | | `deleteCookies` | - | Browser | Delete all cookies | ### Touch Gestures (Both Modes) | Method | Params | Mode | Description | |--------|--------|------|-------------| | `tap` | `{x, y}` | Both | Tap at coordinates | | `longPress` | `{x, y, duration?}` | Both | Long press (default 0.5s) | | `swipe` | `{direction}` | Both | Swipe up/down/left/right | | `swipe` | `{fromX, fromY, toX, toY}` | Both | Swipe between points | | `pinch` | `{scale, duration?}` | Both | Pinch zoom (scale > 1 = zoom in) | #### Swipe Examples ```bash # Swipe up (scroll down) curl -s -X POST http://localhost:9222/command \ -d '{"id":"1","method":"swipe","params":{"direction":"up"}}' # Swipe left (next page in carousel) curl -s -X POST http://localhost:9222/command \ -d '{"id":"2","method":"swipe","params":{"direction":"left","distance":400}}' # Custom swipe path curl -s -X POST http://localhost:9222/command \ -d '{"id":"3","method":"swipe","params":{"fromX":200,"fromY":600,"toX":200,"toY":200}}' # Long press for context menu curl -s -X POST http://localhost:9222/command \ -d '{"id":"4","method":"longPress","params":{"x":150,"y":300,"duration":1.0}}' # Pinch to zoom in curl -s -X POST http://localhost:9222/command \ -d '{"id":"5","method":"pinch","params":{"scale":2.0}}' ``` ## Typical Workflow ```bash # 1. Navigate to site curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"1","method":"goto","params":{"url":"https://www.apple.com/shop"}}' # 2. Wait for page to load sleep 2 curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}' # 3. Mark elements to see what's clickable curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"3","method":"markElements","params":{}}' # 4. Take screenshot to see the marks curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"4","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png # 5. Click a marked element (e.g., label 14) curl -s -X POST http://localhost:9222/command \ -H "Content-Type: application/json" \ -d '{"id":"5","method":"clickMark","params":{"label":14}}' # 6. Repeat as needed ``` ## Troubleshooting ### Navigation not working (goto returns success but page doesn't change) Known issue: `goto` command may return success without navigating. Use JS workaround: ```bash # Instead of goto, use evaluate to navigate curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \ -d '{"id":"1","method":"evaluate","params":{"script":"location.href = \"https://example.com\"; true"}}' # Wait for page load sleep 3 curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \ -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}' ``` ### Server not responding ```bash # Check if app is running xcrun simctl listapps booted | grep atl # Restart the app xcrun simctl terminate booted com.atl.browser xcrun simctl launch booted com.atl.browser # Check logs xcrun simctl spawn booted log show --predicate 'process == "AtlBrowser"' --last 1m ``` ### Need to rebuild (iOS version changes) ```bash cd ~/Atl/core/AtlBrowser xcodebuild -workspace AtlBrowser.xcworkspace -scheme AtlBrowser -sdk iphonesimulator build xcrun simctl install booted ~/Library/Developer/Xcode/DerivedData/AtlBrowser-*/Build/Products/Debug-iphonesimulator/AtlBrowser.app xcrun simctl launch booted com.atl.browser ``` ### Port 9222 in use The ATL server runs inside the simulator app. If port 9222 is blocked, check for other processes: ```bash lsof -i :9222 ``` ## Best Practices ### 1. Clean UI Before Acting Real users dismiss popups. You should too. ```bash # Before any workflow, check for and dismiss: # - Cookie consent banners # - Newsletter popups # - Health/privacy consent modals # - "Download our app" prompts atl_mark for KEYWORD in "close" "dismiss" "no thanks" "accept" "got it" "continue"; do LABEL=$(atl_find "$KEYWORD") [ -n "$LABEL" ] && atl_click $LABEL && sleep 1 done ``` ### 2. Verify State After Actions Don't assume β€” confirm. ```bash atl_click $ADD_TO_CART sleep 2 # Check if cart updated CART=$(atl_find "cart [1-9]") if [ -z "$CART" ]; then # Didn't work - take screenshot to see why atl_screenshot /tmp/debug.png echo "Action may have opened a modal - check screenshot" fi ``` ### 3. Use Viewport Coordinates for Taps Marks give page-relative coordinates. For tap to work, the element must be visible. ```bash # Option A: Scroll element into view first curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \ -d '{"id":"1","method":"evaluate","params":{"script":"document.querySelector(\"#my-button\").scrollIntoView()"}}' # Option B: Get viewport-relative coords via JS curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \ -d '{"id":"2","method":"evaluate","params":{"script":"var r = document.querySelector(\"#my-button\").getBoundingClientRect(); JSON.stringify({x: r.x + r.width/2, y: r.y + r.height/2})"}}' ``` ### 4. Screenshot is Your Debugging Superpower When in doubt, look. ```bash atl_screenshot /tmp/current-state.png # Then analyze with vision or just open the file ``` ## Notes - ATL runs inside the iOS Simulator, sharing the host's network - Port 9222 is the default (matches Chrome DevTools Protocol convention) - The mark system shows red numbered labels on interactive elements - Screenshots are PNG base64-encoded; use `base64 -d` to decode - iOS 26+ compatible (fixed NWListener binding issue) ## Requirements - **macOS** with Xcode installed - **iOS Simulator** (comes with Xcode) - That's it! ## Examples See `examples/` folder: - `test-browse.sh` - Quick bash test workflow ## API Reference For machine-readable API spec, see [openapi.yaml](../api/openapi.yaml) β€” includes all commands, parameters, and response schemas. ## Source - GitHub: https://github.com/JordanCoin/Atl - Author: [@JordanCoin](https://github.com/JordanCoin)