--- description: Computer use and GUI automation patterns — when to use GUI automation vs shell/MCP/browser tools, visual validation techniques, native app testing, and guardrails for visual regression workflows model: claude-opus-4-6 allowed-tools: - Bash - Read --- # Computer Use & GUI Automation Computer use lets Claude interact with GUIs: click buttons, fill forms, take screenshots, and navigate native apps. This is powerful but expensive and slow — use it only when a more precise tool doesn't exist. ## Tool Selection Priority Before reaching for computer use, exhaust these options first: | Task | Prefer This | Over Computer Use | |------|------------|-------------------| | API endpoint testing | Bash + curl | Clicking through UI | | Database inspection | MCP postgres/sqlite | Navigating admin UI | | File operations | Read/Write/Edit | Drag-and-drop UI | | Web scraping | Firecrawl MCP | Screenshot + parse | | Browser automation | Playwright MCP | Computer use click | | CI status | GitHub API / gh CLI | Browser navigation | | Log inspection | Bash + grep | Terminal screenshot | **Rule:** If you can express the task as a shell command or API call, do that. Computer use is the fallback for GUI-only workflows. --- ## When Computer Use Is the Right Choice ### 1. Native App Validation Testing a desktop app that has no API or CLI interface. ``` # Example: Validate Electron app UI after a build Take a screenshot of the app after launch. Click the "New Project" button. Verify the dialog opens with the correct fields. Fill in project name: "Test Project 2026" Click Create and verify the project appears in the list. ``` ### 2. Visual Regression Checks Detecting layout regressions that unit tests can't catch. ``` # Workflow: 1. Take baseline screenshot of the current UI state 2. Apply the change 3. Take comparison screenshot 4. Highlight pixel differences > 1% 5. Human reviews diff ``` ### 3. GUI-Only Admin Tools Admin panels, legacy enterprise software, and embedded UIs with no API. ``` # Example: Generate a report from a legacy admin panel Navigate to: http://admin.internal/reports Click: "Export" → "CSV" → "Last 30 days" Wait for download Move file to: /tmp/report-{date}.csv ``` ### 4. Local Simulator Flows Mobile simulator or desktop app testing that requires visual interaction. ``` # Example: iOS simulator validation Launch: xcrun simctl launch booted com.example.MyApp Take screenshot Verify: "Welcome" text is visible in the header Tap: "Get Started" button (coordinates or element description) Verify: onboarding screen loads ``` --- ## Result Verification Computer use output is inherently visual and unstructured. Always verify results with a structured check after GUI actions: ### Verification Pattern ``` After each GUI action: 1. Take a screenshot 2. Verify the expected visual state (specific text, element position, color) 3. If verification fails: log "FAIL: {what was expected vs. what was seen}" 4. If unsure: take another screenshot from a wider viewport At the end: - List each action and its verification result - Count: {N} actions taken, {M} verified OK, {K} failed ``` ### Confidence Levels | Confidence | Verification | Action | |------------|-------------|--------| | HIGH | Text matches exactly / element found by ID | Proceed | | MEDIUM | Visual match but element found by position | Log and proceed | | LOW | Can't find element / ambiguous screenshot | Stop, report to human | --- ## Safety Guardrails Computer use can cause irreversible actions (delete files, send emails, submit forms). Apply these guardrails: ### Never Without Confirmation - Form submissions in production environments - Delete or "Archive" actions - Payment or billing interactions - Sending emails or messages - Anything involving real user data ### Screenshot Audit Trail Keep screenshots of: - State before any action - State after each major action - Final state ### Dry-Run First For complex GUI flows, describe the steps and ask for confirmation before executing: ``` Before I click "Submit", here's what will happen: - Form data: {summary} - This action cannot be undone - Proceeding? (yes/no) ``` --- ## Computer Use vs. Playwright MCP For web UIs, Playwright MCP is almost always better than computer use: | | Playwright MCP | Computer Use | |--|---------------|-------------| | Reliability | High (DOM-based) | Medium (pixel-based) | | Speed | Fast | Slow (screenshot per action) | | Testability | Scriptable, repeatable | Hard to reproduce exactly | | Cost | Low | High (vision model per screenshot) | | Works on | Web browsers | Any visual surface | **Use Playwright MCP for:** Web app testing, scraping, form automation on websites. **Use Computer Use for:** Native desktop apps, embedded UIs, legacy apps with no API. --- ## Cost Awareness Computer use is expensive: - Each screenshot = vision model inference (high token cost) - A 10-step GUI flow = 10+ vision inferences - Compare: a 10-step shell script = near-zero cost **Estimate before using:** If a GUI flow has N steps, expect N × (screenshot tokens + generation tokens). For flows > 20 steps, consider whether a shell/API approach exists. --- ## Claude Desktop Requirement Computer use requires the Claude Desktop app (not CLI or Web). The Desktop app has the screen capture and input simulation capabilities that CLI lacks. ``` CLI: ❌ Computer use not available Web: ❌ Computer use not available Desktop: ✅ Computer use available ```