--- name: agent-tui description: > Drive terminal UI (TUI) applications programmatically for testing, automation, and inspection. Use when: automating CLI/TUI interactions, regression testing terminal apps, verifying interactive behavior, extracting structured data from terminal UIs. Also use when: user asks "what is agent-tui", "what does agent-tui do", "demo agent-tui", "show me agent-tui", "how does agent-tui work", or wants to see it in action. Do NOT use for: web browsers, GUI apps, or non-terminal interfaces—those need different tools. --- # Terminal Automation Mastery ## Prerequisites - **Supported OS**: macOS or Linux (Windows not supported yet). - **Verify install**: ```bash agent-tui --version ``` If not installed, use one of: ```bash # Recommended: one-line install (macOS/Linux) curl -fsSL https://raw.githubusercontent.com/pproenca/agent-tui/master/install.sh | sh ``` ```bash # Package manager npm i -g agent-tui pnpm add -g agent-tui bun add -g agent-tui ``` ```bash # Build from source cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui ``` If you used the install script, ensure `~/.local/bin` is on your PATH. ## Philosophy: Why Terminal Automation Is Different Terminal UIs are **stateless from the observer's perspective**. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything: | Web Automation | Terminal Automation | |----------------|---------------------| | DOM persists across interactions | Screen buffer redraws constantly | | Stable UI IDs | No stable IDs; re-snapshot frequently | | Query once, act many times | Re-snapshot before each decision | | Network events signal completion | Detect visual stability or text | **The Core Insight**: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous screenshots can become stale after any UI change. ## Mental Model: The Feedback Loop Think of terminal automation as a **closed-loop control system**: ``` ┌──────────────────────────────────────────────┐ │ │ ▼ │ OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY ───┘ │ │ │ │ └─────── NEVER skip ◄────────────────────┘ ``` **Each phase is mandatory.** Skipping verification is the #1 cause of flaky automation. ### The "Fresh Eyes" Principle Every time you need to interact with the UI: 1. **Take a fresh screenshot** — your previous one can be stale 2. **Re-read your target** — the screen may have shifted 3. **Verify the state** — the UI may have changed unexpectedly 4. **Act only when stable** — animations and loading states cause failures This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug. ## Critical Rules (Non-Negotiable) > **RULE 1: Re-snapshot after EVERY action** > Screens can change after any interaction. Always take a fresh screenshot before deciding again. > **RULE 2: Never act on unstable UI** > If the UI is animating, loading, or transitioning, `wait --stable` first. Acting during transitions causes race conditions. > **RULE 3: Verify before claiming success** > Use `wait "expected text" --assert` to confirm outcomes. Don't assume an action worked—prove it. > **RULE 4: Clean up sessions** > Always end with `agent-tui kill`. Orphaned sessions consume resources and can interfere with future runs. ## Decision Framework ### Which Screenshot Mode? ``` Need only raw text? ├─► YES: Use `screenshot` (plain text, faster) │ └─► NO: Need machine-readable output? └─► Use `screenshot --json` ``` ### How to Wait? ``` What are you waiting for? │ ├─► Specific text to appear │ └─► `wait "text" --assert` │ ├─► Specific text to disappear │ └─► `wait "text" --gone` │ └─► UI to stop changing (animations, loading) └─► `wait --stable` ``` ### How to Act? ``` What do you need to do? │ ├─► Type text into focused input │ └─► `input "text"` (or `type "text"`) │ ├─► Send keyboard shortcuts/navigation │ └─► `press Ctrl+C` or `press ArrowDown Enter` │ └─► Scroll the viewport └─► `scroll down 5` ``` ## Core Workflow The canonical automation loop: ```bash # 1. START: Launch the TUI app agent-tui run [-- args...] # 2. OBSERVE: Get current UI state agent-tui screenshot --format json # 3. DECIDE: Based on text/tree, determine next action # (This happens in your head/code) # 4. ACT: Execute the action agent-tui press Enter # or input/scroll # 5. WAIT: Synchronize with UI changes agent-tui wait "Expected" --assert # or wait --stable # 6. VERIFY: Confirm the outcome (often combined with step 5) # 7. REPEAT: Go back to step 2 until done # 8. CLEANUP: Always clean up agent-tui kill ``` ## Anti-Patterns (What NOT to Do) ### ❌ Acting on Stale Screens ```bash # WRONG: Acting based on old information agent-tui screenshot --json # ...some time passes and UI updates... agent-tui press Enter # ❌ Might be the wrong action now # RIGHT: Re-snapshot before acting agent-tui screenshot --json agent-tui press Enter # Now based on fresh state ``` ### ❌ Acting During Animation/Loading ```bash # WRONG: Acting immediately on dynamic UI agent-tui run my-app agent-tui press Enter # ❌ Might miss or hit wrong state # RIGHT: Wait for stability first agent-tui run my-app agent-tui wait --stable # Let UI settle agent-tui press Enter ``` ### ❌ Assuming Success Without Verification ```bash # WRONG: Assuming the action worked agent-tui press Enter # ...proceed as if success... # ❌ What if it failed silently? # RIGHT: Verify the outcome agent-tui press Enter agent-tui wait "Success" --assert # ✓ Proves the action worked ```