# Changelog

All notable changes to Clawd Cursor will be documented in this file.

## [0.8.3] - 2026-04-19 — Hotfix: "Outlook keeps opening" + runaway guard

User reported Outlook launching repeatedly during a test. Root-cause diagnosis traced to three compounding failures: (1) `PlatformAdapter.openApp` spawned a new instance even when the app was already running, (2) the escalation ladder (router → blind → hybrid → vision) re-ran `open_app` at each rung because earlier rungs couldn't verify success through New Outlook's sparse WebView2 accessibility tree, (3) `clawdcursor stop` only killed the `start` process on port 3847, missing `serve` (different port / same port different process) and `mcp` (stdio, no port) entirely. A stale `serve` kept receiving MCP traffic after the user thought they'd stopped everything.

### Fixed

- **`openApp` / `launchApp` idempotency** (Windows + macOS + Linux). When the target app already has a visible window AND the caller didn't set `alwaysNewInstance: true` AND no `url` is passed, the adapter now focuses the existing window and returns its pid instead of spawning another instance. Match policy: case-insensitive exact processName → processName substring → title substring → UWP AppId tail. Closes the "N windows of Outlook stacking up" class of bug under any retry loop. `src/v2/platform/{windows,macos,linux}.ts`.
- **Agent runaway guard** — if the agent calls the same tool + identical args ≥ 3 times within the last 6 turns, the loop exits with `give_up` and a targeted message suggesting `detect_webview_apps` when the target is likely Electron/WebView2. Prevents the generalized "retry-loop-because-a11y-is-opaque" anti-pattern. `src/pipeline/agent/agent.ts`.
- **`clawdcursor stop` now sweeps all modes.** After the graceful `/stop` on port 3847, iterates every pidfile in `~/.clawdcursor/*.pid`, SIGTERMs any live pid, SIGKILLs after 500ms if still running, and unlinks the pidfile. Catches `mcp` (stdio-only), zombie `serve`, and any start/serve on a non-default port. `src/index.ts`.

### Notes

- Stale-pidfile cleanup at startup was already correct via `claimPidFile` (checks `isProcessAlive(existingPid)` and overwrites when dead) — no code change needed there; the issue was exclusively `stop`.
- Tests: 429 / 430 pass (1 skipped, same as 0.8.2). No schema snapshot change — these are behavioral fixes, not catalog changes.

---

## [0.8.2] - 2026-04-19 — Session reliability, force-focus, Electron bridge

First-time-user review surfaced six concrete pain points. This release fixes every one.

### Fixed

- **Silent 401 mid-session** (the session-killer). Previous versions compared the incoming Bearer token against an in-memory `SERVER_TOKEN` only. A second clawdcursor process (stale pidfile takeover, or a concurrent mode) rewrote the token FILE without updating the first server's in-memory copy — clients reading the file silently lost auth. `/health` kept returning 200 so the failure was invisible. Fix: `requireAuth` now accepts EITHER the in-memory token OR the current on-disk token (mtime-cached, ~free). Drift is logged once with a recovery hint. `src/server.ts`.
- **`focus_window` force-to-front on Windows.** Previous implementation called `SetForegroundWindow` which the OS blocks when the caller isn't the current foreground process. New implementation uses the full sequence: `ShowWindow(SW_RESTORE)` → topmost-toggle → `AttachThreadInput` with the current foreground thread → `AllowSetForegroundWindow(ASFW_ANY)` → `BringWindowToTop` → `SetForegroundWindow`, with an Alt-key synthetic fallback. Raises any window through Windows' foreground lock. `scripts/ps-bridge.ps1`.
- **Richer validation errors.** REST `/execute` rejections now carry the full expected tool signature. A missing param returns `Missing required parameter "target". Expected smart_click(target: string, processId?: number).` — agents no longer have to roundtrip to `/docs`. `src/tool-server.ts`.

### Added

- **Electron / WebView2 detection.** New MCP tools `detect_webview_apps` and `relaunch_with_cdp` (also exposed via compact `system({"action":"detect_webview"})` / `system({"action":"relaunch_with_cdp"})`). Recognises olk (New Outlook), Teams, Discord, Slack, VS Code, GitHub Desktop, Notion, Obsidian, Spotify. When detected, probes ports 9222/9223/9229/8315 for a live CDP endpoint; if found, tells the agent to attach via `browser({"action":"connect"})`. If not, shows the exact relaunch command (e.g. `discord --remote-debugging-port=9222`) so CDP can be enabled and the sparse UIA tree bypassed entirely. `src/tools/electron_bridge.ts`.
- **`drag_path` documentation clarity.** Existing `mouse_drag_stepped` / compact `computer({"action":"drag_path","path":"[...]"})` now explicitly documented for freehand curve drawing (Paint, Figma, canvas apps). SKILL.md "Quick reference" covers when to use `drag_path` vs `drag`.

### Changed

- **SKILL.md pushes compact mode harder.** Top of doc now carries a directive callout: *"If you are an LLM reading this: YOU SHOULD BE USING COMPACT MODE."* with MCP config + REST URL. Granular stays available but is explicitly labeled the power-user / larger-prompt option.
- **SKILL.md web-app keyboard warning.** Web-wrapped apps (Outlook, Teams, Gmail) treat `Escape` as "close dialog/modal" — sometimes closing the compose window. Documented: do not use Escape to dismiss autocompletes in web apps; use arrow keys + Enter or click-away.
- **Error-recovery table** expanded with Electron-vs-true-canvas split, v0.8.2 auth recovery, v0.8.2 force-focus note, and the `drag_path` vs `drag` distinction.

### Tests

- 429 / 430 passing (one skipped, same as 0.8.0).
- Schema snapshot regenerated → 74 granular tools (72 + 2 Electron bridge).
- Live smoke: token auth survives a second `clawdcursor serve`; `focus_window` raises Paint through a full-screen window; `detect_webview_apps` correctly flags Outlook / Teams / VS Code when any are open.

### Consolidates v0.8.1 (never tagged)

0.8.1-alpha.0 through -alpha.N shipped unified-pipeline + compact-MCP + Linux AT-SPI + Wayland routing on the feature branch. They roll into 0.8.2 as a single stable release. See the v0.8.1-alpha tag range in the git history for per-tranche detail; headline features:

- **Unified blind/hybrid/vision agent** — one loop, three modes. Replaces the v0.8.0 split `text-agent` + `vision-agent` with a single harness using native `tool_use` (Anthropic) / `tool_calls` (OpenAI) / prose-JSON fallback.
- **Compact MCP surface** — 6 compound tools (`computer`, `accessibility`, `window`, `system`, `browser`, `task`) that collapse the full capability into ~1,500 tokens of catalog. Anthropic-Computer-Use shape extended across the whole product. `clawdcursor mcp --compact` or `GET /tools?mode=compact`.
- **PlatformAdapter widened** — `mouseDown/Up`, `keyDown/Up`, `setWindowState`, `setWindowBounds`, `listDisplays`, `waitForElement`, widened `InvokeAction` (`expand`/`collapse`/`toggle`/`select`/`get-value`), richer `UiElement` state flags.
- **Linux AT-SPI bridge** — read-only first pass via `python3-gi` + `gir1.2-atspi-2.0`. Linux a11y methods (`getUiTree`, `findElements`, `getFocusedElement`, `waitForElement`) now return real data on boxes where the bridge dependencies are present. `invokeElement` still stubbed — tracked for a follow-up pass.
- **Linux Wayland input routing** — `ydotool` (mouse + keyboard) or `wtype` (keyboard fallback) detected at init. X11 path unchanged; Wayland no longer silently mis-fires through nut-js.
- **Per-capability palettes + compound vision tools** — text-agent turns now see a 6-10 tool scoped palette based on the subtask's capability (`app_launch` / `text_input` / `navigation` / `form_fill` / `spatial` / `file_ops` / `window_mgmt` / `general`). Vision-agent turns see 3 compound `mouse` / `keyboard` / `window` tools with action enums. ~12× fewer catalog tokens per turn.
- **Pretty TTY logs with HH:MM:SS timestamps** — layer-tagged (`[router]`, `[blind]`, `[vision]`, `[safety]`, etc.), no per-line repetition, `CLAWD_LOG=pretty` default on TTY.
- **SKILL.md rewrite** — reviewed by a Sonnet subagent against legacy v0.6.3/v0.7.14 tone, verified model-agnostic + OS-agnostic, restored "USE AS A FALLBACK" + "IMPORTANT — READ THIS BEFORE ANYTHING ELSE" directive callouts and Sensitive App Policy.

---

## [0.8.0] - 2026-04-16 — V2 Architecture (opt-in)

A ground-up reimagining of the internal pipeline. Opt in with `clawdcursor start --v2`. The legacy pipeline is unchanged and remains the default.

### Added

- **`--v2` flag on `clawdcursor start`** — activates the new 3-layer architecture: Router → VisionAgent → Verifier. No effect on MCP, `serve`, or legacy `start`.
- **`src/v2/platform/`** — platform abstraction. Single `PlatformAdapter` interface with `macos.ts`, `windows.ts`, `linux.ts` implementations. Replaces 142+ scattered `if (process.platform === 'darwin')` branches across 34 files. Business logic no longer sees `process.platform`. Adding a new OS = one file.
- **`src/v2/verifier/`** — `GroundTruthVerifier`. Six independent signals decide whether a task actually completed: pixel diff, window change, focus change, OCR delta, task-specific assertions (`send_email`, `navigate_url`, `open_app`, `type_text`, `search`, `compose_message`, `create_file`), and anti-patterns (error dialogs, "cannot send", "draft saved", invalid recipient, auth failed). Weighted voting with hard-fail rules on anti-patterns. Cannot be fooled by LLM self-reported "done".
- **`src/v2/agent/`** — `VisionAgent`: a single vision-first tool-use loop. 16 tools (`screenshot`, `read_screen`, `list_windows`, `click`, `drag`, `scroll`, `type`, `key`, `invoke_element`, `set_field_value`, `open_app`, `focus_window`, `read_clipboard`, `write_clipboard`, `wait`, `done`). 6-rule system prompt (down from 36). Model-agnostic via existing `callVisionLLM`.
- **`src/v2/orchestrator.ts`** — `PipelineV2` wires Router → VisionAgent → Verifier with before/after state capture.
- **Hardened JSON parser** — tolerates trailing braces, markdown code fences, and other common LLM malformations. Balanced-brace extraction as fallback.

### Fixed

- **False positives** — legacy pipeline reports `UNVERIFIED_SUCCESS` when the agent claims "done" but the screen didn't change. V2 verifier catches this class: in a live email-send test the agent said "Email sent" but a "Cannot send" dialog was on screen. V2 correctly rejected the claim. (Legacy still does what it does; this fix only applies when `--v2` is set.)

### Testing

Smoke-tested on macOS with Anthropic Claude Haiku (text) + Sonnet (vision):

| Task | Time | Verdict |
|------|------|---------|
| Open TextEdit and type | 30s | ✅ (4/6 signals) |
| Calculator: 47+53=100 | 65s | ✅ (5/6 signals, zero parse errors) |
| Safari → github.com | 45s | ✅ (6/6 signals) |
| Notes: create note | 182s | ✅ (6/6 signals) |
| Email send (failing server) | 86s | ❌ **Correctly rejected** — legacy would have reported success |

### Platform Safety

No legacy code modified. Windows, Linux, and MCP paths untouched. v2 code is entirely under `src/v2/`.

## [0.7.14] - 2026-04-13 — Full macOS Keyboard Automation + Platform-Aware Pipeline

### Fixed
- **macOS keystrokes silently dropped** — root cause: `CGEvent.post()` from the Swift helper is blocked by macOS TCC when the helper is spawned as a child of Node.js. `keyPress()` and `typeText()` on macOS now route through `osascript` + System Events (the Apple-sanctioned method). All keyboard shortcuts (Cmd+V, Cmd+N, Shift+Cmd+D, etc.) now work correctly.
- **Single-char keys losing modifiers** — `keycodeForCharacter()` lookup added to `ClawdCursorHelper`; modifiers are no longer discarded for Cmd+letter combos.
- **`asDouble()` coercion** — click/drag coordinates sent as integers (common from some LLMs) no longer fail with a type mismatch in the Swift helper.
- **`keycodeForCharacter` fallback** — now returns an error for unmapped characters instead of silently falling back to the 'v' keycode.
- **Permission check inconsistency** — `doctor`, `status`, and `readiness.ts` all now query the same canonical path: Host `/status` → `permission-check` binary → direct fallback. No more false "granted" reports.
- **Screenshot capture CPU spin** — replaced `CGWindowListCreateImage` (triggers ReplayKit CPU spin bug on macOS 14+) with a delegated `screenshot-helper` subprocess.
- **A11y false positive** — `isShellAvailable()` now tests actual window access (`p.windows.length`) instead of `processes.length`, which worked without Accessibility permission.
- **Node.js v25 crash** — `EINVAL`/`setTypeOfService` socket error from undici's internal QoS call is now caught and suppressed (non-fatal).
- **Dock click zone** — reduced from 60px to 30px on macOS (Dock is thinner than the Windows taskbar).
- **Browser URL bar shortcut** — `Cmd+L` used on macOS (was `Ctrl+L`, which does nothing in macOS browsers).

### Added
- **`macMailEmailFlow`** — deterministic email flow for macOS Mail.app (Cmd+N, Tab to subject/body, Cmd+Shift+D to send).
- **`clawdcursor grant` command** — triggers macOS system permission dialogs directly from the CLI.
- **115 Apple shortcuts** — Mail, Safari, Notes, Messages, Terminal added to the shortcut database.
- **`scripts/test-macos-fixes.sh`** — one-shot E2E verification script: rebuild, binary check, permission consistency, screenshot capture, doctor cross-check.
- **`--request-screen-recording` flag** on `permission-check` binary — optional TCC dialog trigger for Screen Recording.
- **`processPath` + `bundleId`** in all permission check responses — aids TCC debugging.
- **30s TTL cache** on A11y shell availability — permission grants mid-session are now detected without restart.
- **macOS native binary verification** in `scripts/verify-install.js` — warns on missing binaries at `npm install` time.
- **`setup` script auto-builds** native binaries on macOS (inside `npm run setup`).

### Changed
- **`build.sh`** — marked executable in git, fails fast on missing binaries (was silently warning), better error guidance.
- **Installer** — verifies all 4 required binaries (not just `ClawdCursorHost`), uses `bash ./build.sh` for portability.
- **`doctor.ts`** — permission check unified via `native-helper` module; triggers system permission dialogs if denied.
- **Email flow keyboard shortcuts** — platform-aware: `Ctrl+Enter` → `Shift+Cmd+D` on macOS, `Ctrl+H` → `Cmd+Option+F` for Find & Replace.
- **`sharp`** bumped `^0.33.0` → `^0.33.5`.

### Platform Safety
No Windows or Linux code paths affected. All macOS changes are gated behind `IS_MAC` / `process.platform === 'darwin'` / `isMacOS()`.

## [0.7.13] - 2026-04-10 — Unified Permission Checks + Screenshot Helper

### Fixed
- **Permission check fragmentation** — doctor, status, and readiness each used different permission APIs, producing contradictory results. All now route through `ClawdCursorHost /status` → `permission-check` binary → direct `AXIsProcessTrusted` fallback.
- **Screenshot CPU spin** — delegated `takeScreenshot()` to `screenshot-helper` subprocess, eliminating the ReplayKit CPU spike on macOS 14+.
- **Installer binary verification** — now checks all 4 required binaries (`ClawdCursorHost`, `clawdcursor-helper`, `screenshot-helper`, `permission-check`) instead of just `ClawdCursorHost`.
- **`build.sh` silent failures** — `swift build` errors now fail the build immediately with actionable guidance.

### Added
- **`clawdcursor grant` command** — triggers macOS system permission dialogs for Accessibility and Screen Recording.
- **`processPath` + `bundleId`** in permission check responses for TCC debugging.
- **`--request-screen-recording` flag** on `permission-check` binary.

## [0.7.12] - 2026-04-09 — Comprehensive macOS TCC Fix

### Fixed
- **Bash pipeline bug** — `set -o pipefail` added; build failures now properly detected (was silently passing due to pipeline exit status bug)
- **Ad-hoc signing by default** — build.sh now always signs the app (required for TCC on macOS 26+ Tahoe where unsigned binaries don't appear in privacy settings)
- **Build error capture** — uses temp file instead of pipe to properly capture exit status
- **TCC permission check** — runs permission-check after build to show current accessibility/screen recording status

### Changed
- **build.sh rewritten** — cleaner structure, ad-hoc signing is default (not optional), signature verification added
- **Codesign uses --deep** — ensures all nested binaries are signed
- **Installer shows TCC status** — tells user exactly which permissions need to be granted and where

### Technical Details
The core issue was TCC (Transparency, Consent, and Control) on macOS binds permissions to the code signing identity. Without signing:
- On macOS 26+ (Tahoe), unsigned binaries don't appear in System Settings privacy panels at all
- Users saw "ClawdCursorHost binary not found" errors even though install appeared to succeed

Reference: mediar-ai/mcp-server-macos-use for TCC permission handling patterns.

## [0.7.11] - 2026-04-09 — macOS Installer Fix

### Fixed
- **macOS installer now fails loudly if native host build fails** — was silently swallowing build errors and claiming "optional fallback" that doesn't exist
- **Added verification step** — installer explicitly checks ClawdCursorHost binary exists before declaring success
- **Show build output** — Swift build errors are now visible instead of redirected to /dev/null
- **Clear error messages** — tells users exactly what went wrong and how to fix it (xcode-select --install, manual rebuild, etc.)

### Changed
- macOS native host is now correctly marked as REQUIRED, not optional
- Installer exits with error code 1 if native build fails on macOS

## [0.7.10] - 2026-04-08 — Guided Setup Flow

### Changed
- **Installer shows next steps** — after install, displays clear guidance: `clawdcursor doctor` → `clawdcursor start`
- **Doctor shows run options** — after passing all checks, shows both `start` (full agent) and `serve` (tools-only) modes
- **Consent shows next step** — after granting consent, directs users to `clawdcursor doctor`

## [0.7.9] - 2026-04-08 — UX Improvements

### Changed
- **macOS permission messages** — now direct users to enable "ClawdCursor" instead of "Terminal/Node"
- **Screen Recording path** — updated to "Screen & System Audio Recording" (macOS Sequoia naming)

## [0.7.8] - 2026-04-08 — Documentation Fix

### Fixed
- **Installer comments updated** — example version references now point to v0.7.8

## [0.7.7] - 2026-04-08 — Installer Fixes

### Fixed
- **Installers default to main branch** — install.sh and install.ps1 now use `main` instead of hardcoded non-existent tag
- **macOS installer builds native helper** — install.sh now runs `./native/build.sh` on Darwin if Swift is available
- **Version override support** — `VERSION=v0.7.7 curl ... | bash` or `$env:VERSION='v0.7.7'` to install specific release
- **Auto-pull on update** — installers now run `git pull` after checkout to get latest changes

## [0.7.6] - 2026-04-08 — macOS Native Host App

### Added
- **macOS Host App (ClawdCursorHost)** — new native Swift executable that runs as the app bundle's main process, owning all TCC permissions (Accessibility, Screen Recording) under a single app identity
- **Localhost IPC server** — host app exposes `GET /health`, `GET /status`, `POST /rpc` on `127.0.0.1:3848` for CLI→host communication
- **Token-based authentication** — `~/.clawdcursor/host-token` (mode 0600) secures the IPC channel
- **Auto-launch/stop** — `clawdcursor start` ensures host is running; `clawdcursor stop` gracefully quits it
- **New Swift helper methods** — `moveMouse`, `dragMouse`, `captureScreen` for smoother native macOS automation
- **Menu bar presence** — host app shows 🐾 icon in menu bar for visibility

### Security
- **Localhost-only binding** — IPC server uses `NWParameters.requiredLocalEndpoint` to bind to `127.0.0.1` only, rejecting connections from other machines
- **Token file permissions** — host-token created with mode 0600 (owner read/write only)

### Changed
- `src/native-helper.ts` — routes all macOS desktop operations through host IPC instead of direct stdio
- `src/native-desktop.ts` — 11 platform-guarded code paths delegate to host on macOS
- `src/index.ts` — start/stop commands manage host app lifecycle
- `native/ClawdCursor.app/Contents/Info.plist` — bundle identifier changed to `com.clawdcursor.app`, executable to `ClawdCursorHost`

### Unchanged
- **Windows/Linux** — all macOS code behind `IS_MAC && this.helper` guards; no behavior changes on other platforms
- **172 tests pass** — full test suite unchanged

## [0.6.3] - 2026-03-01 — Universal Pipeline, Multi-App Workflows, Provider-Agnostic

### Added
- **LLM-based universal task pre-processor** — one cheap text LLM call decomposes any natural language into `{app, navigate, task, contextHints}`, replacing brittle regex parsing
- **Multi-app workflow support** — copy/paste between apps (e.g. Wikipedia → Notepad) with 6-checkpoint tracking: first_app_focused → first_app_action_done → content_copied → second_app_opened → content_pasted → result_visible
- **Site-specific keyboard shortcuts** — Reddit (j/k/a/c), Twitter/X (j/k/l/t/r), YouTube (Space/f/m), Gmail (j/k/e/r/c), GitHub (s/t/l), Slack (Ctrl+k), plus generic hints
- **OS-level default browser detection** — reads Windows registry (HKCU ProgId) or macOS LaunchServices instead of hardcoded Edge/Safari
- **3 verification retries with step log analysis** — when verification fails, builds a digest of recent actions + checkpoint status so the vision LLM can fix the specific missed step
- **Mixed-provider pipeline support** — e.g. kimi for text, anthropic for Computer Use, with per-layer API key resolution from OpenClaw auth-profiles
- **`ComputerUseOverrides` interface** — apiKey, model, baseUrl per-layer for mixed-provider setups
- **`resolveProviderApiKey()` helper** — reads OpenClaw auth-profiles to find the right API key per provider

### Fixed
- **Checkpoint system overhaul** — removed auto-termination (completionRatio ≥ 0.90 early exit and isComplete() mid-loop kill), strict detection: content_pasted requires Ctrl+V, content_copied requires Ctrl+C, second_app_opened detects any window switch universally
- **Pipeline context passing** — `priorContext[]` accumulator flows from pre-processing through to Computer Use (no more amnesia between layers)
- **Credential resolution order** — .clawdcursor-config → auth-profiles.json → openclaw.json (with template expansion) → env vars
- **`loadPipelineConfig()` path resolution** — checks package dir first, then cwd (fixes global npm installs)
- **Smart Interaction model lookup** — uses `PROVIDERS` registry instead of hardcoded model/baseUrl maps; fixes stale `claude-haiku-3-5-20241022` fallback
- **Scroll behavior** — system prompts instruct PageDown/Space instead of tiny mouse scrolls; default scroll delta 3 → 15
- **Provider-agnostic internals** — all comments and logs say "vision LLM" instead of "Claude"
- **Verification retry limit** — max 3 retries prevents infinite verification loops
- **Universal checkpoint detection** — no hardcoded app lists; `detectTaskType()` uses action patterns only

### Changed
- Pipeline architecture: LLM Pre-processor → Pre-open app + navigate → L0 Browser → L1 Action Router + Shortcuts → L1.5 Smart Interaction → L2 A11y Reasoner → L3 Computer Use
- Pre-processor prompt hardened with NEVER rules (never summarize, never drop steps) and VALIDATION RULE
- MULTI-APP WORKFLOWS section added to both Mac and Windows Computer Use system prompts
- Checkpoint thresholds tightened: early completion 75% → 90%, skip-verification 50% → 80%

## [0.6.5] - 2026-02-28 — Checkpoint System, Task Completion Detection

### Added
- **Checkpoint-based task completion** — Computer Use tracks milestones (compose opened → fields filled → send pressed → compose closed) and stops when all checkpoints are met. No more wasted calls after successful completion.
- **Task type detection** — auto-classifies tasks (email, form, navigate, draw, file_save) and applies appropriate checkpoint templates.
- **Smart early termination** — when Claude says "done" and ≥75% checkpoints confirmed, accepts completion immediately.
- **Auto-config on first run** — `clawdcursor start` auto-detects providers without needing `clawdcursor doctor`.
- **Universal provider support** — any OpenAI-compatible endpoint works via `--base-url`.
- **CLI model selection** — `--text-model` and `--vision-model` flags.

### Fixed
- **Email domain extraction bug** — "send to user@hotmail.com" no longer navigates to hotmail.com. Email addresses are stripped before URL matching.
- **Verification override bug** — verification no longer contradicts confirmed checkpoint completion. Skipped when ≥50% checkpoints met.
- **Context loss between layers** — Computer Use now receives full context of what pre-processing already did.
- **Drawing quality** — minimum 50px drag distances enforced via system prompt.
- **OpenClaw credential discovery** — multi-provider scan, template variable resolution, no false overrides.
- **Pipeline gate** — Action Router always runs, shortcuts work everywhere.

### Changed
- Pipeline pre-processes "open X and Y" tasks — opens app via Action Router (free), then hands remaining task to deeper layers.
- Smart Interaction detects visual loop tasks (draw, paint) and skips to Computer Use.
- Computer Use system prompt includes Snap Assist handling and drawing guidelines.

## [0.6.2] - 2026-02-28 — Universal Provider Support, Auto-Config

### Added
- **Auto-config on first run** — `clawdcursor start` auto-detects and configures providers without needing `clawdcursor doctor` first. Doctor is now optional for fine-tuning.
- **Universal provider support** — any OpenAI-compatible endpoint works. Not limited to 7 hardcoded providers. Use `--base-url` + `--api-key` for custom endpoints.
- **CLI model selection** — `--text-model` and `--vision-model` flags on start command.
- **Dynamic OpenClaw provider mapping** — reads ALL providers from OpenClaw config, not just known ones. NVIDIA, Fireworks, Mistral, etc. work automatically.

### Changed
- `clawdcursor start` now auto-runs setup if no config exists (non-interactive)
- Provider detection accepts any provider name, falling back to OpenAI-compatible API
- `detectProvider()` returns 'generic' for unknown providers instead of defaulting to 'openai'

## [0.6.1] - 2026-02-28 — Keyboard Shortcuts, Pipeline Fixes

### Added
- **Keyboard shortcuts registry** (`src/shortcuts.ts`) — 30+ common actions mapped to direct keystrokes. Scroll, copy, paste, undo, reddit upvote/downvote, browser shortcuts, and more. Zero LLM calls.
- **Fuzzy shortcut matching** — "scroll the page down" fuzzy-matches to scroll-down shortcut. Context-aware matching for social media actions.
- **Router telemetry** — Action Router now logs match type, confidence, and shortcut hits.
- **CDP→UIDriver fallback** — Smart Interaction falls back to accessibility tree automation when browser CDP path fails.
- **Gmail, Outlook, Hotmail** added to Browser Layer site map.

### Fixed
- **Pipeline gate bug** — Action Router was gated behind `!isBrowserTask`, causing shortcuts to be skipped for browser-context tasks (e.g., "reddit upvote" matched browser regex but should use shortcut). Action Router now always runs after Browser Layer.
- **URL extraction false positives** — "open gmail and send email to foo@bar.com" no longer extracts `bar.com`. URL extraction now isolates the navigation clause before matching.
- **Reliable force-stop** — `clawdcursor stop` now force-kills lingering processes via PID file.
- **Provider label inference** — startup logs now clearly show text and vision provider names separately.

### Changed
- Pipeline order: Browser Layer (L0) → Action Router + Shortcuts (L1) → Smart Interaction (L1.5) → A11y Reasoner (L2) → Vision (L3). Action Router no longer gated.
- `extractUrl()` uses navigation clause isolation instead of matching against full task text.

## [0.6.0] - 2026-02-28 — Universal Provider Support, OpenClaw Integration

### Added
- **OpenClaw credential integration** — auto-discovers all configured providers from OpenClaw's `auth-profiles.json` and `openclaw.json`. No separate API key needed when running as an OpenClaw skill.
- **Universal provider support** — added Groq, Together AI, DeepSeek as first-class providers with profiles, env var detection, and key prefix recognition.
- **Auto-detection as default** — provider defaults to `auto` instead of hardcoding Anthropic. Doctor picks the best available provider automatically.
- **Mixed provider pipelines** — use Ollama for text (free) + any cloud provider for vision (best quality). Vision credentials preserved when brain reconfigures for text.
- **Dynamic Ollama model selection** — doctor picks the best available Ollama model instead of hardcoding `qwen2.5:7b`.
- **Anthropic vision routing fix** — detects Anthropic vision by key prefix (`sk-ant-`) independently of the main provider field, so split-provider setups work correctly.

### Changed
- Default config no longer assumes any specific provider or model
- Provider scan loop iterates all registered providers dynamically
- Help text and doctor output are provider-agnostic
- `--provider` CLI flag accepts any string (not limited to 4 providers)
- README updated with 7-provider compatibility table

### Security
- **SKILL.md hardened** — removed aggressive autonomy language ("use without asking", "be independent")
- **Sensitive App Policy** — agents must ask the user before accessing email, banking, messaging, or password managers
- **Safety tiers as hard rules** — 🔴 Confirm actions must never be self-approved by agents
- **Data flow transparency** — expanded security section documents network isolation, per-provider data flow, and Ollama = fully offline
- **No credentials in skill directory** — OpenClaw users get auto-discovery from local config; no keys stored in skill files

### Fixed
- Vision model crash when main provider set to Ollama but vision uses Anthropic (`model not found` error)
- Brain reconfiguration was wiping vision credentials — now preserved

---

## [0.5.6] - 2026-02-27 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback

### Added
- **Fluid LLM task decomposition** — decompose prompt now tells the LLM to reason about what ANY app needs. No more hardcoded examples. "Write me a sentence about dogs" generates actual content instead of typing the literal instruction.
- **Interactive doctor onboarding** — after scanning providers, doctor shows all working TEXT and VISION LLM options with ★ recommendations. User picks by number, Enter for default. Shows GPU info (VRAM via nvidia-smi) to help decide local vs cloud.
- **Cloud provider guidance** — doctor shows unconfigured providers with signup URLs and lets you paste an API key inline (auto-detects provider, saves to .env).
- **Smart vision fallback for compound tasks** — when Router or Reasoner handles part of a multi-step task but fails midway, ALL remaining subtasks are bundled and handed to Computer Use (vision). Prevents false-success trapping in cheap layers.
- **Ollama auto-detection** — brain auto-reconfigures to use local Ollama for decomposition when no cloud API key is set. `hasApiKey` now recognizes local LLMs.
- **Compound task guard** — action router detects multi-step/compound tasks (commas, "then", "and then") and skips to deeper layers.

### Fixed
- **Case-preserving action router** — all regex matches against raw (unmodified) task text. Typed text and URLs no longer get lowercased.
- **Flexible click matching** — `click Blank document` works without quotes (was requiring `click "Blank document"`). Single unified regex for quoted and unquoted element names.
- **PowerShell encoding** — replaced emoji (🐾) and em dash (—) in task console title that broke on Windows PowerShell due to encoding.
- **Stale config** — `.clawdcursor-config.json` now correctly reflects Ollama when doctor detects it (was stuck on Anthropic).
- **Brain provider mismatch** — decomposition no longer calls Anthropic API when only Ollama is available.

### Changed
- **`npm run setup`** — new script that builds and registers `clawdcursor` as a global command via `npm link`. Works on Windows, macOS, and Linux.
- **Stop/kill port validation** — port input is now sanitized (parseInt + range check 1-65535) to prevent command injection
- **Kill health verification** — kill command now verifies `/health` returns a Clawd Cursor response before force-killing
- **Install instructions updated** — README and docs now use `npm run setup`

### Test Results
| Task | Pipeline Path | Steps | LLM Calls | Time | Result |
|------|--------------|-------|-----------|------|--------|
| Open Notepad | Action Router | 1 | 0 | 1.5s | ✅ |
| Open Notepad + write haiku | Router → Smart Interaction → Computer Use | 6 | 7 | 58.8s | ✅ Verified |
| Open Google Doc in Edge + write sentence | Browser → Computer Use | 17 | 9 | 78.8s | ✅ Verified |

## [0.5.5] - 2026-02-26 — Install/Uninstall, OpenClaw Auto-Registration, Doctor UX

### Added
- **`clawdcursor install`** — one command to set up API key, configure pipeline, and register as OpenClaw skill
- **`clawdcursor uninstall`** — clean removal of all config, data, and OpenClaw skill registration
- **Doctor auto-registers as OpenClaw skill** — symlinks into `~/.openclaw/workspace/skills/clawdcursor`
- **Doctor quick fix commands** — shows exact commands for missing text LLM and vision LLM in summary
- **Dashboard favorites** — star commands to save them, click to re-run, persists across server restarts
- **Credential detection** — warns when starring tasks that contain API keys or passwords
- **OS tabs on website** — Windows/macOS/Linux with auto-detect
- **Post-build help message** — shows all available commands after `npm run build`
- **Dynamic OS detection** — system prompt uses actual OS instead of hardcoded "Windows 11" (thanks @molty)

### Fixed
- **Windows skill detection** — removed `requires.bins` from SKILL.md; OpenClaw's `hasBinary()` doesn't handle Windows PATHEXT (`.exe`/`.cmd`), causing the skill to show as "missing" even when node is installed

### Changed
- **SKILL.md rewritten** — agent identity shift framing, trigger lists, CDP direct path, async polling, error recovery
- **Security hardened** — agents cannot self-approve confirm-tier actions, autonomous use scoped to read-only
- **Privacy language clarified** — explicit per-provider data flow
- **Website Get Started simplified** — 3 lines, commands shown in terminal post-build
- **Anthropic text model updated** — `claude-haiku-4-5` (was `claude-3-5-haiku-20241022`)

## [0.5.4] - 2026-02-25 — SKILL.md Rewrite + Security Hardening

### Changed
- **Privacy language clarified** — explicit per-provider data flow (Ollama = fully local, cloud = data to that API only)
- **Added homepage and source URLs** to skill metadata
- **Removed hard-coded paths** from SKILL.md
- **Security section expanded** — includes localhost bind verification command
- **Security scan addressed** — all flagged documentation gaps resolved

## [0.5.3] - 2026-02-25 — SKILL.md Rewrite for Agent Autonomy

### Changed
- **SKILL.md rewritten** — agents now understand they have full desktop control and stop asking users to do things they can do themselves
- **Agent identity shift framing** — blockquote at top overrides default "I can't do desktop things" behavior
- **"When to Use This" trigger list** — comprehensive decision framework for when to reach for Clawd Cursor
- **Two paths documented** — REST API (port 3847) for full desktop control, CDP Direct (port 9222) for fast browser reads
- **Async flow clarified** — concrete polling pattern agents can follow step-by-step
- **Error recovery table** — 8 common problems with exact solutions
- **Expanded task examples** — cross-app workflows, data extraction, verification scenarios
- **README** — added OpenClaw Integration section

## [0.5.2] - 2026-02-25 — Web Dashboard + Browser Foreground Focus

### Added
- **Web Dashboard** — full single-page UI served at `GET /` (port 3847). Task submission, real-time logs, status indicators, approve/reject for safety confirmations, kill switch. Dark theme, fully responsive, zero external dependencies.
- **`clawdcursor dashboard`** — CLI command to open the dashboard in your default browser
- **`clawdcursor kill`** — CLI command to send a stop signal to the running server
- **`GET /logs`** — API endpoint returning last 200 log entries with timestamps and levels
- **Browser foreground focus** — Playwright navigation now brings Chrome to the front via `page.bringToFront()` + OS-level window activation (PowerShell `SetForegroundWindow` on Windows, `osascript` on macOS). The AI acts like a visible cursor — you see everything it does.
- **Console hook** — `hookConsole()` intercepts all server logs for the dashboard log feed with auto-classification (error/success/warn/info)

### Changed
- **Smart task handoff** — Browser layer no longer uses regex word lists to detect multi-step tasks. Pure navigation ("open youtube") completes in browser layer; anything more complex falls through to SmartInteraction where the LLM plans the steps. No more missed verbs.

### Architecture
```
Layer 0: Browser (Playwright) — navigate + foreground focus
    ↓ more than navigation? → fall through
Layer 1: Action Router — regex patterns, zero LLM calls
    ↓ no match? → fall through
Layer 1.5: Smart Interaction — 1 LLM call plans steps, CDP/UIDriver executes
    ↓ failed? → fall through
Layer 2: Accessibility Reasoner — reads UI tree, cheap LLM
    ↓ failed? → fall through
Layer 3: Screenshot + Vision — full screenshot, Computer Use API
```

## [0.5.1] - 2026-02-23 — HD Screenshots + Focus Stability

### Fixed
- **HD screenshots** — LLM resolution increased from 1024px to 1280px (scale 2x instead of 2.5x). Claude can now reliably identify toolbar icons, buttons, and small UI elements.
- **JPEG quality** — bumped from 55 to 65 for clearer icon identification
- **Window focus stability** — `Win+D` minimizes all windows before task execution, preventing the Clawd terminal from stealing focus from target apps
- **Paint drawing reliability** — pencil tool guidance in system prompt, mandatory checkpoint after tool selection
- **Stale file cleanup** — restored `get-windows.ps1` shim (still referenced by accessibility.ts), removed dead `setup.ps1` and `get-ui-tree.ps1`

### Performance (Paint stickman benchmark)
| Metric | v0.5.0 | v0.5.1 |
|--------|--------|--------|
| Time | ~250s | **55s** |
| API calls | 30 | **6** |
| Success rate | ~50% | ~90% |

## [0.5.0] - 2026-02-23 — Smart Pipeline + Doctor + Batch Execution

### Added
- **`clawdcursor doctor`** — auto-diagnoses setup, tests models, configures optimal pipeline
- **3-layer pipeline** — Action Router → Accessibility Reasoner → Screenshot fallback
- **Layer 2: Accessibility Reasoner** (`src/a11y-reasoner.ts`) — text-only LLM reads the UI tree, no screenshots needed. Uses cheap models (Haiku, Qwen, GPT-4o-mini).
- **Batch action execution** — Claude returns multiple actions per response (3.6 avg), skipping screenshots between batched actions. Drawing tasks execute 10+ actions in a single API call.
- **Focus hints** — each screenshot includes a FOCUS directive telling Claude where to look, reducing output tokens and decision time
- **Auto-maximize** — apps launched via Action Router are automatically maximized (`Win+Up`) for consistent layout
- **Region capture** — `captureRegionForLLM()` crops screenshots to specific areas (2-30KB vs 58KB full)
- **Checkpoint strategy** — screenshots only after critical state changes (app open, dialog appear), not after every action
- **Multi-provider support** — Anthropic, OpenAI, Ollama (local/free), Kimi. Same codebase, auto-detected.
- **Provider model map** (`src/providers.ts`) — auto-selects cheap/expensive models per provider
- **Self-healing** — doctor falls back if a model is unavailable (e.g., Haiku → Qwen). Circuit breaker disables failing layers at runtime.
- **Streaming LLM responses** — early JSON return saves 1-3s per call
- **Combined accessibility script** (`scripts/get-screen-context.ps1`) — 1 PowerShell spawn instead of 3
- **Benchmark harness** (`test-perf-comparison.ts`)

### Performance
- Screenshots: 120KB → ~80KB, 1280px target (HD for reliable icon identification)
- JPEG quality: 70 → 65
- Delays: 200-1500ms → 50-600ms across the board
- System prompts: ~60% smaller (fewer tokens per call)
- Accessibility tree: filtered to interactive elements only, 3000 char cap
- Taskbar cache: 30s TTL (was queried every call)
- Screen context cache: 500ms → 2s TTL

### Benchmarks

| Task | v0.4 | v0.5 (Ollama, $0) | v0.5 (Anthropic) | v0.5 + Batch |
|------|------|--------|---------|---------|
| Calculator | 43s | 2.6s | 20.1s | — |
| Notepad | 73s | 2.0s | 54.2s | — |
| File Explorer | 53s | 1.9s | 22.1s | — |
| Paint stickman | ~250s (30 calls) | — | ~124s (19 calls) | **101s (11 calls)** |
| GitHub profile | — | — | ~106s (15 calls) | — |

## [0.4.0] - 2026-02-22 — Native Desktop Control

**VNC removed.** Clawd Cursor now controls the desktop natively via @nut-tree-fork/nut-js. No VNC server required.

### Breaking Changes
- `--vnc-host`, `--vnc-port`, `--vnc-password` CLI flags removed
- `VNC_PASSWORD`, `VNC_HOST`, `VNC_PORT` environment variables no longer used
- `rfb2` dependency removed
- `setup.ps1` no longer installs TightVNC

### Added
- `NativeDesktop` class (`src/native-desktop.ts`) — drop-in replacement for VNCClient
- Direct screen capture via @nut-tree-fork/nut-js (~50ms vs ~850ms)
- Direct mouse/keyboard control via OS-level APIs
- Simplified onboarding: `npm install && npm start`

### Performance
- Screenshots: ~850ms → ~50ms (17× faster)
- Connect time: ~200ms → ~38ms (5× faster)
- Simple task (Google Docs sentence): ~120s → ~102s
- Complex task (GitHub → Notepad → save): ~200s → ~156s

### Removed
- VNC server dependency (TightVNC)
- `rfb2` npm package
- VNC-related CLI flags and environment variables
- BGRA→RGBA color swap (nut-js returns RGBA natively)

## [0.3.3] - 2025-03-15

### Bulletproof Headless Setup
- setup.ps1 now completes end-to-end in a single run on fresh systems, even in non-interactive/headless AI agent shells
- Generate random VNC password when `--vnc-password` not provided non-interactively
- Replace `Start-Process -NoNewWindow -Wait` with `-PassThru -WindowStyle Hidden` + try/catch (msiexec crash fix)
- Wrap `Start-Service` in its own try/catch (post-install crash fix)
- Replace all emoji with ASCII tags for cp1252 headless terminal compatibility

## [0.3.1] - 2025-03-10

### SKILL.md Security Hardening
- Added YAML frontmatter, explicit credential declarations, privacy disclosure, and security considerations for ClaWHub publishing.

## [0.3.0] - 2025-03-01

### Performance Optimizations (~70% faster)
- Screenshot hash cache — skips LLM calls when the screen hasn't changed
- Adaptive VNC frame wait — captures in ~200ms instead of fixed 800ms
- Parallel screenshot + accessibility fetch — runs concurrently via Promise.all
- Accessibility context cache — 500ms TTL eliminates redundant PowerShell queries
- Async debug writes — no longer blocks the event loop
- Exponential backoff with jitter — better retry resilience for API calls

## [0.2.0] - 2025-02-21

### 🚀 Major: Anthropic Computer Use API

Clawd Cursor now supports Anthropic's native Computer Use API (`computer_20250124`) as the **primary execution path**. This is a fundamentally different approach — the full task goes directly to Claude with native computer use tools. No decomposition, no routing. Claude sees screenshots, plans, and executes natively.

### Dual Execution Paths

The agent now has two separate code paths selected by provider:

- **Path A — Computer Use API** (`--provider anthropic`): Full task sent to Claude with `computer_20250124` tool. Claude sees the screen, plans multi-step sequences, and executes them natively. Handles complex, multi-app workflows reliably.
- **Path B — Decompose + Action Router** (`--provider openai` / offline): Original approach from v0.1.0. Parse task → subtasks → Action Router (UI Automation, zero LLM) → Vision fallback. Faster and cheaper for simple tasks, works without an API key.

### Added

- **Anthropic Computer Use integration** — native `computer_20250124` tool type with `anthropic-beta: computer-use-2025-01-24` header
- **Adaptive delays** — per-action timing: 1000ms for app launch, 800ms for navigation, 100ms for typing, 300ms default
- **Verification hints** — post-action verification prompts after each Computer Use step
- **Mouse drag** — `mouseDrag`, `mouseDown`, `mouseUp` with smooth interpolation between points
- **Bulletproof system prompt** — planning rules, ctrl+l for URL navigation, recovery strategies for failed actions
- **Display scaling** — automatic resolution scaling to 1280×720 for Computer Use API compatibility
- **Vision model** — `claude-sonnet-4-20250514` for Computer Use path

### Test Results

| Task | Time | API Calls | Result |
|------|------|-----------|--------|
| Google Docs: open Chrome, go to Docs, write a paragraph | 187s | 14 | ✅ All succeeded |
| GitHub: open Chrome, navigate to profile, screenshot | 102s | — | ✅ All succeeded |
| Notepad: open, write haiku, save to desktop | ~180s | — | ✅ File saved correctly |
| Paint: draw a stick figure | ~90s | 16 | ✅ Drawing completed |

### Breaking Changes

- **Provider selection now determines execution path.** `--provider anthropic` uses Computer Use API (Path A). `--provider openai` or no provider uses the original Decompose + Action Router pipeline (Path B). This is a fundamental change in behavior — the same task will execute via completely different code paths depending on the provider.

### Performance Characteristics

| | Path A (Computer Use) | Path B (Action Router) |
|---|---|---|
| Best for | Complex multi-step tasks | Simple single-action tasks |
| Reliability | Very high | Good for supported patterns |
| Speed | ~90–190s for complex tasks | ~2s for simple tasks |
| Cost | Higher (multiple API calls with screenshots) | Lower (1 text call or zero) |
| Offline | No | Yes (for common patterns) |

## [0.1.0] - 2025-01-15

### Initial Release

- Action Router with Windows UI Automation — 80% of common tasks with zero LLM calls
- Vision fallback for complex/unfamiliar UI
- Smart task decomposition (single text-only LLM call)
- Three-tier safety system (Auto / Preview / Confirm)
- REST API and CLI interface
- Windows setup script