--- name: test-feature description: "End-to-end feature testing — browser QA, API verification, eval tests, or any combination. Covers browser interactions (via agent-browser CLI), Google Workspace operations (gws CLI), API calls, and LLM eval tests. Can also persist tests as reusable QA flows or eval files." disable-model-invocation: true argument-hint: "" --- Args: $ARGUMENTS You are an end-to-end feature tester for Inbox Zero. Your job is to verify that a feature works correctly by whatever means necessary — browser, API, CLI, or writing an eval test. ## When invoked The user will describe a feature to test, or you can infer it from recent code changes. If the description is vague, check `git diff` and `git log` for recent changes to understand what was built. The user may point you to an existing worktree, branch, or PR to test against. If so, `cd` into that directory, run the environment setup from there, and use a different port if the main dev server is already running (e.g., `PORT=3001 pnpm dev`). ## Step 0: Environment setup Before testing, make sure the local environment is ready. These steps are idempotent — skip any that are already done. 1. **Check if the dev server is running**: `curl -s -o /dev/null -w "%{http_code}" http://localhost:3000` — if you get a response, skip to step 5 (but still check steps 2-4). 2. **Ensure `.env` exists**: If `apps/web/.env` is missing (common in worktrees), try symlinking from a shared location: ```bash ln -sf ~/.inbox-zero/.env apps/web/.env ln -sf ~/.inbox-zero/.env.test apps/web/.env.test # for eval tests ``` If those symlink sources don't exist, ask the user where their env file is. 3. **Enable required feature flags**: Check `.env.example` for any env vars the feature needs (e.g. `NEXT_PUBLIC_EXTERNAL_API_ENABLED=true`). If any are missing from `apps/web/.env`, add them now. **IMPORTANT**: `NEXT_PUBLIC_*` vars are baked in at build time — if you add one to `.env` while the dev server is running, you MUST restart the server for it to take effect. Do this BEFORE testing, not after. Never skip this step and report "feature not enabled" as a finding — that's a setup failure, not a test result. 4. **Install dependencies**: `pnpm install` (if `node_modules` looks stale or missing). 5. **Start the dev server** (if needed for browser/API tests): `pnpm dev` in the background. Wait for it to be ready before proceeding — poll `localhost:3000` until it responds (up to 60 seconds). If you added `NEXT_PUBLIC_*` env vars in step 3 and the server was already running, stop it first and restart it here. ## Step 1: Plan the test Before doing anything, decide the right testing approach. Often you'll combine multiple: | What you're testing | Approach | |---|---| | UI behavior, settings pages, visual changes | Browser QA — interact with the app, take screenshots | | Google Workspace integrations (Drive, Calendar, Gmail) | `gws` CLI for data setup + browser for verification | | API endpoints | Direct HTTP calls (curl/fetch), possibly via the app's API with an API key | | AI/LLM output quality (drafts, categorization, rules) | Eval test — write or run a test in `__tests__/eval/` | | Email processing workflows | E2E flow test or browser QA depending on scope | Tell the user your plan in 2-3 sentences before executing. If you need access or credentials you don't have, say so upfront. ## Step 2: Set up test data Create whatever test data the feature needs. Examples: - **Google Drive**: Use `gws drive files create` to make folders/files, or do it in the browser - **Gmail**: Use `gws gmail users messages send` or send a test email through the browser - **Calendar**: Use `gws calendar events insert` to create test events - **App config**: Use the browser to configure settings (rules, writing style, connected accounts, etc.) When using `gws`, prefer it for data setup since it's faster and more reliable than browser clicks for creating files/folders/events. Use the browser for app-specific configuration that only exists in our UI. ## Step 3: Execute the test ### Browser testing (via `agent-browser` CLI) Use the `agent-browser` skill for all browser interactions. The core loop is: open → snapshot → interact → re-snapshot → screenshot. - **Navigate by direct URL**: `agent-browser click` on sidebar links can be unreliable. Prefer `agent-browser --cdp 9222 open `. - **Set viewport to 1440x900**: Headless Chrome defaults to a tiny viewport. After connecting, set it via CDP: ```bash TARGET_ID=$(curl -s http://127.0.0.1:9222/json | node -p "JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).find(t=>t.type==='page'&&!t.url.startsWith('chrome')).id") node -e "const d=JSON.stringify({id:1,method:'Emulation.setDeviceMetricsOverride',params:{width:1440,height:900,deviceScaleFactor:1,mobile:false}});const ws=new WebSocket('ws://127.0.0.1:9222/devtools/page/$TARGET_ID');ws.onopen=()=>ws.send(d);ws.onmessage=()=>ws.close();" ``` #### Interacting with the chat input The chat textarea has `data-testid="chat-input"`. Use: ```bash agent-browser fill "[data-testid=chat-input]" "Your message here" agent-browser press Enter # submit sleep 15-30 # wait for AI response agent-browser screenshot /tmp/result.png ``` Key: `fill` and `type` require a **selector** as the first arg (CSS selector or `@ref`). Never call `type "some text"` without a selector — that's `keyboard type` (different command). When a CSS selector matches multiple elements, use `agent-browser snapshot` to get unique `@ref` identifiers. - Navigate the app as a user would - Take a screenshot at every meaningful step — the user wants to see what the UI looks like - Pay special attention to: loading states, error states, empty states, success confirmations - If testing a flow (e.g., email → rule → draft), wait for async operations to complete before checking results - Always `agent-browser close` when done to clean up #### App route reference - Assistant chat: `//assistant` - Assistant rules: `//automation` - Assistant settings: `//automation?tab=settings` - Bulk unsubscribe: `//bulk-unsubscribe` - Settings: `/settings` #### Browser authentication The app requires OAuth login. agent-browser can't complete OAuth, so you need a Chrome profile with an existing logged-in session. **Preferred approach: headless Chrome with a saved profile** The user should have a dedicated Chrome profile directory with a logged-in session (stored outside the repo, e.g. `~/.chrome-debug-inbox-zero`). Check the user's auto-memory for the profile path. Then launch Chrome headless and connect: ```bash # 1. Check if CDP is already running curl -s http://127.0.0.1:9222/json/version # 2. If not, launch Chrome headless with the saved profile "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --headless=new \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.chrome-debug-" &>/dev/null & sleep 3 # 3. Connect agent-browser agent-browser close curl -s -X PUT "http://127.0.0.1:9222/json/new?about:blank" > /dev/null sleep 2 agent-browser --cdp 9222 open http://localhost:3000/automation ``` This runs entirely in the background — the user doesn't need to do anything. **Important caveats:** - Chrome won't allow two instances with the same `--user-data-dir` — kill any existing debug Chrome before launching. - If auth cookies have expired, the user needs to launch Chrome **headed** (without `--headless=new`) once to re-login via OAuth, then you can go back to headless. - Google OAuth blocks agent-browser's built-in Chromium ("This browser or app may not be secure") — must use real Chrome. - `agent-browser` may attach to `chrome://` internal pages — close those via `agent-browser close` before connecting. **Fallback options:** 1. **Connect to user's running Chrome via CDP**: If the user already has Chrome open with `--remote-debugging-port=9222`, just use `agent-browser --cdp 9222`. 2. **Headed mode with profile**: Use `agent-browser --headed --profile ` to open a visible Chrome window. 3. **State file**: After signing in, save with `agent-browser state save ./auth.json` and reload later with `agent-browser --state ./auth.json`. Note: state files can expire. ### API testing - Get a real API key from the UI first (Settings → API Keys) — screenshot the process. **Do not test with fake or dummy API keys**; auth errors mask whether the actual feature works. - Configure the CLI/client with the real key, then make real API calls and verify the responses show the expected data. - Check both success and error cases. ### Eval testing - If the right approach is an eval test, check `__tests__/eval/` for existing tests that cover similar ground - Follow the eval test template in `.claude/skills/testing/eval.md` - Use `describeEvalMatrix` for cross-model comparison when relevant - Use `judgeMultiple` with appropriate `CRITERIA` for subjective outputs - Run with `pnpm test-ai eval/` ### Hybrid approaches Often the best test combines approaches. For example: - Use `gws` to create a Google Drive folder with a test PDF - Use the browser to configure the feature to use that folder - Trigger the feature (send an email, start a chat, etc.) - Verify the result in both the UI (screenshot) and via API/database ## Step 4: Report results **An error means the test failed.** Do not report success if any step produced an error, even if the error seems like a configuration issue. Either fix the configuration and retry, or report the failure clearly. Give a clear pass/fail summary: - What was tested - What worked - What failed — with screenshots and the actual error - What you did to try to fix it Always include screenshots — even for passing tests. The user wants to see what the UI looks like. ## Step 5: Persist (if appropriate) After testing, ask the user if this should become a reusable test. Two options: 1. **Browser QA flow** — if the test is primarily UI-driven and would catch regressions, create a flow spec in `qa/browser-flows/` following the template. This can then be re-run with `/qa-run`. 2. **Eval test** — if the test is about AI output quality, write a proper eval test in `__tests__/eval/` that can be run with `pnpm test-ai`. Don't persist trivial one-off checks (like "does this page load"). Persist tests that verify important behavior someone might break later. ## Tool reference ### gws CLI (Google Workspace) ```bash # Create a Drive folder gws drive files create --json '{"name": "Test Folder", "mimeType": "application/vnd.google-apps.folder"}' # Upload a file to a folder gws drive files create --json '{"name": "test.pdf", "parents": ["FOLDER_ID"]}' --upload ./test.pdf # List Drive files gws drive files list --params '{"q": "name contains '\''test'\''", "pageSize": 10}' # Send a Gmail message gws gmail users messages send --params '{"userId": "me"}' --json '{"raw": "BASE64_ENCODED_MESSAGE"}' # Create a calendar event gws calendar events insert --params '{"calendarId": "primary"}' --json '{"summary": "Test Event", "start": {"dateTime": "..."}, "end": {"dateTime": "..."}}' ``` ### Eval test utilities - `describeEvalMatrix(name, fn)` — run across models - `createEvalReporter()` — track pass/fail - `judgeMultiple({ input, output, criteria })` — LLM-as-judge - `CRITERIA.*` — ACCURACY, COMPLETENESS, TONE, CONCISENESS, NO_HALLUCINATION, CORRECT_FORMAT ### Existing QA infrastructure - Flow specs: `qa/browser-flows/*.md` - Flow runner: `/qa-run` - Flow creator: `/qa-new-flow` - E2E tests: `__tests__/e2e/flows/` - Eval tests: `__tests__/eval/`