---
name: reproduce
description: >
  Reproduce a bug with evidence before fixing it. Root cause hypothesis,
  tool-aware reproduction, and a failing test that proves the bug exists.
when_to_use: >
  After /create-bug (or when a bug spec exists) and before /fix.
  Triggered by "/reproduce", "/reproduce pN", or when /pick-flow recommends it.
  Skip only for trivial one-liner bugs where root cause is self-evident from the code.
version: 1.0.0
---

# /reproduce

Reproduce a bug with evidence. Output: a confirmed root cause hypothesis + a failing test.

> **Principle:** If you can't make it fail on demand, you don't understand it well enough to fix it.

## Usage

```bash
/reproduce p714                              # From bug spec
/reproduce features/p714_letter_link.md      # Full path
/reproduce "counts show 0 after refresh"     # Inline — auto-files via /create-bug first
```

**Announce at start:** "I'm using the reproduce skill to confirm this bug and write a failing test."

---

## How to Think

### The Falsification Lens
> "What's the cheapest test that would disprove my hypothesis?"

Never say "Root cause: X." Say "Hypothesis: X. Cheapest disproof: [test]. Run it?"
One DB query might kill the hypothesis in 10 seconds before you spend an hour building a fix on the wrong assumption.

### The Tool Selection Lens
> "What's the fastest path to observable evidence?"

Different bugs need different tools. Pick the right combination:

| Bug type | Primary tool | Why |
|----------|-------------|-----|
| Visual/UI (layout, styling, state) | **Claude in Chrome** (real browser, has cookies) | Sees what the user sees, auth-aware |
| Network/API (wrong response, 4xx/5xx) | **Chrome DevTools MCP** (headless) | Network panel, console logs, no auth needed for public routes |
| E2E flow (multi-step, navigation) | **Playwright** | Scriptable, repeatable, becomes the canary test |
| Two-party flow (sender + recipient) | **Playwright multi-context** or **Claude in Chrome + incognito** | Each party needs its own session |
| DB/RLS (wrong data, permission denied) | **curl REST API** or **Supabase MCP** (test DB) | Direct data verification, no UI needed |
| Auth/token (expired, consumed, wrong scope) | **Playwright** with test account + **curl** for token state | Token lifecycle needs both browser and API |
| Server-side (edge function, webhook) | **curl** + **console logs** | No browser needed |

**Two-party infrastructure:** For bugs in flows involving two users (letter sender/recipient, story creator/responder):
- **Playwright:** Use `browser.newContext()` for each party — separate cookies, separate auth
- **Claude in Chrome:** Use main window for party A, incognito for party B
- **Test accounts:** Check `e2e/` fixtures and `.env.local` for existing test credentials before creating new ones
- **Service role:** For verifying DB state across both parties, use service role key from `.env.local`

**Auth-gated pages:** Chrome DevTools MCP is headless with no cookies — blank page on auth routes. Use Claude in Chrome (real browser) or Playwright with test accounts.

---

## Workflow

### Phase 0: Setup + Context Load

**Phase 0.pre: Ensure spec exists**

If inline description (no P-number): invoke `/create-bug` first from main (w0) to file a tracked spec. Then continue with the resulting P-number.

**Phase 0.0: Branch management**

`/reproduce` does NOT create worktrees. It runs on the current branch (typically `main`). The canary test and spec updates are committed to `main`. When `/fix` later creates a worktree, it branches from `main` and picks up the canary test automatically.

**Exception:** If you're already in a worktree for this bug (e.g., `/fix` was attempted and failed, sending you back to `/reproduce`), stay in the worktree.

**Phase 0.1: Context load**

1. Read the full bug spec — reproduction steps, symptoms, affected files, any prior root cause notes.
2. **Status gate:** If spec shows `status: qa` or `status: done` → STOP. "P{N} is already at {status}. Nothing to reproduce."
3. Read the source file(s) mentioned in the spec. Verify current state matches assumptions.
4. If bug involves DB: read schema from `docs/technical/database.md` and relevant migration files. Do NOT query live DB for schema discovery.

**Phase 0.2: Pipeline Stamp (P659)**

1. Read spec frontmatter.
2. Set `delivery_stage: reproduce` and `status: in-progress`.
3. Append `reproduce` to `pipeline_ran` inline list. Edit pattern: match `pipeline_ran: [existing, items]`, replace with `pipeline_ran: [existing, items, reproduce]`. If `pipeline_ran` doesn't exist, add `pipeline_ran: [reproduce]`. Always inline format.
4. **Predecessor check:** If `pipeline_plan` exists, find the skill before `reproduce` in the plan. If that skill is NOT in `pipeline_ran` (exact match) → stop: "Run `/{predecessor}` first." Skip check if: (a) `pipeline_plan` absent, (b) this skill is first in plan, (c) `pipeline_ran` absent/empty and this is first planned skill.
5. If this skill is NOT in `pipeline_plan` → warn: "This skill wasn't in the planned flow. Proceed anyway?"

---

### Phase 1: Root Cause Hypothesis

**Goal:** Form a testable hypothesis about why the bug exists.

**Steps:**
1. Read the code paths involved in the bug. Trace from user action to symptom.
2. Form 1-3 hypotheses, ranked by likelihood.
3. For each hypothesis, name the **cheapest disproof** — the single test/query/check that would kill it fastest.

**Output:**
```
## Root Cause Analysis

**Hypothesis 1** (most likely): [description]
Cheapest disproof: [specific command/query/check]

**Hypothesis 2**: [description]
Cheapest disproof: [specific command/query/check]

→ Run cheapest disproof for H1?
```

4. **Wait for user approval** before running expensive checks. Run cheap ones (<10 seconds) immediately.
5. Execute the disproof. If hypothesis survives, it's confirmed. If killed, move to next hypothesis.
6. If all hypotheses killed: report "All hypotheses disproved. Recommend deeper investigation with debugging protocol." Do NOT proceed to Phase 2.

**Hard rule:** A hypothesis that survives one disproof is "confirmed for now" — not "proven." State confidence level.

---

### Phase 2: Live Reproduction

**Goal:** Trigger the bug reliably and capture evidence.

**Steps:**
1. Select tool(s) based on bug type (see Tool Selection Lens above).
2. Follow reproduction steps from spec. If steps are vague, refine them as you go.
3. Capture evidence:
   - **Visual bugs:** Screenshot showing the symptom
   - **Data bugs:** Query output showing wrong state
   - **Flow bugs:** Step-by-step log of what happened vs. what should have happened
   - **Error bugs:** Console output, network response, or error message

**Output:**
```
Bug reproduced: [yes/no]
Evidence: [screenshot path / query output / error message]
Reproduction rate: [100% / intermittent — N/M attempts]
Refined reproduction steps:
1. [step]
2. [step]
3. Bug occurs: [what happens]
```

**If can't reproduce after 3 attempts:**
- Flag: "Unable to reproduce after 3 attempts. Possible causes: [list]. Recommend: [next investigation step]."
- Do NOT proceed to Phase 3. The user decides whether to dig deeper or close.

---

### Phase 2b: Surface + Scenario Audit

**Goal:** Find every place and every way this bug can occur — not just the reported one.

This phase has two tracks. Run the one that fits the bug type; run both if the bug straddles both.

---

#### Track A — Surface Audit (UI/layout/rendering bugs)

**Why:** UI behavior bugs almost always affect multiple components. Fixing only the reported surface leaves the bug alive elsewhere and you file it again next month.

**Steps:**
1. Identify the core symptom as a grep pattern. Examples:
   - "position counts show 0" → grep for `PositionButtons`, `positionCounts`, `userPosition`
   - "button not highlighted" → grep for `useState.*null` near position rendering
2. Search codebase for every component that renders the affected behavior.
3. For each match, assess: is the bug present here too?
4. Present the full list:

```
Surface audit: [symptom pattern]

Found on 3 surfaces:
  1. Profile > Points tab          (reported)
  2. Profile > Stories tab          (same pattern — affected)
  3. Point detail page              (different pattern — NOT affected)

Which do you want fixed in this ticket?
- Fix all affected now?
- Fix 1 now, defer others? (I'll create tickets immediately)
```

5. Wait for user confirmation before proceeding.
6. For deferred surfaces: create bug tickets NOW via `./scripts/next-p-number.sh`. "Out of scope" without a ticket number is not allowed.

**Skip Track A for:** Infrastructure bugs (build, CI, migrations), pure logic bugs with zero UI behavior.

---

#### Track B — Scenario Audit (auth/flow/async/security bugs)

**Why:** Guards that work for one scenario can be bypassed by a different trigger sequence. Fixing one scenario leaves identical guards broken in others — same bug, different path. This is how "it worked once, then broke again" happens.

**Use Track B when the bug involves:** authentication state, tokens, URL parameters, async timing, race conditions, multi-party flows, or any guard that checks a condition before allowing access.

**Steps:**
1. **Enumerate trigger scenarios** — all the ways a user could arrive at this code path. Ask:
   - Who could be logged in? (correct user / wrong user / anon / briefly-anon due to race)
   - What URL state could be present? (token, hash fragment, redirect param, error hash, expired token)
   - What async timing windows exist? (auth settling, SIGNED_OUT→SIGNED_IN flicker, RPC delay)
   - What data state variations exist? (null field, already-completed, unclaimed, mode=one-to-many vs 1-to-1)

2. **Map scenarios to guards** — for each scenario, trace which guard would fire. Name the exact line/condition.

3. **Find unguarded scenarios** — scenarios where no guard fires or the guard condition is false:

```
Scenario audit: [guard being tested]

Scenario 1: Wrong user, receiver_email set, auth settled        → guard fires (line 292) ✓
Scenario 2: Wrong user, receiver_email null                     → guard skipped (null check) ✗
Scenario 3: Wrong user, auth briefly null (race condition)      → guard skipped (currentUser null) ✗
Scenario 4: Wrong user, hash error fragment from expired OTP    → [need to trace] ?

Unguarded: scenarios 2, 3, 4 — all must be covered by fix or explicitly deferred with tickets.
```

4. Wait for user confirmation on scope before proceeding.
5. For deferred scenarios: create bug tickets NOW. "Out of scope" without a ticket number is not allowed.

**Output:**
```
Scenario audit complete.
In scope for this ticket: [scenario 1, 2, 3]
Deferred (tickets created): [P-XXX: scenario 4]
Canary test must cover: [all in-scope scenarios]
```

**Skip Track B for:** Pure UI/layout bugs with no guard logic, no async state, no auth.

---

> **Note:** This is the authoritative surface/scenario audit. `/fix` Phase 1b references this phase as fallback for legacy specs. The canary test in Phase 3 must cover all in-scope items from both tracks.

---

### Phase 3: Write Canary Test

**Goal:** A test that FAILS now (proving the bug exists) and will PASS after the fix.

**Hard gate:** Run the test BEFORE any fix code exists. It MUST fail. If it passes → the test is wrong or the bug isn't what you think. Loop back to Phase 1.

**Test selection:**
- **E2E test** (Playwright) — if bug affects user-visible behavior. File: `e2e/p{N}-reproduce.spec.ts`
- **Unit test** — if bug is in isolated function/logic. File: `src/tests/p{N}-reproduce.test.ts`
- **Integration test** — if bug involves DB/RLS/auth. File: `src/tests/p{N}-reproduce.test.ts`

**Critical rule:** Test the USER-VISIBLE SYMPTOM, not the code mechanism.

```typescript
// WRONG (tests mechanism — passes even when bug exists):
await expect(page.getByText('Test point')).toBeVisible()

// RIGHT (tests symptom — fails when bug exists):
await expect(countLabel).toHaveText('1')
await expect(agreeButton).toHaveAttribute('aria-pressed', 'true')
```

**For two-party bugs:** Test must set up both parties. Use Playwright's `browser.newContext()` for isolation:
```typescript
const senderContext = await browser.newContext({ storageState: senderAuth });
const recipientContext = await browser.newContext({ storageState: recipientAuth });
const senderPage = await senderContext.newPage();
const recipientPage = await recipientContext.newPage();
```

**Steps:**
0. **Before writing the test:** name where the function-under-test is invoked from in the UI (`grep -r "functionName" src/` — paste the result). If the call site is more than one step removed from the user action (e.g., button → handler → submitX), trace the chain explicitly. A test that never reaches the call site cannot prove the bug.
1. Write the test asserting the expected (correct) behavior.
2. Run it: `npx playwright test e2e/p{N}-reproduce.spec.ts` or `npm test -- src/tests/p{N}-reproduce.test.ts`
3. Confirm it FAILS with the right error (symptom-related, not setup-related).
4. If it passes → hypothesis is wrong. Return to Phase 1.
5. If it fails for the wrong reason (setup error, missing fixture) → fix the test setup, not the assertion.

**Output:**
```
Canary test: e2e/p714-reproduce.spec.ts
Result: FAILS (expected)
Failure message: Expected "1" but received "0"
                 ↑ confirms the bug: counts show 0

Test proves the bug exists. Ready for /fix.
```

---

### Phase 4: Write Reproduce Artifact

**Goal:** Stamp the spec so `/fix` knows reproduction is done.

Update the bug spec frontmatter:
```yaml
reproduce_artifact:
  test_file: e2e/p714-reproduce.spec.ts
  root_cause: "LetterReadingFlow passes consumed URL token to hook after auth — hook sets tokenExpired=true"
  confidence: high | medium
  surfaces_in_scope: [profile-points, profile-stories]
  surfaces_deferred: [P720, P721]
  reproduced_at: 2026-04-16
  post_fix_timeout: 20000   # optional (ms). Set when canary uses a tight timeout
                             # to prove staleness/no-update bug. /fix reads this
                             # and updates the assertion timeout before running.
```

> **`post_fix_timeout` rule:** When the canary assertion uses `toBeVisible({ timeout: N })` or `toHaveText()` with a short N (≤ 6000ms) to prove that an update does NOT appear within N ms — set `post_fix_timeout` to `expected_polling_interval + 5000`. This signals to /fix that the timeout is a staleness sentinel, not a UX budget.

Update spec body — add or update `## Root Cause` section with the confirmed hypothesis.

Commit: `chore(p{N}): reproduce — failing test + root cause confirmed`

**Tell user:**
```
Reproduction complete for P{N}.
- Root cause: [one-liner]
- Canary test: [file path] (FAILS — proves bug exists)
- Surfaces in scope: [list]

Next step: /fix p{N}
```

---

## Completion Criteria

Before marking reproduction as done:

- [ ] Root cause hypothesis stated and survived at least one disproof attempt
- [ ] Bug reproduced with observable evidence (screenshot, query output, or error log)
- [ ] Surface audit complete (or explicitly skipped with reason)
- [ ] Canary test written, runs, and FAILS for the right reason
- [ ] `reproduce_artifact` written to spec frontmatter
- [ ] Spec body updated with confirmed root cause
- [ ] Changes committed

**Never skip the failing test.** If you can't write one, you don't understand the bug.

---

## Relationship to Other Skills

```
/create-bug → /reproduce → /fix → /ship
                  ↑ YOU ARE HERE
```

**Before /reproduce:**
- `/create-bug` — files the spec (or `/reproduce` auto-invokes it for inline descriptions)
- `/screenshot-debug` — for initial triage from a screenshot when you don't yet have a spec or reproduction steps
- Debugging protocol (`docs/technical/debugging.md`) — for deep investigation when even hypotheses are hard to form
- `/dd:frame-analyze` — when root cause is truly unclear and needs structured problem framing

**After /reproduce:**
- `/fix` — reads the `reproduce_artifact`, skips its own reproduce phases, goes straight to fixing code
- `/kdd` — capture learnings if root cause revealed patterns

---

## Examples

### Visual Bug (Chrome Extension)
```
/reproduce p714

Phase 1: Hypothesis: consumed token passed to hook after auth.
         Cheapest disproof: check if effectiveToken is undefined when authenticated.
         → Confirmed: token always passed regardless of auth state.

Phase 2: Reproduced via Claude in Chrome on localhost:5173/letter/read?token=abc
         Screenshot: ~/Screenshots/p714-expired-toast.png
         Rate: 100%

Phase 3: Canary test: e2e/p714-reproduce.spec.ts
         FAILS: Expected no error toast, got "This sign-in link has expired"

Phase 4: reproduce_artifact written. Next: /fix p714
```

### Two-Party Flow (Playwright)
```
/reproduce p700

Phase 1: Hypothesis: recipient sees stale sender name because fetch races with render.
         Cheapest disproof: add 500ms delay before assertion.
         → Confirmed: with delay, name resolves correctly.

Phase 2: Reproduced via Playwright with two contexts (sender + recipient).
         sender creates letter → recipient opens → sees "Someone" for 2s.
         Rate: 80% (timing-dependent)

Phase 3: Canary test: e2e/p700-reproduce.spec.ts
         Uses browser.newContext() for each party.
         FAILS: Expected "Alice" but received "Someone"

Phase 4: reproduce_artifact written. Next: /fix p700
```

### Data/RLS Bug (curl + Supabase)
```
/reproduce p707

Phase 1: Hypothesis: RLS blocks unverified users from reading positions.
         Cheapest disproof: query positions table as unverified user.
         → KILLED: unverified user CAN read. Wrong hypothesis.

         Hypothesis 2: Missing join in position count RPC.
         Cheapest disproof: compare RPC output vs direct query.
         → Confirmed: RPC returns 0, direct query returns 3.

Phase 2: Reproduced via curl against test DB.
         Rate: 100%

Phase 3: Canary test: src/tests/p707-reproduce.test.ts
         FAILS: Expected count 3, got 0

Phase 4: reproduce_artifact written. Next: /fix p707
```