--- name: diagnose type: reference description: "Diagnostic pipeline for complex/intermittent bugs. Uses diagnostics roles for Investigation, Verification, and Solution before Lead Programmer handoff. Use ONLY for non-obvious failures (root cause unclear, reproduction unstable, fixes reverted). NOT for trivial bugs with known cause — fix them directly." paths: [] effort: 4 allowed-tools: Read, Glob, Grep, Write, Edit, Bash, Task user-invocable: true when_to_use: "When a bug is reproducible but cause is unknown, when a 'fix' has been reverted 2+ times, or when a symptom appears in unfamiliar code. Do NOT use for typos, obvious nulls, or one-line logic errors." --- # Skill: /diagnose — Complex Bug Diagnostic Pipeline ## When to invoke (and when NOT to) ### Use `/diagnose` when: - Bug reproduces but **root cause is unclear** after one read-pass of the failing code - Previous fix attempts have been **reverted ≥ 2 times** (symptoms return) - Failure is **intermittent** (flaky test, race condition, timing-dependent) - Failure occurs in **unfamiliar code** (agent has no prior context) - User has explicitly requested `/diagnose` or "deep investigation" - Circuit Breaker (Rule 14) tripped on the specialist agent that normally handles this domain ### Do NOT use `/diagnose` when: - Cause is obvious (null ref, typo, missing import, incorrect import path) - Fix is < 10 LOC and has a clear success check - Bug is in code you just wrote this session (read-pass + local reasoning is faster) - User wants a quick patch and has accepted the tradeoff ## Pipeline overview ``` Feedback Loop -> Investigation -> Verification -> Solution -> Lead Programmer (signal) (hypothesis) (devil's adv.) (tradeoffs) (assign + exec) repro/check command investigation.json verification.json solution.json implementation (fast deterministic (root_cause, (status: confirmed | (3 options: (delegates to pass/fail signal) evidence[], refuted | inconclusive, Quick/Strategic/ backend-developer, confidence) reproduction_steps) Future-Proof) qa-engineer, etc.) ``` Each stage produces a **required artifact** saved to `.investigations//` and a **handoff contract** (per Rule 16) to the next agent. ## Stage 0 — Feedback Loop **Goal:** Build the fastest reliable pass/fail signal for the exact symptom before explaining the cause. The feedback loop is the highest-leverage part of diagnosis. Do not proceed to root-cause analysis until there is a loop that can reproduce the user's symptom or a documented reason why no loop is possible. Try these in roughly this order: 1. Failing test at the seam that reaches the bug. 2. CLI or script invocation with fixture input and expected output. 3. Curl/HTTP request against a running service. 4. Headless browser script with DOM, console, or network assertions. 5. Replay of a captured payload, event, HAR, log, or trace. 6. Throwaway harness that calls the affected code path in isolation. 7. Property/fuzz loop for broad wrong-output symptoms. 8. Bisection or differential loop between known-good and known-bad states. Improve the loop before investigating: - Faster: remove unrelated setup and narrow the command. - Sharper: assert the specific symptom, not merely "does not crash". - More deterministic: pin time, seed randomness, isolate filesystem/network, or raise intermittent reproduction frequency with stress runs. Do not treat Stage 0 as warm-up. It is the main leverage point. A bad loop produces fake certainty, weak hypotheses, and symptom-only fixes. If no credible loop can be built, stop and report what was tried. Ask for access to the reproducing environment, a captured artifact, or permission to add temporary instrumentation. Do not proceed on a vibe. ## Stage 1 — Investigation **Agent:** `diagnostics` (Investigation role) **Goal:** Produce ranked falsifiable root-cause hypotheses backed by empirical evidence. ### Inputs - Symptom description (from user or TODO.md bug ID) - Reproduction steps (or "cannot reproduce" + environment) - Relevant log lines, stack traces, error IDs - Feedback loop command/check from Stage 0, or a documented reason no loop can currently be built ### Required output — `investigation.json` ```json { "task_id": "BUG-417", "symptom": "POST /api/orders returns 500 when cart has ≥10 items", "reproduction": { "steps": ["...", "..."], "frequency": "100% | intermittent (~30%) | once", "environment": "staging-eu-west-1" }, "feedback_loop": { "command": "npm test -- checkout.e2e.test.ts", "signal": "Fails with timeout before hydration marker appears", "reliable": true }, "ranked_hypotheses": [ { "rank": 1, "cause": "Test clicks #submit before React hydration completes on slow CI runners", "prediction": "Waiting for the hydration marker will make the failure disappear without adding a fixed sleep" }, { "rank": 2, "cause": "Submit button selector matches a hidden stale node", "prediction": "Asserting the visible button count will expose multiple matching nodes" } ], "hypothesis": { "root_cause": "OrderService.calculateTotal() N+1 query exhausts pool when cart.items.length > 9", "confidence": "high | medium | low", "falsifiable_by": "Run with pool_size=50; if error disappears, cause confirmed" }, "evidence": [ {"type": "log", "ref": ".investigations/BUG-417/pg-pool-exhausted.log", "summary": "..."}, {"type": "code", "ref": "src/services/order.service.ts:142", "summary": "Unbounded .map+await"} ], "unknowns": ["Why only eu-west-1?", "When did this start?"], "next_agent": "diagnostics", "next_stage": "verification" } ``` ### Quality gate (Lead Programmer rejects if): - `feedback_loop` is missing and no blocked-loop explanation exists - `feedback_loop.signal` is vague, broad, or does not isolate the user's symptom - `ranked_hypotheses` has fewer than 3 items unless the evidence makes a single cause unavoidable - Any hypothesis lacks a falsifiable prediction - `hypothesis.falsifiable_by` is vague ("check if it works") - `evidence` has fewer than 2 items (unverifiable) - `unknowns` is empty but `confidence: low` (contradictory) ## Stage 2 — Verification **Agent:** `diagnostics` (Verification role) **Goal:** Attempt to **refute** the hypothesis. Only confirmed if refutation fails. ### Inputs - `investigation.json` (from Stage 1) - Access to staging/test environment - The Stage 0 feedback loop, rerun before and after each meaningful probe Verification is invalid if it does not go back through the Stage 0 loop. A fix-looking local observation that bypasses the loop is not confirmation. ### Required output — `verification.json` ```json { "task_id": "BUG-417", "status": "confirmed | refuted | inconclusive", "triangulation": [ {"method": "reproduce_with_fix_applied", "result": "Error gone with pool_size=50"}, {"method": "reproduce_without_fix", "result": "Error returns at 10 items"}, {"method": "adjacent_test_case", "result": "9 items = OK, 10 items = fail → threshold confirmed"} ], "counter_hypotheses_ruled_out": [ "DB slowness (ruled out: p99 < 50ms)", "Network flaps (ruled out: no packet loss in window)" ], "confidence": "high", "recommendation": "Proceed to Solution stage — cause confirmed necessary AND sufficient" } ``` ### Decision flow | `status` | Next action | | -------------- | -------------------------------------------------------------------- | | `confirmed` | Proceed to Solution stage (same `diagnostics` agent) | | `refuted` | Return to Investigation stage with counter-evidence. Max 2 round-trips. | | `inconclusive` | STOP. Surface to user with all evidence. Do NOT proceed to Solution stage. | ## Stage 3 — Solution **Agent:** `diagnostics` (Solution role) **Goal:** Generate 3 solution options with explicit tradeoffs; never pick silently. ### Required output — `solution.json` ```json { "task_id": "BUG-417", "options": [ { "name": "Quick", "description": "Increase pool_size from 20 → 50 in db.ts", "scope_loc": 1, "risk_tier": "Low", "tradeoff": "Masks root cause; higher RAM; future growth hits same wall" }, { "name": "Strategic", "description": "Rewrite calculateTotal() to batch via IN-clause", "scope_loc": 40, "risk_tier": "Medium", "tradeoff": "Fixes N+1 permanently; requires regression test on discount logic" }, { "name": "Future-Proof", "description": "Introduce DataLoader pattern across service layer", "scope_loc": 300, "risk_tier": "High", "tradeoff": "Eliminates entire class of N+1 bugs; 2-3 day refactor; needs ADR" } ], "recommendation": "Strategic — best risk/value ratio. Quick only if release is < 24h." } ``` ### Quality gate - All 3 options must have distinct scope (not three flavors of the same fix) - `tradeoff` must state what is **sacrificed**, not just "takes longer" - `recommendation` must cite a criterion (time budget, risk tier, blast radius) - The selected option's acceptance criteria must include rerunning the original feedback loop from Stage 0 and a regression test when a correct seam exists ## Stage 4 — Finalization **Agent:** `lead-programmer` **Goal:** Select option, assign specialist, track execution. ### Actions 1. Review `solution.json` with user (if `risk_tier: High` or `scope_loc > 100`) 2. Select option → write selection to `.investigations//decision.md` 3. Create A2A handoff contract (Rule 16) via `/handoff`: - `lead-programmer → backend-developer` (or `frontend-developer`, `data-engineer`) - Acceptance criteria derived from `investigation.hypothesis.falsifiable_by` 4. Append ledger entry (Rule 15) to `production/traces/decision_ledger.jsonl`: ```jsonl {"ts":"2026-04-17T14:22:00Z","session":"main","agent_id":"lead-programmer","task_id":"BUG-417","request":"/diagnose BUG-417","reasoning":"Verified N+1 as necessary+sufficient; selected Strategic per Solution-stage recommendation","choice":"Strategic refactor of calculateTotal()","outcome":"pass","risk_tier":"Medium","duration_s":1840} ``` ## Artifact storage All intermediate reports MUST be saved to `.investigations//`: ``` .investigations/ └── BUG-417/ ├── investigation.json # Stage 1 output ├── verification.json # Stage 2 output ├── solution.json # Stage 3 output ├── decision.md # Stage 4 — human-readable rationale ├── evidence/ # logs, screenshots, traces referenced in reports └── handoffs/ # A2A contracts (copied from .tasks/handoffs/) ``` **Retention:** Keep until bug is closed + 30 days, then archive to `.investigations/archive/`. ## Escalation paths | Trigger | Escalate to | | ------------------------------------------------- | ---------------------------------------------- | | `diagnostics` circuit OPEN in Investigation (Rule 14) | Surface raw symptom to user; skip pipeline, request manual triage | | Verification stage returns `inconclusive` twice | Surface to user; request manual reproduction | | Solution stage cannot produce 3 distinct options | Escalate to `technical-director` — scope unclear | | User rejects all 3 options | Return to Investigation stage; hypothesis likely wrong | | Bug reoccurs after fix merges | Restart `/diagnose` with new `task_id`; link prior investigation in `unknowns[]` | ## Integration with coordination rules - **Rule 6 (Layered Recovery):** If any stage fails once, retry with fresh context before escalating - **Rule 14 (Circuit Breaker):** Read `production/session-state/circuit-state.json` before invoking each agent - **Rule 15 (Decision Ledger):** Every stage transition logs to `decision_ledger.jsonl` - **Rule 16 (A2A Handoff):** Stage 1→2, 2→3, 3→4 each require a handoff contract in `.tasks/handoffs/` ## Concrete example — "Flaky checkout test" **Symptom:** `checkout.e2e.test.ts` fails ~20% of CI runs; local always passes. ``` /diagnose flaky-checkout-e2e ↓ Stage 1 → Investigation (diagnostics) hypothesis: "Test clicks #submit before React hydration completes on slow CI runners" evidence: [CI traces showing hydration marker missing, local has DevTools overhead masking timing] confidence: medium (cannot reproduce locally) ↓ Stage 2 → Verification (diagnostics) triangulation: - Inject 500ms delay before click → test passes 50/50 runs ✓ - Remove delay → fails 9/50 ✗ - Check for hydration marker instead of fixed delay → passes 50/50 ✓ status: confirmed ↓ Stage 3 → Solution (diagnostics) Quick: add sleep(500ms) [masks issue] Strategic: waitFor hydration marker [addresses root cause] Future-Proof: custom test util that always waits for RSC boundary [reusable] recommendation: Strategic ↓ Stage 4 → lead-programmer selects Strategic; assigns to qa-engineer handoff contract: "qa-engineer updates checkout.e2e.test.ts to use waitFor(hydrationMarker)" acceptance_criteria: ["10 CI runs in a row pass", "no sleep() in test"] ledger entry written ``` ## Common pitfalls | Pitfall | Fix | | --------------------------------------------- | --------------------------------------------------------------------------- | | Hypothesizing before building a loop | Return to Stage 0. A diagnosis without a signal is speculation. | | Keeping a slow or blurry loop | Rework Stage 0 until the signal is fast, specific, and trustworthy. | | Skipping Verification ("cause is obvious") | Verifier exists specifically to catch "obvious but wrong" hypotheses | | Investigator produces only 1 hypothesis | Reject unless evidence makes alternatives impossible; require ranked hypotheses and predictions | | Solver picks Quick fix without naming tradeoff| Reject — all 3 options required for explicit tradeoff comparison | | No artifact written to `.investigations/` | Reject — verbal diagnosis is not auditable | | Running `/diagnose` in parallel on same bug | Only one active investigation per `task_id`; concurrent runs create race | ## Output to user After Stage 4 completes, summarize in ≤ 5 lines: ``` /diagnose BUG-417 complete. Root cause: Unbounded .map+await in OrderService.calculateTotal() exhausts pg pool. Selected: Strategic (batch via IN-clause, ~40 LOC, Medium risk). Assigned: @backend-developer; acceptance = load test with 50 items passes. Artifacts: .investigations/BUG-417/ ```