---
name: debug
description: Use when encountering any bug, test failure, build break, or unexpected behavior — runs a 4-phase systematic debugging process before proposing any fix. Iron Law: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Produces a `.orchestrator/debug/` artifact the fixer agent must reference.
model: inherit
color: red
tools: Read, Grep, Glob, Bash, Write
---
# Debug Skill
> 4-phase root-cause investigation. Iron Law: no fix without root cause.
## When to use
- Any test failure (unit, integration, E2E)
- Build break or compile error
- Runtime exception in production code
- Performance regression
- Integration issue between modules
- "It worked yesterday" bugs
## When NOT to use
- Feature implementation (use `/plan` or wave-executor)
- Pure refactoring with no broken behavior
- Style/lint fixes with no behavioral impact
## Phase 0: Bootstrap Gate
Read `skills/_shared/bootstrap-gate.md` and execute the gate check. If the gate is CLOSED, invoke `skills/bootstrap/SKILL.md` and wait for completion before proceeding. If the gate is OPEN, continue to Phase 1.
Do NOT proceed past Phase 0 if GATE_CLOSED. There is no bypass. Refer to `skills/_shared/bootstrap-gate.md` for the full HARD-GATE constraints.
## The Iron Law
> **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
No matter how obvious the fix looks, complete Phase 1 (Root Cause Investigation) and produce the artifact BEFORE writing any fix code. This is non-negotiable.
The "2-hour obviously a typo" case has been the root cause of every multi-day debugging session in every project's history. Treat the obvious fix as a hypothesis to verify, not a conclusion to act on.
## Phase 1: Root Cause Investigation
### 1.1 Read the error carefully
Quote the exact error message, full stack trace, and exit code in the artifact. Do not paraphrase. Paraphrasing introduces your assumptions before investigation begins.
### 1.2 Reproduce reliably
Document the exact command and environment that triggers it. If reproduction is flaky, that itself is a data point — write down the failure rate.
```bash
# minimum reproduction attempt
# env: Node version, OS, env vars relevant to the failure
```
### 1.3 Check recent changes
```bash
git log --oneline -20
git diff HEAD~5..HEAD --
```
Identify the smallest set of commits that could have introduced this. Do not assume the most recent commit is the culprit without checking.
### 1.4 Instrument every component boundary
Add temporary logging at each input/output point along the failing path's modules. Capture actual values, not assumptions about what they should be. Run with instrumentation in place and record results.
### 1.5 Trace data flow backward
Start from the failure point and walk upstream. At each step, verify what you think is true with an actual Read or Bash call. "I assume X is Y" is not data.
### Phase 1 artifact
Write to `.orchestrator/debug/-.md` (sequence = 1, 2, 3 within session).
**Artifact MUST be written before proceeding to Phase 2.** See "Artifact contract" section below.
## Phase 2: Pattern Identification
After the Phase 1 artifact is written:
- Is this a recurrence of a previously fixed bug? Search `git log --all --grep=""`.
- Is there a class of bug this exemplifies (race condition, off-by-one, null check, encoding, async ordering)?
- Is there a missing test that would have caught this? Identify what the test would look like.
- Search the codebase for similar patterns:
```bash
grep -rn "" --include="*.ts" --include="*.mjs"
git grep ""
```
Record findings in the artifact's Phase 2 section.
**Optional operator-side companion:** `/tmux-layout --layout debug` renders a 4-pane layout (scratch shell + `npm test --watch` + `tail -F .orchestrator/debug/*.md` + `watch -n 2 'git diff --stat'`) so you can observe hypothesis tests, debug artifacts, and diff progression peripherally. Pure observability — the `/debug` skill works identically with or without the layout (per ADR-0007 + GitLab #562).
## Phase 3: Impact Analysis
- What else does the root-caused code touch?
```bash
grep -rn ""
```
- What other code paths could share the same root cause?
- Will the fix break anything else? Identify callers, dependents, transitive effects.
- List the affected files explicitly in the artifact.
## Phase 4: Solution
Only after Phases 1-3 are complete and the artifact is written:
1. Propose the minimal fix — the smallest code change that addresses the confirmed root cause
2. Identify test cases that would have caught this; write them alongside or before the fix if appropriate
3. Document the fix decision in the artifact under a "Resolution" section
4. Verify the fix: run the test command from Session Config (`test-command`) and confirm the failure no longer reproduces
Do not gold-plate the fix. Scope creep during a bugfix introduces new bugs.
## Artifact contract
`.orchestrator/debug/-.md`:
```markdown
# Debug session:
Created:
Session:
## Phase 1 — Root Cause
### Error
[exact error message + stack trace + exit code — verbatim]
### Reproduction
[exact command]
[environment: Node version, platform, relevant env vars]
[frequency: always | intermittent (~N/10)]
### Suspect commits
- — reason
### Instrumentation data
[per-boundary observations with actual values]
### Hypothesized root cause
[ONE sentence] · Confidence: [low|medium|high]
## Phase 2 — Pattern
[recurrence? class? missing test? similar code paths?]
## Phase 3 — Impact
[affected files, callers, dependents]
## Phase 4 — Solution
[proposed fix + test cases identified]
## Resolution
[commit SHA, files changed, tests added — filled in after fix lands]
```
**Rules for the artifact:**
- "Hypothesized root cause" is always ONE sentence. Multiple hypotheses signal you have none — pick the most evidence-backed one and lower confidence if uncertain.
- Do not write fix code in Phase 1-3 sections. Those sections are investigation only.
- The artifact is the deliverable — not an optional note.
## Integration with wave-executor
When `wave-executor` classifies a task as `bugfix`, it routes through this skill. `code-implementer` agents handling bugfix-classified tasks must reference an existing Phase-1 artifact path in their task description before editing any production code. Wire-up happens in wave-executor's routing layer (Wave 3 P1 cross-reference).
## Anti-Patterns
- **Skipping Phase 1 because "it's obviously a typo"** — see Iron Law. The obvious hypothesis goes in Phase 1 as a hypothesis to verify, not a fix to apply.
- **Multiple hypotheses in "Hypothesized root cause"** — pick ONE. Lower confidence rating if uncertain. "Either X or Y" means you don't know yet.
- **Writing fix code in Phase 1** — fixes belong in Phase 4. Investigation and implementation are separate acts.
- **Skipping the artifact write** — the artifact IS the work product. No artifact = no investigation.
- **Fixing the symptom** — the error location is the symptom; the root cause may be several frames upstream. Trace backward.
- **Assuming the most recent commit is the cause** — check `git log -20` and the diff before assuming recency.
## See Also
- `skills/discovery/SKILL.md` — discovery surfaces problems; debug investigates them
- `.claude/rules/verification-before-completion.md` — once fixed, verify fresh (issue #38)
- `agents/code-implementer.md` — references this skill for bugfix-classified tasks