--- name: debug-systematic description: Systematic 4-phase debugging methodology for complex, intermittent, or mysterious issues. Use when investigating bugs, race conditions, or unexplained failures. --- # Systematic Debugging Protocol A disciplined, evidence-based approach to debugging that prevents guessing and ensures root cause discovery. ## The 4-Phase Protocol ### Phase 1: REPRODUCE (Establish Ground Truth) **Goal**: Create reliable reproduction steps before ANY investigation. **Actions**: 1. Document exact steps to trigger the bug 2. Record environment specifics (OS, versions, config, memory, network) 3. Determine frequency: Always? Sometimes? Specific conditions? 4. Capture exact error messages, stack traces, screenshots 5. Test on different environments to isolate variables **Key Questions**: - When did it last work correctly? - What changed since then? (code, deps, config, infrastructure) - Is it environment-specific? - Is it data-specific? - Is it timing-specific? **Output**: Clear reproduction steps that reliably trigger the issue. --- ### Phase 2: ISOLATE (Narrow the Scope) **Goal**: Reduce the search space from "entire codebase" to "specific component." **Techniques**: **Binary Search**: 1. Identify two points: working state and broken state 2. Test the midpoint 3. Recurse into the broken half 4. Continue until the change is identified **Git Bisect** (for regressions): ```bash git bisect start git bisect bad HEAD git bisect good # Git will checkout commits for testing # After each test: git bisect good # or git bisect bad # Continue until culprit found ``` **Code Elimination**: - Comment out sections to isolate the problem - Create minimal reproduction case - Strip away everything non-essential **Environment Isolation**: - Test in isolation (unit test the failing path) - Compare working vs broken environments - Use fresh installs to eliminate pollution **Output**: "The bug is in [specific component/function/line range]" --- ### Phase 3: DIAGNOSE (Understand Root Cause) **Goal**: Know exactly WHY the bug occurs, not just WHERE. **Scientific Method**: 1. **Observe**: What exactly is happening? 2. **Hypothesize**: Why might this be happening? 3. **Predict**: If hypothesis is correct, what else would be true? 4. **Test**: Verify predictions with evidence 5. **Iterate**: Refine hypothesis based on results **Logging Strategy**: ```javascript // Add strategic logging at boundaries console.log("[DEBUG] Function entry:", { input, state }); console.log("[DEBUG] After processing:", { result, sideEffects }); console.log("[DEBUG] Function exit:", { returnValue }); ``` **Common Root Causes**: | Symptom | Likely Causes | | -------------------------- | -------------------------------------------------- | | Works locally, fails in CI | Environment differences, timing, resources | | Intermittent failure | Race condition, flaky network, resource contention | | Works then stops working | State mutation, memory leak, cache poisoning | | Wrong data | Type coercion, encoding, timezone, precision | | Silent failure | Swallowed exception, async error, missing await | **Output**: Clear explanation of the root cause with evidence. --- ### Phase 4: FIX & VERIFY (Resolve and Prevent) **Goal**: Fix the issue and prevent regression. **Fix Process**: 1. **Write a failing test** that captures the bug 2. **Implement minimal fix** - change as little as possible 3. **Verify test passes** - confirms fix works 4. **Check for similar patterns** - same bug elsewhere? 5. **Review fix for side effects** - does it break anything? 6. **Document the fix** - why it happened, how to prevent **Verification Checklist**: - [ ] Test passes that specifically catches this bug - [ ] Existing tests still pass - [ ] Manual verification confirms fix - [ ] Fix works in all affected environments - [ ] No new warnings or errors introduced **Prevention**: - Add guards/validation at boundaries - Improve error messages for easier future debugging - Document gotchas for other developers - Consider if architectural change prevents similar bugs --- ## Debugging Anti-Patterns **DO NOT**: - Guess and hope (change things randomly) - Assume you know the problem without evidence - Trust comments/docs over actual code behavior - Debug production with print statements you'll forget to remove - Fix the symptom instead of the root cause - Make multiple changes at once **DO**: - Verify assumptions with evidence - Change one thing at a time - Log actual values, not what you expect - Trust the code over documentation - Take breaks when stuck (fresh eyes help) --- ## Quick Reference ``` 1. REPRODUCE → Can I reliably trigger this? 2. ISOLATE → Where exactly is it failing? 3. DIAGNOSE → Why is it failing? 4. FIX → How do I fix it permanently? ``` --- ## Output Template ```markdown ## Bug Investigation: [Title] ### Reproduction - Steps to reproduce - Environment details - Frequency ### Isolation - Search method used - Scope narrowed to ### Root Cause - What's actually wrong - Why it happens - Evidence ### Fix - Code changes made - Test added ### Prevention - How to prevent similar bugs - Documentation updates ```