--- name: debugging description: Four-phase debugging framework that ensures root cause identification before fixes. Use when encountering bugs, test failures, unexpected behavior, or when previous fix attempts failed. Enforces investigate-first discipline ('debug this', 'fix this error', 'test is failing', 'not working'). allowed-tools: '*' --- # Systematic Debugger Find root cause before fixing. Symptom fixes are failure. **Iron Law:** NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST ## When to Use Answer IN ORDER. Stop at first match: 1. Bug, error, or test failure? → Use this skill 2. Unexpected behavior? → Use this skill 3. Previous fix didn't work? → Use this skill (especially important) 4. Performance problem? → Use this skill 5. None of above? → Skip this skill **Use especially when:** - Under time pressure (emergencies make guessing tempting) - "Quick fix" seems obvious (red flag) - Already tried 1+ fixes that didn't work ## The Four Phases Complete each phase before proceeding. ### Phase 1: Root Cause Investigation **BEFORE attempting ANY fix:** **1. Read Error Messages Completely** ```text Don't skip past errors. They often contain the exact solution. - Full stack trace (note line numbers, file paths) - Error codes and messages - Warnings that preceded the error ``` **2. Reproduce Consistently** | Can reproduce? | Action | | --------------- | ---------------------------------------------------- | | Yes, every time | Proceed to step 3 | | Sometimes | Gather more data - when does it happen vs not? | | Never | Cannot debug what you cannot reproduce - gather logs | **3. Check Recent Changes** ```bash git diff HEAD~5 # Recent code changes git log --oneline -10 # Recent commits ``` What changed that could cause this? Dependencies? Config? Environment? **4. Trace Data Flow (Root Cause Tracing)** When error is deep in call stack: ```text Symptom: Error at line 50 in utils.js ↑ Called by handler.js:120 ↑ Called by router.js:45 ↑ Called by app.js:10 ← ROOT CAUSE: bad input here ``` **Technique:** 1. Find where error occurs (symptom) 2. Ask: "What called this with bad data?" 3. Trace up until you find the SOURCE 4. Fix at source, not at symptom **5. Multi-Component Systems** When system has multiple layers (API → service → database): ```bash # Log at EACH boundary before proposing fixes echo "=== Layer 1 (API): request=$REQUEST ===" echo "=== Layer 2 (Service): input=$INPUT ===" echo "=== Layer 3 (DB): query=$QUERY ===" ``` Run once to find WHERE it breaks. Then investigate that layer. ### Phase 2: Pattern Analysis **1. Find Working Examples** Locate similar working code in same codebase. What works that's similar? **2. Identify Differences** | Working code | Broken code | Could this matter? | | ---------------- | -------------- | ------------------ | | Uses async/await | Uses callbacks | Yes - timing | | Validates input | No validation | Yes - bad data | List ALL differences. Don't assume "that can't matter." ### Phase 3: Hypothesis Testing **1. Form Single Hypothesis** Write it down: "I think X is the root cause because Y" Be specific: - ❌ "Something's wrong with the database" - ✅ "Connection pool exhausted because connections aren't released in error path" **2. Test Minimally** | Rule | Why | | ------------------------ | ---------------------- | | ONE change at a time | Isolate what works | | Smallest possible change | Avoid side effects | | Don't bundle fixes | Can't tell what helped | **3. Evaluate Result** | Result | Action | | --------------- | --------------------------------------- | | Fixed | Phase 4 (verify) | | Not fixed | NEW hypothesis (return to 3.1) | | Partially fixed | Found one issue, continue investigating | ### Phase 4: Implementation **1. Create Failing Test** Before fixing, write test that fails due to the bug: ```javascript it('handles empty input without crashing', () => { // This test should FAIL before fix, PASS after expect(() => processData('')).not.toThrow(); }); ``` **2. Implement Fix** - Address ROOT CAUSE identified in Phase 1 - ONE change - No "while I'm here" improvements **3. Verify** - [ ] New test passes - [ ] Existing tests still pass - [ ] Issue actually resolved (not just test passing) **4. If Fix Doesn't Work** | Fix attempts | Action | | ------------ | ---------------------------------------- | | 1-2 | Return to Phase 1 with new information | | 3+ | STOP - Question architecture (see below) | **5. After 3+ Failed Fixes: Question Architecture** Pattern indicating architectural problem: - Each fix reveals new coupling/shared state - Fixes require "massive refactoring" - Each fix creates new symptoms elsewhere **STOP and ask:** - Is this pattern fundamentally sound? - Should we refactor vs. continue patching? - Discuss with user before more fix attempts ## Red Flags - STOP Immediately If you catch yourself thinking: | Thought | Reality | | ---------------------------------------------- | --------------------------------- | | "Quick fix for now, investigate later" | Investigate NOW or you never will | | "Just try changing X" | That's guessing, not debugging | | "I'll add multiple fixes and test" | Can't isolate what worked | | "I don't fully understand but this might work" | You need to understand first | | "One more fix attempt" (after 2+ failures) | 3+ failures = wrong approach | **ALL mean: STOP. Return to Phase 1.** ## Quick Reference | Phase | Key Question | Success Criteria | | ----------------- | ------------------------------------- | ---------------------------------- | | 1. Root Cause | "WHY is this happening?" | Understand cause, not just symptom | | 2. Pattern | "What's different from working code?" | Identified key differences | | 3. Hypothesis | "Is my theory correct?" | Confirmed or formed new theory | | 4. Implementation | "Does the fix work?" | Test passes, issue resolved |