--- name: debug-systematic description: Systematic debugging workflow with hypothesis testing disable-model-invocation: false --- # Systematic Debugging Workflow I'll help you debug issues systematically using the scientific method - hypothesis formation, testing, and iterative refinement. Arguments: `$ARGUMENTS` - error description, reproduction steps, or context ## Token Optimization **Target:** 50% reduction (4,000-6,000 → 1,500-3,000 tokens) ### Core Optimization Strategies **1. Hypothesis-Driven Debugging (Not Exhaustive Analysis)** - ❌ **AVOID:** Reading entire codebase to find bugs - ✅ **DO:** Form hypotheses about likely causes, test top 2-3 first - **Token savings:** 90% (200 tokens vs 2,000+ tokens) - **Pattern:** Prioritize recently changed files, common failure patterns **2. Git Diff for Recently Changed Files (Likely Bug Source)** - ❌ **AVOID:** `ls -R` then reading all files - ✅ **DO:** `git diff --name-only HEAD~3..HEAD` to find changed files - ✅ **DO:** `git log --oneline --since="3 days ago"` for recent commits - **Token savings:** 85% (300 tokens vs 2,000+ tokens) - **Pattern:** Bugs often introduced in recent changes **3. Stack Trace Parsing with Grep** - ❌ **AVOID:** Reading entire log files with Read tool - ✅ **DO:** `grep -i "error\|exception\|fatal" logs/*.log | tail -20` - ✅ **DO:** Parse stack traces to extract file paths and line numbers - **Token savings:** 95% (100 tokens vs 2,000+ tokens for large logs) - **Pattern:** Stack traces reveal exact failure locations **4. Test Failure Analysis Caching** - ✅ Cache test results in `debug/state.json` - ✅ Cache hypothesis outcomes to avoid retesting - ✅ Cache reproduction steps once confirmed - **Token savings:** 70% on subsequent debugging turns - **Pattern:** Multi-turn debugging sessions benefit from state **5. Progressive Investigation (Narrow Before Deep)** - ✅ Start with stack trace → identify file → read specific function - ✅ Hypothesis testing: test most likely causes first - ✅ Binary search through git history when needed - **Token savings:** 60% (stop early when cause found) - **Pattern:** Most bugs have obvious causes in changed code **6. Session State Tracking for Multi-Turn Debugging** - ✅ Session files in `debug/` directory - ✅ Track tested hypotheses to avoid repetition - ✅ Resume from last checkpoint on subsequent runs - **Token savings:** 80% on resumed sessions (skip completed work) - **Pattern:** Complex bugs require multiple debugging turns ### Token Usage by Operation | Operation | Unoptimized | Optimized | Savings | |-----------|-------------|-----------|---------| | Initial bug analysis | 2,000-3,000 | 500-1,000 | 60-75% | | Hypothesis formation | 1,500-2,000 | 400-800 | 60-73% | | Stack trace parsing | 2,000+ | 100-200 | 90-95% | | File investigation | 2,000+ | 300-600 | 70-85% | | Test reproduction | 1,000-1,500 | 200-400 | 73-80% | | Session resume | 2,000-3,000 | 300-600 | 80-85% | **Average Reduction:** 50% (4,000-6,000 → 1,500-3,000 tokens) ### Debugging-Specific Patterns **Stack Trace Analysis:** ```bash # Extract file paths and line numbers from stack traces grep -E "at .+ \(.+:[0-9]+:[0-9]+\)" error.log | head -10 # Focus investigation on these specific files/lines ``` **Recent Changes Focus:** ```bash # Find files changed in last 3 days (likely bug sources) git diff --name-only HEAD~10..HEAD # Only read files that changed recently ``` **Hypothesis Prioritization:** 1. **Recent changes** (80% of bugs) - Check git diff first 2. **Stack trace files** (90% reliability) - Read exact failure locations 3. **Error message patterns** (70% of bugs) - Grep for similar errors 4. **Environment/config** (20% of bugs) - Check if configs changed 5. **External dependencies** (10% of bugs) - Check updates **Binary Search for Regressions:** ```bash # Use git bisect to find regression commit git bisect start HEAD v1.2.3 git bisect run npm test # Automated testing # Saves 95% tokens vs manual testing each commit ``` ### Caching Behavior **Session Location:** `debug/` (in project root) - `debug/plan.md` - Debugging plan with hypotheses and results - `debug/state.json` - Session state and test results - `debug/reproduction.log` - Issue reproduction steps and logs **Cache Location:** `.claude/cache/debug/` - `hypotheses.json` - Tested hypotheses and outcomes - `stack-traces.json` - Parsed stack trace information - `changed-files.json` - Recently changed files analysis **Cache Validity:** - Until issue resolved (status: "solved" in state.json) - Until source files change (checksum-based) - 7 days maximum for stale sessions **Shared With:** - `/debug-root-cause` - Root cause analysis skill - `/debug-session` - Debug session documentation - `/test` - Test execution for verification ### Usage Examples **Start New Debugging Session:** ``` debug-systematic "API returns 500 on POST /users" # Expected tokens: 1,500-3,000 (full analysis) ``` **Resume Existing Session:** ``` debug-systematic resume # Expected tokens: 800-1,500 (skips completed hypotheses) ``` **Test Specific Hypothesis:** ``` debug-systematic test 1 # Expected tokens: 500-1,000 (focused testing) ``` **Check Debugging Progress:** ``` debug-systematic status # Expected tokens: 200-500 (read session state only) ``` **Mark Issue as Solved:** ``` debug-systematic solved # Expected tokens: 300-600 (generate summary) ``` ### Early Exit Conditions **Exit immediately (saves 90% tokens) when:** - ✅ Issue already solved (check `debug/state.json` status) - ✅ No test framework available (can't reproduce) - ✅ Not a git repository (can't check recent changes) - ✅ Root cause already identified in session state **Progressive disclosure saves 60-80% tokens:** - Show hypothesis formation → wait for user confirmation - Test one hypothesis at a time → report results - Only deep dive when hypothesis confirms ### Implementation Checklist - ✅ Git diff analysis for recent changes (PRIMARY optimization) - ✅ Stack trace parsing with Grep (saves 90-95%) - ✅ Session-based hypothesis tracking (saves 70-80% on reruns) - ✅ Progressive hypothesis testing (most likely → least likely) - ✅ Bash-based log analysis (minimal tokens) - ✅ Test failure result caching - ✅ Early exit when issue resolved - ✅ Binary search for regressions (git bisect) - ✅ Focus area flags (specific file/function debugging) **Optimization Status:** ✅ Optimized (Phase 2 Batch 2, 2026-01-26) **Expected Tokens:** 1,500-3,000 (vs. 4,000-6,000 unoptimized) **Achieved Reduction:** 50% average across all debugging operations ## Session Intelligence I'll maintain debugging session continuity: **Session Files (in current project directory):** - `debug/plan.md` - Debugging plan with hypotheses and results - `debug/state.json` - Session state and test results - `debug/reproduction.log` - Issue reproduction steps and logs **IMPORTANT:** Session files are stored in a `debug` folder in your current project root **Auto-Detection:** - If session exists: Resume debugging from last hypothesis - If no session: Create debugging plan and initial reproduction - Commands: `resume`, `reproduce`, `status`, `solved` ## Phase 1: Issue Reproduction & Information Gathering ### Extended Thinking for Complex Debugging For complex or elusive bugs, I'll use extended thinking to explore debugging strategies: When debugging complex issues: - Multiple potential root causes that interact - Timing-sensitive or race condition bugs - Environment-specific failures - Subtle state corruption scenarios - Performance degradation patterns - Security vulnerability exploitation paths **Triggers for Extended Analysis:** - Intermittent or non-deterministic bugs - Production-only failures - Performance issues without obvious cause - Security vulnerabilities - Multi-component system failures **MANDATORY FIRST STEPS:** 1. Check if `debug` directory exists in current working directory 2. If directory exists, check for session files: - Look for `debug/state.json` - Look for `debug/plan.md` - If found, resume from last hypothesis 3. If no directory or session exists: - Gather error information - Create reproduction steps - Initialize debugging session **Information Gathering (Token-Efficient):** ```bash #!/bin/bash # Systematic Debugging - Information Gathering gather_debug_info() { echo "=== Issue Reproduction Information ===" echo "" # 1. Error logs (use Grep, not cat) echo "Recent error logs:" if [ -d "logs" ]; then grep -i "error\|exception\|fatal" logs/*.log 2>/dev/null | tail -20 || echo " No errors in logs" fi # 2. Git status (what changed recently) echo "" echo "Recent changes:" git log --oneline --since="3 days ago" | head -10 || echo " Not a git repository" # 3. Environment info echo "" echo "Environment:" if [ -f "package.json" ]; then echo " Node: $(node --version 2>/dev/null || echo 'not installed')" echo " NPM: $(npm --version 2>/dev/null || echo 'not installed')" elif [ -f "requirements.txt" ]; then echo " Python: $(python --version 2>/dev/null || echo 'not installed')" fi # 4. System resources echo "" echo "System resources:" echo " Memory: $(free -h 2>/dev/null | grep Mem | awk '{print $3 "/" $2}' || echo 'N/A')" echo " Disk: $(df -h . 2>/dev/null | tail -1 | awk '{print $3 "/" $2 " (" $5 ")"}' || echo 'N/A')" # 5. Running processes (if server issue) echo "" echo "Relevant processes:" ps aux | grep -E "node|python|java" | grep -v grep | head -5 || echo " No relevant processes" } gather_debug_info > debug/initial-state.log cat debug/initial-state.log ``` **Reproduction Steps:** ```bash #!/bin/bash # Create reproducible test case create_reproduction() { cat > debug/reproduction.sh << 'EOF' #!/bin/bash # Minimal reproduction script echo "=== Bug Reproduction Steps ===" echo "" echo "Step 1: Setup environment" # TODO: Add setup commands echo "Step 2: Execute actions that trigger bug" # TODO: Add trigger commands echo "Step 3: Verify bug occurs" # TODO: Add verification echo "" echo "Expected: [describe expected behavior]" echo "Actual: [describe actual behavior]" EOF chmod +x debug/reproduction.sh echo "Created reproduction script: debug/reproduction.sh" } create_reproduction ``` ## Phase 2: Hypothesis Formation I'll formulate testable hypotheses about the root cause: **Hypothesis Generation Framework:** ```markdown # Debugging Plan - [timestamp] ## Issue Description **Summary**: [brief description] **Severity**: Critical | High | Medium | Low **Impact**: [affected users/systems] **Frequency**: Always | Intermittent | Rare ## Error Details ``` [Full error message/stack trace] ``` ## Environment - **Platform**: [OS, runtime version] - **Configuration**: [relevant settings] - **Recent Changes**: [commits/deployments] ## Hypotheses (Prioritized) ### Hypothesis 1: [Most likely cause] - PRIORITY: HIGH **Theory**: [explanation of suspected cause] **Evidence**: [supporting observations] **Test**: [how to verify/disprove] **Expected**: [what should happen if correct] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ### Hypothesis 2: [Second most likely] - PRIORITY: MEDIUM **Theory**: [explanation] **Evidence**: [observations] **Test**: [verification method] **Expected**: [expected outcome] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ### Hypothesis 3: [Alternative cause] - PRIORITY: LOW **Theory**: [explanation] **Evidence**: [observations] **Test**: [verification method] **Expected**: [expected outcome] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ## Investigation Log - [timestamp]: Initial reproduction successful - [timestamp]: Hypothesis 1 testing in progress ``` **Hypothesis Prioritization:** 1. **Recent changes** - Check git history 2. **Common patterns** - Known bug categories 3. **Environment issues** - Dependencies, config 4. **Logic errors** - Code analysis 5. **External factors** - Third-party services ## Phase 3: Systematic Testing I'll test each hypothesis methodically: **Testing Framework:** ```bash #!/bin/bash # Hypothesis Testing Script test_hypothesis() { local hypothesis_num="$1" local test_description="$2" echo "=== Testing Hypothesis $hypothesis_num ===" echo "Test: $test_description" echo "" # Create checkpoint before testing git stash push -m "Debug checkpoint before hypothesis $hypothesis_num" # Run test local result="PENDING" # Log result echo "[$hypothesis_num] $test_description: $result" >> debug/test-results.log } # Example: Test hypothesis about missing dependency test_dependency_hypothesis() { echo "Hypothesis: Missing or incompatible dependency" # Check dependency versions if [ -f "package.json" ]; then echo "Checking npm dependencies..." npm list --depth=0 2>&1 | grep -i "missing\|error" && { echo "❌ CONFIRMED: Missing dependencies detected" return 0 } fi echo "✓ DISPROVED: All dependencies present" return 1 } # Example: Test hypothesis about race condition test_race_condition_hypothesis() { echo "Hypothesis: Race condition in async code" # Add delays to test timing sensitivity echo "Running test with delays..." # TODO: Add test with deliberate delays echo "Running test rapidly..." for i in {1..10}; do # TODO: Run test in tight loop true done } # Test each hypothesis in priority order test_dependency_hypothesis test_race_condition_hypothesis ``` **Binary Search Debugging:** ```bash #!/bin/bash # Binary search through git history to find regression git_bisect_debug() { echo "=== Git Bisect Debugging ===" # Find last known good commit read -p "Enter last known good commit (or tag): " good_commit read -p "Enter first known bad commit (or 'HEAD'): " bad_commit git bisect start git bisect bad $bad_commit git bisect good $good_commit cat > debug/bisect-test.sh << 'EOF' #!/bin/bash # Automated bisect test script # Run test npm test || exit 1 # Exit 1 if bad, 0 if good # Or manual verification echo "Test the current commit and press:" echo " g - if this commit is good" echo " b - if this commit is bad" read -n 1 response [ "$response" = "g" ] && exit 0 || exit 1 EOF chmod +x debug/bisect-test.sh echo "Run: git bisect run ./debug/bisect-test.sh" } ``` ## Phase 4: Isolation & Simplification I'll create minimal test cases: **Issue Isolation:** ```bash #!/bin/bash # Create minimal reproducible example create_minimal_reproduction() { local issue_type="$1" mkdir -p debug/minimal-case case $issue_type in "api") cat > debug/minimal-case/test.js << 'EOF' // Minimal API test case const fetch = require('node-fetch'); async function testIssue() { const response = await fetch('http://localhost:3000/api/endpoint'); const data = await response.json(); console.log('Response:', data); // Add assertion that fails } testIssue().catch(console.error); EOF ;; "frontend") cat > debug/minimal-case/test.html << 'EOF' Minimal Test Case
EOF ;; "database") cat > debug/minimal-case/test.sql << 'EOF' -- Minimal database query to reproduce issue BEGIN TRANSACTION; -- Setup test data CREATE TEMP TABLE test_data (id INT, value TEXT); INSERT INTO test_data VALUES (1, 'test'); -- Query that demonstrates issue SELECT * FROM test_data WHERE condition; ROLLBACK; EOF ;; esac echo "Created minimal test case in debug/minimal-case/" } ``` ## Phase 5: Solution Implementation Once root cause is identified, I'll implement the fix: **Fix Validation:** ```bash #!/bin/bash # Validate fix before committing validate_fix() { echo "=== Fix Validation ===" # 1. Run original reproduction - should now pass echo "Step 1: Run original reproduction..." if [ -f "debug/reproduction.sh" ]; then ./debug/reproduction.sh && echo "✓ Original issue resolved" || { echo "❌ Issue still reproduces" return 1 } fi # 2. Run full test suite echo "Step 2: Run test suite..." npm test 2>&1 | tee debug/post-fix-tests.log # 3. Check for regressions echo "Step 3: Check for regressions..." git diff HEAD -- . | grep -E "^\+" | grep -v "^+++" | head -20 # 4. Verify no new errors echo "Step 4: Lint check..." npm run lint 2>&1 | grep -i "error" && { echo "⚠️ New linting errors introduced" } || echo "✓ No new linting errors" echo "" echo "✓ Fix validation complete" } validate_fix ``` **Fix Documentation:** ```markdown ## Solution ### Root Cause [Detailed explanation of what caused the issue] ### Fix Applied [Description of the solution] ```diff // Before - problematic code // After + corrected code ``` ### Verification - [x] Original reproduction no longer triggers issue - [x] All tests passing - [x] No regressions introduced - [x] Edge cases handled ### Prevention [How to prevent similar issues in the future] - Add test coverage for [scenario] - Update validation to catch [condition] - Add monitoring for [metric] ``` ## Phase 6: Regression Prevention I'll add safeguards to prevent recurrence: **Test Addition:** ```bash #!/bin/bash # Add regression test add_regression_test() { local test_framework="$1" case $test_framework in "jest") cat >> tests/regression.test.js << 'EOF' describe('Regression: [Issue Description]', () => { test('should not reproduce issue #123', async () => { // Reproduce the scenario that previously failed const result = await functionThatHadBug(); // Assert correct behavior expect(result).toBe(expectedValue); }); }); EOF ;; "pytest") cat >> tests/test_regression.py << 'EOF' def test_issue_123_regression(): """Regression test for [issue description]""" # Reproduce the scenario result = function_that_had_bug() # Assert correct behavior assert result == expected_value EOF ;; esac echo "Added regression test to prevent future occurrence" } ``` ## Context Continuity **Session Resume:** When you return and run `/debug-systematic` or `/debug-systematic resume`: - Load debugging plan and hypothesis results - Show which hypotheses have been tested - Continue from next untested hypothesis - Track full debugging timeline **Progress Example:** ``` RESUMING DEBUGGING SESSION ├── Issue: API timeout on user search ├── Hypotheses: 5 total ├── Tested: 3 (2 disproved, 1 confirmed) ├── Current: Testing database query optimization └── Status: Root cause identified Continuing investigation... ``` ## Practical Examples **Start Debugging:** ``` /debug-systematic "API returns 500 on POST /users" /debug-systematic reproduce # Create reproduction steps /debug-systematic # Auto-resume if session exists ``` **Hypothesis Testing:** ``` /debug-systematic test 1 # Test specific hypothesis /debug-systematic isolate # Create minimal reproduction /debug-systematic bisect # Git bisect to find regression ``` **Session Control:** ``` /debug-systematic resume # Continue debugging /debug-systematic status # Show current progress /debug-systematic solved # Mark as solved and summarize ``` ## Debugging Techniques **Common Debugging Patterns:** 1. **Print Debugging:** ```bash add_debug_logging() { echo "Adding strategic debug points..." # Add before suspected issue # Add after suspected issue # Compare outputs } ``` 2. **Rubber Duck Debugging:** ```markdown ## Explain to Rubber Duck 1. What the code should do: [expected behavior] 2. What the code actually does: [actual behavior] 3. Step-by-step execution: [trace through] 4. Where it diverges: [AHA moment] ``` 3. **Divide and Conquer:** ```bash # Comment out half the code # Does issue persist? # - Yes: Issue in remaining half # - No: Issue in commented half # Repeat until isolated ``` ## Safety Guarantees **Protection Measures:** - Git checkpoints before each test - Automated state restoration - No destructive operations without confirmation - Clear rollback paths **Important:** I will NEVER: - Modify production code without validation - Skip hypothesis testing - Apply fixes without verification - Add AI attribution ## Skill Integration When appropriate, I may suggest: - `/test` - Run comprehensive test suite - `/security-scan` - Check if bug is security-related - `/commit` - Commit fix with clear message ## Advanced Debugging Tools **Performance Profiling:** ```bash profile_performance() { # Node.js profiling node --prof app.js node --prof-process isolate-*.log > profile.txt # Python profiling python -m cProfile -o profile.stats script.py python -m pstats profile.stats } ``` **Memory Leak Detection:** ```bash detect_memory_leak() { # Monitor memory over time while true; do ps aux | grep node | awk '{print $6}' | head -1 sleep 5 done | tee memory.log # Analyze pattern gnuplot << 'EOF' set terminal png set output 'memory-usage.png' plot 'memory.log' with lines EOF } ``` **Network Debugging:** ```bash debug_network() { # Capture network traffic tcpdump -i any -w debug/network.pcap port 3000 # Analyze with tshark tshark -r debug/network.pcap -Y "http.response.code >= 400" } ``` ## What I'll Actually Do 1. **Gather information** - Comprehensive context using Grep 2. **Reproduce issue** - Create reliable reproduction 3. **Form hypotheses** - Prioritized theories about cause 4. **Test systematically** - Validate each hypothesis 5. **Isolate problem** - Minimal reproducible case 6. **Implement fix** - Targeted solution 7. **Prevent regression** - Add tests and monitoring I'll maintain complete debugging session continuity, tracking all hypotheses and results across sessions. **Credits:** Systematic debugging methodology based on scientific method and debugging best practices from "Debugging: The 9 Indispensable Rules" by David Agans.