--- name: debug-investigator description: > Hypothesis-driven debugging methodology: ranked hypotheses with confirming/refuting tests, git bisect strategy, log analysis, instrumentation point planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses for this error", "help me isolate this issue", "create a minimal reproduction", "instrumentation plan for this bug", "why does this keep failing". The differentiator is the structured investigation methodology (hypothesis ranking, bisection strategy, instrumentation points) — use this skill for non-obvious bugs that need systematic investigation, not simple errors the model diagnoses directly. NOT for abstract reasoning or problem decomposition without a specific error — the model handles general reasoning natively. metadata: version: 1.1.0 --- # Debug Investigator Structured debugging methodology that replaces ad-hoc exploration with hypothesis-driven investigation. Captures symptoms, analyzes evidence (stacktraces, logs, state), generates ranked hypotheses, designs bisection strategies, identifies instrumentation points, and produces minimal reproductions — documenting every step so dead ends are never revisited. > **When to use this skill vs native debugging:** The base model handles straightforward > debugging (clear stacktraces, obvious errors) natively. Use this skill for non-obvious bugs > requiring systematic investigation: intermittent failures, bugs with no clear stacktrace, > performance regressions, or issues requiring git bisection and hypothesis ranking. ## Reference Files | File | Contents | Load When | | -------------------------------------- | ----------------------------------------------------------------------------- | ------------------------------- | | `references/stacktrace-patterns.md` | Exception taxonomy, traceback reading, common Python/JS error signatures | Stacktrace or exception present | | `references/hypothesis-templates.md` | Bug category catalog, probability ranking, confirmation/refutation tests | Always | | `references/bisection-guide.md` | git bisect workflow, binary search debugging, narrowing techniques | Bug appeared after a change | | `references/log-analysis.md` | Log pattern extraction, anomaly detection, timeline correlation | Log output available | | `references/instrumentation-points.md` | Strategic logging placement, breakpoint strategy, state inspection techniques | Investigation plan needed | ## Prerequisites - **git** — for bisection and history analysis - **Access to source code** — cannot debug opaque binaries - **Reproducible environment** — or at minimum, error output (stacktrace, logs) ## Workflow ### Phase 1: Symptom Capture Before touching code, document the observable problem: 1. **What is happening?** — Describe the observed behavior precisely. "It crashes" is insufficient. "Raises `KeyError('user_id')` on line 42 of `auth.py` when calling `get_current_user()` with a valid session token" is actionable. 2. **What should happen?** — Define the expected behavior. If unknown, state that. 3. **Reproducibility** — Always, intermittent (with frequency), or one-time? Intermittent bugs require different strategies than deterministic ones. 4. **Recency** — When did this start? Correlate with recent changes: `git log --oneline -20`. If the bug appeared after a specific commit, bisection is the fastest path. 5. **Environment** — Python version, OS, dependency versions, configuration differences between working and broken environments. ### Phase 2: Evidence Analysis Examine all available evidence before forming hypotheses: 1. **Stacktrace interpretation** — If a traceback exists, read it bottom-up. The last frame is where the error manifested, but the cause is often several frames up. Identify: - Exception type and message - The frame where the error originated vs. where it was raised - Any familiar patterns (see `references/stacktrace-patterns.md`) 2. **Log pattern extraction** — Search logs for: - Temporal anomalies (timestamps out of sequence, gaps) - Repeated errors (same error appearing in bursts) - State transitions that didn't complete - Correlation with external events (deploys, config changes) 3. **State inspection** — If the system is running, inspect: - Variable values at the failure point - Database state (missing rows, unexpected values) - Configuration values (environment variables, config files) - External dependency status (API availability, DB connectivity) 4. **Code diff analysis** — If the bug is recent: - `git diff HEAD~5` — what changed? - Focus on files touched by the error's call chain - Look for typos, wrong variable names, missing null checks ### Phase 3: Hypothesis Generation Generate ranked hypotheses — never start fixing without a hypothesis: 1. **List 3-5 hypotheses** ranked by likelihood. Each hypothesis must include: - A concrete claim about what is wrong - What evidence supports it - What evidence would confirm it (a test you can run) - What evidence would refute it 2. **Rank by likelihood** using: - Proximity to recent changes (most bugs are in new code) - Simplicity (typos before race conditions) - Evidence fit (does the hypothesis explain ALL symptoms?) 3. **Common bug categories** (see `references/hypothesis-templates.md`): - State bugs: wrong value, missing initialization, stale cache - Logic bugs: off-by-one, wrong operator, inverted condition - Integration bugs: API contract mismatch, serialization error - Concurrency bugs: race condition, deadlock, resource starvation - Environment bugs: missing dependency, wrong config, version mismatch ### Phase 4: Investigation Plan Design specific steps to test each hypothesis: 1. **Test H1 first** — Always test the most likely hypothesis first. Design a single action that will confirm or refute it. 2. **Bisection** — If the bug appeared after a change and H1 fails: - Identify the known-good and known-bad commits - Run `git bisect start ` - Define the test command for each commit - See `references/bisection-guide.md` for workflow 3. **Isolation** — Remove variables one at a time: - Simplify input data - Disable features/plugins - Replace external calls with hardcoded values - Run in a clean environment 4. **Instrumentation** — Add targeted logging/breakpoints: - At function entry/exit points in the call chain - Before and after state mutations - At decision points (if/else branches) - See `references/instrumentation-points.md` ### Phase 5: Execution Execute the investigation plan, updating hypotheses as evidence arrives: 1. **Test one variable at a time** — Changing multiple things simultaneously makes results uninterpretable. 2. **Record results** — Document what each test revealed, even negative results. Dead-end documentation prevents revisiting failed paths. 3. **Update probabilities** — After each test, re-rank hypotheses. If H1 is refuted, H2 becomes the new priority. 4. **Know when to escalate** — If all hypotheses are exhausted, the bug is in a category you haven't considered. Step back and re-examine assumptions. ### Phase 6: Resolution Documentation After finding the root cause: 1. **Root cause** — What was actually wrong, precisely. 2. **Fix** — What was changed and why. 3. **Prevention** — How to prevent recurrence (test, lint rule, type check, etc.). 4. **Lessons** — What was learned that applies beyond this specific bug. ## Output Format ```` ## Debug Investigation: {Brief Description} ### Symptom **Observed:** {What is happening — precise description} **Expected:** {What should happen} **Reproducibility:** {Always | Intermittent (~N% of attempts) | Once} **First noticed:** {Date/time or triggering event} **Environment:** {Relevant versions and configuration} ### Evidence Analysis #### Stacktrace - **Exception:** {type}: {message} - **Origin:** {file}:{line} in {function} - **Call chain:** {caller} → {caller} → {failure point} - **Key insight:** {What the traceback reveals about the cause} #### Logs - **Anomaly:** {What is unusual} - **Timeline:** {When the anomaly started} - **Correlation:** {Related events} #### Code Changes - **Recent commits:** {relevant commits since last known-good state} - **Files in error path:** {which changed files appear in the traceback} ### Hypotheses | # | Hypothesis | Likelihood | Confirming Test | Refuting Test | |---|------------|------------|-----------------|---------------| | H1 | {Specific claim} | High | {What to check} | {What would disprove} | | H2 | {Specific claim} | Medium | {What to check} | {What would disprove} | | H3 | {Specific claim} | Low | {What to check} | {What would disprove} | ### Investigation Plan #### Step 1: Test H1 — {action} - **Command/action:** {specific step} - **If confirmed:** {next action — fix} - **If refuted:** proceed to Step 2 #### Step 2: Bisection - **Good commit:** {hash} - **Bad commit:** {hash} - **Test:** {command to verify each commit} - **Command:** `git bisect start {bad} {good}` #### Step 3: Isolation - **Remove:** {variable to eliminate} - **Expected change:** {what should happen} ### Instrumentation Points 1. {file}:{line} — log {variable/state} to observe {what} 2. {file}:{line} — breakpoint to inspect {what} ### Minimal Reproduction ```{language} # Minimal code that triggers the bug {code} ```` ### Resolution **Root cause:** {What was wrong} **Fix:** {What was changed — file:line, diff summary} **Prevention:** {Test added, lint rule, type annotation, etc.} **Lessons:** {What generalizes beyond this bug} ```text ## Configuring Scope | Mode | Scope | Depth | When to Use | |------|-------|-------|-------------| | `quick` | Single error | H1 test + fix | Clear stacktrace, obvious cause | | `standard` | Full investigation | 3 hypotheses + bisection plan | Default for non-obvious bugs | | `deep` | Systemic analysis | 5+ hypotheses + instrumentation + reproduction | Intermittent bugs, no stacktrace, production issues | ## Calibration Rules 1. **Hypotheses before code changes.** Never start modifying code without at least one explicit hypothesis. "Let me try this" is not debugging — it's guessing. 2. **One variable at a time.** Each investigation step should change exactly one thing. If you change two things and the bug disappears, you don't know which fixed it. 3. **Document dead ends.** Failed hypotheses are valuable — they narrow the search space. Record what was tested and what was learned. 4. **Simplest explanation first.** Test typos, wrong variable names, and missing imports before considering race conditions, compiler bugs, or cosmic rays. 5. **Reproduce before fixing.** If you cannot reproduce the bug in a controlled environment, any fix is speculative. Invest in reproduction first. 6. **Root cause, not symptoms.** A fix that addresses the symptom (adding a null check) without understanding the root cause (why was it null?) leaves the real bug alive. ## Error Handling | Problem | Resolution | |---------|------------| | No stacktrace available | Focus on log analysis and state inspection. Use instrumentation to generate diagnostic output. | | Bug is intermittent | Add persistent logging at key decision points. Run under stress (high load, concurrent requests) to increase reproduction rate. | | Cannot reproduce locally | Compare environments systematically: versions, config, data, timing. Use `docker` or VM to mirror production. | | Multiple hypotheses equally likely | Design a single test that distinguishes between them. Binary decision: "If X, then H1; if Y, then H2." | | Fix attempted but bug persists | The hypothesis was wrong. Revert the fix, update hypothesis rankings, and proceed to the next hypothesis. Do not stack fixes. | | Bug is in a dependency | Confirm with a minimal reproduction that uses only the dependency. Check issue trackers. Pin to last known-good version while awaiting upstream fix. | ## When NOT to Investigate Push back if: - The error message already contains the fix ("missing module X" → install X) - The issue is a known environment setup problem (wrong Python version, missing env var) - The "bug" is actually a feature request or design disagreement — redirect to ADR or discussion - The code is not under the user's control (third-party SaaS, managed service) — file a support ticket instead - The user wants to debug generated/minified code — debug the source, not the output ```