--- name: hypothesis-debugging description: Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans. --- # Hypothesis-Driven Debugging Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself. ## Philosophical Foundation Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong. ## Process ### 1. Gather Context Before forming hypotheses, collect: - **Symptom description**: What behaviour is observed vs expected? - **Reproduction conditions**: When does it occur? Intermittent or consistent? - **Recent changes**: Deployments, configuration changes, dependency updates - **Error artefacts**: Stack traces, logs, error messages, screenshots - **Environmental factors**: OS, runtime versions, network conditions If information is missing, note gaps in the output document. ### 2. Form Hypotheses Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be: - **Specific**: Name the component, function, or interaction suspected - **Falsifiable**: A concrete test could disprove it - **Independent**: Falsifying one should not automatically falsify others Common hypothesis categories: | Category | Examples | |----------|----------| | State | Race condition, stale cache, corrupted data | | Input | Malformed payload, encoding issue, boundary case | | Environment | Missing dependency, version mismatch, resource exhaustion | | Logic | Off-by-one, incorrect predicate, missing null check | | Integration | API contract violation, timeout, auth failure | Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode. ### 3. Design Falsification Plans For each hypothesis, specify: 1. **Prediction**: If this hypothesis is correct, what observable outcome follows? 2. **Falsification test**: What action would produce a contradicting observation? 3. **Expected negative result**: What outcome would disprove the hypothesis? 4. **Tooling required**: Commands, scripts, or instrumentation needed 5. **Confidence impact**: How decisively would a negative result rule this out? Prefer tests that are: - Quick to execute - Minimally invasive - Deterministic rather than probabilistic ### 4. Output Document Generate a Markdown document following the template in `assets/debugging-plan.md`. Save to the working directory as `debugging-plan-{timestamp}.md`. ## Quality Criteria A well-formed debugging plan exhibits: - **Mutual exclusivity**: At least one hypothesis should survive if others fail - **Collective exhaustiveness**: Hypotheses cover the likely failure space - **Ordered efficiency**: Cheapest decisive tests appear first - **Clear success criteria**: The executing agent knows when to stop ## Anti-Patterns - Confirmation bias: Designing tests that can only succeed, not fail - Hypothesis creep: Adding new hypotheses during execution rather than revision - Coupling: Tests that cannot isolate individual hypotheses - Vagueness: "Check the logs" without specifying what pattern would falsify ## References - `references/examples.md`: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)