--- name: competing-hypotheses description: Debug problems by investigating multiple hypotheses in parallel. Use when you have a bug, unexpected behaviour, or mystery where the root cause is unclear. Spawns parallel investigator agents each pursuing a different theory, then compares evidence to identify the most likely cause and fix. --- # Competing Hypotheses Debug problems by racing multiple theories in parallel. Each investigator pursues a different hypothesis, gathers evidence, and reports back. The lead compares findings to identify the root cause. ## When to Use - "I have no idea why this is broken" - A bug that could have multiple root causes - Unexpected behaviour with no obvious source - Performance regressions with unclear origin - Intermittent failures that are hard to reproduce --- ## Instructions for Claude You are the **lead investigator** coordinating a parallel hypothesis investigation. ### Coordination Protocol Messages between teammates are **asynchronous** — a message sent now may not be read until the recipient finishes their current work. You cannot rely on message timing for coordination. Instead, **task status is the shared state** that tells every agent where things stand. #### Task Status as Position Marker When a teammate receives a message, they determine where it sits in the conversation by checking their task status — not by assuming it arrived "just now." | Status | Who sets it | Meaning | |--------|------------|---------| | `pending` | Lead | Not started, waiting for assignment | | `in_progress` | Teammate | Working, or finished and **parked** waiting for lead to acknowledge | | `completed` | **Lead only** | Lead has read the teammate's report — this IS the acknowledgment | **The lead marks tasks `completed` — not the teammate.** When a teammate sees their task marked `completed`, they know the lead has processed their report and any new message is current. #### Teammate Protocol Include these rules in every teammate's spawn prompt: 1. Mark your task `in_progress` when you begin work 2. When done, send your report via `SendMessage`, then **park** — stop all work, do not check `TaskList` or claim new tasks. Just wait. 3. Before acting on any received message, **check your task status via `TaskGet`**: - Still `in_progress` → lead hasn't acknowledged your report yet. This message may pre-date your report. Reply with your current state instead of re-executing. - `completed` → lead has processed your report. If a new task is assigned to you, this message contains current instructions — proceed. 4. Wait for all spawned subagents to finish before sending your report. Do not leave background work running. #### Lead Protocol 1. After reading a teammate's report, mark their task `completed` (your acknowledgment) 2. Before sending new instructions, ensure the previous task is `completed` and the new task is created/assigned 3. Verify phase completion via `TaskList` — check that all relevant tasks show the expected status, don't track messages mentally 4. Between implementation steps, run `git status` to confirm a clean working tree before proceeding ### Phase 1: Hypothesize 1. Understand the problem from the user's input: - What's the symptom? (error message, wrong output, unexpected behaviour) - When does it happen? (always, sometimes, after a recent change) - What's already been tried? 2. Generate 2-5 plausible hypotheses for the root cause - Each should be distinct and testable - Cover different areas (data, logic, infrastructure, external dependencies, timing) 3. Present the hypotheses to the user: - List each hypothesis with a brief rationale - Ask: "I'll spin up N investigators to pursue these in parallel. Proceed?" - Incorporate any hypotheses the user wants to add or remove ### Phase 2: Parallel Investigation 1. Create a team with `TeamCreate` 2. Create tasks for each hypothesis with `TaskCreate` 3. Spawn one `general-purpose` teammate per hypothesis using `Task` with `team_name` - Name them after their hypothesis (e.g., `race-condition-investigator`, `data-corruption-investigator`) - Each investigator's prompt should include: - The overall problem description - Their specific hypothesis to pursue - Instruction to **investigate only, do not make changes** - The **Teammate Protocol** from the Coordination Protocol above (copy it into their prompt verbatim) - What evidence to look for (see Investigation Guide below) - Instruction to report findings via `SendMessage` 4. Spawn all investigators in parallel 5. As investigators report back, mark each investigation task `completed` (acknowledging the report) and give the user brief progress updates 6. If an investigator discovers a recent commit already resolved the issue, report the finding to the user and end early if they confirm it's fixed #### Subagent Guidance for Investigators Include the following in each investigator's prompt: > **Use subagents (`Task` tool) to keep your context focused.** Spawn subagents for: > - Exploring specific files, modules, or subsystems > - Searching through git history, logs, or large codebases > - Any research tangent that might not pan out > > Each subagent should report back: > 1. **Relevant findings** — what it discovered that matters to your investigation > 2. **Red herrings** (1-2 sentences) — anything that *looks* related but *isn't*, and why. Calling these out early prevents wasted cycles re-exploring dead ends. > > Report red herrings even when your main findings are conclusive — they prevent other agents from re-exploring the same dead ends. > > After receiving a subagent's report, decide whether to: > - **Use its findings directly** — if the summary gives you enough to proceed > - **Dive in yourself** — if the subagent found something promising and you want full, first-hand context in that area before drawing conclusions. Examples: conflicting evidence that needs direct examination, low confidence in the subagent's assessment, or complex state/flow where first-hand context matters. > > When choosing subagent types, prefer read-only or exploration-focused types for open-ended codebase searches, and full-capability types for targeted analysis or tasks that need write access. ### Investigation Guide Each investigator should: 1. **Search for evidence** supporting their hypothesis - Read relevant code paths - Check logs, error messages, stack traces if available - Look at recent changes (git log, git diff) that could be related - Examine configuration, environment, data 2. **Search for counter-evidence** that would disprove their hypothesis 3. **Rate their confidence** based on what they found 4. **Report** using the output format below ### Investigator Output Format ``` ## Hypothesis: {description} ### Evidence For - {evidence point}: {where found, what it means} ### Evidence Against - {evidence point}: {where found, what it means} ### Red Herrings - {code paths or areas explored that looked related but weren't, and why} ### Confidence: {high/medium/low} ### Root Cause (if found) {specific root cause, file, line, mechanism} ### Suggested Fix {what to change and why} ### Open Questions - {anything unresolved that could help narrow it down} ``` ### Phase 3: Compare & Conclude 1. Once all investigation tasks show `completed` in `TaskList`, compare findings: - Which hypothesis has the strongest evidence? - Did any investigator find definitive proof? - Do findings from different investigators corroborate each other? - Are there open questions that could be quickly resolved? - **Compound bugs** — if multiple hypotheses are confirmed, present as a multi-root-cause scenario and propose fixing in dependency order (fix the cause that enables the others first) 2. Present the analysis to the user: - Rank hypotheses by evidence strength - Highlight the most likely root cause - Note any surprising findings or ruled-out theories - Recommend next steps (fix, further investigation, or targeted test) ### Phase 4: Fix (Optional) Skip this phase if the user only wanted diagnosis, not a fix. 1. If the root cause is clear and the user wants to proceed, follow the **Lead Protocol**: a. Create an implementation task and assign it to the investigator who found the root cause b. Send them an implementation message with the fix details c. **Wait** — the investigator will implement, send a report, and park d. Read the report. Mark the implementation task `completed` (your acknowledgment). e. Run `git status` to confirm a clean working tree 2. If the root cause is unclear: - Propose targeted experiments to disambiguate - Ask the user which direction to pursue 3. For **compound bugs** (multiple root causes), implement fixes one at a time — repeat step 1 for each, verifying clean git state between each fix 4. After all fixes, verify via `TaskList` that all implementation tasks are `completed` and `git status` shows a clean working tree. Then spawn a fresh `validator` teammate. The validator's spawn prompt must include: the **Teammate Protocol** (verbatim), the original symptom, the confirmed hypothesis/root cause, and what the fix was intended to do. 5. If validation fails, route the failure back to the investigator who implemented the fix for corrections, then re-validate ### Rules - **Task status is the source of truth** — coordinate through `TaskUpdate` status, not message timing. Always check `TaskList` to verify state. - **Teammates park after reporting** — after sending a report, stop and wait. Do not self-assign new work or act on queued messages without checking task status first. - **Lead owns `completed`** — only the lead marks tasks `completed`. This is the acknowledgment that closes the loop. - **Keep investigators alive** until the conclusion — they may need follow-up questions - **2-5 hypotheses max** — too many dilutes focus - **Investigators don't communicate** — they work independently to avoid confirmation bias - **Evidence over intuition** — rank hypotheses by concrete evidence, not plausibility - **Counter-evidence matters** — a hypothesis with strong counter-evidence should be deprioritized even if it seems likely - **Finish subagents before reporting** — wait for all spawned subagents to complete before sending your report - **Shut down when done** — after validation passes, or after the user declines to fix, send shutdown requests and **wait for confirmations** before reporting final results - **Unresponsive teammate?** — if a teammate hasn't reported within a reasonable timeframe, check their task status. If stuck, spawn a replacement and inform the user.