--- name: counterexample-debugger description: "Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally." --- # Counterexample-Guided Proof Debugger Analyze counterexamples from Nitpick or QuickChick to explain proof failures and suggest corrections to specifications or proofs. ## Workflow ### 1. Receive Counterexample Information Identify what information is provided: - **Counterexample output**: From Nitpick or QuickChick - **Failed theorem**: The statement that couldn't be proven - **Proof attempt**: Any partial proof or tactics tried - **Context**: Definitions and lemmas involved ### 2. Choose Target System Determine which proof assistant is being used: - **Isabelle/HOL with Nitpick**: Finite model finder - **Coq with QuickChick**: Property-based testing - **Both**: Provide analysis for both systems ### 3. Analyze the Counterexample Examine the counterexample systematically: **Verify the counterexample**: - Manually compute the result for the given values - Confirm it actually violates the theorem - Check if it's a genuine counterexample or tool limitation **Identify the violation**: - Which part of the theorem fails? - What values cause the failure? - Is it an edge case or fundamental issue? **Determine the root cause**: - Missing precondition? - Incorrect specification? - Wrong quantifier order? - Implementation bug? - Off-by-one error? - Type constraint issue? ### 4. Explain the Failure Provide clear explanation: **What went wrong**: - Describe why the counterexample violates the theorem - Show the computation step-by-step - Highlight the specific point of failure **Why it happened**: - Explain the underlying cause - Identify the conceptual error - Note any common patterns (empty list, boundary values, etc.) **Impact assessment**: - Is the theorem fundamentally wrong? - Does it need preconditions? - Is the specification incomplete? ### 5. Suggest Corrections Provide actionable fixes based on the root cause: **For missing preconditions**: ```isabelle (* Before *) lemma "hd xs ∈ set xs" (* After *) lemma "xs ≠ [] ⟹ hd xs ∈ set xs" ``` **For incorrect specifications**: ```coq (* Before: uses < instead of <= *) x < y && is_sorted (y :: ys) (* After *) x <= y && is_sorted (y :: ys) ``` **For quantifier order**: ```isabelle (* Before *) "∃y. ∀x. P x y" (* After *) "∀x. ∃y. P x y" ``` **For incomplete specifications**: ```coq (* Before: only checks sortedness *) is_sorted (sort l) (* After: also checks permutation *) is_sorted (sort l) && permutation l (sort l) ``` ### 6. Recommend Next Steps Guide the user on what to do: **Retest with fix**: - Run Nitpick/QuickChick again - Verify no counterexample found - Check if fix is sufficient **Identify additional issues**: - Are there other edge cases? - Do other lemmas need fixing? - Is the specification now complete? **Proceed with proof**: - If no counterexample, attempt proof - Suggest proof strategy - Identify needed helper lemmas ## Counterexample Analysis Patterns ### Pattern 1: Empty Structures **Symptom**: Counterexample is `[]`, `{}`, or `None` **Common causes**: - Missing non-empty precondition - Undefined behavior on empty input - Base case not handled **Fix**: Add precondition or handle empty case explicitly ### Pattern 2: Boundary Values **Symptom**: Counterexample is `0`, `1`, or type limits **Common causes**: - Off-by-one errors - Boundary condition not considered - Edge case in arithmetic **Fix**: Adjust bounds or add special case handling ### Pattern 3: Duplicate Elements **Symptom**: Counterexample has repeated values like `[0, 0]` **Common causes**: - Using `<` instead of `≤` - Assuming distinctness - Set vs. multiset confusion **Fix**: Use appropriate comparison or add distinctness assumption ### Pattern 4: Small Counterexamples **Symptom**: Very small counterexample (1-2 elements) **Common causes**: - Fundamental specification error - Wrong base case - Incorrect inductive step **Fix**: Review base definitions and inductive structure ### Pattern 5: Type-Specific Values **Symptom**: Counterexample at type boundaries **Common causes**: - Type constraints not considered - Overflow/underflow issues - Finite vs. infinite types **Fix**: Add type constraints or adjust specification ## Tool-Specific Guidance ### Nitpick (Isabelle/HOL) For detailed Nitpick usage and interpretation: - See [nitpick_guide.md](references/nitpick_guide.md) Key points: - Searches for finite models - Configurable cardinality bounds - May miss counterexamples beyond bounds - "No counterexample" ≠ proof ### QuickChick (Coq) For detailed QuickChick usage and interpretation: - See [quickchick_guide.md](references/quickchick_guide.md) Key points: - Random testing with shrinking - Configurable test count - May miss rare counterexamples - "Success" ≠ proof ## Common Root Causes ### Specification Errors **Symptoms**: - Counterexample shows spec doesn't match intent - Multiple unrelated counterexamples - Spec too weak or too strong **Fixes**: - Strengthen postconditions - Add completeness requirements - Review specification against intent ### Missing Preconditions **Symptoms**: - Counterexample is edge case - Empty structures or boundary values - Undefined behavior **Fixes**: - Add non-empty constraints - Add type bounds - Add well-formedness conditions ### Quantifier Issues **Symptoms**: - Counterexample shows wrong order - Existential/universal confusion - Skolem constant issues **Fixes**: - Swap quantifier order - Review logical structure - Check variable dependencies ### Implementation Bugs **Symptoms**: - Counterexample shows function doesn't work - Output doesn't match specification - Logic error in definition **Fixes**: - Fix the implementation - Review algorithm correctness - Test implementation separately ## Debugging Checklist When analyzing a counterexample: 1. **Verify manually**: Compute the result for the counterexample values 2. **Identify violation**: Which part of the theorem fails? 3. **Find root cause**: Why does it fail? (precondition, spec, impl, quantifiers) 4. **Suggest fix**: What needs to change? 5. **Check completeness**: Are there other issues? 6. **Retest**: Run tool again after fix 7. **Proceed**: If no counterexample, attempt proof ## Examples For complete debugging examples including: - Incorrect sortedness definition - Missing preconditions - Wrong quantifier order - Off-by-one errors - Incomplete specifications See [examples.md](references/examples.md) ## Tips - **Trust the counterexample**: If tool finds one, investigate thoroughly - **Verify manually**: Always check the counterexample by hand - **Look for patterns**: Empty, boundary, duplicates are common - **Fix root cause**: Don't just patch symptoms - **Retest after fixing**: Ensure fix is complete - **Use tools early**: Run Nitpick/QuickChick before proving - **Iterate**: May need multiple rounds of debugging - **Document assumptions**: Make preconditions explicit - **Check completeness**: Ensure specification is sufficient