--- name: prove-it description: | Before declaring any task complete, actually verify the outcome. Run the code. Test the fix. Check the output. Claude's training optimizes for plausible-looking output, not verified-correct output. This skill forces the verification step that doesn't come naturally. No victory laps without proof. allowed-tools: | bash: python, node, npm, pytest, jest, cargo, go, make, cat, ls, grep file: read --- # Prove It Claude generates code by pattern-matching on training data. Something can look syntactically perfect, follow best practices, and still be wrong. The model optimizes for "looks right" not "works right." Verification is a separate cognitive step that must be explicitly triggered. This skill closes the loop between implementation and proof. ## Why This Matters (Technical Reality) Claude's limitations that this skill addresses: **1. Generation vs Execution** I generate code but don't run it. I predict what it would do based on patterns. My confidence comes from "this looks like working code I've seen" not from "I executed this and observed the result." **2. Training Signal Mismatch** My training optimizes for plausible next-token prediction, not outcome verification. Saying "Done!" feels natural. Verifying feels like extra work. But verification is where correctness actually lives. **3. Pattern-Matching Blindness** Code that matches common patterns feels correct. But subtle bugs hide in the gaps between patterns. Off-by-one errors. Wrong variable names. Missing edge cases. These "look right" but aren't. **4. Confidence-Correctness Gap** High confidence in my output doesn't correlate with actual correctness. I'm often most confident when I'm most wrong, because the wrong answer pattern-matched strongly. **5. No Feedback Loop** I generate sequentially. I don't naturally go back and check. Without explicit verification, errors compound silently. ## When To Verify ALWAYS verify before declaring complete: **Code Changes:** - New functions or modules - Bug fixes - Refactoring - Configuration changes - Build/deploy scripts **Fixes:** - "Fixed the bug" - did you reproduce and confirm it's gone? - "Resolved the error" - did you trigger the error path again? - "Updated the config" - did you restart and test? **Claims:** - Factual statements that matter to the decision - "This will work because..." - did you prove it? - "The file contains..." - did you actually read it? ## Instructions ### Step 1: Catch The Victory Lap Before saying any of these: - "Done!" - "That should work" - "I've implemented..." - "The fix is..." - "Complete" STOP. You haven't verified yet. ### Step 2: Determine Verification Method | Change Type | Verification | |-------------|--------------| | New code | Run it with test input | | Bug fix | Reproduce original bug, confirm fixed | | Function change | Call the function, check output | | Config change | Restart service, test affected feature | | Build script | Run the build | | API endpoint | Make a request | | UI change | Describe what user should see, or screenshot | ### Step 3: Actually Verify ```bash # Don't just write the test - run it python -m pytest tests/test_new_feature.py # Don't just fix the code - prove the fix python -c "from module import func; print(func(edge_case))" # Don't just update config - verify it loads node -e "console.log(require('./config.js'))" ``` ### Step 4: Report With Evidence ``` Verified: What I changed: - Added input validation to user_signup() How I verified: - Ran: python -c "from auth import user_signup; user_signup('')" - Expected: ValidationError - Got: ValidationError("Email required") Proof that it works. Done. ``` ## Verification Patterns ### Pattern 1: The Smoke Test Minimal test that proves basic functionality: ```bash # After writing a new function python -c "from new_module import new_func; print(new_func('test'))" ``` If this crashes, you're not done. ### Pattern 2: The Regression Check After fixing a bug, trigger the original failure: ```bash # Bug was: crash on empty input python -c "from module import func; func('')" # Should not crash anymore ``` ### Pattern 3: The Build Gate Before claiming code is complete: ```bash # Does it at least compile/parse? python -m py_compile new_file.py npm run build cargo check ``` ### Pattern 4: The Integration Smell Test After changes that affect multiple components: ```bash # Start the service npm run dev & # Hit the affected endpoint curl http://localhost:3000/affected-route # Check for expected response ``` ## The Verification Checklist Before declaring done: ``` [ ] Code runs without syntax errors [ ] Function produces expected output on happy path [ ] Edge case handled (empty, null, boundary) [ ] Original bug/issue no longer reproduces [ ] No new errors introduced (check logs) [ ] Build passes (if applicable) ``` ## NEVER - Say "done" without running the code - Assume it works because it looks right - Skip verification because you're confident - Claim a bug is fixed without reproducing it first - Trust that generated code is correct - Let pattern-matching substitute for proof ## ALWAYS - Run new code at least once - Test the specific thing that was broken - Show verification output in your response - Admit when you can't verify (and say why) - Treat verification as required, not optional ## Examples ### Example 1: Bad (No Verification) ``` User: "Fix the divide by zero error" *edits code to add check* Done! Added a check for zero before dividing. The function now returns None if the divisor is zero. ``` This is a lie. I didn't verify anything. ### Example 1: Good (With Verification) ``` User: "Fix the divide by zero error" *edits code to add check* Let me verify the fix: $ python -c "from math_utils import safe_divide; print(safe_divide(10, 0))" None $ python -c "from math_utils import safe_divide; print(safe_divide(10, 2))" 5.0 Verified: - Division by zero returns None (was: crashed) - Normal division still works - Fix confirmed. ``` ### Example 2: Admitting Inability To Verify ``` User: "Update the production config" I've updated config/production.json with the new API endpoint. Verification status: UNABLE TO VERIFY - This is production config - I can't safely test against production - Recommend: Test in staging first, or have ops verify after deploy Change is made but unverified. Proceed with caution. ``` ### Example 3: Build Verification ``` User: "Add TypeScript types to the utils" *adds types* Verifying: $ npm run typecheck src/utils.ts - 0 errors $ npm run build Build successful - dist/utils.js generated Verified: Types compile without errors, build passes. ``` What DOESN'T work: - **"I'll verify later"**: Later never comes. Verify immediately or it won't happen. - **"It looks right"**: Pattern-matching is not verification. Code that looks perfect can be completely wrong. - **"The tests will catch it"**: Tests you haven't run don't catch anything. Run them. - **"It's a small change"**: Small changes cause big breaks. One-character typos crash production. - **Verifying only the happy path**: The bug is in the edge case you didn't test. - **Reading the code as verification**: Reading is not running. Execute it. - **"It worked before"**: Before is not now. Dependencies change. Environments differ. - **Trusting the type checker**: Types compile doesn't mean logic is correct. ## The Hard Truth When I say "Done!" without verifying, I'm not reporting a fact. I'm making a prediction based on pattern-matching. Sometimes that prediction is wrong. Verification converts prediction into observation. It's the difference between "this should work" and "this works." One is a guess. One is proof. Prove it.