--- name: strict-enforcement description: Use when running claudikins-kernel:verify, checking implementation quality, deciding pass/fail verdicts, or enforcing cross-command gates — requires actual evidence of code working, not just passing tests allowed-tools: - Read - Grep - Glob - Bash - WebFetch - Skill - mcp__plugin_claudikins-tool-executor_tool-executor__search_tools - mcp__plugin_claudikins-tool-executor_tool-executor__get_tool_schema - mcp__plugin_claudikins-tool-executor_tool-executor__execute_code --- # Strict Enforcement Verification Methodology ## When to use this skill Use this skill when you need to: - Run the `claudikins-kernel:verify` command - Validate implementation before shipping - Decide pass/fail verdicts - Check code integrity after changes - Enforce cross-command gates ## Core Philosophy > "Evidence before assertions. Always." - Verification philosophy Never claim code works without seeing it work. Tests passing is not enough. Claude must SEE the output. ### The Three Laws 1. **See it working** - Screenshots, curl responses, CLI output. Actual evidence. 2. **Human checkpoint** - No auto-shipping. Human reviews evidence and decides. 3. **Exit code 2 gates** - Verification failures block claudikins-kernel:ship. No exceptions. ## Verification Phases ### Phase 1: Automated Quality Checks Run the automated checks first. Fast feedback. | Check | Command Pattern | What It Catches | | ----- | ------------------------------------ | ----------------------------------- | | Tests | `npm test` / `pytest` / `cargo test` | Logic errors, regressions | | Lint | `npm run lint` / `ruff` / `clippy` | Style issues, common bugs | | Types | `tsc` / `mypy` / `cargo check` | Type mismatches, interface drift | | Build | `npm run build` / `cargo build` | Compilation errors, bundling issues | **Flaky Test Detection (C-12):** ``` Test fails? ├── Re-run failed tests ├── Pass 2nd time? │ └── Yes → STOP: [Accept flakiness] [Fix tests] [Abort] └── Fail 2nd time? ├── Run isolated └── Still fail? → STOP: [Fix] [Skip] [Abort] ``` ### Phase 2: Output Verification (catastrophiser) This is the feedback loop that makes Claude's code actually work. | Project Type | Verification Method | Evidence | | ------------ | ------------------------------------ | ------------------------------ | | Web app | Start server, screenshot, test flows | Screenshots, console logs | | API | Curl endpoints, check responses | Status codes, response bodies | | CLI | Run commands, capture output | stdout, stderr, exit codes | | Library | Run examples, check results | Output values, test coverage | | Service | Check logs, verify health endpoint | Log patterns, health responses | **Fallback Hierarchy (A-3):** If primary method unavailable, fall back: 1. Start server + screenshot (preferred for web) 2. Curl endpoints (preferred for API) 3. Run CLI commands (preferred for CLI) 4. Run tests only (fallback) 5. Code review only (last resort) **Timeout:** 30 seconds per verification method (CMD-30). ### Phase 3: Code Simplification (Optional) After verification passes, optionally run cynic for polish. **Prerequisites:** - Phase 2 (catastrophiser) must PASS - Human approves: "Run cynic for polish pass?" **cynic Rules:** - Preserve exact behaviour (tests MUST still pass) - Remove unnecessary abstraction - Improve naming clarity - Delete dead code - Flatten nested conditionals **If tests fail after simplification:** - Log failure reasons - Show human - Proceed anyway (A-5) with caveat See [cynic-rollback.md](references/cynic-rollback.md) for recovery patterns. ### Phase 4: Klaus Escalation If stuck during verification: ``` Is mcp__claudikins-klaus available? (E-16) ├── No → │ Offer: [Manual review] [Ask Claude differently] (E-17) │ Fallback: [Accept with uncertainty] [Max retries, abort] (E-18) └── Yes → Spawn klaus via SubagentStop hook ``` ### Phase 5: Human Checkpoint The final gate. Present comprehensive evidence. ``` Verification Report ------------------- Tests: ✓ 47/47 passed Lint: ✓ 0 issues Types: ✓ 0 errors Build: ✓ success Evidence: - Screenshot: .claude/evidence/login-flow.png - API test: POST /api/auth → 200 OK - CLI test: mycli --help → exit 0 [Ready to Ship] [Needs Work] [Accept with Caveats] ``` Human decides. If approved, set `unlock_ship = true`. ## Rationalizations to Resist Agents under pressure find excuses. These are all violations: | Excuse | Reality | | ------------------------------------------ | --------------------------------------------------------------------- | | "Tests pass, that's good enough" | Tests aren't enough. SEE it working. Screenshots, curl, output. | | "I'll verify after shipping" | Verify BEFORE ship. That's the whole point. | | "The type checker caught everything" | Types don't catch runtime issues. Get evidence. | | "Screenshot failed but it probably works" | "Probably" isn't evidence. Fix the screenshot or use fallback. | | "Human checkpoint is just a formality" | Human checkpoint is the gate. No auto-shipping. | | "Code review is enough for this change" | Code review is last resort fallback. Try harder. | | "Tests are flaky, I'll ignore the failure" | Flaky tests hide real failures. Fix or explicitly accept with caveat. | | "Exit code 2 is too strict" | Exit code 2 exists to block bad ships. Pass properly. | **All of these mean: Get evidence. Human decides. No shortcuts.** ## Red Flags — STOP and Reassess If you're thinking any of these, you're about to violate the methodology: - "It should work because..." - "The tests pass so..." - "I'm confident that..." - "It worked before..." - "The types check so..." - "I'll just skip verification this once" - "Human will approve anyway" - "Evidence isn't necessary for this change" **All of these mean: STOP. Get evidence. Present to human. Let them decide.** ## Exit Code 2 Pattern (CRITICAL) The verify-gate.sh hook enforces the gate: ```bash # Both conditions MUST be true ALL_PASSED=$(jq -r '.all_checks_passed' "$STATE") HUMAN_APPROVED=$(jq -r '.human_checkpoint.decision' "$STATE") if [ "$ALL_PASSED" != "true" ]; then exit 2 # Blocks claudikins-kernel:ship fi if [ "$HUMAN_APPROVED" != "ready_to_ship" ]; then exit 2 # Blocks claudikins-kernel:ship fi ``` **File Manifest (C-6):** At verification completion, generate SHA256 hashes of all source files: ```bash find . \( -name '*.ts' -o -name '*.py' -o -name '*.rs' \) \ | xargs sha256sum > .claude/verify-manifest.txt ``` This lets claudikins-kernel:ship detect if code was modified after verification. ## Cross-Command Gate (C-14) claudikins-kernel:verify requires claudikins-kernel:execute to have completed: ```bash if [ ! -f "$EXECUTE_STATE" ]; then echo "ERROR: claudikins-kernel:execute has not been run" exit 2 fi ``` This enforces the claudikins-kernel:outline → claudikins-kernel:execute → claudikins-kernel:verify → claudikins-kernel:ship flow. ## Agent Integration | Agent | Role | When | | -------------- | ---------------- | ---------------------------------- | | catastrophiser | See code working | Phase 2: Output verification | | cynic | Polish pass | Phase 3: Simplification (optional) | Both agents run with `context: fork` and `background: true`. See [agent-integration.md](references/agent-integration.md) for coordination patterns. ## State Tracking ### verify-state.json ```json { "session_id": "verify-2026-01-16-1100", "execute_session_id": "execute-2026-01-16-1030", "branch": "execute/task-1-auth-middleware", "phases": { "test_suite": { "status": "PASS", "count": 47 }, "lint": { "status": "PASS", "issues": 0 }, "type_check": { "status": "PASS", "errors": 0 }, "output_verification": { "status": "PASS", "agent": "catastrophiser" }, "code_simplification": { "status": "PASS", "agent": "cynic" } }, "all_checks_passed": true, "human_checkpoint": { "decision": "ready_to_ship", "caveats": [] }, "unlock_ship": true, "verified_manifest": "sha256:...", "verified_commit_sha": "abc123..." } ``` ## Anti-Patterns **Don't do these:** - Trusting test results without seeing code run - Skipping output verification because "tests pass" - Auto-approving verification without human checkpoint - Modifying code after verification passes - Ignoring flaky test warnings - Proceeding when lint/type checks fail ## Edge Case Handling | Situation | Reference | | -------------------------- | ----------------------------------------------------------------------------- | | Tests hang or timeout | [test-timeout-handling.md](references/test-timeout-handling.md) | | Auto-fix breaks code | [lint-fix-validation.md](references/lint-fix-validation.md) | | Primary verification fails | [verification-method-fallback.md](references/verification-method-fallback.md) | | Type-check results unclear | [type-check-confidence.md](references/type-check-confidence.md) | | cynic breaks tests | [cynic-rollback.md](references/cynic-rollback.md) | | Large project state | [verify-state-compression.md](references/verify-state-compression.md) | ## References Full documentation in this skill's references/ folder: - [verification-checklist.md](references/verification-checklist.md) - Complete verification checklist - [red-flags.md](references/red-flags.md) - Common rationalisation patterns - [agent-integration.md](references/agent-integration.md) - How catastrophiser and cynic coordinate - [advanced-verification.md](references/advanced-verification.md) - Complex verification scenarios - [test-timeout-handling.md](references/test-timeout-handling.md) - When tests hang (S-13) - [lint-fix-validation.md](references/lint-fix-validation.md) - Validating auto-fix safety (S-14) - [verification-method-fallback.md](references/verification-method-fallback.md) - Fallback strategies (S-15) - [type-check-confidence.md](references/type-check-confidence.md) - Interpreting type results (S-16) - [cynic-rollback.md](references/cynic-rollback.md) - Rolling back failed simplifications (S-17) - [verify-state-compression.md](references/verify-state-compression.md) - State management for large projects (S-18)