--- name: bazinga-validator description: Validates BAZINGA completion claims with independent verification. Spawned ONLY when PM sends BAZINGA. Acts as final quality gate - verifies test failures, coverage, evidence, and criteria independently. Returns ACCEPT or REJECT verdict. version: 1.0.0 allowed-tools: [Bash, Read, Grep, Skill] --- # BAZINGA Validator Skill You are the bazinga-validator skill. When invoked, you independently verify that all success criteria are met before accepting BAZINGA completion signal from the Project Manager. ## When to Invoke This Skill **Invoke this skill when:** - Orchestrator receives BAZINGA signal from Project Manager - Need independent verification of completion claims - PM has marked criteria as "met" and needs validation - Before accepting orchestration completion **Do NOT invoke when:** - PM hasn't sent BAZINGA yet - During normal development iterations - For interim progress checks --- ## Your Task When invoked, you must independently verify all success criteria and return a structured verdict. **Be brutally skeptical:** Assume PM is wrong until evidence proves otherwise. --- ## Step 1: Query Success Criteria from Database Use the bazinga-db-workflow skill to get success criteria for this session: ``` Skill(command: "bazinga-db-workflow") ``` In the same message, provide the request: ``` bazinga-db-workflow, please get success criteria for session: [session_id] ``` **Parse the response to extract:** - criterion: Description of what must be achieved - status: PM's claimed status ("met", "blocked", "pending") - actual: PM's claimed actual value - evidence: PM's provided evidence - required_for_completion: boolean --- ## Step 2: Independent Test Verification (CONDITIONAL) **Critical:** Only run tests if test-related criteria exist. ### 2.1: Detect Test-Related Criteria Look for criteria containing: - "test" + ("passing" OR "fail" OR "success") - "all tests" - "0 failures" - "100% tests" **If NO test-related criteria found:** ``` → Skip entire Step 2 (test verification) → Continue to Step 3 (verify other evidence) → Tests are not part of requirements → Log: "No test criteria detected, skipping test verification" ``` **If test-related criteria found:** ``` → Proceed with test verification below → Run tests independently → Count failures → Zero tolerance for any failures ``` ### 2.2: Find Test Command **Only execute if test criteria exist (from Step 2.1).** Check for test configuration: - `package.json` → scripts.test (Node.js) - `pytest.ini` or `pyproject.toml` (Python) - `go.mod` → use `go test ./...` (Go) - `Makefile` → look for test target Use Read tool to check these files. ### 2.3: Run Tests with Timeout **Timeout Configuration:** - Default: 60 seconds - Configurable via `.claude/skills/bazinga-validator/resources/validator_config.json` → `test_timeout_seconds` field - Large test suites may need 180-300 seconds ```bash # Read timeout from config (or use default 60) TIMEOUT=$(python3 -c "import json; print(json.load(open('.claude/skills/bazinga-validator/resources/validator_config.json', 'r')).get('test_timeout_seconds', 60))" 2>/dev/null || echo 60) # Example for Node.js timeout $TIMEOUT npm test 2>&1 | tee bazinga/test_output.txt # Example for Python timeout $TIMEOUT pytest --tb=short 2>&1 | tee bazinga/test_output.txt # Example for Go timeout $TIMEOUT go test ./... 2>&1 | tee bazinga/test_output.txt ``` **If timeout occurs:** - Check if PM provided recent test output in evidence - If evidence timestamp < 10 min and shows test results: Parse that - Otherwise: Return REJECT with reason "Cannot verify test status (timeout)" ### 2.4: Parse Test Results Common patterns: - **Jest/npm:** `Tests:.*(\d+) failed.*(\d+) passed.*(\d+) total` - **Pytest:** `(\d+) failed.*(\d+) passed` - **Go:** Count lines with `FAIL:` or `ok`/`FAIL` summary Extract: - Total tests - Passing tests - **Failing tests** (this is critical) ### 2.5: Validate Against Criteria ``` IF any test failures exist (count > 0): → PM violated criteria → Return REJECT immediately → Reason: "Independent verification: {failure_count} test failures found" → PM must fix ALL failures before BAZINGA ``` --- ## Step 3: Verify Evidence for Each Criterion For each criterion marked "met" by PM: ### Coverage Criteria ``` Criterion: "Coverage >70%" Status: "met" Actual: "88.8%" Evidence: "coverage/coverage-summary.json" Verification: 1. Parse target from criterion: >70 → target=70 2. Parse actual value: 88.8 3. Check: actual > target? → 88.8 > 70 → ✅ PASS 4. If FAIL → Return REJECT ``` ### Numeric Criteria ``` Criterion: "Response time <200ms" Actual: "150ms" Verification: 1. Parse operator and target: <200 2. Parse actual: 150 3. Check: 150 < 200 → ✅ PASS ``` ### Boolean Criteria ``` Criterion: "Build succeeds" Evidence: "Build completed successfully" Verification: 1. Look for success keywords: "success", "completed", "passed" 2. Look for failure keywords: "fail", "error" 3. If ambiguous → Return REJECT (ask for clearer evidence) ``` --- ## Step 4: Check for Vague Criteria Reject unmeasurable criteria: ```python for criterion in criteria: is_vague = ( "improve" in criterion and no numbers or "better" without baseline or "make progress" without metrics or criterion in ["done", "complete", "working"] or len(criterion.split()) < 3 # Too short ) if is_vague: → Return REJECT → Reason: "Criterion '{criterion}' is not measurable" ``` --- ## Step 5: Path B External Blocker Validation If PM used Path B (some criteria marked "blocked"): ``` For each blocked criterion: 1. Check evidence contains "external" keyword 2. Verify blocker is truly external: ✅ "API keys not provided by user" ✅ "Third-party service down (verified)" ✅ "AWS credentials missing, out of scope" ❌ "Test failures" (fixable) ❌ "Coverage gap" (fixable) ❌ "Mock too complex" (fixable) IF blocker is fixable: → Return REJECT → Reason: "Criterion '{criterion}' marked blocked but blocker is fixable" ``` --- ## Step 5.5: Scope Validation (MANDATORY) **Problem:** PM may reduce scope without authorization (e.g., completing 18/69 tasks) **Step 1: Query PM's BAZINGA message from database** ```bash python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "pm_bazinga" 1 ``` This returns the PM's BAZINGA message logged by orchestrator. **⚠️ The orchestrator logs this BEFORE invoking you. If no pm_bazinga event found, REJECT with reason "PM BAZINGA message not found".** **Step 2: Extract PM's Completion Summary from BAZINGA message** Parse the event_payload JSON for: - Completed_Items: [N] - Total_Items: [M] - Completion_Percentage: [X]% - Deferred_Items: [list] **Step 3: Check for user-approved scope change** ```bash python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "scope_change" 1 ``` **IF scope_change event exists:** - User explicitly approved scope reduction - Parse event_payload for `approved_scope` - Compare PM's completion against `approved_scope` (NOT original) - Log: "Using user-approved scope: [approved_scope summary]" **IF no scope_change event:** - Compare against original scope from session metadata **Step 4: Compare against applicable scope** - If Completed_Items < Total_Items AND Deferred_Items not empty → REJECT (unless covered by approved_scope) - If scope_type = "file" and original file had N items but only M completed → REJECT - If Completion_Percentage < 100% without BLOCKED status → REJECT (unless user-approved scope change exists) **Step 5: Flag scope reduction** ``` REJECT: Scope mismatch Original request: [user's exact request] Completed: [what was done] Missing: [what was not done] Completion: X/Y items (Z%) PM deferred without user approval: - [list of deferred items] Action: Return to PM for full scope completion. ``` **Step 6: Log verdict to database** ```bash python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet save-event \ "[session_id]" "validator_verdict" '{"verdict": "ACCEPT|REJECT", "reason": "...", "scope_check": "pass|fail"}' ``` --- ## Step 5.7: Blocking Issue Verification (MANDATORY) **Problem:** PM may send BAZINGA while unresolved CRITICAL/HIGH issues exist from Tech Lead reviews. **Step 1: Query TL issues and Developer responses from events** ```bash # Get ALL TL issues (no limit - filter by group after) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_issues" # Get ALL Developer responses (no limit - filter by group after) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_issue_responses" # Get ALL TL verdicts (single source of truth for rejection acceptance) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_verdicts" # NOTE: Filter events by group_id after retrieval, then get latest iteration per group: # jq '[.[] | select(.group_id == "GROUP_ID")] | sort_by(.timestamp) | last' ``` **Step 2: Compute unresolved blocking issues** For each task group, diff `tl_issues` against Dev responses AND TL verdicts: ```python unresolved_blocking = [] # Get TL's acceptance verdicts from tl_verdicts events (single source of truth) tl_accepted_ids = set() for verdict_event in tl_verdicts_events: for verdict in verdict_event.get("verdicts", []): if verdict.get("verdict") == "ACCEPTED": tl_accepted_ids.add(verdict.get("issue_id")) for issue in tl_issues.issues where issue.blocking == true: response = find(tl_issue_responses.issue_responses, issue.id) if response is None: unresolved_blocking.append(issue) # Not addressed elif response.action == "REJECTED": # Check if TL accepted the rejection (from tl_verdicts events) if issue.id not in tl_accepted_ids: unresolved_blocking.append(issue) # Rejection not yet accepted by TL elif response.action == "FIXED": # Assume fixed (TL will re-flag if not actually fixed) pass ``` **Alternative: If events not found, check handoff files directly:** ```bash # Fallback: Read handoff files (check both simple and parallel mode paths) # Simple mode: cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead.json | jq '.issues[] | select(.blocking == true)' cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation.json | jq '.issue_responses' # Parallel mode (agent-specific files): cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead_{agent_id}.json | jq '.issues[] | select(.blocking == true)' cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation_{agent_id}.json | jq '.issue_responses' ``` **⚠️ Field-level fallbacks for old handoff formats:** ```python # When reading handoff files, handle missing fields gracefully: issues = handoff.get("issues", []) blocking_summary = handoff.get("blocking_summary", {"total_blocking": 0, "fixed": 0}) issue_responses = handoff.get("issue_responses", []) ``` **🔴 CRITICAL: If review occurred but evidence is missing:** ``` # Check if TL review actually occurred by looking for tl_issues events # Note: review_iteration defaults to 1, so checking > 0 is unreliable tl_issues_events = get_events(session_id, "tl_issues", group_id) IF tl_issues_events exist (TL review happened): IF no tl_issue_responses events AND no handoff_implementation.json exists: → Return: REJECT → Reason: "TL raised issues but no Developer responses found for group {group_id}" → Note: This indicates Developer did not address TL feedback ``` This hard failure prevents BAZINGA acceptance when review evidence is missing. **Step 2: Check for any unresolved blocking issues** **IF unresolved blocking issues exist:** ``` → Return: REJECT → Reason: "Unresolved blocking issues from code review" → List all unresolved issues with their IDs, severity, and title ``` **Example rejection:** ```markdown ❌ Blocking Issue Verification: FAIL - Unresolved blocking issues: 2 - TL-AUTH-1-001 (CRITICAL): SQL injection in login query - TL-AUTH-2-003 (HIGH): Missing rate limiting on auth endpoint These issues must be FIXED or have accepted rejections before BAZINGA. ``` **IF no unresolved blocking issues:** ``` → Proceed to Step 6 → Log: "Blocking issue check: PASS (0 unresolved)" ``` **Step 3: Validate rejected issues (if any)** For issues with Developer `action = "REJECTED"`: - Check tl_verdicts events for TL's verdict on this issue_id - Only `ACCEPTED` verdict means TL agreed the fix is unnecessary - `OVERRULED` or no verdict means issue still counts as blocking **Resolution states (based on tl_verdicts events):** | Developer Action | TL Verdict | Final State | Blocks BAZINGA? | |------------------|------------|-------------|-----------------| | `FIXED` | N/A | Resolved | ❌ No | | `REJECTED` | `ACCEPTED` | TL agreed | ❌ No | | `REJECTED` | `OVERRULED` | TL disagreed | ✅ YES | | `REJECTED` | (none yet) | Pending TL review | ✅ YES | | `DEFERRED` | N/A | Deferred (non-blocking only) | ❌ No | | (none) | N/A | Unaddressed | ✅ YES | **Note:** The `rejection_accepted` field in event_tl_issue_responses.schema.json is deprecated. Use tl_verdicts events as the single source of truth for TL decisions. --- ## Step 5.8: SpecKit Task Completion Verification (CONDITIONAL) **Purpose:** When session is in SpecKit mode, verify all pre-planned tasks are completed. **Step 1: Check if SpecKit mode is enabled** ```bash # Query orchestrator state (returns JSON with all fields) ORCH_STATE=$(python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-state "[session_id]" "orchestrator") # Parse speckit_mode from the JSON result # The result is a JSON object: {"speckit_mode": true, "feature_dir": "...", ...} # Use jq or Python to extract: echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('speckit_mode', False))" ``` **IF speckit_mode is NOT true (or state not found):** ``` → Skip entire Step 5.8 → Continue to Step 6 → Log: "SpecKit mode not enabled, skipping task verification" ``` **IF speckit_mode is true:** ``` → Proceed with SpecKit task verification below ``` **Step 2: Query task groups for SpecKit task IDs** ```bash python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-task-groups "[session_id]" ``` Parse each task group for: - `speckit_task_ids`: JSON array of task IDs (e.g., `["T001", "T002", "T003"]`) - `status`: Current group status **Step 3: Collect all SpecKit task IDs** ```python all_task_ids = [] completed_groups = [] incomplete_groups = [] for group in task_groups: task_ids = json.loads(group.get("speckit_task_ids", "[]")) all_task_ids.extend(task_ids) if group.status == "completed": completed_groups.append(group.id) else: incomplete_groups.append({ "group_id": group.id, "status": group.status, "task_ids": task_ids }) ``` **Step 4: Verify tasks.md checkmarks (if feature_dir available)** ```bash # Get feature_dir from orchestrator state (already queried in Step 1) # Parse from ORCH_STATE: echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('feature_dir', ''))" FEATURE_DIR=$(echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('feature_dir', ''))" 2>/dev/null) if [ -n "$FEATURE_DIR" ] && [ -f "$FEATURE_DIR/tasks.md" ]; then # Count unchecked tasks unchecked=$(grep -c "^- \[ \]" "$FEATURE_DIR/tasks.md" || echo 0) checked=$(grep -c "^- \[x\]" "$FEATURE_DIR/tasks.md" || echo 0) echo "Tasks: $checked checked, $unchecked unchecked" fi ``` **Step 5: Validate completion** **IF incomplete_groups exist:** ``` → Return: REJECT → Reason: "SpecKit task groups not completed" → Details: List incomplete groups with their task IDs and current status ``` **Example rejection:** ```markdown ❌ SpecKit Task Verification: FAIL - speckit_mode: true - Total task groups: 3 - Completed groups: 2/3 - Incomplete: - Group US2 (status: qa_review): Tasks T004, T005, T006 All SpecKit task groups must be completed before BAZINGA. ``` **IF tasks.md has unchecked items AND feature_dir is available:** ``` → Return: REJECT (with warning) → Reason: "SpecKit tasks.md has unchecked items" → Note: This is a secondary check - DB status takes precedence → Details: Show unchecked count vs total ``` **IF all task groups completed:** ``` → Proceed to Step 6 → Log: "SpecKit task verification: PASS ({n} tasks across {m} groups completed)" ``` **⚠️ Graceful degradation:** - If `speckit_task_ids` is NULL/empty for all groups: Log warning but don't fail - If `feature_dir` not in state: Skip tasks.md check, rely on DB status only - This handles sessions that upgraded to speckit_mode mid-workflow --- ## Step 6: Calculate Completion & Return Verdict ``` met_count = count(criteria where status="met" AND verified=true) blocked_count = count(criteria where status="blocked" AND external=true) total_count = count(criteria where required_for_completion=true) completion_percentage = (met_count / total_count) * 100 ``` --- ## Verdict Decision Tree ``` IF missing_review_data_for_reviewed_groups: → Return: REJECT → Reason: "Cannot verify blocking issues - missing review data" → Detection: If tl_issues events exist for a group (TL flagged issues) but no corresponding tl_issue_responses events or implementation handoff exists → review data is incomplete ELSE IF unresolved_blocking_issues > 0: → Return: REJECT → Reason: "Unresolved blocking issues from code review" → Note: CRITICAL/HIGH issues must be FIXED or have accepted rejection ELSE IF speckit_mode AND incomplete_task_groups > 0: → Return: REJECT → Reason: "SpecKit task groups not completed" → Note: All task groups must reach 'completed' status before BAZINGA ELSE IF all verifications passed AND met_count == total_count: → Return: ACCEPT → Path: A (Full achievement) ELSE IF all verifications passed AND met_count + blocked_count == total_count: → Return: ACCEPT (with caveat) → Path: B (Partial with external blockers) ELSE IF test_failures_found: → Return: REJECT → Reason: "Independent verification: {failure_count} test failures found" → Note: This only applies if test criteria exist (Step 2.1) ELSE IF evidence_mismatch: → Return: REJECT → Reason: "Evidence doesn't match claimed value" ELSE IF vague_criteria: → Return: REJECT → Reason: "Criterion '{criterion}' is not measurable" ELSE: → Return: REJECT → Reason: "Incomplete: {list incomplete criteria}" ``` **Important:** If no test-related criteria exist, the validator skips Step 2 entirely. The decision tree proceeds based on other evidence (Step 3) only. --- ## Response Format **Structure your response for orchestrator parsing:** ```markdown ## BAZINGA Validation Result **Verdict:** ACCEPT | REJECT | CLARIFY **Path:** A | B | C **Completion:** X/Y criteria met (Z%) ### Verification Details ✅ Test Verification: PASS | FAIL - Command: {test_command} - Total tests: {total} - Passing: {passing} - Failing: {failing} ✅ Evidence Verification: {passed}/{total} - Criterion 1: ✅ PASS ({actual} vs {target}) - Criterion 2: ❌ FAIL (evidence mismatch) ✅ Blocking Issue Verification: PASS | FAIL - Unresolved blocking issues: {count} - {issue_id} ({severity}): {title} ✅ SpecKit Task Verification: PASS | SKIP | FAIL - speckit_mode: {true|false} - Task groups: {completed}/{total} - Task IDs tracked: {count} ### Reason {Detailed explanation of verdict} ### Recommended Action {What PM or orchestrator should do next} ``` --- ## Example: ACCEPT Verdict ```markdown ## BAZINGA Validation Result **Verdict:** ACCEPT **Path:** A (Full achievement) **Completion:** 3/3 criteria met (100%) ### Verification Details ✅ Test Verification: PASS - Command: npm test - Total tests: 1229 - Passing: 1229 - Failing: 0 ✅ Evidence Verification: 3/3 - ALL tests passing: ✅ PASS (0 failures verified) - Coverage >70%: ✅ PASS (88.8% > 70%) - Build succeeds: ✅ PASS (verified successful) ### Reason Independent verification confirms all criteria met with concrete evidence. Test suite executed successfully with 0 failures. ### Recommended Action Accept BAZINGA and proceed to shutdown protocol. ``` --- ## Example: REJECT Verdict ```markdown ## BAZINGA Validation Result **Verdict:** REJECT **Path:** C (Work incomplete - fixable gaps) **Completion:** 1/2 criteria met (50%) ### Verification Details ❌ Test Verification: FAIL - Command: npm test - Total tests: 1229 - Passing: 854 - Failing: 375 ✅ Evidence Verification: 1/2 - Coverage >70%: ✅ PASS (88.8% > 70%) - ALL tests passing: ❌ FAIL (PM claimed 0, found 375) ### Reason PM claimed "ALL tests passing" but independent verification found 375 test failures (69.5% pass rate). This contradicts PM's claim. Failures breakdown: - Backend: 77 failures - Mobile: 298 failures These are fixable via Path C (spawn developers). ### Recommended Action REJECT BAZINGA. Spawn PM with instruction: "375 tests still failing. Continue fixing until failure count = 0." ``` --- ## Example: ACCEPT Verdict (No Test Criteria) ```markdown ## BAZINGA Validation Result **Verdict:** ACCEPT **Path:** A (Full achievement) **Completion:** 2/2 criteria met (100%) ### Verification Details ⏭️ Test Verification: SKIPPED - No test-related criteria detected - Tests not part of requirements ✅ Evidence Verification: 2/2 - Dark mode toggle working: ✅ PASS (verified in UI) - Settings page updated: ✅ PASS (component added) ### Reason No test requirements specified. Independent verification confirms all specified criteria met with concrete evidence. ### Recommended Action Accept BAZINGA and proceed to shutdown protocol. ``` --- ## Error Handling **Database query fails:** ``` → Return: CLARIFY → Reason: "Cannot retrieve success criteria from database" ``` **Test command fails (timeout):** ``` → Return: REJECT → Reason: "Cannot verify test status (timeout after {TIMEOUT}s)" → Action: "Provide recent test output file OR increase test_timeout_seconds in .claude/skills/bazinga-validator/resources/validator_config.json" ``` **Evidence file missing:** ``` → Return: REJECT → Reason: "Evidence file '{path}' not found" → Action: "Provide valid evidence path or re-run tests/coverage" ``` --- ## Critical Reminders 1. **Be skeptical** - Assume PM wrong until proven right 2. **Run tests yourself** - Don't trust PM's status updates 3. **Zero tolerance for test failures** - Even 1 failure = REJECT 4. **Zero tolerance for blocking issues** - CRITICAL/HIGH issues must be resolved 5. **Verify evidence** - Don't accept claims without proof 6. **Structured response** - Orchestrator parses your verdict 7. **Timeout protection** - Use configurable timeout (default 60s, see .claude/skills/bazinga-validator/resources/validator_config.json) 8. **Clear reasoning** - Explain WHY you accepted or rejected 9. **SpecKit completion** - If speckit_mode=true, ALL task groups must be completed --- **Golden Rule:** "The user expects 100% accuracy when BAZINGA is accepted. Be thorough."