--- name: task-quality-kpi description: "Objective task quality evaluation framework using quantitative KPIs. KPIs are automatically calculated by a hook when task files are modified and saved to TASK-XXX--kpi.json. Use when: reading KPI data for task evaluation, understanding quality metrics, deciding whether to iterate or approve based on data." allowed-tools: Read, Write --- # Task Quality KPI Framework ## Overview The **Task Quality KPI Framework** provides **objective, quantitative metrics** for evaluating task implementation quality. **Key Architecture**: KPIs are **auto-generated by a hook** - you read the results, not run scripts. ``` ┌─────────────────────────────────────────────────────────────┐ │ HOOK (auto-executes) │ │ Trigger: PostToolUse on TASK-*.md │ │ Script: task-kpi-analyzer.py │ │ Output: TASK-XXX--kpi.json │ ├─────────────────────────────────────────────────────────────┤ │ SKILL / AGENT (reads output) │ │ Input: TASK-XXX--kpi.json │ │ Action: Make evaluation decisions │ └─────────────────────────────────────────────────────────────┘ ``` ### Why This Architecture? | Problem | Solution | |---------|----------| | Skills can't execute scripts | Hook auto-runs on file save | | Subjective review_status | Quantitative 0-10 scores | | "Looks good to me" | Evidence-based evaluation | | Binary pass/fail | Graduated quality levels | ## KPI File Location After any task file modification, find KPI data at: ``` docs/specs/[ID]/tasks/TASK-XXX--kpi.json ``` ## KPI Categories ``` ┌─────────────────────────────────────────────────────────────┐ │ OVERALL SCORE (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Spec Compliance (30%) │ │ ├── Acceptance Criteria Met (0-10) │ │ ├── Requirements Coverage (0-10) │ │ └── No Scope Creep (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Code Quality (25%) │ │ ├── Static Analysis (0-10) │ │ ├── Complexity (0-10) │ │ └── Patterns Alignment (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Test Coverage (25%) │ │ ├── Unit Tests Present (0-10) │ │ ├── Test/Code Ratio (0-10) │ │ └── Coverage Percentage (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Contract Fulfillment (20%) │ │ ├── Provides Verified (0-10) │ │ └── Expects Satisfied (0-10) │ └─────────────────────────────────────────────────────────────┘ ``` ### Category Weights | Category | Weight | Why | |----------|--------|-----| | Spec Compliance | 30% | Most important - did we build what was asked? | | Code Quality | 25% | Technical excellence | | Test Coverage | 25% | Verification and confidence | | Contract Fulfillment | 20% | Integration with other tasks | ## When to Use - Reading KPI data for task quality evaluation - Understanding quality metrics and scoring breakdown - Deciding whether to iterate or approve based on quantitative data - Integrating KPI checks into automated loops (`agents_loop.py`) - Generating evidence-based evaluation reports ## Instructions ### 1. Reading KPI Data (Primary Use) **DO NOT run scripts** - read the auto-generated file: ```markdown Read the KPI file: docs/specs/001-feature/tasks/TASK-001--kpi.json ``` ### 2. Understanding the Data The KPI file contains: ```json { "task_id": "TASK-001", "evaluated_at": "2026-01-15T10:30:00Z", "overall_score": 8.2, "passed_threshold": true, "threshold": 7.5, "kpi_scores": [ { "category": "Spec Compliance", "weight": 30, "score": 8.5, "weighted_score": 2.55, "metrics": { "acceptance_criteria_met": 9.0, "requirements_coverage": 8.0, "no_scope_creep": 8.5 }, "evidence": [ "Acceptance criteria: 9/10 checked", "Requirements coverage: 8/10" ] } ], "recommendations": [ "Code Quality: Moderate improvements possible" ], "summary": "Score: 8.2/10 - PASSED" } ``` ### 3. Making Decisions Use `overall_score` and `passed_threshold`: ``` IF passed_threshold == true: → Task meets quality standards → Approve and proceed IF passed_threshold == false: → Task needs improvement → Check recommendations for specific targets → Create fix specification ``` ## Integration with Workflow ### In Task Review (evaluator-agent) ```markdown ## Review Process 1. Read KPI file: TASK-XXX--kpi.json 2. Extract overall_score and kpi_scores 3. Read task file to validate 4. Generate evaluation report 5. Decision based on passed_threshold ``` ### In agents_loop ```python # Check KPI file exists kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json" if kpi_path.exists(): kpi_data = json.loads(kpi_path.read_text()) if kpi_data["passed_threshold"]: # Quality threshold met advance_state("update_done") else: # Need more work fix_targets = kpi_data["recommendations"] create_fix_task(fix_targets) advance_state("fix") else: # KPI not generated yet - task may not be implemented log_warning("No KPI data found") ``` ### Multi-Iteration Loop Instead of max 3 retries, iterate until quality threshold met: ``` Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions Iteration 3: Score 7.8 → PASSED → Proceed ``` Each iteration updates the KPI file automatically on task save. ## Threshold Guidelines | Score | Quality Level | Action | |-------|---------------|--------| | 9.0-10.0 | Exceptional | Approve, document best practices | | 8.0-8.9 | Good | Approve with minor notes | | 7.0-7.9 | Acceptable | Approve (if threshold 7.5) | | 6.0-6.9 | Below Standard | Request specific improvements | | < 6.0 | Poor | Significant rework required | ### Recommended Thresholds | Project Type | Threshold | Rationale | |--------------|-----------|-----------| | Production MVP | 8.0 | High quality required | | Internal Tool | 7.0 | Good enough | | Prototype | 6.0 | Functional over perfect | | Critical System | 8.5 | No compromises | ## Metric Details ### Spec Compliance Metrics **Acceptance Criteria Met** - Calculates: `(checked_criteria / total_criteria) * 10` - Source: Task file checkbox count - Example: 9/10 checked = 9.0 **Requirements Coverage** - Calculates: Count of REQ-IDs this task covers - Source: `traceability-matrix.md` - Example: 4 requirements covered = 8.0 **No Scope Creep** - Calculates: `(implemented_files / expected_files) * 10` - Source: Task "Files to Create" vs actual files - Penalizes: Missing files or unexpected additions ### Code Quality Metrics **Static Analysis** - Java: Maven Checkstyle - TypeScript: ESLint - Python: ruff - Score: 10 if passes, 5 if issues found **Complexity** - Calculates: Functions >50 lines - Score: `10 - (long_functions_ratio * 5)` - Penalizes: Large, complex functions **Patterns Alignment** - Checks: Knowledge Graph patterns - Source: `knowledge-graph.json` - Validates: Implementation follows project patterns ### Test Coverage Metrics **Unit Tests Present** - Calculates: `min(10, test_files * 5)` - 2 test files = maximum score - Penalizes: Missing tests **Test/Code Ratio** - Calculates: `(test_count / code_count) * 10` - 1:1 ratio = 10/10 - Ideal: At least 1 test file per code file **Coverage Percentage** - Source: Coverage reports (JaCoCo, lcov, etc.) - Calculates: `coverage_percent / 10` - 80% coverage = 8.0 ### Contract Fulfillment Metrics **Provides Verified** - Checks: Files exist and export expected symbols - Source: Task `provides` frontmatter - Validates: Contract satisfied **Expects Satisfied** - Checks: Dependencies provide required files/symbols - Source: Task `expects` frontmatter - Validates: Prerequisites met ## When KPI File is Missing If `TASK-XXX--kpi.json` doesn't exist: 1. **Task was never modified** - Hook runs on file save 2. **Hook failed** - Check Claude Code logs 3. **Task is new** - Save the file first to trigger hook **DO NOT** try to calculate KPIs manually. The hook runs automatically when: - Task file is saved (Write tool) - Task file is edited (Edit tool) ## Best Practices ### 1. Always Check KPI File Exists Before evaluating: ```markdown Check if KPI file exists: docs/specs/[ID]/tasks/TASK-XXX--kpi.json If missing: - Task may not be implemented yet - Ask user to save the task file first ``` ### 2. Trust the Metrics The KPIs are objective. Only override with documented evidence: - Critical security issue not in metrics - Logic error not caught by static analysis - Exceptional quality not measured ### 3. Iterate on Low KPIs Target specific categories: ``` ❌ "Fix code quality issues" ✅ "Improve Code Quality KPI from 5.2 to 7.0: - Complexity: Refactor processData() (5→8) - Patterns: Add error handling (6→8)" ``` ### 4. Track KPI Trends Monitor quality over time: ``` Sprint 1: Average KPI 6.8 Sprint 2: Average KPI 7.3 (+0.5) Sprint 3: Average KPI 7.9 (+0.6) ``` ## Troubleshooting ### KPI File Not Generated **Check:** 1. Hook enabled in `hooks.json` 2. Task file name matches pattern `TASK-*.md` 3. File was actually saved (not just viewed) ### KPI Scores Seem Wrong **Validate:** 1. Check evidence field for data sources 2. Verify files exist at expected paths 3. Some metrics need build tools (Maven, npm) ### Low Scores Despite Good Code **Possible causes:** - Missing test files - No coverage report generated - Acceptance criteria not checked - Lint rules too strict Fix the root cause, not just the score. ## Examples ### Example 1: Reading KPI Data ```markdown Read the KPI file to evaluate task quality: docs/specs/001-feature/tasks/TASK-042--kpi.json Based on the data: - Overall score: 6.8/10 (below threshold) - Lowest KPI: Test Coverage (5.0/10) - Recommendation: Add unit tests Decision: REQUEST FIXES - target Test Coverage improvement ``` ### Example 2: Iteration Decision ```markdown Iteration 1 KPI: Score 6.2 → FAILED - Spec Compliance: 7.0 ✓ - Code Quality: 5.5 ✗ - Test Coverage: 6.0 ✗ Fix targets: 1. Refactor complex functions (Code Quality) 2. Add test coverage (Test Coverage) Iteration 2 KPI: Score 7.8 → PASSED ✓ ``` ### Example 3: agents_loop Integration ```python # In agents_loop, after implementation step kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json" if kpi_file.exists(): kpi = json.loads(kpi_file.read_text()) if kpi["passed_threshold"]: print(f"✅ Task passed quality check: {kpi['overall_score']}/10") advance_state("update_done") else: print(f"❌ Task failed quality check: {kpi['overall_score']}/10") print("Recommendations:") for rec in kpi["recommendations"]: print(f" - {rec}") advance_state("fix") ``` ## References - `evaluator-agent.md` - Agent that uses KPI data for evaluation - `hooks.json` - Hook configuration for auto-generation - `task-kpi-analyzer.py` - Hook script (do not execute directly) - `agents_loop.py` - Orchestrator that reads KPI for decisions