--- name: performance-metrics description: Log task completion data to metrics/. Use at the end of every task to record tokens, cost, agents used, rework cycles, and hallucination events. Also use for periodic reporting to identify efficiency and quality trends. role: worker user-invocable: true --- # Performance Metrics ## Overview Schema and procedures for capturing performance data in `metrics/`. Metrics enable evidence-based evaluation of agent effectiveness, cost efficiency, and quality outcomes. ## Constraints - Never log credentials, API keys, or PII in metric entries - Log entries are append-only; do not modify or delete existing JSONL records - Log at task completion, not mid-task; mid-task state belongs in `memory/` progress files - Use the defined JSONL schema; do not invent new top-level fields without updating the reference ## Metric Categories ### Efficiency Metrics | Metric | Description | Target | | --- | --- | --- | | Task completion time | Wall-clock time from request to delivery | Track trend, no fixed target | | Token usage per task | Total input + output tokens consumed | Minimize for comparable quality | | Agent loading overhead | Tokens spent on agent/skill file reads | < 5% of total task tokens | | Context summarization frequency | How often summarization triggers per task | < 2 per standard task | ### Quality Metrics | Metric | Description | Target | | --- | --- | --- | | First-pass acceptance rate | Tasks accepted without rework | > 80% | | Rework count | Number of revision cycles per task | < 2 | | Hallucination incidents | Outputs containing fabricated information | < 5% of tasks | | Accuracy score | Correctness of structured data extraction | > 95% | | Test coverage | Percentage of code covered by generated tests | Track per project | ### Cost Metrics | Metric | Description | Target | | --- | --- | --- | | Cost per task | Total API cost (input + output tokens at rate) | Track trend | | LLM routing ratio | Percentage of tasks routed to each LLM | Track distribution | | Selective loading savings | Tokens saved vs. loading all agents | > 50% reduction | ## Log Format Metrics are stored in `metrics/` as JSONL files (one JSON object per line). ### File Naming ``` metrics/{date}-task-log.jsonl ``` Example: `metrics/2026-02-20-task-log.jsonl` ### Task Completion Entry Logged at the end of each task: ```json { "timestamp": "2026-02-20T14:30:00Z", "task_id": "unique-id", "task_type": "implementation", "task_description": "Build REST API for user authentication", "agents_used": ["software-engineer", "architect"], "skills_used": ["hexagonal-architecture"], "tokens": { "input": 12500, "output": 3200, "total": 15700 }, "cost_usd": 0.043, "llm": "claude-opus-4-6", "context_summarizations": 0, "phases": 2, "rework_cycles": 1, "accepted": true, "hallucination_detected": false, "duration_seconds": 180 } ``` ### Field Reference | Field | Type | Description | | --- | --- | --- | | `timestamp` | string | ISO 8601 completion time | | `task_id` | string | Unique identifier for this task | | `task_type` | string | `implementation`, `design`, `bugfix`, `testing`, `documentation`, `analysis` | | `task_description` | string | Brief description of the task | | `agents_used` | string[] | Agent names that were loaded | | `skills_used` | string[] | Skill names that were loaded | | `tokens.input` | number | Input tokens from API usage field | | `tokens.output` | number | Output tokens from API usage field | | `tokens.total` | number | Sum of input + output | | `cost_usd` | number | Estimated cost based on token rates | | `llm` | string | Model ID used | | `context_summarizations` | number | Times summarization was triggered | | `phases` | number | Number of loading phases | | `rework_cycles` | number | Number of revision cycles | | `accepted` | boolean | Whether the user accepted the output | | `hallucination_detected` | boolean | Whether a hallucination was flagged | | `duration_seconds` | number | Wall-clock seconds from start to delivery | ## When to Log | Event | Action | | --- | --- | | Task completed | Log full task completion entry | | Configuration change | Log in `metrics/config-changelog.jsonl` (see Feedback & Learning skill) | | Hallucination detected | Flag in task entry + log separately if correction applied | | Context summarization triggered | Increment counter in current task entry | ## Output JSONL log entries written to `metrics/` and/or a summary report of metric trends. Be concise — report anomalies and trend signals; omit entries within normal range. ## Reporting Periodically review metrics to identify patterns: 1. **Weekly**: Review task completion entries for rework trends and hallucination rate 2. **Monthly**: Aggregate cost metrics and LLM routing distribution 3. **Per-project**: Compare first-pass acceptance rate across task types Summaries can be written to `metrics/reports/` for historical reference.