--- name: agent-performance description: Track and report agent invocation metrics including usage counts, success/failure rates, and completion times. Use for understanding which agents are utilized, identifying underused agents, and optimizing agent delegation patterns. source_urls: - https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices --- # Agent Performance Dashboard ## Purpose Provides visibility into agent usage patterns to optimize delegation and identify improvement opportunities. ## When I Activate I automatically load when you mention: - "agent performance" or "agent metrics" - "agent dashboard" or "agent usage" - "which agents are used" or "underutilized agents" - "agent success rate" or "agent statistics" ## What I Do 1. **Track Invocations**: Record agent usage via workflow tracker 2. **Measure Success**: Track completion rates per agent 3. **Analyze Patterns**: Identify usage trends and gaps 4. **Generate Reports**: Create actionable dashboards ## Quick Start ``` User: "Show me agent performance metrics" Skill: *activates automatically* "Generating agent performance report..." ``` ## Core Capabilities ### 1. Report Generation Generate a performance report by reading workflow logs and aggregating agent metrics: ``` User: "Generate agent performance report" ``` Report includes: - Invocation counts per agent - Success/failure rates - Average completion times (when tracked) - Underutilized agents list - Recommendations for optimization ### 2. Live Tracking Track agent invocations during workflow execution using the existing `workflow_tracker`: ```python # Already available in .claude/tools/amplihack/hooks/workflow_tracker.py from workflow_tracker import log_agent_invocation log_agent_invocation( agent_name="architect", purpose="Design authentication module", step_number=2 ) ``` ### 3. Metrics Storage Metrics are stored in: - **Raw logs**: `.claude/runtime/logs/workflow_adherence/workflow_execution.jsonl` - **Aggregated**: `.claude/runtime/metrics/agent_performance.yaml` ## Report Format ### Summary Dashboard ```yaml # Agent Performance Summary # Generated: 2025-11-25 total_invocations: 142 agents: architect: invocations: 45 success_rate: 95.6% avg_duration_ms: 2340 trend: increasing builder: invocations: 38 success_rate: 89.5% avg_duration_ms: 4520 trend: stable reviewer: invocations: 25 success_rate: 100% avg_duration_ms: 1890 trend: increasing underutilized: - database (0 invocations in last 30 days) - integration (2 invocations in last 30 days) - patterns (3 invocations in last 30 days) recommendations: - Consider using database agent for schema work - Integration agent available for external service connections - Patterns agent can identify reusable solutions ``` ## Implementation Guide ### To Generate a Report 1. Read workflow execution logs: ``` Read: .claude/runtime/logs/workflow_adherence/workflow_execution.jsonl ``` 2. Filter for `agent_invoked` events: ```json { "event": "agent_invoked", "agent": "architect", "purpose": "...", "step": 2 } ``` 3. Aggregate by agent name: - Count invocations - Calculate success rates from workflow_end events - Compute average durations 4. Identify underutilized agents: - List all available agents from `.claude/agents/amplihack/` - Compare against invocation counts - Flag agents with <5 invocations in analysis period 5. Write report to: ``` .claude/runtime/metrics/agent_performance.yaml ``` ### Available Agents Inventory **Core Agents** (6): - architect, builder, reviewer, tester, optimizer, api-designer **Specialized Agents** (25): - ambiguity, amplifier-cli-architect, analyzer, azure-kubernetes-expert - ci-diagnostic-workflow, cleanup, database, documentation-writer - fallback-cascade, fix-agent, integration, knowledge-archaeologist - memory-manager, multi-agent-debate, n-version-validator, patterns - philosophy-guardian, pre-commit-diagnostic, preference-reviewer - prompt-writer, rust-programming-expert, security, visualization-architect - worktree-manager, xpia-defense **Note**: Agent count may change as specialized agents are added/removed. Use `ls .claude/agents/amplihack/specialized/` for current count. ## Tracking Best Practices ### When Invoking Agents Always log invocations for accurate tracking: ```python # Before invoking an agent via Task tool log_agent_invocation( agent_name="security", purpose="Audit authentication implementation", step_number=7 # Optional: link to workflow step ) # Then invoke the agent Task(subagent_type="security", prompt="...") ``` ### Workflow Integration The DEFAULT_WORKFLOW.md specifies agent delegation at each step. This skill helps verify adherence: - Step 1: prompt-writer - Step 2: architect - Step 3: builder - Step 4: tester - Step 5: reviewer - etc. ## Configuration | Setting | Default | Description | | ------------------------- | ------------------------ | -------------------------------------- | | `ANALYSIS_DAYS` | 30 | Days of history to analyze | | `UNDERUTILIZED_THRESHOLD` | 5 | Invocations below this = underutilized | | `METRICS_FILE` | `agent_performance.yaml` | Output file name | ## Philosophy Alignment This skill follows: - **Ruthless Simplicity**: Uses existing infrastructure (workflow_tracker) - **Zero-BS**: No placeholders, working aggregation logic - **Modular Design**: Self-contained skill, clear boundaries - **Emergence**: Insights emerge from simple tracking patterns ## Interpreting Metrics ### Success Rate Guidelines | Rate | Assessment | Action | | --------- | --------------- | ------------------------------------------ | | 95-100% | Excellent | Maintain current patterns | | 85-94% | Good | Review occasional failures for patterns | | 70-84% | Needs Attention | Investigate failure causes, adjust prompts | | Below 70% | Critical | Agent may need redesign or prompt overhaul | ### Invocation Volume Interpretation - **High volume (30+ in 30 days)**: Core workflow agent, ensure reliability - **Medium volume (10-29)**: Regular use, monitor for optimization opportunities - **Low volume (5-9)**: Specialized use case, verify still needed - **Very low (<5)**: Consider if agent is discoverable or relevant ### Duration Benchmarks - **< 2 seconds**: Fast execution, typical for simple analysis - **2-10 seconds**: Normal for moderate complexity - **10-60 seconds**: Expected for deep analysis or multi-step tasks - **> 60 seconds**: May indicate inefficiency, consider optimization ## Empty State Handling When no log data exists (new project or logs cleared): ```yaml # Agent Performance Report # Period: Last 30 days # Status: No data available summary: total_invocations: 0 message: "No agent invocations logged yet" getting_started: - "Agent tracking begins when workflow_tracker logs invocations" - "Ensure agents are invoked via Task tool with proper logging" - "First report available after initial workflow execution" next_steps: - "Run a workflow task to generate initial data" - "Verify workflow_tracker is properly configured" - "Check .claude/runtime/logs/ directory exists" ``` ## Limitations This skill has the following constraints: 1. **Depends on workflow_tracker**: Only tracks agents invoked through the logging system 2. **No real-time metrics**: Reports are generated on-demand, not streamed 3. **Historical data only**: Cannot predict future usage patterns 4. **Manual log analysis**: Does not auto-detect anomalies or alert on issues 5. **Single-project scope**: Metrics are per-project, no cross-project aggregation 6. **Time-based only**: No correlation with code quality or PR outcomes