--- name: skills-eval description: Evaluate and improve Claude skill quality through auditing. Use when reviewing skill quality, preparing skills for production, or auditing existing skills. Do not use when creating new skills (use modular-skills) or writing prose (use writing-clearly-and-concisely). Use this skill before shipping any skill to production. category: skill-management tags: - evaluation - improvement - skills - optimization - quality-assurance - tool-use - performance-metrics dependencies: - modular-skills - performance-optimization tools: - skills-auditor - improvement-suggester - compliance-checker - tool-performance-analyzer - token-usage-tracker provides: infrastructure: - evaluation-framework - quality-assurance - improvement-planning patterns: - skill-analysis - token-optimization - modular-design sdk_features: - agent-sdk-compatibility - advanced-metrics - dynamic-discovery estimated_tokens: 1800 usage_patterns: - skill-audit - quality-assessment - improvement-planning - skills-inventory - tool-performance-evaluation - dynamic-discovery-optimization - advanced-tool-use-analysis - programmatic-calling-efficiency - context-preservation-quality - token-efficiency-optimization - modular-architecture-validation - integration-testing - compliance-reporting - performance-benchmarking complexity: advanced evaluation_criteria: structure_compliance: 25 metadata_quality: 20 token_efficiency: 25 tool_integration: 20 claude_sdk_compliance: 10 --- # Skills Evaluation and Improvement ## Table of Contents 1. [Overview](#overview) 2. [Quick Start](#quick-start) 3. [Evaluation Workflow](#evaluation-workflow) 4. [Evaluation and Optimization](#evaluation-and-optimization) 5. [Resources](#resources) ## Overview This framework audits Claude skills against quality standards to improve performance and reduce token consumption. Automated tools analyze skill structure, measure context usage, and identify specific technical improvements. Run verification commands after each audit to confirm fixes work correctly. The `skills-auditor` provides structural analysis, while the `improvement-suggester` ranks fixes by impact. Compliance is verified through the `compliance-checker`. Runtime efficiency is monitored by `tool-performance-analyzer` and `token-usage-tracker`. ## Quick Start ### Basic Audit Run a full audit of all skills or target a specific file to identify structural issues. ```bash # Audit all skills make audit-all # Audit specific skill make audit-skill TARGET=path/to/skill/SKILL.md ``` ### Analysis and Optimization Use `skill_analyzer.py` for complexity checks and `token_estimator.py` to verify the context budget. ```bash make analyze-skill TARGET=path/to/skill/SKILL.md make estimate-tokens TARGET=path/to/skill/SKILL.md ``` ### Improvements Generate a prioritized plan and verify standards compliance using `improvement_suggester.py` and `compliance_checker.py`. ```bash make improve-skill TARGET=path/to/skill/SKILL.md make check-compliance TARGET=path/to/skill/SKILL.md ``` ## Evaluation Workflow Start with `make audit-all` to inventory skills and identify high-priority targets. For each skill requiring attention, run analysis with `analyze-skill` to map complexity. Generate an improvement plan, apply fixes, and run `check-compliance` to verify the skill meets project standards. Finalize by checking the token budget for efficiency. ## Evaluation and Optimization Quality assessments use the `skills-auditor` and `improvement-suggester` to generate detailed reports. Performance analysis focuses on token efficiency through the `token-usage-tracker` and tool performance via `tool-performance-analyzer`. For standards compliance, the `compliance-checker` automates common fixes for structural issues. ### Scoring and Prioritization We evaluate skills across five dimensions: structure compliance, content quality, token efficiency, activation reliability, and tool integration. Scores above 90 represent production-ready skills, while scores below 50 indicate critical issues requiring immediate attention. Improvements are prioritized by impact. Critical issues include security vulnerabilities or broken functionality. High-priority items cover structural flaws that hinder discoverability. Medium and low priorities focus on best practices and minor optimizations. ### Structural Patterns **Deprecated**: `skills/shared/modules/` directories. Shared modules must be relocated into the consuming skill's own `modules/` directory. The evaluator flags any remaining `skills/shared/` as a structural warning. **Current**: Each skill owns its modules at `skills//modules/`. Cross-skill references use relative paths (e.g., `../skill-authoring/modules/anti-rationalization.md`). ## Resources ### Shared Modules: Cross-Skill Patterns - **Anti-Rationalization Patterns**: See [anti-rationalization.md](../skill-authoring/modules/anti-rationalization.md) - **Enforcement Language**: See [enforcement-language.md](../shared-patterns/modules/workflow-patterns.md) - **Trigger Patterns**: See [trigger-patterns.md](modules/evaluation-criteria.md) ### Skill-Specific Modules - **Trigger Isolation Analysis**: See `modules/trigger-isolation-analysis.md` - **Skill Authoring Best Practices**: See `modules/skill-authoring-best-practices.md` - **Authoring Checklist**: See `modules/authoring-checklist.md` - **Evaluation Workflows**: See `modules/evaluation-workflows.md` - **Quality Metrics**: See `modules/quality-metrics.md` - **Advanced Tool Use Analysis**: See `modules/advanced-tool-use-analysis.md` - **Evaluation Framework**: See `modules/evaluation-framework.md` - **Integration Patterns**: See `modules/integration.md` - **Troubleshooting**: See `modules/troubleshooting.md` - **Pressure Testing**: See `modules/pressure-testing.md` - **Integration Testing**: See `modules/integration-testing.md` - **Multi-Metric Evaluation**: See `modules/multi-metric-evaluation-methodology.md` - **Performance Benchmarking**: See `modules/performance-benchmarking.md` ### Tools and Automation - **Tools**: Executable analysis utilities in `scripts/` directory. - **Automation**: Setup and validation scripts in `scripts/automation/`.