--- name: skill-validator description: | Validates skills against production-level criteria with 9-category scoring. This skill should be used when reviewing, auditing, or improving skills to ensure quality standards. Evaluates structure, content, user interaction, documentation, domain standards, technical robustness, maintainability, zero-shot implementation, and reusability. Returns actionable validation report with scores and improvement recommendations. --- # Skill Validator Validate any skill against production-level quality criteria. ## Before Implementation | Source | Gather | |--------|--------| | **Skill Directory** | SKILL.md, references/, scripts/, assets/ | | **Skill Type** | Builder, Guide, Automation, Analyzer, or Validator | | **Conversation** | Validation purpose (audit, improvement, review) | ## What This Skill Does NOT Do - Test skills in production environments - Automatically fix identified issues - Validate skill runtime behavior (only structure/content) - Replace human judgment on domain accuracy ## Validation Workflow ### Phase 1: Gather Context 1. **Read the skill's SKILL.md** completely 2. **Identify skill type** from frontmatter description: - Builder skill (creates artifacts) - Guide skill (provides instructions) - Automation skill (executes workflows) - Analyzer skill (extracts insights) - Validator skill (enforces quality) - Hybrid skill (combination of above) 3. **Read all reference files** in `references/` directory 4. **Check for assets/scripts** directories 5. **Note frontmatter fields** (`name`, `description`, `allowed-tools`, `model`) ### Phase 2: Apply Criteria Evaluate against **9 criteria categories**. Each criterion scores 0-3: - **0**: Missing/Absent - **1**: Present but inadequate - **2**: Adequate implementation - **3**: Excellent implementation --- ## Criteria Categories ### 1. Structure & Anatomy (Weight: 12%) | Criterion | What to Check | |-----------|---------------| | **SKILL.md exists** | Root file present | | **Line count** | <500 lines (context is precious) | | **Frontmatter complete** | `name` and `description` present in YAML | | **Name constraints** | 1-64 chars; lowercase alphanumeric + hyphens; no consecutive hyphens; can't start/end with hyphen; must match directory name | | **Description format** | [What] + [When] format; ≤1024 chars | | **Description style** | Third-person: "This skill should be used when..." | | **No extraneous files** | No README.md, CHANGELOG.md, LICENSE in skill dir | | **Progressive disclosure** | Details in `references/`, not bloated SKILL.md | | **Asset organization** | Templates in `assets/`, scripts in `scripts/` | | **Large file guidance** | If references >10k words, grep patterns in SKILL.md | **Fail condition**: Missing SKILL.md or >800 lines = automatic fail ### 2. Content Quality (Weight: 15%) | Criterion | What to Check | |-----------|---------------| | **Conciseness** | No verbose explanations, context is public good | | **Imperative form** | Instructions use "Do X" not "You should do X" | | **Appropriate freedom** | Constraints where needed, flexibility where safe | | **Scope clarity** | Clear what skill does AND does not do | | **No hallucination risk** | No instructions that encourage making up info | | **Output specification** | Clear expected outputs defined | ### 3. User Interaction (Weight: 12%) | Criterion | What to Check | |-----------|---------------| | **Clarification triggers** | Asks questions before acting on ambiguity | | **Required vs optional** | Distinguishes must-know from nice-to-know | | **Graceful handling** | What to do when user doesn't answer | | **No over-asking** | Doesn't ask obvious or inferrable questions | | **Question pacing** | Avoids too many questions in single message | | **Context awareness** | Uses available context before asking | **Key pattern to look for**: ```markdown ## Required Clarifications 1. Question about X 2. Question about Y ## Optional Clarifications 3. Question about Z (if relevant) Note: Avoid asking too many questions in a single message. ``` ### 4. Documentation & References (Weight: 10%) | Criterion | What to Check | |-----------|---------------| | **Source URLs** | Official documentation links provided | | **Reference files** | Complex details in `references/` not main file | | **Fetch guidance** | Instructions to fetch docs for unlisted patterns | | **Version awareness** | Notes about checking for latest patterns | | **Example coverage** | Good/bad examples for key patterns | **Key pattern to look for**: ```markdown | Resource | URL | Use For | |----------|-----|---------| | Official Docs | https://... | Complex cases | ``` ### 5. Domain Standards (Weight: 10%) | Criterion | What to Check | |-----------|---------------| | **Best practices** | Follows domain conventions (e.g., WCAG, OWASP) | | **Enforcement mechanism** | Checklists, validation steps, must-verify items | | **Anti-patterns** | Lists what NOT to do | | **Quality gates** | Output checklist before delivery | **Key pattern to look for**: ```markdown ### Must Follow - [ ] Requirement 1 - [ ] Requirement 2 ### Must Avoid - Antipattern 1 - Antipattern 2 ``` ### 6. Technical Robustness (Weight: 8%) | Criterion | What to Check | |-----------|---------------| | **Error handling** | Guidance for failure scenarios | | **Security considerations** | Input validation, secrets handling if relevant | | **Dependencies** | External tools/APIs documented | | **Edge cases** | Common edge cases addressed | | **Testability** | Can outputs be verified? | ### 7. Maintainability (Weight: 8%) | Criterion | What to Check | |-----------|---------------| | **Modularity** | References are self-contained topics | | **Update path** | Easy to update when standards change | | **No hardcoded values** | Uses placeholders/variables where appropriate | | **Clear organization** | Logical section ordering | ### 8. Zero-Shot Implementation (Weight: 12%) Skills should enable single-interaction implementation with embedded expertise. | Criterion | What to Check | |-----------|---------------| | **Before Implementation section** | Context gathering guidance present | | **Codebase context** | Guidance to scan existing structure/patterns | | **Conversation context** | Uses discussed requirements/decisions | | **Embedded expertise** | Domain knowledge in `references/`, not runtime discovery | | **User-only questions** | Only asks for USER requirements, not domain knowledge | **Key pattern to look for**: ```markdown ## Before Implementation Gather context to ensure successful implementation: | Source | Gather | |--------|--------| | **Codebase** | Existing structure, patterns, conventions | | **Conversation** | User's specific requirements | | **Skill References** | Domain patterns from `references/` | | **User Guidelines** | Project-specific conventions | ``` **Red flag**: Skill instructs to "research" or "discover" domain knowledge at runtime instead of embedding it. ### 9. Reusability (Weight: 13%) Skills should handle variations, not single requirements. | Criterion | What to Check | |-----------|---------------| | **Handles variations** | Not hardcoded to single use case | | **Variable elements** | Clarifications capture what VARIES | | **Constant patterns** | Domain best practices encoded as constants | | **Not requirement-specific** | Avoids hardcoded data, tools, configs | | **Abstraction level** | Appropriate generalization for domain | **Good example**: ```markdown "Create visualizations - adaptable to data shape, chart type, library" ``` **Bad example (too specific)**: ```markdown "Create bar chart with sales data using Recharts" ``` **Key check**: Does the skill work for multiple use cases within its domain? --- ## Type-Specific Validation After scoring general criteria, verify type-specific requirements: | Type | Must Have | |------|-----------| | **Builder** | Clarifications, Output Spec, Domain Standards, Output Checklist | | **Guide** | Workflow Steps, Examples (Good/Bad), Official Docs links | | **Automation** | Scripts in `scripts/`, Dependencies, Error Handling, I/O Spec | | **Analyzer** | Analysis Scope, Evaluation Criteria, Output Format, Synthesis | | **Validator** | Quality Criteria, Scoring Rubric, Thresholds, Remediation | **Scoring**: Deduct 10 points if type-specific requirements missing for identified type. --- ## Scoring Guide ### Category Scores Calculate each category score: ``` Category Score = (Sum of criterion scores) / (Max possible) * 100 ``` ### Overall Score ``` Overall = Σ(Category Score × Weight) ``` ### Rating Thresholds | Score | Rating | Meaning | |-------|--------|---------| | 90-100 | **Production** | Ready for wide use | | 75-89 | **Good** | Minor improvements needed | | 60-74 | **Adequate** | Functional but needs work | | 40-59 | **Developing** | Significant gaps | | 0-39 | **Incomplete** | Major rework required | --- ## Output Format Generate validation report: ```markdown # Skill Validation Report: [skill-name] **Rating**: [Production/Good/Adequate/Developing/Incomplete] **Overall Score**: [X]/100 ## Summary [2-3 sentence assessment] ## Category Scores | Category | Score | Weight | Weighted | |----------|-------|--------|----------| | Structure & Anatomy | X/100 | 12% | X | | Content Quality | X/100 | 15% | X | | User Interaction | X/100 | 12% | X | | Documentation | X/100 | 10% | X | | Domain Standards | X/100 | 10% | X | | Technical Robustness | X/100 | 8% | X | | Maintainability | X/100 | 8% | X | | Zero-Shot Implementation | X/100 | 12% | X | | Reusability | X/100 | 13% | X | | **Type-Specific Deduction** | -X | - | -X | ## Critical Issues (if any) - [Issue requiring immediate fix] ## Improvement Recommendations 1. **High Priority**: [Specific action] 2. **Medium Priority**: [Specific action] 3. **Low Priority**: [Specific action] ## Strengths - [What skill does well] ``` --- ## Quick Validation Checklist For rapid assessment, check these critical items: ### Structure & Frontmatter - [ ] SKILL.md <500 lines - [ ] Frontmatter: name (≤64 chars, lowercase, hyphens) + description (≤1024 chars) - [ ] Description uses third-person style ("This skill should be used when...") - [ ] No README.md/CHANGELOG.md in skill directory ### Content & Interaction - [ ] Has clarification questions (Required vs Optional) - [ ] Has output specification - [ ] Has official documentation links ### Zero-Shot & Reusability - [ ] Has "Before Implementation" section (context gathering) - [ ] Domain expertise embedded in `references/` (not runtime discovery) - [ ] Handles variations (not requirement-specific) ### Type-Specific (check based on skill type) - [ ] Builder: Clarifications + Output Spec + Standards + Checklist - [ ] Guide: Workflow + Examples + Docs - [ ] Automation: Scripts + Dependencies + Error Handling - [ ] Analyzer: Scope + Criteria + Output Format - [ ] Validator: Criteria + Scoring + Thresholds + Remediation **If 10+ checked**: Likely Production (90+) **If 7-9 checked**: Likely Good (75-89) **If 5-6 checked**: Likely Adequate (60-74) **If <5 checked**: Needs significant work --- ## Reference Files | File | When to Read | |------|--------------| | `references/detailed-criteria.md` | Deep evaluation of specific criterion | | `references/scoring-examples.md` | Example validations for calibration | | `references/improvement-patterns.md` | Common fixes for common issues | --- ## Usage Examples ### Validate a skill ``` Validate the chatgpt-widget-creator skill against production criteria ``` ### Quick audit ``` Quick validation check on mcp-builder skill ``` ### Focused review ``` Check if skill-creator skill has proper user interaction patterns ```