--- name: review-multi description: Comprehensive multi-dimensional skill reviews across structure, content, quality, usability, and integration. Task-based operations with automated validation, manual assessment, scoring rubrics, and improvement recommendations. Use when reviewing skills, ensuring quality, validating production readiness, identifying improvements, or conducting quality assurance. allowed-tools: Read, Write, Edit, Glob, Grep, Bash, WebSearch, WebFetch --- # Review-Multi ## Overview review-multi provides a systematic framework for conducting comprehensive, multi-dimensional reviews of Claude Code skills. It evaluates skills across 5 independent dimensions, combining automated validation with manual assessment to deliver objective quality scores and actionable improvement recommendations. **Purpose**: Systematic skill quality assurance through multi-dimensional assessment **The 5 Review Dimensions**: 1. **Structure Review** - YAML frontmatter, file organization, naming conventions, progressive disclosure 2. **Content Review** - Section completeness, clarity, examples, documentation quality 3. **Quality Review** - Pattern compliance, best practices, anti-pattern detection, code quality 4. **Usability Review** - Ease of use, learnability, real-world effectiveness, user satisfaction 5. **Integration Review** - Dependency documentation, data flow, component integration, composition **Automation Levels**: - Structure: 95% automated (validate-structure.py) - Content: 40% automated, 60% manual assessment - Quality: 50% automated, 50% manual assessment - Usability: 10% automated, 90% manual testing - Integration: 30% automated, 70% manual review **Scoring System**: - **Scale**: 1-5 per dimension (Excellent/Good/Acceptable/Needs Work/Poor) - **Overall Score**: Weighted average across dimensions - **Grade**: A/B/C/D/F mapping - **Production Readiness**: ≥4.5 ready, 4.0-4.4 ready with improvements, 3.5-3.9 needs work, <3.5 not ready **Value Proposition**: - **Objective**: Evidence-based scoring using detailed rubrics (not subjective opinion) - **Comprehensive**: 5 dimensions cover all quality aspects - **Efficient**: Automation handles 30-95% of checks depending on dimension - **Actionable**: Specific, prioritized improvement recommendations - **Consistent**: Standardized checklists ensure repeatable results - **Flexible**: 3 review modes (Comprehensive, Fast Check, Custom) **Key Benefits**: - Catch 70% of issues with fast automated checks - Reduce common quality issues by 30% using checklists - Ensure production readiness before deployment - Identify improvement opportunities systematically - Track quality improvements over time - Establish quality standards across skill ecosystem ## When to Use Use review-multi when: 1. **Pre-Production Validation** - Review new skills before deploying to production to catch issues early and ensure quality standards 2. **Quality Assurance** - Conduct systematic QA on skills to validate they meet ecosystem standards and user needs 3. **Identifying Improvements** - Discover specific, actionable improvements for existing skills through multi-dimensional assessment 4. **Continuous Improvement** - Regular reviews throughout development lifecycle, not just at end, to maintain quality 5. **Production Readiness Assessment** - Determine if skill is ready for production use with objective scoring and grade mapping 6. **Skill Ecosystem Standards** - Ensure consistency and quality across multiple skills using standardized review framework 7. **Post-Update Validation** - Review skills after major updates to ensure changes don't introduce issues or degrade quality 8. **Learning and Improvement** - Use review findings to learn patterns, improve future skills, and refine development practices 9. **Team Calibration** - Standardize quality assessment across multiple reviewers with objective rubrics **Don't Use When**: - Quick syntax checks (use validate-structure.py directly) - In-progress drafts (wait until reasonably complete) - Experimental prototypes (not production-bound) ## Prerequisites **Required**: - Skill to review (in `.claude/skills/[skill-name]/` format) - Time allocation based on review mode: - Fast Check: 5-10 minutes - Single Operation: 15-60 minutes (varies by dimension) - Comprehensive Review: 1.5-2.5 hours **Optional**: - Python 3.7+ (for automation scripts in Structure and Quality reviews) - PyYAML library (for YAML frontmatter validation) - Access to skill-under-review documentation - Familiarity with Claude Code skill patterns (see `development-workflow/references/common-patterns.md`) **Skills** (no required dependencies, complementary): - development-workflow: Use review-multi after skill development - skill-updater: Apply review-multi recommendations - testing-validator: Combine with review-multi for full QA ## Scoring System The review-multi scoring system provides objective, consistent quality assessment across all skill dimensions. ### Per-Dimension Scoring (1-5 Scale) Each dimension is scored independently using a 1-5 integer scale: **5 - Excellent** (Exceeds Standards) - All criteria met perfectly - Goes beyond minimum requirements - Exemplary quality that sets the bar - No issues or concerns identified - Can serve as example for others **4 - Good** (Meets Standards) - Meets all critical criteria - 1-2 minor, non-critical issues - Production-ready quality - Standard expected level - Small improvements possible **3 - Acceptable** (Minor Improvements Needed) - Meets most criteria - 3-4 issues, some may be critical - Usable but not optimal - Several improvements recommended - Can proceed with noted concerns **2 - Needs Work** (Notable Issues) - Missing several criteria - 5-6 issues, multiple critical - Not production-ready - Significant improvements required - Rework needed before deployment **1 - Poor** (Significant Problems) - Fails most criteria - 7+ issues, fundamentally flawed - Major quality concerns - Extensive rework required - Not viable in current state ### Overall Score Calculation The overall score is a **weighted average** of the 5 dimension scores: ``` Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) + (Usability × 0.15) + (Integration × 0.15) ``` **Weight Rationale**: - **Content & Quality (25% each)**: Core skill value - what it does and how well - **Structure (20%)**: Important foundation - organization and compliance - **Usability & Integration (15% each)**: Supporting factors - user experience and composition **Example Calculations**: - Scores (5, 4, 4, 3, 4) → Overall = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → Grade **B** - Scores (4, 5, 5, 4, 4) → Overall = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → Grade **A** - Scores (3, 3, 2, 3, 3) → Overall = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → Grade **C** ### Grade Mapping Overall scores map to letter grades: - **A (4.5-5.0)**: Excellent - Production ready, high quality - **B (3.5-4.4)**: Good - Ready with minor improvements - **C (2.5-3.4)**: Acceptable - Needs improvements before production - **D (1.5-2.4)**: Poor - Requires significant rework - **F (1.0-1.4)**: Failing - Major issues, not viable ### Production Readiness Assessment Based on overall score: - **≥4.5 (Grade A)**: ✅ **Production Ready** - High quality, deploy with confidence - **4.0-4.4 (Grade B+)**: ✅ **Ready with Minor Improvements** - Can deploy, address improvements in next iteration - **3.5-3.9 (Grade B-)**: ⚠️ **Needs Improvements** - Address issues before production deployment - **<3.5 (Grade C-F)**: ❌ **Not Ready** - Significant rework required before deployment **Decision Framework**: - **A Grade**: Ship it - exemplary quality - **B Grade (4.0+)**: Ship it - standard quality, note improvements for future - **B- Grade (3.5-3.9)**: Hold - fix identified issues first - **C-F Grade**: Don't ship - substantial work needed ## Operations ### Operation 1: Structure Review **Purpose**: Validate file organization, naming conventions, YAML frontmatter compliance, and progressive disclosure **When to Use This Operation**: - Always run first (fast automated check catches 70% of issues) - Before comprehensive review (quick validation of basics) - During development (continuous structure validation) - Quick quality checks (5-10 minute validation) **Automation Level**: 95% automated via `scripts/validate-structure.py` **Process**: 1. **Run Structure Validation Script** ```bash python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose] ``` Script checks YAML, file structure, naming, progressive disclosure 2. **Review YAML Frontmatter** - Verify name field in kebab-case format - Check description has 5+ trigger keywords naturally embedded - Validate YAML syntax is correct 3. **Verify File Structure** - Confirm SKILL.md exists - Check references/ and scripts/ organization (if present) - Verify README.md exists 4. **Check Naming Conventions** - SKILL.md and README.md uppercase - references/ files: lowercase-hyphen-case - scripts/ files: lowercase-hyphen-case with extension 5. **Validate Progressive Disclosure** - SKILL.md <1,500 lines (warn if >1,200) - references/ files 300-800 lines each - No monolithic files **Validation Checklist**: - [ ] YAML frontmatter present and valid syntax - [ ] `name` field in kebab-case format (e.g., skill-name) - [ ] `description` includes 5+ trigger keywords (naturally embedded) - [ ] SKILL.md file exists - [ ] File naming follows conventions (SKILL.md uppercase, references lowercase-hyphen) - [ ] Directory structure correct (references/, scripts/ if present) - [ ] SKILL.md size appropriate (<1,500 lines, ideally <1,200) - [ ] References organized by topic (if present) - [ ] No monolithic files (progressive disclosure maintained) - [ ] README.md present **Scoring Criteria**: - **5 - Excellent**: All 10 checks pass, perfect compliance, exemplary structure - **4 - Good**: 8-9 checks pass, 1-2 minor non-critical issues (e.g., README missing but optional) - **3 - Acceptable**: 6-7 checks pass, 3-4 issues including some critical (e.g., YAML invalid but fixable) - **2 - Needs Work**: 4-5 checks pass, 5-6 issues with multiple critical (e.g., no SKILL.md, bad naming) - **1 - Poor**: ≤3 checks pass, 7+ issues, fundamentally flawed structure **Outputs**: - Structure score (1-5) - Pass/fail status for each checklist item - List of issues found with severity (critical/warning/info) - Specific improvement recommendations with fix guidance - JSON report (if using script with --json flag) **Time Estimate**: 5-10 minutes (mostly automated) **Example**: ```bash $ python3 scripts/validate-structure.py .claude/skills/todo-management Structure Validation Report =========================== Skill: todo-management Date: 2025-11-06 ✅ YAML Frontmatter: PASS - Name format: valid (kebab-case) - Trigger keywords: 8 found (target: 5+) ✅ File Structure: PASS - SKILL.md: exists - README.md: exists - references/: 3 files found - scripts/: 1 file found ✅ Naming Conventions: PASS - All files follow conventions ⚠️ Progressive Disclosure: WARNING - SKILL.md: 569 lines (good) - state-management-guide.md: 501 lines (good) - BUT: No Quick Reference section detected Overall Structure Score: 4/5 (Good) Issues: 1 warning (missing Quick Reference) Recommendation: Add Quick Reference section to SKILL.md ``` --- ### Operation 2: Content Review **Purpose**: Assess section completeness, content clarity, example quality, and documentation comprehensiveness **When to Use This Operation**: - Evaluate documentation quality - Assess completeness of skill content - Review example quality and quantity - Validate information architecture - Check clarity and organization **Automation Level**: 40% automated (section detection, example counting), 60% manual assessment **Process**: 1. **Check Section Completeness** (automated + manual) - Verify 5 core sections present: Overview, When to Use, Main Content (workflow/operations), Best Practices, Quick Reference - Check optional sections: Prerequisites, Common Mistakes, Troubleshooting - Assess if all necessary sections included 2. **Assess Content Clarity** (manual) - Is content understandable? - Is organization logical? - Are explanations clear without being verbose? - Is technical level appropriate for audience? 3. **Evaluate Example Quality** (automated count + manual quality) - Count code/command examples (target: 5+) - Check if examples are concrete (not abstract placeholders) - Verify examples are executable/copy-pasteable - Assess if examples help understanding 4. **Review Documentation Completeness** (manual) - Is all necessary information present? - Are there unexplained gaps? - Is sufficient detail provided? - Are edge cases covered? 5. **Check Explanation Depth** (manual) - Not too brief (insufficient detail)? - Not too verbose (unnecessary length)? - Balanced depth for complexity? **Validation Checklist**: - [ ] Overview/Introduction section present - [ ] When to Use section present with 5+ scenarios - [ ] Main content (workflow steps OR operations OR reference material) complete - [ ] Best Practices section present - [ ] Quick Reference section present - [ ] 5+ code/command examples included - [ ] Examples are concrete (not abstract placeholders like "YOUR_VALUE_HERE") - [ ] Content clarity: readable and well-structured - [ ] Sufficient detail: not too brief - [ ] Not too verbose: concise without unnecessary length **Scoring Criteria**: - **5 - Excellent**: All 10 checks pass, exceptional clarity, great examples, comprehensive documentation - **4 - Good**: 8-9 checks pass, good content with minor gaps or clarity issues - **3 - Acceptable**: 6-7 checks pass, some sections weak or missing, acceptable clarity - **2 - Needs Work**: 4-5 checks pass, multiple sections incomplete/unclear, poor examples - **1 - Poor**: ≤3 checks pass, major gaps, confusing content, few/no examples **Outputs**: - Content score (1-5) - Section-by-section assessment (present/missing/weak) - Example quality rating and count - Specific content improvement recommendations - Clarity issues identified with examples **Time Estimate**: 15-30 minutes (requires manual review) **Example**: ``` Content Review: prompt-builder ============================== Section Completeness: 9/10 ✅ ✅ Overview: Present, clear explanation of purpose ✅ When to Use: 7 scenarios listed ✅ Main Content: 5-step workflow, well-organized ✅ Best Practices: 6 practices documented ✅ Quick Reference: Present ⚠️ Common Mistakes: Not present (optional but valuable) Example Quality: 8/10 ✅ - Count: 12 examples (exceeds target of 5+) - Concrete: Yes, all examples executable - Helpful: Yes, demonstrate key concepts - Minor: Could use 1-2 edge case examples Content Clarity: 9/10 ✅ - Well-organized logical flow - Clear explanations without verbosity - Technical level appropriate - Minor: Step 3 could be clearer (add diagram) Documentation Completeness: 8/10 ✅ - All workflow steps documented - Validation criteria clear - Minor gaps: Error handling not covered Content Score: 4/5 (Good) Primary Recommendation: Add Common Mistakes section Secondary: Add error handling guidance to Step 3 ``` --- ### Operation 3: Quality Review **Purpose**: Evaluate pattern compliance, best practices adherence, anti-pattern detection, and code/script quality **When to Use This Operation**: - Validate standards compliance - Check pattern implementation - Detect anti-patterns - Assess code quality (if scripts present) - Ensure best practices followed **Automation Level**: 50% automated (pattern detection, anti-pattern checking), 50% manual assessment **Process**: 1. **Detect Architecture Pattern** (automated + manual) - Identify pattern type: workflow/task/reference/capabilities - Verify pattern correctly implemented - Check pattern consistency throughout skill 2. **Validate Documentation Patterns** (automated + manual) - Verify 5 core sections present - Check consistent structure across steps/operations - Validate section formatting 3. **Check Best Practices** (manual) - Validation checklists present and specific? - Examples throughout documentation? - Quick Reference available? - Error cases considered? 4. **Detect Anti-Patterns** (automated + manual) - Keyword stuffing (trigger keywords unnatural)? - Monolithic SKILL.md (>1,500 lines, no progressive disclosure)? - Inconsistent structure (each section different format)? - Vague validation ("everything works")? - Missing examples (too abstract)? - Placeholders in production ("YOUR_VALUE_HERE")? - Ignoring error cases (only happy path)? - Over-engineering simple skills? - Unclear dependencies? - No Quick Reference? 5. **Assess Code Quality** (manual, if scripts present) - Scripts well-documented (docstrings)? - Error handling present? - CLI interfaces clear? - Code style consistent? **Validation Checklist**: - [ ] Architecture pattern correctly implemented (workflow/task/reference/capabilities) - [ ] Consistent structure across steps/operations (same format throughout) - [ ] Validation checklists present and specific (measurable, not vague) - [ ] Best practices section actionable (specific guidance) - [ ] No keyword stuffing (trigger keywords natural, contextual) - [ ] No monolithic SKILL.md (progressive disclosure used if >1,000 lines) - [ ] Examples are complete (no "YOUR_VALUE_HERE" placeholders in production) - [ ] Error cases considered (not just happy path documented) - [ ] Dependencies documented (if skill requires other skills) - [ ] Scripts well-documented (if present: docstrings, error handling, CLI help) **Scoring Criteria**: - **5 - Excellent**: All 10 checks pass, exemplary quality, no anti-patterns, exceeds standards - **4 - Good**: 8-9 checks pass, high quality, meets all standards, minor deviations - **3 - Acceptable**: 6-7 checks pass, acceptable quality, some standard violations, 2-3 anti-patterns - **2 - Needs Work**: 4-5 checks pass, quality issues, multiple standard violations, 4-5 anti-patterns - **1 - Poor**: ≤3 checks pass, poor quality, significant problems, 6+ anti-patterns detected **Outputs**: - Quality score (1-5) - Pattern compliance assessment (pattern detected, compliance level) - Anti-patterns detected (list with severity) - Best practices gaps identified - Code quality assessment (if scripts present) - Prioritized improvement recommendations **Time Estimate**: 20-40 minutes (mixed automated + manual) **Example**: ``` Quality Review: workflow-skill-creator ====================================== Pattern Compliance: ✅ - Pattern Detected: Workflow-based - Implementation: Correct (5 sequential steps with dependencies) - Consistency: High (all steps follow same structure) Documentation Patterns: ✅ - 5 Core Sections: All present - Structure: Consistent across all 5 steps - Formatting: Proper heading levels Best Practices Adherence: 8/10 ✅ ✅ Validation checklists: Present and specific ✅ Examples throughout: 6 examples included ✅ Quick Reference: Present ⚠️ Error handling: Limited (only happy path in examples) Anti-Pattern Detection: 1 detected ⚠️ ✅ No keyword stuffing (15 natural keywords) ✅ No monolithic file (1,465 lines but has references/) ✅ Consistent structure ✅ Specific validation criteria ✅ Examples complete (no placeholders) ⚠️ Error cases: Only happy path documented ✅ Dependencies: Clearly documented ✅ Not over-engineered Code Quality: N/A (no scripts) Quality Score: 4/5 (Good) Primary Issue: Limited error handling documentation Recommendation: Add error case examples and recovery guidance ``` --- ### Operation 4: Usability Review **Purpose**: Evaluate ease of use, learnability, real-world effectiveness, and user satisfaction through scenario testing **When to Use This Operation**: - Test real-world usage - Assess user experience - Evaluate learnability - Measure effectiveness - Validate skill achieves stated purpose **Automation Level**: 10% automated (basic checks), 90% manual testing **Process**: 1. **Test in Real-World Scenario** - Select appropriate use case from "When to Use" section - Actually use the skill to complete task - Document experience: smooth or friction? - Note any confusion or difficulty 2. **Assess Navigation/Findability** - Can you find needed information easily? - Is information architecture logical? - Are sections well-organized? - Is Quick Reference helpful? 3. **Evaluate Clarity** - Are instructions clear and actionable? - Are steps easy to follow? - Do examples help understanding? - Is technical terminology explained? 4. **Measure Effectiveness** - Does skill achieve stated purpose? - Does it deliver promised value? - Are outputs useful and complete? - Would you use it again? 5. **Assess Learning Curve** - How long to understand skill? - How long to use effectively? - Is learning curve reasonable for complexity? - Are first-time users supported well? **Validation Checklist**: - [ ] Skill tested in real-world scenario (actual usage, not just reading) - [ ] Users can find information easily (navigation clear, sections logical) - [ ] Instructions are clear and actionable (can follow without confusion) - [ ] Examples help understanding (concrete, demonstrate key concepts) - [ ] Skill achieves stated purpose (delivers promised value) - [ ] Learning curve reasonable (appropriate for skill complexity) - [ ] Error messages helpful (if applicable: clear, actionable guidance) - [ ] Overall user satisfaction high (would use again, recommend to others) **Scoring Criteria**: - **5 - Excellent**: All 8 checks pass, excellent usability, easy to learn, highly effective, very satisfying - **4 - Good**: 6-7 checks pass, good usability, minor friction points, generally effective - **3 - Acceptable**: 4-5 checks pass, acceptable usability, some confusion/difficulty, moderately effective - **2 - Needs Work**: 2-3 checks pass, usability issues, frustrating or confusing, limited effectiveness - **1 - Poor**: ≤1 check passes, poor usability, hard to use, ineffective, unsatisfying **Outputs**: - Usability score (1-5) - Scenario test results (success/partial/failure) - User experience assessment (smooth/acceptable/frustrating) - Specific usability improvements identified - Learning curve assessment - Effectiveness rating **Time Estimate**: 30-60 minutes (requires actual testing) **Example**: ``` Usability Review: skill-researcher ================================== Real-World Scenario Test: ✅ - Scenario: Research GitHub API integration patterns - Result: SUCCESS - Found 5 relevant sources, synthesized findings - Experience: Smooth, operations clearly explained - Time: 45 minutes (expected 60 min range) Navigation/Findability: 9/10 ✅ - Information easy to find - 5 operations clearly separated - Quick Reference table very helpful - Minor: Could use table of contents for long doc Instruction Clarity: 9/10 ✅ - Steps clear and actionable - Process well-explained - Examples demonstrate concepts - Minor: Web search query formulation could be clearer Effectiveness: 10/10 ✅ - Achieved purpose: Found patterns and synthesized - Delivered value: Comprehensive research in 45 min - Would use again: Yes, very helpful Learning Curve: 8/10 ✅ - Time to understand: 10 minutes - Time to use effectively: 15 minutes - Reasonable for complexity - First-time user: Some concepts need explanation (credibility scoring) Error Handling: N/A (no errors encountered) User Satisfaction: 9/10 ✅ - Would use again: Yes - Would recommend: Yes - Overall experience: Very positive Usability Score: 5/5 (Excellent) Minor Improvement: Add brief explanation of credibility scoring concept ``` --- ### Operation 5: Integration Review **Purpose**: Assess dependency documentation, data flow clarity, component integration, and composition patterns **When to Use This Operation**: - Review workflow skills (that compose other skills) - Validate dependency documentation - Check integration clarity - Assess composition patterns - Verify cross-references valid **Automation Level**: 30% automated (dependency checking, cross-reference validation), 70% manual assessment **Process**: 1. **Review Dependency Documentation** (manual) - Are required skills documented? - Are optional/complementary skills mentioned? - Is YAML `dependencies` field used (if applicable)? - Are dependency versions noted (if relevant)? 2. **Assess Data Flow Clarity** (manual, for workflow skills) - Is data flow between skills explained? - Are inputs/outputs documented for each step? - Do users understand how data moves? - Are there diagrams or flowcharts (if helpful)? 3. **Evaluate Component Integration** (manual) - How do component skills work together? - Are integration points clear? - Are there integration examples? - Is composition pattern documented? 4. **Verify Cross-References** (automated + manual) - Do internal links work (references to references/, scripts/)? - Are external skill references correct? - Are complementary skills mentioned? 5. **Check Composition Patterns** (manual, for workflow skills) - Is composition pattern identified (sequential/parallel/conditional/etc.)? - Is pattern correctly implemented? - Are orchestration details provided? **Validation Checklist**: - [ ] Dependencies documented (if skill requires other skills) - [ ] YAML `dependencies` field correct (if used) - [ ] Data flow explained (for workflow skills: inputs/outputs clear) - [ ] Integration points clear (how component skills connect) - [ ] Component skills referenced correctly (names accurate, paths valid) - [ ] Cross-references valid (internal links work, external references correct) - [ ] Integration examples provided (if applicable: how to use together) - [ ] Composition pattern documented (if workflow: sequential/parallel/etc.) - [ ] Complementary skills mentioned (optional but valuable related skills) **Scoring Criteria**: - **5 - Excellent**: All 9 checks pass (applicable ones), perfect integration documentation - **4 - Good**: 7-8 checks pass, good integration, minor gaps in documentation - **3 - Acceptable**: 5-6 checks pass, some integration unclear, missing details - **2 - Needs Work**: 3-4 checks pass, integration issues, poorly documented dependencies/flow - **1 - Poor**: ≤2 checks pass, poor integration, confusing or missing dependency documentation **Outputs**: - Integration score (1-5) - Dependency validation results (required/optional/complementary documented) - Data flow clarity assessment (for workflow skills) - Integration clarity rating - Cross-reference validation results - Improvement recommendations **Time Estimate**: 15-25 minutes (mostly manual) **Example**: ``` Integration Review: development-workflow ======================================== Dependency Documentation: 10/10 ✅ - Required Skills: None (workflow is standalone) - Component Skills: 5 clearly documented (skill-researcher, planning-architect, task-development, prompt-builder, todo-management) - Optional Skills: 3 complementary skills mentioned (review-multi, skill-updater, testing-validator) - YAML Field: Not used (not required, skills referenced in content) Data Flow Clarity: 10/10 ✅ (Workflow Skill) - Data flow diagram present (skill → output → next skill) - Inputs/outputs for each step documented - Users understand how artifacts flow - Example: ``` skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development ``` Component Integration: 10/10 ✅ - Integration method documented for each step (Guided Execution) - Integration examples provided - Clear explanation of how skills work together - Process for using each component skill detailed Cross-Reference Validation: ✅ - Internal links valid (references/ files exist and reachable) - External skill references correct (all 5 component skills exist) - Complementary skills mentioned appropriately Composition Pattern: 10/10 ✅ (Workflow Skill) - Pattern: Sequential Pipeline (with one optional step) - Correctly implemented (Step 1 → 2 → [3 optional] → 4 → 5) - Orchestration details provided - Clear flow diagram Integration Score: 5/5 (Excellent) Notes: Exemplary integration documentation for workflow skill ``` --- ## Review Modes ### Comprehensive Review Mode **Purpose**: Complete multi-dimensional assessment across all 5 dimensions with aggregate scoring **When to Use**: - Pre-production validation (ensure skill ready for deployment) - Major skill updates (validate changes don't degrade quality) - Quality certification (establish baseline quality score) - Periodic quality audits (track quality over time) **Process**: 1. **Run All 5 Operations Sequentially** - Operation 1: Structure Review (5-10 min, automated) - Operation 2: Content Review (15-30 min, manual) - Operation 3: Quality Review (20-40 min, mixed) - Operation 4: Usability Review (30-60 min, manual) - Operation 5: Integration Review (15-25 min, manual) 2. **Aggregate Scores** - Record score (1-5) for each dimension - Calculate weighted overall score using formula - Map overall score to grade (A/B/C/D/F) 3. **Assess Production Readiness** - ≥4.5: Production Ready - 4.0-4.4: Ready with minor improvements - 3.5-3.9: Needs improvements before production - <3.5: Not ready, significant rework required 4. **Compile Improvement Recommendations** - Aggregate issues from all dimensions - Prioritize: Critical → High → Medium → Low - Provide specific, actionable fixes 5. **Generate Comprehensive Report** - Executive summary (overall score, grade, readiness) - Per-dimension scores and findings - Prioritized improvement list - Detailed rationale for scores **Output**: - Overall score (1.0-5.0 with one decimal) - Grade (A/B/C/D/F) - Production readiness assessment - Per-dimension scores (Structure, Content, Quality, Usability, Integration) - Comprehensive improvement recommendations (prioritized) - Detailed review report **Time Estimate**: 1.5-2.5 hours total **Example Output**: ``` Comprehensive Review Report: skill-researcher ============================================= OVERALL SCORE: 4.6/5.0 - GRADE A STATUS: ✅ PRODUCTION READY Dimension Scores: - Structure: 5/5 (Excellent) - Perfect file organization - Content: 5/5 (Excellent) - Comprehensive, clear documentation - Quality: 4/5 (Good) - High quality, minor error handling gaps - Usability: 5/5 (Excellent) - Easy to use, highly effective - Integration: 4/5 (Good) - Well-documented dependencies Production Readiness: READY - High quality, deploy with confidence Recommendations (Priority Order): 1. [Medium] Add error handling examples for web search failures 2. [Low] Consider adding table of contents for long SKILL.md Strengths: - Excellent structure and organization - Comprehensive coverage of 5 research operations - Strong usability with clear instructions - Good examples throughout Overall: Exemplary skill, production-ready quality ``` --- ### Fast Check Mode **Purpose**: Quick automated validation for rapid quality feedback during development **When to Use**: - During development (continuous validation) - Quick quality checks (before detailed review) - Pre-commit validation (catch issues early) - Rapid iteration (fast feedback loop) **Process**: 1. **Run Automated Structure Validation** ```bash python3 scripts/validate-structure.py /path/to/skill ``` 2. **Check Critical Issues** - YAML frontmatter valid? - Required files present? - Naming conventions followed? - File sizes appropriate? 3. **Generate Pass/Fail Report** - PASS: Critical checks passed, proceed to development - FAIL: Critical issues found, fix before continuing 4. **Provide Quick Fixes** (if available) - Specific commands to fix issues - Examples of correct format - References to documentation **Output**: - Pass/Fail status - Critical issues list (if failed) - Quick fixes or guidance - Score estimate (if passed) **Time Estimate**: 5-10 minutes **Example Output**: ```bash $ python3 scripts/validate-structure.py .claude/skills/my-skill Fast Check Report ================= Skill: my-skill ❌ FAIL - Critical Issues Found Critical Issues: 1. YAML frontmatter: Invalid syntax (line 3: unexpected character) 2. Naming convention: File "MyGuide.md" should be "my-guide.md" Quick Fixes: 1. Fix YAML: Remove trailing comma on line 3 2. Rename file: mv references/MyGuide.md references/my-guide.md Run full validation after fixes: python3 scripts/validate-structure.py .claude/skills/my-skill ``` --- ### Custom Review **Purpose**: Flexible review focusing on specific dimensions or concerns **When to Use**: - Targeted improvements (focus on specific dimension) - Time constraints (can't do comprehensive review) - Specific concerns (e.g., only check usability) - Iterative improvements (focus on one dimension at a time) **Options**: 1. **Select Dimensions**: Choose 1-5 operations to run 2. **Adjust Thoroughness**: Quick/Standard/Thorough per dimension 3. **Focus Areas**: Specify particular concerns (e.g., "check examples quality") **Process**: 1. **Define Custom Review Scope** - Which dimensions to review? - How thorough for each? - Any specific focus areas? 2. **Run Selected Operations** - Execute chosen operations - Apply thoroughness level 3. **Generate Targeted Report** - Scores for selected dimensions only - Focused findings - Specific recommendations **Example Scenarios**: **Scenario 1: Content-Focused Review** ``` Custom Review: Content + Examples - Operations: Content Review only - Thoroughness: Thorough - Focus: Example quality and completeness - Time: 30 minutes ``` **Scenario 2: Quick Quality Check** ``` Custom Review: Structure + Quality (Fast) - Operations: Structure + Quality - Thoroughness: Quick - Focus: Pattern compliance, anti-patterns - Time: 15-20 minutes ``` **Scenario 3: Workflow Integration Review** ``` Custom Review: Integration Deep Dive - Operations: Integration Review only - Thoroughness: Thorough - Focus: Data flow, composition patterns - Time: 30 minutes ``` --- ## Best Practices ### 1. Self-Review First **Practice**: Run Fast Check mode before requesting comprehensive review **Rationale**: Automated checks catch 70% of structural issues in 5-10 minutes, allowing manual review to focus on higher-value assessment **Application**: Always run `validate-structure.py` before detailed review ### 2. Use Checklists Systematically **Practice**: Follow validation checklists item-by-item for each operation **Rationale**: Research shows teams using checklists reduce common issues by 30% and ensure consistent results **Application**: Print or display checklist, mark each item explicitly ### 3. Test in Real Scenarios **Practice**: Conduct usability review with actual usage, not just documentation reading **Rationale**: Real-world testing reveals hidden usability issues that documentation review misses **Application**: For Usability Review, actually use the skill to complete a realistic task ### 4. Focus on Automation **Practice**: Let scripts handle routine checks, focus manual effort on judgment-requiring assessment **Rationale**: Automation provides 70% reduction in manual review time for routine checks **Application**: Use scripts for Structure and partial Quality checks, manual for Content/Usability ### 5. Provide Actionable Feedback **Practice**: Make improvement recommendations specific, prioritized, and actionable **Rationale**: Vague feedback ("improve quality") is less valuable than specific guidance ("add error handling examples to Step 3") **Application**: For each issue, specify: What, Why, How (to fix), Priority ### 6. Review Regularly **Practice**: Conduct reviews throughout development lifecycle, not just at end **Rationale**: Early reviews catch issues before they compound; rapid feedback maintains momentum (37% productivity increase) **Application**: Fast Check during development, Comprehensive Review before production ### 7. Track Improvements **Practice**: Document before/after scores to measure improvement over time **Rationale**: Tracking demonstrates progress, identifies patterns, validates improvements **Application**: Save review reports, compare scores across iterations ### 8. Iterate Based on Findings **Practice**: Use review findings to improve future skills, not just current skill **Rationale**: Learnings compound; patterns identified in reviews improve entire skill ecosystem **Application**: Document common issues, create guidelines, update templates --- ## Common Mistakes ### Mistake 1: Skipping Structure Review **Symptom**: Spending time on detailed review only to discover fundamental structural issues **Cause**: Assumption that structure is correct, eagerness to assess content **Fix**: Always run Structure Review (Fast Check) first - takes 5-10 minutes, catches 70% of issues **Prevention**: Make Fast Check mandatory first step in any review process ### Mistake 2: Subjective Scoring **Symptom**: Inconsistent scores, debate over ratings, difficulty justifying scores **Cause**: Using personal opinion instead of rubric criteria **Fix**: Use `references/scoring-rubric.md` - score based on specific criteria, not feeling **Prevention**: Print rubric, refer to criteria for each score, document evidence ### Mistake 3: Ignoring Usability **Symptom**: Skill looks good on paper but difficult to use in practice **Cause**: Skipping Usability Review (90% manual, time-consuming) **Fix**: Actually test skill in real scenario - reveals hidden issues **Prevention**: Allocate 30-60 minutes for usability testing, cannot skip for production ### Mistake 4: No Prioritization **Symptom**: Long list of improvements, unclear what to fix first, overwhelmed **Cause**: Treating all issues equally without assessing impact **Fix**: Prioritize issues: Critical (must fix) → High → Medium → Low (nice to have) **Prevention**: Tag each issue with priority level during review ### Mistake 5: Batch Reviews **Symptom**: Discovering major issues late in development, costly rework **Cause**: Waiting until end to review, accumulating issues **Fix**: Review early and often - Fast Check during development, iterations **Prevention**: Continuous validation, rapid feedback, catch issues when small ### Mistake 6: Ignoring Patterns **Symptom**: Repeating same issues across multiple skills **Cause**: Treating each review in isolation, not learning from patterns **Fix**: Track common issues, create guidelines, update development process **Prevention**: Document patterns, share learnings, improve templates --- ## Quick Reference ### The 5 Operations | Operation | Focus | Automation | Time | Key Output | |-----------|-------|------------|------|------------| | **Structure** | YAML, files, naming, organization | 95% | 5-10m | Structure score, compliance report | | **Content** | Completeness, clarity, examples | 40% | 15-30m | Content score, section assessment | | **Quality** | Patterns, best practices, anti-patterns | 50% | 20-40m | Quality score, pattern compliance | | **Usability** | Ease of use, effectiveness | 10% | 30-60m | Usability score, scenario test results | | **Integration** | Dependencies, data flow, composition | 30% | 15-25m | Integration score, dependency validation | ### Scoring Scale | Score | Level | Meaning | Action | |-------|-------|---------|--------| | **5** | Excellent | Exceeds standards | Exemplary - use as example | | **4** | Good | Meets standards | Production ready - standard quality | | **3** | Acceptable | Minor improvements | Usable - note improvements | | **2** | Needs Work | Notable issues | Not ready - significant improvements | | **1** | Poor | Significant problems | Not viable - extensive rework | ### Production Readiness | Overall Score | Grade | Status | Decision | |---------------|-------|--------|----------| | **4.5-5.0** | A | ✅ Production Ready | Ship it - high quality | | **4.0-4.4** | B+ | ✅ Ready (minor improvements) | Ship - note improvements for next iteration | | **3.5-3.9** | B- | ⚠️ Needs Improvements | Hold - fix issues first | | **2.5-3.4** | C | ❌ Not Ready | Don't ship - substantial work needed | | **1.5-2.4** | D | ❌ Not Ready | Don't ship - significant rework | | **1.0-1.4** | F | ❌ Not Ready | Don't ship - major issues | ### Review Modes | Mode | Time | Use Case | Coverage | |------|------|----------|----------| | **Fast Check** | 5-10m | During development, quick validation | Structure only (automated) | | **Custom** | Variable | Targeted review, specific concerns | Selected dimensions | | **Comprehensive** | 1.5-2.5h | Pre-production, full assessment | All 5 dimensions + report | ### Common Commands ```bash # Fast structure validation python3 scripts/validate-structure.py /path/to/skill # Verbose output python3 scripts/validate-structure.py /path/to/skill --verbose # JSON output python3 scripts/validate-structure.py /path/to/skill --json # Pattern compliance check python3 scripts/check-patterns.py /path/to/skill # Generate review report python3 scripts/generate-review-report.py review_data.json --output report.md # Run comprehensive review python3 scripts/review-runner.py /path/to/skill --mode comprehensive ``` ### Weighted Average Formula ``` Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) + (Usability × 0.15) + (Integration × 0.15) ``` **Weight Rationale**: - Content & Quality (25% each): Core value - Structure (20%): Foundation - Usability & Integration (15% each): Supporting ### For More Information - **Structure details**: `references/structure-review-guide.md` - **Content details**: `references/content-review-guide.md` - **Quality details**: `references/quality-review-guide.md` - **Usability details**: `references/usability-review-guide.md` - **Integration details**: `references/integration-review-guide.md` - **Complete scoring rubrics**: `references/scoring-rubric.md` - **Report templates**: `references/review-report-template.md` --- **For detailed guidance on each dimension, see reference files. For automation tools, see scripts/.**