# 📊 Agent Skill Benchmark Report > Generated: 2026-04-11T12:00:22.838Z > Token counting: `ceil(characters / 4)` — cl100k_base approximation. > Baselines: derived from **real, measured example prompts** (see Methodology). > Quality: structural rubric (0–10), no live LLM calls required. ## ❓ How to Read This Report This benchmark answers: **"How many tokens and dollars does an agent skill save compared to a developer writing the same guidance inline?"** **WITHOUT a skill**: A developer writes domain knowledge directly into the prompt every time (Baseline). **WITH a skill**: The agent loads the SKILL.md file (~400 tokens) — structured, reusable, cached. **Eval Alignment**: % of eval assertion values that appear in SKILL.md. High alignment means the skill actually teaches what the evals test — the static proxy for "with skill > without skill" behavioral improvement. ## 🔢 Executive Summary | Metric | Value | | --------------------------------- | --------------------------------- | | Total Skills Benchmarked | **237** | | Avg. Tokens WITH Skill (SKILL.md) | **516 tokens** | | Baseline: Light prompt (no skill) | **1449 tokens** ↓ see Methodology | | Baseline: Heavy prompt (no skill) | **3656 tokens** ↓ see Methodology | | Avg. Token Savings vs Light | **64%** (933 tokens/call) | | Avg. Token Savings vs Heavy | **86%** (3140 tokens/call) | | Avg. Quality Score | **9.9/10** | | Skills with Evals | **237 / 237** | | Avg. Eval Alignment | **94%** (eval assertions covered by SKILL.md) | ## 📜 History | Version | Date | Skills | Avg Tokens | Savings (%) | Quality | Report | | ------- | ---------- | ------ | ---------- | ----------- | ------- | ------ | | v2.1.1 | 2026-04-11 | 237 | 516 | 86% | 9.9/10 | [Full Report](benchmarks/archive/v2.1.1.md) | | v2.1.0 | 2026-04-04 | 237 | 526 | 86% | 9.9/10 | [Full Report](benchmarks/archive/v2.1.0.md) | | v2.0.1 | 2026-03-30 | 238 | 527 | 86% | 9.8/10 | [Full Report](benchmarks/archive/v2.0.1.md) | | v2.0.0 | 2026-03-25 | 235 | 523 | 86% | 9.9/10 | [Full Report](benchmarks/archive/v2.0.0.md) | | v1.10.3 | 2026-03-21 | 234 | 505 | 86% | 9.8/10 | [Full Report](benchmarks/archive/v1.10.3.md) | | v1.10.1 | 2026-03-16 | 229 | 428 | 88% | 9.9/10 | [Full Report](benchmarks/archive/v1.10.1.md) | | v1.10.0 | 2026-03-16 | 229 | 434 | 88% | 7/10 | [Full Report](benchmarks/archive/v1.10.0.md) | | v1.9.3 | 2026-03-15 | 229 | 460 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.3.md) | | v1.9.2 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.2.md) | | v1.9.1 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.1.md) | | v1.9.0 | 2026-03-05 | 228 | 457 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.0.md) | | v1.8.0 | 2026-03-02 | 228 | 443 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.8.0.md) | | v1.7.3 | 2026-02-25 | 222 | 418 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.3.md) | | v1.7.2 | 2026-02-25 | 220 | 413 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.2.md) | ### 💰 Cost Comparison — Per Single Call (Average Skill) > Comparison based on **Heavy Baseline** vs. modern and speculative models. | Model | Original Cost | Skill Cost | Net Savings | % Saved | | ----------------- | ------------- | ---------- | -------------- | ------- | | Gemini 3 Flash | $0.0018280 | $0.0002580 | **$0.0015700 ** | 86% | | GPT-5 | $0.0045700 | $0.0006450 | **$0.0039250 ** | 86% | | Gemini 3.1 Pro | $0.0073120 | $0.0010320 | **$0.0062800 ** | 86% | | Claude Sonnet 4.5 | $0.0109680 | $0.0015480 | **$0.0094200 ** | 86% | ### 📈 Monthly Savings at Scale — (Avg Skill vs Heavy Prompt) | Daily Calls | Original Cost/mo | Monthly Savings (1 skill) | Monthly Savings (50 skills) | Model | | ----------- | ---------------- | ------------------------- | --------------------------- | ----- | | 1,000 | $137.1000 /mo | $117.7500 /mo | $5887.5000 /mo | GPT-5 | | 1,000 | $329.0400 /mo | $282.6000 /mo | $14130.0000 /mo | Claude Sonnet 4.5 | | 1,000 | $219.3600 /mo | $188.4000 /mo | $9420.0000 /mo | Gemini 3.1 Pro | ## 📦 Per-Category Summary

📦 android (22 skills | avg 344 tokens | quality 10.0/10 | eval alignment 93%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `android-architecture ` | 504 | █████████░ 86% | 10/10 | 3 | ✅ 88% | | `android-background-work` | 302 | █████████░ 92% | 10/10 | 3 | ✅ 100% | | `android-compose ` | 447 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `android-concurrency ` | 314 | █████████░ 91% | 10/10 | 3 | ✅ 89% | | `android-deployment ` | 328 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `android-design-system` | 252 | █████████░ 93% | 10/10 | 3 | ✅ 100% | | `android-di ` | 309 | █████████░ 92% | 10/10 | 3 | ✅ 88% | | `android-legacy-navigation` | 310 | █████████░ 92% | 10/10 | 3 | ✅ 86% | | `android-legacy-security` | 444 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `android-legacy-state ` | 256 | █████████░ 93% | 10/10 | 3 | ✅ 100% | | `android-navigation ` | 261 | █████████░ 93% | 10/10 | 3 | ✅ 100% | | `android-navigation-type-safe` | 267 | █████████░ 93% | 10/10 | 3 | ✅ 83% | | `android-networking ` | 409 | █████████░ 89% | 10/10 | 3 | ✅ 83% | | `android-notifications` | 415 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `android-performance ` | 384 | █████████░ 89% | 10/10 | 3 | ✅ 88% | | `android-persistence ` | 297 | █████████░ 92% | 10/10 | 3 | ✅ 100% | | `android-resources ` | 409 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `android-security ` | 397 | █████████░ 89% | 10/10 | 3 | ✅ 86% | | `android-state ` | 361 | █████████░ 90% | 10/10 | 3 | ✅ 88% | | `android-testing ` | 318 | █████████░ 91% | 10/10 | 3 | ✅ 88% | | `android-tooling ` | 294 | █████████░ 92% | 10/10 | 3 | ✅ 88% | | `android-xml-views ` | 295 | █████████░ 92% | 10/10 | 3 | ✅ 88% |

📦 angular (15 skills | avg 497 tokens | quality 9.9/10 | eval alignment 87%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `angular-architecture ` | 613 | ████████░░ 83% | 10/10 | 6 | ✅ 95% | | `angular-components ` | 613 | ████████░░ 83% | 10/10 | 9 | ✅ 77% | | `angular-dependency-injection` | 518 | █████████░ 86% | 10/10 | 6 | ✅ 89% | | `angular-directives-pipes` | 489 | █████████░ 87% | 10/10 | 6 | ✅ 91% | | `angular-forms ` | 344 | █████████░ 91% | 10/10 | 6 | ✅ 100% | | `angular-http-client ` | 557 | █████████░ 85% | 10/10 | 6 | ✅ 92% | | `angular-performance ` | 472 | █████████░ 87% | 10/10 | 6 | ✅ 82% | | `angular-routing ` | 380 | █████████░ 90% | 10/10 | 6 | ✅ 100% | | `angular-rxjs-interop ` | 501 | █████████░ 86% | 10/10 | 6 | ✅ 95% | | `angular-security ` | 494 | █████████░ 86% | 10/10 | 6 | ✅ 78% | | `angular-ssr ` | 472 | █████████░ 87% | 10/10 | 6 | ✅ 90% | | `angular-state-management` | 402 | █████████░ 89% | 10/10 | 6 | ✅ 81% | | `angular-style-guide ` | 500 | █████████░ 86% | 10/10 | 6 | ✅ 71% | | `angular-testing ` | 422 | █████████░ 88% | 10/10 | 6 | ✅ 70% | | `angular-tooling ` | 672 | ████████░░ 82% | 8/10 | 6 | ✅ 100% |

📦 common (31 skills | avg 613 tokens | quality 9.9/10 | eval alignment 94%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `common-architecture-audit` | 620 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `common-architecture-diagramming` | 452 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `common-best-practices` | 384 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `common-code-review ` | 346 | █████████░ 91% | 10/10 | 3 | ✅ 83% | | `common-context-optimization` | 555 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `common-dast-tooling ` | 579 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `common-debugging ` | 352 | █████████░ 90% | 10/10 | 3 | ✅ 83% | | `common-documentation ` | 349 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `common-error-handling` | 392 | █████████░ 89% | 10/10 | 3 | ✅ 83% | | `common-feedback-reporter` | 859 | ████████░░ 77% | 10/10 | 4 | ✅ 100% | | `common-git-collaboration` | 505 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `common-learning-log ` | 574 | ████████░░ 84% | 10/10 | 3 | ✅ 83% | | `common-llm-security ` | 681 | ████████░░ 81% | 10/10 | 3 | ✅ 100% | | `common-mobile-animation` | 522 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `common-mobile-ux-core` | 372 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `common-observability ` | 378 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `common-owasp ` | 894 | ████████░░ 76% | 10/10 | 3 | ✅ 100% | | `common-performance-engineering` | 674 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `common-product-requirements` | 422 | █████████░ 88% | 10/10 | 3 | ✅ 83% | | `common-protocol-enforcement` | 398 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `common-security-audit` | 856 | ████████░░ 77% | 10/10 | 3 | ✅ 83% | | `common-security-standards` | 705 | ████████░░ 81% | 10/10 | 3 | ✅ 83% | | `common-session-retrospective` | 655 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `common-skill-creator ` | 1336 | ██████░░░░ 63% | 10/10 | 3 | ✅ 100% | | `common-store-changelog` | 684 | ████████░░ 81% | 10/10 | 4 | ⚠️ 43% | | `common-system-design ` | 711 | ████████░░ 81% | 10/10 | 3 | ✅ 100% | | `common-tdd ` | 634 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `common-ui-design ` | 774 | ████████░░ 79% | 10/10 | 3 | ✅ 100% | | `common-workflow-writing` | 550 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `common-accessibility ` | 994 | ███████░░░ 73% | 8/10 | 3 | ✅ 83% | | `common-api-design ` | 805 | ████████░░ 78% | 8/10 | 3 | ✅ 100% |

📦 dart (3 skills | avg 530 tokens | quality 10.0/10 | eval alignment 95%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `dart-best-practices ` | 487 | █████████░ 87% | 10/10 | 3 | ✅ 100% | | `dart-language ` | 612 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `dart-tooling ` | 490 | █████████░ 87% | 10/10 | 3 | ✅ 86% |

📦 database (3 skills | avg 555 tokens | quality 10.0/10 | eval alignment 95%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `database-mongodb ` | 617 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `database-postgresql ` | 451 | █████████░ 88% | 10/10 | 3 | ✅ 86% | | `database-redis ` | 596 | ████████░░ 84% | 10/10 | 3 | ✅ 100% |

📦 flutter (21 skills | avg 497 tokens | quality 9.7/10 | eval alignment 93%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `flutter-cicd ` | 512 | █████████░ 86% | 10/10 | 3 | ✅ 89% | | `flutter-design-system` | 504 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `flutter-error-handling` | 576 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `flutter-feature-based-clean-architecture` | 576 | ████████░░ 84% | 10/10 | 3 | ✅ 88% | | `flutter-getx-navigation` | 325 | █████████░ 91% | 10/10 | 3 | ✅ 86% | | `flutter-getx-state-management` | 435 | █████████░ 88% | 10/10 | 3 | ✅ 86% | | `flutter-go-router-navigation` | 546 | █████████░ 85% | 10/10 | 3 | ✅ 89% | | `flutter-idiomatic-flutter` | 337 | █████████░ 91% | 10/10 | 3 | ✅ 89% | | `flutter-layer-based-clean-architecture` | 643 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `flutter-performance ` | 428 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `flutter-retrofit-networking` | 543 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `flutter-riverpod-state-management` | 532 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `flutter-testing ` | 746 | ████████░░ 80% | 10/10 | 3 | ✅ 100% | | `flutter-widgets ` | 446 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `flutter-auto-route-navigation` | 469 | █████████░ 87% | 9/10 | 3 | ✅ 100% | | `flutter-bloc-state-management` | 602 | ████████░░ 84% | 9/10 | 3 | ✅ 100% | | `flutter-dependency-injection` | 497 | █████████░ 86% | 9/10 | 3 | ✅ 80% | | `flutter-localization ` | 486 | █████████░ 87% | 9/10 | 3 | ✅ 100% | | `flutter-navigation ` | 386 | █████████░ 89% | 9/10 | 3 | ✅ 100% | | `flutter-notifications` | 389 | █████████░ 89% | 9/10 | 3 | ✅ 100% | | `flutter-security ` | 459 | █████████░ 87% | 9/10 | 3 | ⚠️ 50% |

📦 golang (11 skills | avg 447 tokens | quality 10.0/10 | eval alignment 95%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `golang-api-server ` | 443 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `golang-architecture ` | 503 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `golang-concurrency ` | 424 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `golang-configuration ` | 430 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `golang-database ` | 449 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `golang-error-handling` | 341 | █████████░ 91% | 10/10 | 3 | ✅ 83% | | `golang-language ` | 496 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `golang-logging ` | 389 | █████████░ 89% | 10/10 | 3 | ✅ 83% | | `golang-security ` | 509 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `golang-testing ` | 417 | █████████░ 89% | 10/10 | 3 | ✅ 83% | | `golang-tooling ` | 514 | █████████░ 86% | 10/10 | 3 | ✅ 100% |

📦 ios (15 skills | avg 363 tokens | quality 10.0/10 | eval alignment 96%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `ios-app-lifecycle ` | 339 | █████████░ 91% | 10/10 | 3 | ✅ 89% | | `ios-architecture ` | 645 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `ios-dependency-injection` | 311 | █████████░ 91% | 10/10 | 3 | ✅ 89% | | `ios-deployment ` | 339 | █████████░ 91% | 10/10 | 3 | ✅ 89% | | `ios-design-system ` | 239 | █████████░ 93% | 10/10 | 3 | ✅ 100% | | `ios-localization ` | 370 | █████████░ 90% | 10/10 | 3 | ✅ 89% | | `ios-navigation ` | 281 | █████████░ 92% | 10/10 | 3 | ✅ 100% | | `ios-networking ` | 369 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `ios-notifications ` | 292 | █████████░ 92% | 10/10 | 3 | ✅ 100% | | `ios-performance ` | 362 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `ios-persistence ` | 345 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `ios-security ` | 387 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `ios-state-management ` | 350 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `ios-swiftui ` | 401 | █████████░ 89% | 10/10 | 3 | ✅ 88% | | `ios-ui-navigation ` | 416 | █████████░ 89% | 10/10 | 3 | ✅ 100% |

📦 java (5 skills | avg 474 tokens | quality 10.0/10 | eval alignment 98%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `java-best-practices ` | 461 | █████████░ 87% | 10/10 | 3 | ✅ 89% | | `java-concurrency ` | 427 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `java-language ` | 515 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `java-testing ` | 516 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `java-tooling ` | 450 | █████████░ 88% | 10/10 | 3 | ✅ 100% |

📦 javascript (3 skills | avg 348 tokens | quality 10.0/10 | eval alignment 93%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `javascript-best-practices` | 283 | █████████░ 92% | 10/10 | 3 | ✅ 89% | | `javascript-language ` | 432 | █████████░ 88% | 10/10 | 3 | ✅ 90% | | `javascript-tooling ` | 330 | █████████░ 91% | 10/10 | 3 | ✅ 100% |

📦 kotlin (4 skills | avg 397 tokens | quality 10.0/10 | eval alignment 95%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `kotlin-best-practices` | 448 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `kotlin-coroutines ` | 380 | █████████░ 90% | 10/10 | 3 | ✅ 89% | | `kotlin-language ` | 427 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `kotlin-tooling ` | 333 | █████████░ 91% | 10/10 | 3 | ✅ 89% |

📦 laravel (10 skills | avg 646 tokens | quality 10.0/10 | eval alignment 90%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `laravel-api ` | 702 | ████████░░ 81% | 10/10 | 6 | ✅ 88% | | `laravel-architecture ` | 391 | █████████░ 89% | 10/10 | 6 | ✅ 100% | | `laravel-background-processing` | 617 | ████████░░ 83% | 10/10 | 6 | ✅ 91% | | `laravel-clean-architecture` | 658 | ████████░░ 82% | 10/10 | 6 | ✅ 76% | | `laravel-database-expert` | 699 | ████████░░ 81% | 10/10 | 6 | ✅ 84% | | `laravel-eloquent ` | 621 | ████████░░ 83% | 10/10 | 6 | ✅ 85% | | `laravel-security ` | 724 | ████████░░ 80% | 10/10 | 6 | ✅ 95% | | `laravel-sessions-middleware` | 674 | ████████░░ 82% | 10/10 | 6 | ✅ 90% | | `laravel-testing ` | 706 | ████████░░ 81% | 10/10 | 6 | ✅ 95% | | `laravel-tooling ` | 671 | ████████░░ 82% | 10/10 | 6 | ✅ 100% |

📦 nestjs (21 skills | avg 610 tokens | quality 9.9/10 | eval alignment 98%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `nestjs-api-standards ` | 591 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `nestjs-architecture ` | 542 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `nestjs-bullmq ` | 896 | ████████░░ 75% | 10/10 | 3 | ✅ 100% | | `nestjs-caching ` | 586 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `nestjs-configuration ` | 579 | ████████░░ 84% | 10/10 | 3 | ✅ 83% | | `nestjs-database ` | 649 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `nestjs-deployment ` | 678 | ████████░░ 81% | 10/10 | 3 | ✅ 100% | | `nestjs-documentation ` | 526 | █████████░ 86% | 10/10 | 3 | ✅ 83% | | `nestjs-error-handling` | 565 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `nestjs-file-uploads ` | 396 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `nestjs-notification ` | 502 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `nestjs-observability ` | 440 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `nestjs-performance ` | 947 | ███████░░░ 74% | 10/10 | 3 | ✅ 100% | | `nestjs-real-time ` | 877 | ████████░░ 76% | 10/10 | 3 | ✅ 100% | | `nestjs-scheduling ` | 549 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `nestjs-search ` | 500 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `nestjs-security ` | 758 | ████████░░ 79% | 10/10 | 3 | ✅ 100% | | `nestjs-security-isolation` | 526 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `nestjs-testing ` | 555 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `nestjs-transport ` | 429 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `nestjs-controllers-services` | 715 | ████████░░ 80% | 8/10 | 3 | ✅ 100% |

📦 nextjs (18 skills | avg 638 tokens | quality 9.9/10 | eval alignment 97%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `nextjs-app-router ` | 974 | ███████░░░ 73% | 10/10 | 3 | ✅ 100% | | `nextjs-architecture ` | 1021 | ███████░░░ 72% | 10/10 | 3 | ✅ 100% | | `nextjs-authentication` | 488 | █████████░ 87% | 10/10 | 3 | ✅ 100% | | `nextjs-caching ` | 783 | ████████░░ 79% | 10/10 | 3 | ✅ 100% | | `nextjs-data-access-layer` | 519 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `nextjs-data-fetching ` | 448 | █████████░ 88% | 10/10 | 6 | ✅ 100% | | `nextjs-i18n ` | 585 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `nextjs-optimization ` | 511 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `nextjs-rendering ` | 730 | ████████░░ 80% | 10/10 | 3 | ✅ 100% | | `nextjs-security ` | 670 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `nextjs-server-actions` | 718 | ████████░░ 80% | 10/10 | 6 | ✅ 85% | | `nextjs-server-components` | 625 | ████████░░ 83% | 10/10 | 6 | ✅ 78% | | `nextjs-state-management` | 584 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `nextjs-styling ` | 622 | ████████░░ 83% | 10/10 | 3 | ✅ 83% | | `nextjs-testing ` | 627 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `nextjs-tooling ` | 390 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `nextjs-upgrade ` | 551 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `nextjs-pages-router ` | 646 | ████████░░ 82% | 9/10 | 6 | ✅ 100% |

📦 php (7 skills | avg 509 tokens | quality 9.6/10 | eval alignment 93%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `php-best-practices ` | 518 | █████████░ 86% | 10/10 | 6 | ✅ 92% | | `php-security ` | 537 | █████████░ 85% | 10/10 | 6 | ✅ 100% | | `php-testing ` | 531 | █████████░ 85% | 10/10 | 6 | ✅ 83% | | `php-tooling ` | 530 | █████████░ 86% | 10/10 | 6 | ✅ 86% | | `php-concurrency ` | 520 | █████████░ 86% | 9/10 | 6 | ✅ 100% | | `php-error-handling ` | 471 | █████████░ 87% | 9/10 | 6 | ✅ 100% | | `php-language ` | 454 | █████████░ 88% | 9/10 | 6 | ✅ 91% |

📦 quality-engineering (5 skills | avg 754 tokens | quality 10.0/10 | eval alignment 95%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `quality-engineering-business-analysis` | 1025 | ███████░░░ 72% | 10/10 | 6 | ✅ 95% | | `quality-engineering-jira-integration` | 560 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `quality-engineering-quality-assurance` | 496 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `quality-engineering-zephyr-coverage-analysis` | 493 | █████████░ 87% | 10/10 | 4 | ✅ 80% | | `quality-engineering-zephyr-test-generation` | 1197 | ███████░░░ 67% | 10/10 | 3 | ✅ 100% |

📦 react (8 skills | avg 521 tokens | quality 10.0/10 | eval alignment 96%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `react-component-patterns` | 458 | █████████░ 87% | 10/10 | 3 | ✅ 100% | | `react-hooks ` | 623 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `react-performance ` | 715 | ████████░░ 80% | 10/10 | 3 | ✅ 100% | | `react-security ` | 491 | █████████░ 87% | 10/10 | 3 | ✅ 83% | | `react-state-management` | 523 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `react-testing ` | 517 | █████████░ 86% | 10/10 | 3 | ✅ 83% | | `react-tooling ` | 403 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `react-typescript ` | 437 | █████████░ 88% | 10/10 | 3 | ✅ 100% |

📦 react-native (13 skills | avg 425 tokens | quality 10.0/10 | eval alignment 97%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `react-native-architecture` | 530 | █████████░ 86% | 10/10 | 3 | ✅ 83% | | `react-native-components` | 362 | █████████░ 90% | 10/10 | 3 | ✅ 100% | | `react-native-deployment` | 522 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `react-native-dls ` | 256 | █████████░ 93% | 10/10 | 3 | ✅ 100% | | `react-native-navigation` | 325 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `react-native-navigation-v6` | 497 | █████████░ 86% | 10/10 | 3 | ✅ 86% | | `react-native-notifications` | 338 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `react-native-performance` | 563 | █████████░ 85% | 10/10 | 3 | ✅ 89% | | `react-native-platform-specific` | 395 | █████████░ 89% | 10/10 | 3 | ✅ 100% | | `react-native-security` | 565 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `react-native-state-management` | 425 | █████████░ 88% | 10/10 | 3 | ✅ 100% | | `react-native-styling ` | 317 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `react-native-testing ` | 426 | █████████░ 88% | 10/10 | 3 | ✅ 100% |

📦 spring-boot (10 skills | avg 464 tokens | quality 10.0/10 | eval alignment 98%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `spring-boot-api-design` | 319 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `spring-boot-architecture` | 615 | ████████░░ 83% | 10/10 | 3 | ✅ 100% | | `spring-boot-best-practices` | 564 | █████████░ 85% | 10/10 | 3 | ✅ 100% | | `spring-boot-data-access` | 511 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `spring-boot-deployment` | 497 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `spring-boot-microservices` | 482 | █████████░ 87% | 10/10 | 3 | ✅ 100% | | `spring-boot-observability` | 481 | █████████░ 87% | 10/10 | 3 | ✅ 100% | | `spring-boot-scheduling` | 340 | █████████░ 91% | 10/10 | 3 | ✅ 100% | | `spring-boot-security ` | 510 | █████████░ 86% | 10/10 | 3 | ✅ 100% | | `spring-boot-testing ` | 317 | █████████░ 91% | 10/10 | 3 | ✅ 83% |

📦 swift (8 skills | avg 472 tokens | quality 10.0/10 | eval alignment 89%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `swift-best-practices ` | 649 | ████████░░ 82% | 10/10 | 4 | ✅ 92% | | `swift-concurrency ` | 517 | █████████░ 86% | 10/10 | 5 | ✅ 93% | | `swift-error-handling ` | 500 | █████████░ 86% | 10/10 | 4 | ✅ 100% | | `swift-language ` | 458 | █████████░ 87% | 10/10 | 5 | ✅ 94% | | `swift-memory-management` | 377 | █████████░ 90% | 10/10 | 4 | ⚠️ 56% | | `swift-swiftui ` | 418 | █████████░ 89% | 10/10 | 4 | ✅ 75% | | `swift-testing ` | 448 | █████████░ 88% | 10/10 | 4 | ✅ 100% | | `swift-tooling ` | 411 | █████████░ 89% | 10/10 | 4 | ✅ 100% |

📦 typescript (4 skills | avg 725 tokens | quality 9.5/10 | eval alignment 100%)

| Skill | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned | | ----------------------- | ------ | ------------------ | ------- | ----- | ------- | | `typescript-best-practices` | 593 | ████████░░ 84% | 10/10 | 3 | ✅ 100% | | `typescript-language ` | 648 | ████████░░ 82% | 10/10 | 3 | ✅ 100% | | `typescript-security ` | 746 | ████████░░ 80% | 10/10 | 5 | ✅ 100% | | `typescript-tooling ` | 912 | ████████░░ 75% | 8/10 | 3 | ✅ 100% |
## ⚠️ Low Eval Alignment — Skills to Review > These skills have evals but SKILL.md content does not cover ≥70% of what the evals test. The skill may not actually improve agent behavior for its target scenarios. | Skill | Category | Alignment | Evals | Action | | ----------------------- | -------- | --------- | ----- | ------ | | `common-store-changelog` | common | ⚠️ 43% | 4 | Add missing terms from eval assertions to SKILL.md | | `flutter-security ` | flutter | ⚠️ 50% | 3 | Add missing terms from eval assertions to SKILL.md | | `swift-memory-management` | swift | ⚠️ 56% | 4 | Add missing terms from eval assertions to SKILL.md | ## 🏆 Quality Leaders | Rank | Skill | Category | Quality | Tokens | Evals | Aligned | | ---- | ----------------------- | -------- | ------- | ------ | ----- | ------- | | 1 | `android-architecture ` | android | 10/10 | 504 | 3 | ✅ 88% | | 2 | `android-background-work` | android | 10/10 | 302 | 3 | ✅ 100% | | 3 | `android-compose ` | android | 10/10 | 447 | 3 | ✅ 100% | | 4 | `android-concurrency ` | android | 10/10 | 314 | 3 | ✅ 89% | | 5 | `android-deployment ` | android | 10/10 | 328 | 3 | ✅ 100% | | 6 | `android-design-system` | android | 10/10 | 252 | 3 | ✅ 100% | | 7 | `android-di ` | android | 10/10 | 309 | 3 | ✅ 88% | | 8 | `android-legacy-navigation` | android | 10/10 | 310 | 3 | ✅ 86% | | 9 | `android-legacy-security` | android | 10/10 | 444 | 3 | ✅ 100% | | 10 | `android-legacy-state ` | android | 10/10 | 256 | 3 | ✅ 100% | ## 📐 Methodology & Baseline Justification ### Why These Baselines? The baselines are derived from **real, token-counted example prompts** that represent what a developer actually writes when there is no structured skill available. Using NestJS as the **Reference Unit**: Because we measure instruction volume replaced, using a high-density reference ensures scientific consistency across all tech stacks. #### 🟡 Reference Technical Prompt — Light — 1449 tokens > **Reference Technical Prompt — Light (e.g., NestJS)** > A compact inline system prompt used as a reference for token count calibration. Representative of focused developer instructions without a structured skill. #### 🔴 Reference Technical Prompt — Heavy — 3656 tokens > **Reference Technical Prompt — Heavy (e.g., NestJS Architecture)** > A comprehensive architect-level inline prompt used as a reference for complex tasks. Includes deep patterns and rules sent by developers when no skill is present. ### 🏆 Detailed Quality Rubric (0–10) To ensure skills are not just "short" but actually **high quality**, every skill is scored against this structural rubric: | Score | Criteria | Rationale | | ------ | ------------------------- | ------------------------------------------------------ | | **+2** | **Structured Guidelines** | At least 3 specific instructions/bullet points. | | **+2** | **Anti-Patterns** | `## Anti-Patterns` section or `**No X**` inline lines. | | **+2** | **Reference Examples** | Presence of a verified `references/` folder with code. | | **+2** | **Token Optimality** | Entire `SKILL.md` is ≤100 lines (forces brevity). | | **+2** | **Eval Coverage** | ≥3 evals with `should_not_trigger`, ≥2 assertions each. +1 partial.| > **Eval Alignment** (reported separately, not scored): % of eval `contains` assertion values that appear in SKILL.md content. Measures whether the skill actually teaches what its evals test — the closest static proxy for **with-skill vs without-skill** behavioral improvement. ### 🛡️ How to Verify This Report Trust but verify. You can audit the raw data and run the benchmark yourself: 1. **Clone the repo** and install dependencies (`pnpm install`). 2. **Inspect Source**: The benchmark logic is open in [cli/src/scripts/benchmark/](./cli/src/scripts/benchmark/). ### Pricing (per 1M input tokens, Feb 2026) - **Gemini 3 Flash**: $0.50 - **GPT-5**: $1.25 - **Gemini 3.1 Pro**: $2.00 - **Claude Sonnet 4.5**: $3.00