# 📊 Agent Skill Benchmark Report

> Generated: 2026-03-30T04:14:33.036Z
> Token counting: `ceil(characters / 4)` — cl100k_base approximation.
> Baselines: derived from **real, measured example prompts** (see Methodology).
> Quality: structural rubric (0–10), no live LLM calls required.

## ❓ How to Read This Report

This benchmark answers: **"How many tokens and dollars does an agent skill save compared to a developer writing the same guidance inline?"**

**WITHOUT a skill**: A developer writes domain knowledge directly into the prompt every time (Baseline).
**WITH a skill**: The agent loads the SKILL.md file (~400 tokens) — structured, reusable, cached.

**Eval Alignment**: % of eval assertion values that appear in SKILL.md. High alignment means the skill actually teaches what the evals test — the static proxy for "with skill > without skill" behavioral improvement.

## 🔢 Executive Summary

| Metric                            | Value                                         |
| --------------------------------- | --------------------------------------------- |
| Total Skills Benchmarked          | **238**                                       |
| Avg. Tokens WITH Skill (SKILL.md) | **527 tokens**                                |
| Baseline: Light prompt (no skill) | **1449 tokens** ↓ see Methodology             |
| Baseline: Heavy prompt (no skill) | **3656 tokens** ↓ see Methodology             |
| Avg. Token Savings vs Light       | **64%** (922 tokens/call)                     |
| Avg. Token Savings vs Heavy       | **86%** (3129 tokens/call)                    |
| Avg. Quality Score                | **9.8/10**                                    |
| Skills with Evals                 | **236 / 238**                                 |
| Avg. Eval Alignment               | **91%** (eval assertions covered by SKILL.md) |

## 📜 History

| Version | Date       | Skills | Avg Tokens | Savings (%) | Quality | Report                                       |
| ------- | ---------- | ------ | ---------- | ----------- | ------- | -------------------------------------------- |
| v2.0.1  | 2026-03-30 | 238    | 527        | 86%         | 9.8/10  | [Full Report](benchmarks/archive/v2.0.1.md)  |
| v2.0.0  | 2026-03-25 | 235    | 523        | 86%         | 9.9/10  | [Full Report](benchmarks/archive/v2.0.0.md)  |
| v1.10.3 | 2026-03-21 | 234    | 505        | 86%         | 9.8/10  | [Full Report](benchmarks/archive/v1.10.3.md) |
| v1.10.1 | 2026-03-16 | 229    | 428        | 88%         | 9.9/10  | [Full Report](benchmarks/archive/v1.10.1.md) |
| v1.10.0 | 2026-03-16 | 229    | 434        | 88%         | 7/10    | [Full Report](benchmarks/archive/v1.10.0.md) |
| v1.9.3  | 2026-03-15 | 229    | 460        | 87%         | 8.9/10  | [Full Report](benchmarks/archive/v1.9.3.md)  |
| v1.9.2  | 2026-03-07 | 228    | 458        | 87%         | 8.9/10  | [Full Report](benchmarks/archive/v1.9.2.md)  |
| v1.9.1  | 2026-03-07 | 228    | 458        | 87%         | 8.9/10  | [Full Report](benchmarks/archive/v1.9.1.md)  |
| v1.9.0  | 2026-03-05 | 228    | 457        | 88%         | 8.9/10  | [Full Report](benchmarks/archive/v1.9.0.md)  |
| v1.8.0  | 2026-03-02 | 228    | 443        | 88%         | 8.9/10  | [Full Report](benchmarks/archive/v1.8.0.md)  |
| v1.7.3  | 2026-02-25 | 222    | 418        | 89%         | 8.9/10  | [Full Report](benchmarks/archive/v1.7.3.md)  |
| v1.7.2  | 2026-02-25 | 220    | 413        | 89%         | 8.9/10  | [Full Report](benchmarks/archive/v1.7.2.md)  |

### 💰 Cost Comparison — Per Single Call (Average Skill)

> Comparison based on **Heavy Baseline** vs. modern and speculative models.

| Model             | Original Cost | Skill Cost | Net Savings    | % Saved |
| ----------------- | ------------- | ---------- | -------------- | ------- |
| Gemini 3 Flash    | $0.0018280    | $0.0002635 | **$0.0015645** | 86%     |
| GPT-5             | $0.0045700    | $0.0006588 | **$0.0039112** | 86%     |
| Gemini 3.1 Pro    | $0.0073120    | $0.0010540 | **$0.0062580** | 86%     |
| Claude Sonnet 4.5 | $0.0109680    | $0.0015810 | **$0.0093870** | 86%     |

### 📈 Monthly Savings at Scale — (Avg Skill vs Heavy Prompt)

| Daily Calls | Original Cost/mo | Monthly Savings (1 skill) | Monthly Savings (50 skills) | Model             |
| ----------- | ---------------- | ------------------------- | --------------------------- | ----------------- |
| 1,000       | $137.1000 /mo    | $117.3375 /mo             | $5866.8750 /mo              | GPT-5             |
| 1,000       | $329.0400 /mo    | $281.6100 /mo             | $14080.5000 /mo             | Claude Sonnet 4.5 |
| 1,000       | $219.3600 /mo    | $187.7400 /mo             | $9387.0000 /mo              | Gemini 3.1 Pro    |

## 📦 Per-Category Summary

<details>
<summary><h3>📦 android (22 skills | avg 350 tokens | quality 10.0/10 | eval alignment 92%)</h3></summary>

| Skill                          | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ------------------------------ | ------ | ------------------ | ------- | ----- | ------- |
| `android-architecture`         | 505    | █████████░ 86%     | 10/10   | 3     | ✅ 88%  |
| `android-background-work`      | 305    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `android-compose`              | 450    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `android-concurrency`          | 315    | █████████░ 91%     | 10/10   | 3     | ✅ 89%  |
| `android-deployment`           | 328    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `android-design-system`        | 300    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `android-di`                   | 311    | █████████░ 91%     | 10/10   | 3     | ✅ 88%  |
| `android-legacy-navigation`    | 311    | █████████░ 91%     | 10/10   | 3     | ✅ 86%  |
| `android-legacy-security`      | 446    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `android-legacy-state`         | 258    | █████████░ 93%     | 10/10   | 3     | ✅ 100% |
| `android-navigation`           | 277    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `android-navigation-type-safe` | 267    | █████████░ 93%     | 10/10   | 3     | ✅ 83%  |
| `android-networking`           | 415    | █████████░ 89%     | 10/10   | 3     | ⚠️ 67%  |
| `android-notifications`        | 436    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `android-performance`          | 385    | █████████░ 89%     | 10/10   | 3     | ✅ 88%  |
| `android-persistence`          | 298    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `android-resources`            | 412    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |
| `android-security`             | 398    | █████████░ 89%     | 10/10   | 3     | ✅ 86%  |
| `android-state`                | 363    | █████████░ 90%     | 10/10   | 3     | ✅ 88%  |
| `android-testing`              | 318    | █████████░ 91%     | 10/10   | 3     | ✅ 88%  |
| `android-tooling`              | 296    | █████████░ 92%     | 10/10   | 3     | ✅ 88%  |
| `android-xml-views`            | 297    | █████████░ 92%     | 10/10   | 3     | ✅ 88%  |

</details>

<details>
<summary><h3>📦 angular (16 skills | avg 502 tokens | quality 9.9/10 | eval alignment 84%)</h3></summary>

| Skill                          | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ------------------------------ | ------ | ------------------ | ------- | ----- | ------- |
| `angular-architecture`         | 620    | ████████░░ 83%     | 10/10   | 6     | ✅ 95%  |
| `angular-component-patterns`   | 553    | █████████░ 85%     | 10/10   | 6     | ✅ 91%  |
| `angular-components`           | 568    | ████████░░ 84%     | 10/10   | 6     | ✅ 87%  |
| `angular-dependency-injection` | 525    | █████████░ 86%     | 10/10   | 6     | ✅ 89%  |
| `angular-directives-pipes`     | 496    | █████████░ 86%     | 10/10   | 6     | ✅ 95%  |
| `angular-forms`                | 346    | █████████░ 91%     | 10/10   | 6     | ⚠️ 58%  |
| `angular-http-client`          | 560    | █████████░ 85%     | 10/10   | 6     | ✅ 96%  |
| `angular-performance`          | 476    | █████████░ 87%     | 10/10   | 6     | ✅ 82%  |
| `angular-routing`              | 381    | █████████░ 90%     | 10/10   | 6     | ⚠️ 43%  |
| `angular-rxjs-interop`         | 507    | █████████░ 86%     | 10/10   | 6     | ✅ 100% |
| `angular-security`             | 500    | █████████░ 86%     | 10/10   | 6     | ✅ 89%  |
| `angular-ssr`                  | 473    | █████████░ 87%     | 10/10   | 6     | ✅ 90%  |
| `angular-state-management`     | 407    | █████████░ 89%     | 10/10   | 6     | ✅ 81%  |
| `angular-style-guide`          | 521    | █████████░ 86%     | 10/10   | 6     | ✅ 81%  |
| `angular-testing`              | 425    | █████████░ 88%     | 10/10   | 6     | ✅ 70%  |
| `angular-tooling`              | 675    | ████████░░ 82%     | 8/10    | 6     | ✅ 100% |

</details>

<details>
<summary><h3>📦 common (31 skills | avg 622 tokens | quality 9.7/10 | eval alignment 91%)</h3></summary>

| Skill                             | Tokens | Savings (vs Heavy) | Quality | Evals   | Aligned |
| --------------------------------- | ------ | ------------------ | ------- | ------- | ------- |
| `common-architecture-audit`       | 623    | ████████░░ 83%     | 10/10   | 3       | ✅ 100% |
| `common-architecture-diagramming` | 453    | █████████░ 88%     | 10/10   | 3       | ✅ 100% |
| `common-best-practices`           | 391    | █████████░ 89%     | 10/10   | 3       | ✅ 91%  |
| `common-code-review`              | 383    | █████████░ 90%     | 10/10   | 3       | ✅ 100% |
| `common-context-optimization`     | 574    | ████████░░ 84%     | 10/10   | 3       | ✅ 100% |
| `common-debugging`                | 396    | █████████░ 89%     | 10/10   | 3       | ✅ 100% |
| `common-documentation`            | 483    | █████████░ 87%     | 10/10   | 3       | ✅ 89%  |
| `common-error-handling`           | 395    | █████████░ 89%     | 10/10   | 3       | ✅ 78%  |
| `common-feedback-reporter`        | 670    | ████████░░ 82%     | 10/10   | 3       | ✅ 100% |
| `common-git-collaboration`        | 507    | █████████░ 86%     | 10/10   | 3       | ✅ 100% |
| `common-mobile-animation`         | 542    | █████████░ 85%     | 10/10   | 3       | ✅ 100% |
| `common-mobile-ux-core`           | 369    | █████████░ 90%     | 10/10   | 3       | ✅ 100% |
| `common-observability`            | 380    | █████████░ 90%     | 10/10   | 3       | ✅ 100% |
| `common-performance-engineering`  | 677    | ████████░░ 81%     | 10/10   | 3       | ✅ 78%  |
| `common-product-requirements`     | 431    | █████████░ 88%     | 10/10   | 3       | ✅ 100% |
| `common-protocol-enforcement`     | 466    | █████████░ 87%     | 10/10   | 3       | ✅ 100% |
| `common-security-audit`           | 883    | ████████░░ 76%     | 10/10   | 3       | ✅ 100% |
| `common-security-standards`       | 709    | ████████░░ 81%     | 10/10   | 3       | ✅ 100% |
| `common-session-retrospective`    | 659    | ████████░░ 82%     | 10/10   | 3       | ✅ 100% |
| `common-skill-creator`            | 1238   | ███████░░░ 66%     | 10/10   | 3       | ✅ 75%  |
| `common-store-changelog`          | 715    | ████████░░ 80%     | 10/10   | 4       | ⚠️ 43%  |
| `common-system-design`            | 713    | ████████░░ 80%     | 10/10   | 3       | ✅ 100% |
| `common-tdd`                      | 647    | ████████░░ 82%     | 10/10   | 3       | ⚠️ 0%   |
| `common-ui-design`                | 784    | ████████░░ 79%     | 10/10   | 6       | ✅ 100% |
| `common-workflow-writing`         | 563    | █████████░ 85%     | 10/10   | 3       | ✅ 100% |
| `common-llm-security`             | 688    | ████████░░ 81%     | 9/10    | 5       | ✅ 100% |
| `common-owasp`                    | 901    | ████████░░ 75%     | 9/10    | 5       | ✅ 91%  |
| `common-accessibility`            | 1009   | ███████░░░ 72%     | 8/10    | 3       | ✅ 100% |
| `common-api-design`               | 844    | ████████░░ 77%     | 8/10    | 3       | ✅ 100% |
| `common-dast-tooling`             | 607    | ████████░░ 83%     | 8/10    | ❌ none | n/a     |
| `common-learning-log`             | 571    | ████████░░ 84%     | 8/10    | ❌ none | n/a     |

</details>

<details>
<summary><h3>📦 dart (3 skills | avg 560 tokens | quality 10.0/10 | eval alignment 100%)</h3></summary>

| Skill                 | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| --------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `dart-best-practices` | 526    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `dart-language`       | 637    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `dart-tooling`        | 518    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 database (3 skills | avg 566 tokens | quality 10.0/10 | eval alignment 95%)</h3></summary>

| Skill                 | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| --------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `database-mongodb`    | 624    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `database-postgresql` | 471    | █████████░ 87%     | 10/10   | 3     | ✅ 86%  |
| `database-redis`      | 602    | ████████░░ 84%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 flutter (21 skills | avg 542 tokens | quality 9.5/10 | eval alignment 91%)</h3></summary>

| Skill                                      | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ------------------------------------------ | ------ | ------------------ | ------- | ----- | ------- |
| `flutter-cicd`                             | 575    | ████████░░ 84%     | 10/10   | 3     | ✅ 100% |
| `flutter-design-system`                    | 525    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `flutter-error-handling`                   | 616    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `flutter-feature-based-clean-architecture` | 713    | ████████░░ 80%     | 10/10   | 3     | ⚠️ 30%  |
| `flutter-getx-navigation`                  | 382    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `flutter-idiomatic-flutter`                | 370    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `flutter-layer-based-clean-architecture`   | 677    | ████████░░ 81%     | 10/10   | 3     | ✅ 100% |
| `flutter-performance`                      | 466    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `flutter-retrofit-networking`              | 565    | █████████░ 85%     | 10/10   | 3     | ⚠️ 67%  |
| `flutter-riverpod-state-management`        | 557    | █████████░ 85%     | 10/10   | 3     | ⚠️ 50%  |
| `flutter-testing`                          | 750    | ████████░░ 79%     | 10/10   | 3     | ✅ 100% |
| `flutter-widgets`                          | 499    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `flutter-auto-route-navigation`            | 511    | █████████░ 86%     | 9/10    | 3     | ✅ 100% |
| `flutter-bloc-state-management`            | 666    | ████████░░ 82%     | 9/10    | 3     | ✅ 100% |
| `flutter-dependency-injection`             | 526    | █████████░ 86%     | 9/10    | 3     | ✅ 80%  |
| `flutter-localization`                     | 500    | █████████░ 86%     | 9/10    | 3     | ✅ 100% |
| `flutter-navigation`                       | 400    | █████████░ 89%     | 9/10    | 3     | ✅ 100% |
| `flutter-notifications`                    | 415    | █████████░ 89%     | 9/10    | 3     | ✅ 100% |
| `flutter-security`                         | 502    | █████████░ 86%     | 9/10    | 3     | ✅ 75%  |
| `flutter-getx-state-management`            | 506    | █████████░ 86%     | 8/10    | 3     | ✅ 100% |
| `flutter-go-router-navigation`             | 655    | ████████░░ 82%     | 8/10    | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 golang (11 skills | avg 449 tokens | quality 9.9/10 | eval alignment 93%)</h3></summary>

| Skill                   | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ----------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `golang-api-server`     | 445    | █████████░ 88%     | 10/10   | 3     | ✅ 80%  |
| `golang-architecture`   | 500    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `golang-concurrency`    | 427    | █████████░ 88%     | 10/10   | 3     | ⚠️ 67%  |
| `golang-configuration`  | 434    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `golang-database`       | 448    | █████████░ 88%     | 10/10   | 3     | ✅ 71%  |
| `golang-error-handling` | 345    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `golang-language`       | 499    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `golang-logging`        | 390    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |
| `golang-security`       | 511    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `golang-testing`        | 418    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |
| `golang-tooling`        | 518    | █████████░ 86%     | 9/10    | 4     | ✅ 100% |

</details>

<details>
<summary><h3>📦 ios (15 skills | avg 368 tokens | quality 10.0/10 | eval alignment 87%)</h3></summary>

| Skill                      | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| -------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `ios-app-lifecycle`        | 345    | █████████░ 91%     | 10/10   | 3     | ✅ 89%  |
| `ios-architecture`         | 663    | ████████░░ 82%     | 10/10   | 3     | ✅ 100% |
| `ios-dependency-injection` | 307    | █████████░ 92%     | 10/10   | 3     | ✅ 89%  |
| `ios-deployment`           | 337    | █████████░ 91%     | 10/10   | 3     | ✅ 89%  |
| `ios-design-system`        | 240    | █████████░ 93%     | 10/10   | 3     | ✅ 100% |
| `ios-localization`         | 372    | █████████░ 90%     | 10/10   | 3     | ✅ 78%  |
| `ios-navigation`           | 295    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `ios-networking`           | 371    | █████████░ 90%     | 10/10   | 3     | ⚠️ 56%  |
| `ios-notifications`        | 310    | █████████░ 92%     | 10/10   | 3     | ✅ 100% |
| `ios-performance`          | 363    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `ios-persistence`          | 343    | █████████░ 91%     | 10/10   | 3     | ⚠️ 67%  |
| `ios-security`             | 379    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `ios-state-management`     | 350    | █████████░ 90%     | 10/10   | 3     | ⚠️ 56%  |
| `ios-swiftui`              | 429    | █████████░ 88%     | 10/10   | 3     | ✅ 88%  |
| `ios-ui-navigation`        | 417    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 java (5 skills | avg 494 tokens | quality 10.0/10 | eval alignment 98%)</h3></summary>

| Skill                 | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| --------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `java-best-practices` | 479    | █████████░ 87%     | 10/10   | 3     | ✅ 89%  |
| `java-concurrency`    | 448    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `java-language`       | 535    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `java-testing`        | 534    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `java-tooling`        | 473    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 javascript (3 skills | avg 419 tokens | quality 10.0/10 | eval alignment 100%)</h3></summary>

| Skill                       | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| --------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `javascript-best-practices` | 403    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |
| `javascript-language`       | 511    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `javascript-tooling`        | 342    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 kotlin (4 skills | avg 410 tokens | quality 10.0/10 | eval alignment 95%)</h3></summary>

| Skill                   | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ----------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `kotlin-best-practices` | 466    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `kotlin-coroutines`     | 384    | █████████░ 89%     | 10/10   | 3     | ✅ 89%  |
| `kotlin-language`       | 445    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `kotlin-tooling`        | 346    | █████████░ 91%     | 10/10   | 3     | ✅ 89%  |

</details>

<details>
<summary><h3>📦 laravel (10 skills | avg 650 tokens | quality 10.0/10 | eval alignment 82%)</h3></summary>

| Skill                           | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ------------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `laravel-api`                   | 707    | ████████░░ 81%     | 10/10   | 6     | ✅ 88%  |
| `laravel-architecture`          | 395    | █████████░ 89%     | 10/10   | 6     | ⚠️ 17%  |
| `laravel-background-processing` | 621    | ████████░░ 83%     | 10/10   | 6     | ✅ 91%  |
| `laravel-clean-architecture`    | 662    | ████████░░ 82%     | 10/10   | 6     | ✅ 76%  |
| `laravel-database-expert`       | 703    | ████████░░ 81%     | 10/10   | 6     | ✅ 84%  |
| `laravel-eloquent`              | 623    | ████████░░ 83%     | 10/10   | 6     | ✅ 85%  |
| `laravel-security`              | 727    | ████████░░ 80%     | 10/10   | 6     | ✅ 95%  |
| `laravel-sessions-middleware`   | 675    | ████████░░ 82%     | 10/10   | 6     | ✅ 90%  |
| `laravel-testing`               | 708    | ████████░░ 81%     | 10/10   | 6     | ✅ 95%  |
| `laravel-tooling`               | 675    | ████████░░ 82%     | 10/10   | 6     | ✅ 100% |

</details>

<details>
<summary><h3>📦 nestjs (21 skills | avg 632 tokens | quality 9.9/10 | eval alignment 98%)</h3></summary>

| Skill                         | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ----------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `nestjs-api-standards`        | 628    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `nestjs-architecture`         | 551    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `nestjs-bullmq`               | 900    | ████████░░ 75%     | 10/10   | 3     | ✅ 100% |
| `nestjs-caching`              | 616    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `nestjs-configuration`        | 611    | ████████░░ 83%     | 10/10   | 3     | ✅ 83%  |
| `nestjs-database`             | 681    | ████████░░ 81%     | 10/10   | 3     | ✅ 100% |
| `nestjs-deployment`           | 717    | ████████░░ 80%     | 10/10   | 3     | ✅ 100% |
| `nestjs-documentation`        | 542    | █████████░ 85%     | 10/10   | 3     | ✅ 83%  |
| `nestjs-error-handling`       | 587    | ████████░░ 84%     | 10/10   | 3     | ✅ 100% |
| `nestjs-file-uploads`         | 431    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `nestjs-notification`         | 511    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `nestjs-observability`        | 463    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `nestjs-performance`          | 974    | ███████░░░ 73%     | 10/10   | 3     | ✅ 100% |
| `nestjs-real-time`            | 905    | ████████░░ 75%     | 10/10   | 3     | ✅ 100% |
| `nestjs-scheduling`           | 577    | ████████░░ 84%     | 10/10   | 3     | ✅ 100% |
| `nestjs-search`               | 533    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `nestjs-security`             | 759    | ████████░░ 79%     | 10/10   | 3     | ✅ 100% |
| `nestjs-security-isolation`   | 536    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `nestjs-testing`              | 556    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `nestjs-transport`            | 451    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `nestjs-controllers-services` | 747    | ████████░░ 80%     | 8/10    | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 nextjs (18 skills | avg 642 tokens | quality 9.6/10 | eval alignment 85%)</h3></summary>

| Skill                      | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| -------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `nextjs-app-router`        | 987    | ███████░░░ 73%     | 10/10   | 6     | ✅ 100% |
| `nextjs-architecture`      | 1065   | ███████░░░ 71%     | 10/10   | 6     | ✅ 93%  |
| `nextjs-authentication`    | 492    | █████████░ 87%     | 10/10   | 6     | ⚠️ 61%  |
| `nextjs-caching`           | 813    | ████████░░ 78%     | 10/10   | 6     | ✅ 100% |
| `nextjs-data-access-layer` | 523    | █████████░ 86%     | 10/10   | 6     | ✅ 88%  |
| `nextjs-data-fetching`     | 467    | █████████░ 87%     | 10/10   | 6     | ✅ 100% |
| `nextjs-optimization`      | 512    | █████████░ 86%     | 10/10   | 6     | ✅ 87%  |
| `nextjs-rendering`         | 736    | ████████░░ 80%     | 10/10   | 6     | ✅ 100% |
| `nextjs-server-actions`    | 735    | ████████░░ 80%     | 10/10   | 6     | ✅ 85%  |
| `nextjs-server-components` | 635    | ████████░░ 83%     | 10/10   | 6     | ✅ 78%  |
| `nextjs-upgrade`           | 559    | █████████░ 85%     | 10/10   | 6     | ⚠️ 64%  |
| `nextjs-i18n`              | 594    | ████████░░ 84%     | 9/10    | 6     | ✅ 100% |
| `nextjs-pages-router`      | 654    | ████████░░ 82%     | 9/10    | 6     | ✅ 100% |
| `nextjs-security`          | 678    | ████████░░ 81%     | 9/10    | 6     | ✅ 100% |
| `nextjs-state-management`  | 442    | █████████░ 88%     | 9/10    | 6     | ⚠️ 0%   |
| `nextjs-styling`           | 654    | ████████░░ 82%     | 9/10    | 6     | ✅ 100% |
| `nextjs-testing`           | 622    | ████████░░ 83%     | 9/10    | 6     | ✅ 100% |
| `nextjs-tooling`           | 392    | █████████░ 89%     | 9/10    | 6     | ✅ 73%  |

</details>

<details>
<summary><h3>📦 php (7 skills | avg 513 tokens | quality 9.6/10 | eval alignment 83%)</h3></summary>

| Skill                | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| -------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `php-best-practices` | 522    | █████████░ 86%     | 10/10   | 6     | ✅ 92%  |
| `php-security`       | 538    | █████████░ 85%     | 10/10   | 6     | ✅ 100% |
| `php-testing`        | 533    | █████████░ 85%     | 10/10   | 6     | ⚠️ 13%  |
| `php-tooling`        | 546    | █████████░ 85%     | 10/10   | 6     | ✅ 86%  |
| `php-concurrency`    | 525    | █████████░ 86%     | 9/10    | 6     | ✅ 100% |
| `php-error-handling` | 472    | █████████░ 87%     | 9/10    | 6     | ✅ 100% |
| `php-language`       | 456    | █████████░ 88%     | 9/10    | 6     | ✅ 91%  |

</details>

<details>
<summary><h3>📦 quality-engineering (5 skills | avg 777 tokens | quality 10.0/10 | eval alignment 82%)</h3></summary>

| Skill                                          | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ---------------------------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `quality-engineering-business-analysis`        | 1042   | ███████░░░ 71%     | 10/10   | 6     | ✅ 95%  |
| `quality-engineering-jira-integration`         | 562    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `quality-engineering-quality-assurance`        | 467    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `quality-engineering-zephyr-coverage-analysis` | 694    | ████████░░ 81%     | 10/10   | 3     | ✅ 83%  |
| `quality-engineering-zephyr-test-generation`   | 1121   | ███████░░░ 69%     | 10/10   | 3     | ⚠️ 33%  |

</details>

<details>
<summary><h3>📦 react (8 skills | avg 535 tokens | quality 10.0/10 | eval alignment 95%)</h3></summary>

| Skill                      | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| -------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `react-component-patterns` | 475    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `react-hooks`              | 634    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `react-performance`        | 734    | ████████░░ 80%     | 10/10   | 3     | ✅ 100% |
| `react-security`           | 508    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `react-state-management`   | 532    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `react-testing`            | 530    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `react-tooling`            | 419    | █████████░ 89%     | 10/10   | 3     | ⚠️ 57%  |
| `react-typescript`         | 447    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 react-native (13 skills | avg 431 tokens | quality 10.0/10 | eval alignment 97%)</h3></summary>

| Skill                            | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| -------------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `react-native-architecture`      | 553    | █████████░ 85%     | 10/10   | 3     | ✅ 83%  |
| `react-native-components`        | 377    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `react-native-deployment`        | 526    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `react-native-dls`               | 257    | █████████░ 93%     | 10/10   | 3     | ✅ 100% |
| `react-native-navigation`        | 339    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `react-native-navigation-v6`     | 499    | █████████░ 86%     | 10/10   | 3     | ✅ 86%  |
| `react-native-notifications`     | 357    | █████████░ 90%     | 10/10   | 3     | ✅ 100% |
| `react-native-performance`       | 566    | █████████░ 85%     | 10/10   | 3     | ✅ 89%  |
| `react-native-platform-specific` | 397    | █████████░ 89%     | 10/10   | 3     | ✅ 100% |
| `react-native-security`          | 566    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `react-native-state-management`  | 425    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |
| `react-native-styling`           | 317    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `react-native-testing`           | 427    | █████████░ 88%     | 10/10   | 3     | ✅ 100% |

</details>

<details>
<summary><h3>📦 spring-boot (10 skills | avg 466 tokens | quality 10.0/10 | eval alignment 95%)</h3></summary>

| Skill                        | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ---------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `spring-boot-api-design`     | 320    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-architecture`   | 621    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-best-practices` | 564    | █████████░ 85%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-data-access`    | 514    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-deployment`     | 498    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-microservices`  | 484    | █████████░ 87%     | 10/10   | 3     | ⚠️ 67%  |
| `spring-boot-observability`  | 483    | █████████░ 87%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-scheduling`     | 343    | █████████░ 91%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-security`       | 513    | █████████░ 86%     | 10/10   | 3     | ✅ 100% |
| `spring-boot-testing`        | 319    | █████████░ 91%     | 10/10   | 3     | ✅ 83%  |

</details>

<details>
<summary><h3>📦 swift (8 skills | avg 479 tokens | quality 10.0/10 | eval alignment 92%)</h3></summary>

| Skill                     | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| ------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `swift-best-practices`    | 659    | ████████░░ 82%     | 10/10   | 4     | ✅ 92%  |
| `swift-concurrency`       | 521    | █████████░ 86%     | 10/10   | 5     | ✅ 93%  |
| `swift-error-handling`    | 513    | █████████░ 86%     | 10/10   | 4     | ⚠️ 67%  |
| `swift-language`          | 465    | █████████░ 87%     | 10/10   | 5     | ✅ 94%  |
| `swift-memory-management` | 381    | █████████░ 90%     | 10/10   | 4     | ✅ 89%  |
| `swift-swiftui`           | 427    | █████████░ 88%     | 10/10   | 4     | ✅ 100% |
| `swift-testing`           | 451    | █████████░ 88%     | 10/10   | 4     | ✅ 100% |
| `swift-tooling`           | 414    | █████████░ 89%     | 10/10   | 4     | ✅ 100% |

</details>

<details>
<summary><h3>📦 typescript (4 skills | avg 663 tokens | quality 9.5/10 | eval alignment 98%)</h3></summary>

| Skill                       | Tokens | Savings (vs Heavy) | Quality | Evals | Aligned |
| --------------------------- | ------ | ------------------ | ------- | ----- | ------- |
| `typescript-best-practices` | 594    | ████████░░ 84%     | 10/10   | 3     | ✅ 90%  |
| `typescript-language`       | 650    | ████████░░ 82%     | 10/10   | 4     | ✅ 100% |
| `typescript-security`       | 608    | ████████░░ 83%     | 10/10   | 3     | ✅ 100% |
| `typescript-tooling`        | 799    | ████████░░ 78%     | 8/10    | 3     | ✅ 100% |

</details>

## ⚠️ Low Eval Alignment — Skills to Review

> These skills have evals but SKILL.md content does not cover ≥70% of what the evals test. The skill may not actually improve agent behavior for its target scenarios.

| Skill                                        | Category            | Alignment | Evals | Action                                             |
| -------------------------------------------- | ------------------- | --------- | ----- | -------------------------------------------------- |
| `common-tdd`                                 | common              | ⚠️ 0%     | 3     | Add missing terms from eval assertions to SKILL.md |
| `nextjs-state-management`                    | nextjs              | ⚠️ 0%     | 6     | Add missing terms from eval assertions to SKILL.md |
| `php-testing`                                | php                 | ⚠️ 13%    | 6     | Add missing terms from eval assertions to SKILL.md |
| `laravel-architecture`                       | laravel             | ⚠️ 17%    | 6     | Add missing terms from eval assertions to SKILL.md |
| `flutter-feature-based-clean-architecture`   | flutter             | ⚠️ 30%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `quality-engineering-zephyr-test-generation` | quality-engineering | ⚠️ 33%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `angular-routing`                            | angular             | ⚠️ 43%    | 6     | Add missing terms from eval assertions to SKILL.md |
| `common-store-changelog`                     | common              | ⚠️ 43%    | 4     | Add missing terms from eval assertions to SKILL.md |
| `flutter-riverpod-state-management`          | flutter             | ⚠️ 50%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `ios-networking`                             | ios                 | ⚠️ 56%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `ios-state-management`                       | ios                 | ⚠️ 56%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `react-tooling`                              | react               | ⚠️ 57%    | 3     | Add missing terms from eval assertions to SKILL.md |
| `angular-forms`                              | angular             | ⚠️ 58%    | 6     | Add missing terms from eval assertions to SKILL.md |
| `nextjs-authentication`                      | nextjs              | ⚠️ 61%    | 6     | Add missing terms from eval assertions to SKILL.md |
| `nextjs-upgrade`                             | nextjs              | ⚠️ 64%    | 6     | Add missing terms from eval assertions to SKILL.md |

## 🏆 Quality Leaders

| Rank | Skill                       | Category | Quality | Tokens | Evals | Aligned |
| ---- | --------------------------- | -------- | ------- | ------ | ----- | ------- |
| 1    | `android-architecture`      | android  | 10/10   | 505    | 3     | ✅ 88%  |
| 2    | `android-background-work`   | android  | 10/10   | 305    | 3     | ✅ 100% |
| 3    | `android-compose`           | android  | 10/10   | 450    | 3     | ✅ 100% |
| 4    | `android-concurrency`       | android  | 10/10   | 315    | 3     | ✅ 89%  |
| 5    | `android-deployment`        | android  | 10/10   | 328    | 3     | ✅ 100% |
| 6    | `android-design-system`     | android  | 10/10   | 300    | 3     | ✅ 100% |
| 7    | `android-di`                | android  | 10/10   | 311    | 3     | ✅ 88%  |
| 8    | `android-legacy-navigation` | android  | 10/10   | 311    | 3     | ✅ 86%  |
| 9    | `android-legacy-security`   | android  | 10/10   | 446    | 3     | ✅ 100% |
| 10   | `android-legacy-state`      | android  | 10/10   | 258    | 3     | ✅ 100% |

## 📐 Methodology & Baseline Justification

### Why These Baselines?

The baselines are derived from **real, token-counted example prompts** that represent what a developer actually writes when there is no structured skill available.

Using NestJS as the **Reference Unit**: Because we measure instruction volume replaced, using a high-density reference ensures scientific consistency across all tech stacks.

#### 🟡 Reference Technical Prompt — Light — 1449 tokens

> **Reference Technical Prompt — Light (e.g., NestJS)**
> A compact inline system prompt used as a reference for token count calibration. Representative of focused developer instructions without a structured skill.

#### 🔴 Reference Technical Prompt — Heavy — 3656 tokens

> **Reference Technical Prompt — Heavy (e.g., NestJS Architecture)**
> A comprehensive architect-level inline prompt used as a reference for complex tasks. Includes deep patterns and rules sent by developers when no skill is present.

### 🏆 Detailed Quality Rubric (0–10)

To ensure skills are not just "short" but actually **high quality**, every skill is scored against this structural rubric:

| Score  | Criteria                  | Rationale                                                           |
| ------ | ------------------------- | ------------------------------------------------------------------- |
| **+2** | **Structured Guidelines** | At least 3 specific instructions/bullet points.                     |
| **+2** | **Anti-Patterns**         | `## Anti-Patterns` section or `**No X**` inline lines.              |
| **+2** | **Reference Examples**    | Presence of a verified `references/` folder with code.              |
| **+2** | **Token Optimality**      | Entire `SKILL.md` is ≤100 lines (forces brevity).                   |
| **+2** | **Eval Coverage**         | ≥3 evals with `should_not_trigger`, ≥2 assertions each. +1 partial. |

> **Eval Alignment** (reported separately, not scored): % of eval `contains` assertion values that appear in SKILL.md content. Measures whether the skill actually teaches what its evals test — the closest static proxy for **with-skill vs without-skill** behavioral improvement.

### 🛡️ How to Verify This Report

Trust but verify. You can audit the raw data and run the benchmark yourself:

1. **Clone the repo** and install dependencies (`pnpm install`).
2. **Inspect Source**: The benchmark logic is open in [cli/src/scripts/benchmark/](./cli/src/scripts/benchmark/).

### Pricing (per 1M input tokens, Feb 2026)

- **Gemini 3 Flash**: $0.50
- **GPT-5**: $1.25
- **Gemini 3.1 Pro**: $2.00
- **Claude Sonnet 4.5**: $3.00