# 📊 Agent Skill Benchmark Report > Generated: 2026-03-16T10:19:05.797Z > Token counting: `ceil(characters / 4)` — cl100k_base approximation. > Baselines: derived from **real, measured example prompts** (see Methodology). > Quality: structural rubric (0–10), no live LLM calls required. ## ❓ How to Read This Report This benchmark answers: **"How many tokens and dollars does an agent skill save compared to a developer writing the same guidance inline?"** **WITHOUT a skill**: A developer writes domain knowledge directly into the prompt every time (Baseline). **WITH a skill**: The agent loads the SKILL.md file (~400 tokens) — structured, reusable, cached. ## 🔢 Executive Summary | Metric | Value | | --------------------------------- | --------------------------------- | | Total Skills Benchmarked | **229** | | Avg. Tokens WITH Skill (SKILL.md) | **478 tokens** | | Baseline: Light prompt (no skill) | **1449 tokens** ↓ see Methodology | | Baseline: Heavy prompt (no skill) | **3656 tokens** ↓ see Methodology | | Avg. Token Savings vs Light | **67%** (971 tokens/call) | | Avg. Token Savings vs Heavy | **87%** (3178 tokens/call) | | Avg. Quality Score | **9.4/10** | ## 📜 History | Version | Date | Skills | Avg Tokens | Savings (%) | Quality | Report | | ------- | ---------- | ------ | ---------- | ----------- | ------- | -------------------------------------------- | | v1.10.0 | 2026-03-16 | 229 | 478 | 87% | 7/10 | [Full Report](benchmarks/archive/v1.10.0.md) | | v1.9.3 | 2026-03-15 | 229 | 460 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.3.md) | | v1.9.2 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.2.md) | | v1.9.1 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.1.md) | | v1.9.0 | 2026-03-05 | 228 | 457 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.0.md) | | v1.8.0 | 2026-03-02 | 228 | 443 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.8.0.md) | | v1.7.3 | 2026-02-25 | 222 | 418 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.3.md) | | v1.7.2 | 2026-02-25 | 220 | 413 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.2.md) | ### 💰 Cost Comparison — Per Single Call (Average Skill) > Comparison based on **Heavy Baseline** vs. modern and speculative models. | Model | Original Cost | Skill Cost | Net Savings | % Saved | | ----------------- | ------------- | ---------- | -------------- | ------- | | Gemini 3 Flash | $0.0018280 | $0.0002390 | **$0.0015890** | 87% | | GPT-5 | $0.0045700 | $0.0005975 | **$0.0039725** | 87% | | Gemini 3.1 Pro | $0.0073120 | $0.0009560 | **$0.0063560** | 87% | | Claude Sonnet 4.5 | $0.0109680 | $0.0014340 | **$0.0095340** | 87% | ### 📈 Monthly Savings at Scale — (Avg Skill vs Heavy Prompt) | Daily Calls | Original Cost/mo | Monthly Savings (1 skill) | Monthly Savings (50 skills) | Model | | ----------- | ---------------- | ------------------------- | --------------------------- | ----------------- | | 1,000 | $137.1000 /mo | $119.1750 /mo | $5958.7500 /mo | GPT-5 | | 1,000 | $329.0400 /mo | $286.0200 /mo | $14301.0000 /mo | Claude Sonnet 4.5 | | 1,000 | $219.3600 /mo | $190.6800 /mo | $9534.0000 /mo | Gemini 3.1 Pro | ## 📦 Per-Category Summary

📦 android (22 skills | avg 319 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ------------------------------ | ------ | ------------------ | ------- | | `android-architecture` | 451 | █████████░ 88% | 10/10 | | `android-background-work` | 260 | █████████░ 93% | 10/10 | | `android-compose` | 354 | █████████░ 90% | 10/10 | | `android-concurrency` | 282 | █████████░ 92% | 10/10 | | `android-deployment` | 318 | █████████░ 91% | 10/10 | | `android-design-system` | 464 | █████████░ 87% | 10/10 | | `android-di` | 308 | █████████░ 92% | 10/10 | | `android-legacy-navigation` | 271 | █████████░ 93% | 10/10 | | `android-legacy-security` | 321 | █████████░ 91% | 10/10 | | `android-legacy-state` | 277 | █████████░ 92% | 10/10 | | `android-navigation` | 330 | █████████░ 91% | 10/10 | | `android-navigation-type-safe` | 283 | █████████░ 92% | 10/10 | | `android-networking` | 334 | █████████░ 91% | 10/10 | | `android-notifications` | 347 | █████████░ 91% | 10/10 | | `android-performance` | 320 | █████████░ 91% | 10/10 | | `android-persistence` | 286 | █████████░ 92% | 10/10 | | `android-resources` | 266 | █████████░ 93% | 10/10 | | `android-security` | 371 | █████████░ 90% | 10/10 | | `android-state` | 316 | █████████░ 91% | 10/10 | | `android-testing` | 277 | █████████░ 92% | 10/10 | | `android-tooling` | 268 | █████████░ 93% | 10/10 | | `android-xml-views` | 320 | █████████░ 91% | 10/10 |

📦 angular (15 skills | avg 331 tokens | quality 8.9/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ------------------------------ | ------ | ------------------ | ------- | | `angular-component-patterns` | 343 | █████████░ 91% | 10/10 | | `angular-components` | 379 | █████████░ 90% | 10/10 | | `angular-dependency-injection` | 320 | █████████░ 91% | 10/10 | | `angular-performance` | 329 | █████████░ 91% | 10/10 | | `angular-routing` | 311 | █████████░ 91% | 10/10 | | `angular-state-management` | 321 | █████████░ 91% | 10/10 | | `angular-style-guide` | 367 | █████████░ 90% | 10/10 | | `angular-architecture` | 517 | █████████░ 86% | 8/10 | | `angular-directives-pipes` | 284 | █████████░ 92% | 8/10 | | `angular-forms` | 283 | █████████░ 92% | 8/10 | | `angular-http-client` | 303 | █████████░ 92% | 8/10 | | `angular-rxjs-interop` | 316 | █████████░ 91% | 8/10 | | `angular-security` | 315 | █████████░ 91% | 8/10 | | `angular-ssr` | 295 | █████████░ 92% | 8/10 | | `angular-testing` | 288 | █████████░ 92% | 8/10 |

📦 common (25 skills | avg 678 tokens | quality 9.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------------------- | ------ | ------------------ | ------- | | `common-accessibility` | 1034 | ███████░░░ 72% | 10/10 | | `common-api-design` | 889 | ████████░░ 76% | 10/10 | | `common-architecture-diagramming` | 434 | █████████░ 88% | 10/10 | | `common-code-review` | 404 | █████████░ 89% | 10/10 | | `common-error-handling` | 1033 | ███████░░░ 72% | 10/10 | | `common-feedback-reporter` | 479 | █████████░ 87% | 10/10 | | `common-mobile-animation` | 419 | █████████░ 89% | 10/10 | | `common-mobile-ux-core` | 396 | █████████░ 89% | 10/10 | | `common-observability` | 961 | ███████░░░ 74% | 10/10 | | `common-product-requirements` | 457 | █████████░ 88% | 10/10 | | `common-protocol-enforcement` | 440 | █████████░ 88% | 10/10 | | `common-session-retrospective` | 689 | ████████░░ 81% | 10/10 | | `common-workflow-writing` | 596 | ████████░░ 84% | 10/10 | | `common-architecture-audit` | 747 | ████████░░ 80% | 8/10 | | `common-best-practices` | 1026 | ███████░░░ 72% | 8/10 | | `common-context-optimization` | 504 | █████████░ 86% | 8/10 | | `common-debugging` | 456 | █████████░ 88% | 8/10 | | `common-documentation` | 585 | ████████░░ 84% | 8/10 | | `common-git-collaboration` | 696 | ████████░░ 81% | 8/10 | | `common-performance-engineering` | 749 | ████████░░ 80% | 8/10 | | `common-security-audit` | 848 | ████████░░ 77% | 8/10 | | `common-security-standards` | 668 | ████████░░ 82% | 8/10 | | `common-skill-creator` | 982 | ███████░░░ 73% | 8/10 | | `common-system-design` | 553 | █████████░ 85% | 8/10 | | `common-tdd` | 893 | ████████░░ 76% | 8/10 |

📦 dart (3 skills | avg 492 tokens | quality 8.7/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------- | ------ | ------------------ | ------- | | `dart-language` | 626 | ████████░░ 83% | 10/10 | | `dart-best-practices` | 406 | █████████░ 89% | 8/10 | | `dart-tooling` | 443 | █████████░ 88% | 8/10 |

📦 database (3 skills | avg 759 tokens | quality 9.3/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------- | ------ | ------------------ | ------- | | `database-mongodb` | 617 | ████████░░ 83% | 10/10 | | `database-redis` | 637 | ████████░░ 83% | 10/10 | | `database-postgresql` | 1022 | ███████░░░ 72% | 8/10 |

📦 flutter (21 skills | avg 493 tokens | quality 9.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ------------------------------------------ | ------ | ------------------ | ------- | | `flutter-bloc-state-management` | 459 | █████████░ 87% | 10/10 | | `flutter-design-system` | 595 | ████████░░ 84% | 10/10 | | `flutter-getx-navigation` | 422 | █████████░ 88% | 10/10 | | `flutter-getx-state-management` | 544 | █████████░ 85% | 10/10 | | `flutter-layer-based-clean-architecture` | 555 | █████████░ 85% | 10/10 | | `flutter-localization` | 533 | █████████░ 85% | 10/10 | | `flutter-navigation` | 358 | █████████░ 90% | 10/10 | | `flutter-notifications` | 382 | █████████░ 90% | 10/10 | | `flutter-retrofit-networking` | 549 | █████████░ 85% | 10/10 | | `flutter-riverpod-state-management` | 487 | █████████░ 87% | 10/10 | | `flutter-testing` | 927 | ████████░░ 75% | 10/10 | | `flutter-auto-route-navigation` | 461 | █████████░ 87% | 8/10 | | `flutter-cicd` | 530 | █████████░ 86% | 8/10 | | `flutter-dependency-injection` | 442 | █████████░ 88% | 8/10 | | `flutter-error-handling` | 465 | █████████░ 87% | 8/10 | | `flutter-feature-based-clean-architecture` | 456 | █████████░ 88% | 8/10 | | `flutter-go-router-navigation` | 476 | █████████░ 87% | 8/10 | | `flutter-idiomatic-flutter` | 401 | █████████░ 89% | 8/10 | | `flutter-performance` | 483 | █████████░ 87% | 8/10 | | `flutter-security` | 392 | █████████░ 89% | 8/10 | | `flutter-widgets` | 431 | █████████░ 88% | 8/10 |

📦 golang (10 skills | avg 417 tokens | quality 9.2/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ----------------------- | ------ | ------------------ | ------- | | `golang-api-server` | 444 | █████████░ 88% | 10/10 | | `golang-database` | 389 | █████████░ 89% | 10/10 | | `golang-error-handling` | 343 | █████████░ 91% | 10/10 | | `golang-language` | 347 | █████████░ 91% | 10/10 | | `golang-security` | 526 | █████████░ 86% | 10/10 | | `golang-testing` | 468 | █████████░ 87% | 10/10 | | `golang-architecture` | 558 | █████████░ 85% | 8/10 | | `golang-concurrency` | 428 | █████████░ 88% | 8/10 | | `golang-configuration` | 331 | █████████░ 91% | 8/10 | | `golang-logging` | 336 | █████████░ 91% | 8/10 |

📦 ios (15 skills | avg 437 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------------- | ------ | ------------------ | ------- | | `ios-app-lifecycle` | 424 | █████████░ 88% | 10/10 | | `ios-architecture` | 607 | ████████░░ 83% | 10/10 | | `ios-dependency-injection` | 420 | █████████░ 89% | 10/10 | | `ios-deployment` | 393 | █████████░ 89% | 10/10 | | `ios-design-system` | 426 | █████████░ 88% | 10/10 | | `ios-localization` | 508 | █████████░ 86% | 10/10 | | `ios-navigation` | 326 | █████████░ 91% | 10/10 | | `ios-networking` | 483 | █████████░ 87% | 10/10 | | `ios-notifications` | 342 | █████████░ 91% | 10/10 | | `ios-performance` | 434 | █████████░ 88% | 10/10 | | `ios-persistence` | 493 | █████████░ 87% | 10/10 | | `ios-security` | 441 | █████████░ 88% | 10/10 | | `ios-state-management` | 456 | █████████░ 88% | 10/10 | | `ios-swiftui` | 333 | █████████░ 91% | 10/10 | | `ios-ui-navigation` | 471 | █████████░ 87% | 10/10 |

📦 java (5 skills | avg 548 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------- | ------ | ------------------ | ------- | | `java-best-practices` | 580 | ████████░░ 84% | 10/10 | | `java-concurrency` | 498 | █████████░ 86% | 10/10 | | `java-language` | 670 | ████████░░ 82% | 10/10 | | `java-testing` | 549 | █████████░ 85% | 10/10 | | `java-tooling` | 444 | █████████░ 88% | 10/10 |

📦 javascript (3 skills | avg 416 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------------- | ------ | ------------------ | ------- | | `javascript-best-practices` | 429 | █████████░ 88% | 10/10 | | `javascript-language` | 445 | █████████░ 88% | 10/10 | | `javascript-tooling` | 373 | █████████░ 90% | 10/10 |

📦 kotlin (4 skills | avg 478 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ----------------------- | ------ | ------------------ | ------- | | `kotlin-best-practices` | 556 | █████████░ 85% | 10/10 | | `kotlin-coroutines` | 353 | █████████░ 90% | 10/10 | | `kotlin-language` | 579 | ████████░░ 84% | 10/10 | | `kotlin-tooling` | 425 | █████████░ 88% | 10/10 |

📦 laravel (10 skills | avg 391 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ------------------------------- | ------ | ------------------ | ------- | | `laravel-api` | 369 | █████████░ 90% | 10/10 | | `laravel-architecture` | 409 | █████████░ 89% | 10/10 | | `laravel-background-processing` | 421 | █████████░ 88% | 10/10 | | `laravel-clean-architecture` | 460 | █████████░ 87% | 10/10 | | `laravel-database-expert` | 433 | █████████░ 88% | 10/10 | | `laravel-eloquent` | 346 | █████████░ 91% | 10/10 | | `laravel-security` | 363 | █████████░ 90% | 10/10 | | `laravel-sessions-middleware` | 415 | █████████░ 89% | 10/10 | | `laravel-testing` | 340 | █████████░ 91% | 10/10 | | `laravel-tooling` | 356 | █████████░ 90% | 10/10 |

📦 nestjs (21 skills | avg 610 tokens | quality 8.6/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ----------------------------- | ------ | ------------------ | ------- | | `nestjs-architecture` | 434 | █████████░ 88% | 10/10 | | `nestjs-bullmq` | 941 | ███████░░░ 74% | 10/10 | | `nestjs-notification` | 472 | █████████░ 87% | 10/10 | | `nestjs-security` | 566 | █████████░ 85% | 10/10 | | `nestjs-security-isolation` | 576 | ████████░░ 84% | 10/10 | | `nestjs-testing` | 585 | ████████░░ 84% | 10/10 | | `nestjs-api-standards` | 545 | █████████░ 85% | 8/10 | | `nestjs-caching` | 694 | ████████░░ 81% | 8/10 | | `nestjs-configuration` | 587 | ████████░░ 84% | 8/10 | | `nestjs-controllers-services` | 726 | ████████░░ 80% | 8/10 | | `nestjs-database` | 662 | ████████░░ 82% | 8/10 | | `nestjs-deployment` | 617 | ████████░░ 83% | 8/10 | | `nestjs-documentation` | 902 | ████████░░ 75% | 8/10 | | `nestjs-error-handling` | 557 | █████████░ 85% | 8/10 | | `nestjs-file-uploads` | 390 | █████████░ 89% | 8/10 | | `nestjs-observability` | 633 | ████████░░ 83% | 8/10 | | `nestjs-performance` | 855 | ████████░░ 77% | 8/10 | | `nestjs-real-time` | 681 | ████████░░ 81% | 8/10 | | `nestjs-scheduling` | 393 | █████████░ 89% | 8/10 | | `nestjs-search` | 494 | █████████░ 86% | 8/10 | | `nestjs-transport` | 506 | █████████░ 86% | 8/10 |

📦 nextjs (18 skills | avg 532 tokens | quality 9.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------------- | ------ | ------------------ | ------- | | `nextjs-authentication` | 326 | █████████░ 91% | 10/10 | | `nextjs-data-fetching` | 442 | █████████░ 88% | 10/10 | | `nextjs-i18n` | 580 | ████████░░ 84% | 10/10 | | `nextjs-pages-router` | 964 | ███████░░░ 74% | 10/10 | | `nextjs-rendering` | 431 | █████████░ 88% | 10/10 | | `nextjs-security` | 362 | █████████░ 90% | 10/10 | | `nextjs-server-components` | 521 | █████████░ 86% | 10/10 | | `nextjs-testing` | 377 | █████████░ 90% | 10/10 | | `nextjs-tooling` | 376 | █████████░ 90% | 10/10 | | `nextjs-app-router` | 583 | ████████░░ 84% | 8/10 | | `nextjs-architecture` | 874 | ████████░░ 76% | 8/10 | | `nextjs-caching` | 458 | █████████░ 87% | 8/10 | | `nextjs-data-access-layer` | 426 | █████████░ 88% | 8/10 | | `nextjs-optimization` | 612 | ████████░░ 83% | 8/10 | | `nextjs-server-actions` | 631 | ████████░░ 83% | 8/10 | | `nextjs-state-management` | 606 | ████████░░ 83% | 8/10 | | `nextjs-styling` | 494 | █████████░ 86% | 8/10 | | `nextjs-upgrade` | 518 | █████████░ 86% | 8/10 |

📦 php (7 skills | avg 343 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------- | ------ | ------------------ | ------- | | `php-best-practices` | 352 | █████████░ 90% | 10/10 | | `php-concurrency` | 310 | █████████░ 92% | 10/10 | | `php-error-handling` | 367 | █████████░ 90% | 10/10 | | `php-language` | 353 | █████████░ 90% | 10/10 | | `php-security` | 356 | █████████░ 90% | 10/10 | | `php-testing` | 326 | █████████░ 91% | 10/10 | | `php-tooling` | 337 | █████████░ 91% | 10/10 |

📦 quality-engineering (4 skills | avg 522 tokens | quality 8.5/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------------------------------- | ------ | ------------------ | ------- | | `quality-engineering-zephyr-test-generation` | 621 | ████████░░ 83% | 10/10 | | `quality-engineering-business-analysis` | 467 | █████████░ 87% | 8/10 | | `quality-engineering-jira-integration` | 553 | █████████░ 85% | 8/10 | | `quality-engineering-quality-assurance` | 446 | █████████░ 88% | 8/10 |

📦 react (8 skills | avg 439 tokens | quality 9.5/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------------- | ------ | ------------------ | ------- | | `react-component-patterns` | 438 | █████████░ 88% | 10/10 | | `react-hooks` | 594 | ████████░░ 84% | 10/10 | | `react-performance` | 675 | ████████░░ 82% | 10/10 | | `react-security` | 367 | █████████░ 90% | 10/10 | | `react-testing` | 352 | █████████░ 90% | 10/10 | | `react-typescript` | 356 | █████████░ 90% | 10/10 | | `react-state-management` | 416 | █████████░ 89% | 8/10 | | `react-tooling` | 315 | █████████░ 91% | 8/10 |

📦 react-native (13 skills | avg 478 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | -------------------------------- | ------ | ------------------ | ------- | | `react-native-architecture` | 774 | ████████░░ 79% | 10/10 | | `react-native-components` | 534 | █████████░ 85% | 10/10 | | `react-native-deployment` | 484 | █████████░ 87% | 10/10 | | `react-native-dls` | 300 | █████████░ 92% | 10/10 | | `react-native-navigation` | 345 | █████████░ 91% | 10/10 | | `react-native-navigation-v6` | 490 | █████████░ 87% | 10/10 | | `react-native-notifications` | 367 | █████████░ 90% | 10/10 | | `react-native-performance` | 533 | █████████░ 85% | 10/10 | | `react-native-platform-specific` | 464 | █████████░ 87% | 10/10 | | `react-native-security` | 555 | █████████░ 85% | 10/10 | | `react-native-state-management` | 449 | █████████░ 88% | 10/10 | | `react-native-styling` | 438 | █████████░ 88% | 10/10 | | `react-native-testing` | 481 | █████████░ 87% | 10/10 |

📦 spring-boot (10 skills | avg 405 tokens | quality 9.8/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ---------------------------- | ------ | ------------------ | ------- | | `spring-boot-api-design` | 334 | █████████░ 91% | 10/10 | | `spring-boot-best-practices` | 425 | █████████░ 88% | 10/10 | | `spring-boot-data-access` | 349 | █████████░ 90% | 10/10 | | `spring-boot-deployment` | 359 | █████████░ 90% | 10/10 | | `spring-boot-microservices` | 349 | █████████░ 90% | 10/10 | | `spring-boot-observability` | 342 | █████████░ 91% | 10/10 | | `spring-boot-scheduling` | 292 | █████████░ 92% | 10/10 | | `spring-boot-security` | 552 | █████████░ 85% | 10/10 | | `spring-boot-testing` | 376 | █████████░ 90% | 10/10 | | `spring-boot-architecture` | 669 | ████████░░ 82% | 8/10 |

📦 swift (8 skills | avg 375 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | ------------------------- | ------ | ------------------ | ------- | | `swift-best-practices` | 368 | █████████░ 90% | 10/10 | | `swift-concurrency` | 363 | █████████░ 90% | 10/10 | | `swift-error-handling` | 336 | █████████░ 91% | 10/10 | | `swift-language` | 361 | █████████░ 90% | 10/10 | | `swift-memory-management` | 358 | █████████░ 90% | 10/10 | | `swift-swiftui` | 404 | █████████░ 89% | 10/10 | | `swift-testing` | 414 | █████████░ 89% | 10/10 | | `swift-tooling` | 397 | █████████░ 89% | 10/10 |

📦 typescript (4 skills | avg 548 tokens | quality 10.0/10)

| Skill | Tokens | Savings (vs Heavy) | Quality | | --------------------------- | ------ | ------------------ | ------- | | `typescript-best-practices` | 514 | █████████░ 86% | 10/10 | | `typescript-language` | 591 | ████████░░ 84% | 10/10 | | `typescript-security` | 460 | █████████░ 87% | 10/10 | | `typescript-tooling` | 627 | ████████░░ 83% | 10/10 |

## 🏆 Quality Leaders | Rank | Skill | Category | Quality | Tokens | | ---- | --------------------------- | -------- | ------- | ------ | | 1 | `android-architecture` | android | 10/10 | 451 | | 2 | `android-background-work` | android | 10/10 | 260 | | 3 | `android-compose` | android | 10/10 | 354 | | 4 | `android-concurrency` | android | 10/10 | 282 | | 5 | `android-deployment` | android | 10/10 | 318 | | 6 | `android-design-system` | android | 10/10 | 464 | | 7 | `android-di` | android | 10/10 | 308 | | 8 | `android-legacy-navigation` | android | 10/10 | 271 | | 9 | `android-legacy-security` | android | 10/10 | 321 | | 10 | `android-legacy-state` | android | 10/10 | 277 | ## 📐 Methodology & Baseline Justification ### Why These Baselines? The baselines are derived from **real, token-counted example prompts** that represent what a developer actually writes when there is no structured skill available. Using NestJS as the **Reference Unit**: Because we measure instruction volume replaced, using a high-density reference ensures scientific consistency across all tech stacks. #### 🟡 Reference Technical Prompt — Light — 1449 tokens > **Reference Technical Prompt — Light (e.g., NestJS)** > A compact inline system prompt used as a reference for token count calibration. Representative of focused developer instructions without a structured skill. #### 🔴 Reference Technical Prompt — Heavy — 3656 tokens > **Reference Technical Prompt — Heavy (e.g., NestJS Architecture)** > A comprehensive architect-level inline prompt used as a reference for complex tasks. Includes deep patterns and rules sent by developers when no skill is present. ### 🏆 Detailed Quality Rubric (0–10) To ensure skills are not just "short" but actually **high quality**, every skill is scored against this structural rubric: | Score | Criteria | Rationale | | ------ | ------------------------- | ------------------------------------------------------ | | **+2** | **Structured Guidelines** | At least 3 specific instructions/bullet points. | | **+2** | **Anti-Patterns** | Specifically listing what the LLM should _avoid_. | | **+2** | **Reference Examples** | Presence of a verified `references/` folder with code. | | **+2** | **Token Optimality** | Entire `SKILL.md` is ≤100 lines (forces brevity). | | **+2** | **Trigger Metadata** | Proper keywords and file-match triggers defined. | ### 🛡️ How to Verify This Report Trust but verify. You can audit the raw data and run the benchmark yourself: 1. **Clone the repo** and install dependencies (`pnpm install`). 2. **Inspect Source**: The benchmark logic is open in [cli/src/scripts/benchmark/](./cli/src/scripts/benchmark/). ### Pricing (per 1M input tokens, Feb 2026) - **Gemini 3 Flash**: $0.50 - **GPT-5**: $1.25 - **Gemini 3.1 Pro**: $2.00 - **Claude Sonnet 4.5**: $3.00