# 📊 Agent Skill Benchmark Report
> Generated: 2026-03-07T17:47:03.158Z
> Token counting: `ceil(characters / 4)` — cl100k_base approximation.
> Baselines: derived from **real, measured example prompts** (see Methodology).
> Quality: structural rubric (0–10), no live LLM calls required.
## ❓ How to Read This Report
This benchmark answers: **"How many tokens and dollars does an agent skill save compared to a developer writing the same guidance inline?"**
**WITHOUT a skill**: A developer writes domain knowledge directly into the prompt every time (Baseline).
**WITH a skill**: The agent loads the SKILL.md file (~400 tokens) — structured, reusable, cached.
## 🔢 Executive Summary
| Metric | Value |
| --------------------------------- | --------------------------------- |
| Total Skills Benchmarked | **228** |
| Avg. Tokens WITH Skill (SKILL.md) | **458 tokens** |
| Baseline: Light prompt (no skill) | **1449 tokens** ↓ see Methodology |
| Baseline: Heavy prompt (no skill) | **3656 tokens** ↓ see Methodology |
| Avg. Token Savings vs Light | **68%** (991 tokens/call) |
| Avg. Token Savings vs Heavy | **87%** (3198 tokens/call) |
| Avg. Quality Score | **8.9/10** |
## 📜 History
| Version | Date | Skills | Avg Tokens | Savings (%) | Quality | Report |
| ------- | ---------- | ------ | ---------- | ----------- | ------- | ------ |
| v1.9.2 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.2.md) |
| v1.9.1 | 2026-03-07 | 228 | 458 | 87% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.1.md) |
| v1.9.0 | 2026-03-05 | 228 | 457 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.9.0.md) |
| v1.8.0 | 2026-03-02 | 228 | 443 | 88% | 8.9/10 | [Full Report](benchmarks/archive/v1.8.0.md) |
| v1.7.3 | 2026-02-25 | 222 | 418 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.3.md) |
| v1.7.2 | 2026-02-25 | 220 | 413 | 89% | 8.9/10 | [Full Report](benchmarks/archive/v1.7.2.md) |
### 💰 Cost Comparison — Per Single Call (Average Skill)
> Comparison based on **Heavy Baseline** vs. modern and speculative models.
| Model | Original Cost | Skill Cost | Net Savings | % Saved |
| ----------------- | ------------- | ---------- | -------------- | ------- |
| Gemini 3 Flash | $0.0018280 | $0.0002290 | **$0.0015990 ** | 87% |
| GPT-5 | $0.0045700 | $0.0005725 | **$0.0039975 ** | 87% |
| Gemini 3.1 Pro | $0.0073120 | $0.0009160 | **$0.0063960 ** | 87% |
| Claude Sonnet 4.5 | $0.0109680 | $0.0013740 | **$0.0095940 ** | 87% |
### 📈 Monthly Savings at Scale — (Avg Skill vs Heavy Prompt)
| Daily Calls | Original Cost/mo | Monthly Savings (1 skill) | Monthly Savings (50 skills) | Model |
| ----------- | ---------------- | ------------------------- | --------------------------- | ----- |
| 1,000 | $137.1000 /mo | $119.9250 /mo | $5996.2500 /mo | GPT-5 |
| 1,000 | $329.0400 /mo | $287.8200 /mo | $14391.0000 /mo | Claude Sonnet 4.5 |
| 1,000 | $219.3600 /mo | $191.8800 /mo | $9594.0000 /mo | Gemini 3.1 Pro |
## 📦 Per-Category Summary
📦 android (22 skills | avg 315 tokens | quality 9.9/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `android-navigation ` | 318 | █████████░ 91% | 10/10 |
| `android-notifications` | 337 | █████████░ 91% | 10/10 |
| `architecture ` | 445 | █████████░ 88% | 10/10 |
| `background-work ` | 262 | █████████░ 93% | 10/10 |
| `compose ` | 356 | █████████░ 90% | 10/10 |
| `concurrency ` | 284 | █████████░ 92% | 10/10 |
| `deployment ` | 311 | █████████░ 91% | 10/10 |
| `di ` | 306 | █████████░ 92% | 10/10 |
| `legacy-navigation ` | 271 | █████████░ 93% | 10/10 |
| `legacy-security ` | 308 | █████████░ 92% | 10/10 |
| `legacy-state ` | 273 | █████████░ 93% | 10/10 |
| `navigation ` | 276 | █████████░ 92% | 10/10 |
| `networking ` | 328 | █████████░ 91% | 10/10 |
| `performance ` | 318 | █████████░ 91% | 10/10 |
| `persistence ` | 282 | █████████░ 92% | 10/10 |
| `resources ` | 269 | █████████░ 93% | 10/10 |
| `security ` | 364 | █████████░ 90% | 10/10 |
| `state ` | 311 | █████████░ 91% | 10/10 |
| `testing ` | 278 | █████████░ 92% | 10/10 |
| `tooling ` | 270 | █████████░ 93% | 10/10 |
| `xml-views ` | 310 | █████████░ 92% | 10/10 |
| `android-design-system` | 458 | █████████░ 87% | 8/10 |
📦 angular (15 skills | avg 307 tokens | quality 8.8/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `components ` | 369 | █████████░ 90% | 10/10 |
| `dependency-injection ` | 313 | █████████░ 91% | 10/10 |
| `performance ` | 317 | █████████░ 91% | 10/10 |
| `routing ` | 306 | █████████░ 92% | 10/10 |
| `state-management ` | 313 | █████████░ 91% | 10/10 |
| `style-guide ` | 365 | █████████░ 90% | 10/10 |
| `architecture ` | 484 | █████████░ 87% | 8/10 |
| `component-patterns ` | 334 | █████████░ 91% | 8/10 |
| `directives-pipes ` | 246 | █████████░ 93% | 8/10 |
| `forms ` | 244 | █████████░ 93% | 8/10 |
| `http-client ` | 258 | █████████░ 93% | 8/10 |
| `rxjs-interop ` | 276 | █████████░ 92% | 8/10 |
| `security ` | 273 | █████████░ 93% | 8/10 |
| `ssr ` | 256 | █████████░ 93% | 8/10 |
| `testing ` | 246 | █████████░ 93% | 8/10 |
📦 common (24 skills | avg 666 tokens | quality 7.9/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `architecture-diagramming` | 426 | █████████░ 88% | 10/10 |
| `feedback-reporter ` | 476 | █████████░ 87% | 10/10 |
| `mobile-animation ` | 418 | █████████░ 89% | 10/10 |
| `product-requirements ` | 451 | █████████░ 88% | 10/10 |
| `session-retrospective` | 678 | ████████░░ 81% | 10/10 |
| `accessibility ` | 1020 | ███████░░░ 72% | 8/10 |
| `api-design ` | 887 | ████████░░ 76% | 8/10 |
| `architecture-audit ` | 699 | ████████░░ 81% | 8/10 |
| `best-practices ` | 985 | ███████░░░ 73% | 8/10 |
| `code-review ` | 404 | █████████░ 89% | 8/10 |
| `context-optimization ` | 461 | █████████░ 87% | 8/10 |
| `error-handling ` | 1035 | ███████░░░ 72% | 8/10 |
| `mobile-ux-core ` | 384 | █████████░ 89% | 8/10 |
| `observability ` | 959 | ███████░░░ 74% | 8/10 |
| `security-audit ` | 810 | ████████░░ 78% | 8/10 |
| `security-standards ` | 633 | ████████░░ 83% | 8/10 |
| `skill-creator ` | 938 | ███████░░░ 74% | 8/10 |
| `tdd ` | 858 | ████████░░ 77% | 8/10 |
| `workflow-writing ` | 590 | ████████░░ 84% | 8/10 |
| `debugging ` | 418 | █████████░ 89% | 6/10 |
| `git-collaboration ` | 661 | ████████░░ 82% | 6/10 |
| `performance-engineering` | 721 | ████████░░ 80% | 6/10 |
| `system-design ` | 526 | █████████░ 86% | 6/10 |
| `documentation ` | 551 | █████████░ 85% | 4/10 |
📦 dart (3 skills | avg 459 tokens | quality 6.7/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `language ` | 615 | ████████░░ 83% | 8/10 |
| `best-practices ` | 369 | █████████░ 90% | 6/10 |
| `tooling ` | 392 | █████████░ 89% | 6/10 |
📦 database (3 skills | avg 720 tokens | quality 9.3/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `mongodb ` | 615 | ████████░░ 83% | 10/10 |
| `redis ` | 633 | ████████░░ 83% | 10/10 |
| `postgresql ` | 912 | ████████░░ 75% | 8/10 |
📦 flutter (21 skills | avg 465 tokens | quality 8.5/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `bloc-state-management` | 443 | █████████░ 88% | 10/10 |
| `flutter-design-system` | 571 | ████████░░ 84% | 10/10 |
| `flutter-navigation ` | 345 | █████████░ 91% | 10/10 |
| `flutter-notifications` | 386 | █████████░ 89% | 10/10 |
| `getx-navigation ` | 405 | █████████░ 89% | 10/10 |
| `layer-based-clean-architecture` | 538 | █████████░ 85% | 10/10 |
| `localization ` | 537 | █████████░ 85% | 10/10 |
| `retrofit-networking ` | 535 | █████████░ 85% | 10/10 |
| `testing ` | 918 | ████████░░ 75% | 10/10 |
| `auto-route-navigation` | 415 | █████████░ 89% | 8/10 |
| `cicd ` | 487 | █████████░ 87% | 8/10 |
| `dependency-injection ` | 400 | █████████░ 89% | 8/10 |
| `error-handling ` | 421 | █████████░ 88% | 8/10 |
| `feature-based-clean-architecture` | 422 | █████████░ 88% | 8/10 |
| `getx-state-management` | 531 | █████████░ 85% | 8/10 |
| `riverpod-state-management` | 474 | █████████░ 87% | 8/10 |
| `security ` | 344 | █████████░ 91% | 8/10 |
| `go-router-navigation ` | 428 | █████████░ 88% | 6/10 |
| `idiomatic-flutter ` | 356 | █████████░ 90% | 6/10 |
| `performance ` | 431 | █████████░ 88% | 6/10 |
| `widgets ` | 383 | █████████░ 90% | 6/10 |
📦 golang (10 skills | avg 398 tokens | quality 9.2/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `api-server ` | 430 | █████████░ 88% | 10/10 |
| `database ` | 382 | █████████░ 90% | 10/10 |
| `error-handling ` | 339 | █████████░ 91% | 10/10 |
| `language ` | 345 | █████████░ 91% | 10/10 |
| `security ` | 522 | █████████░ 86% | 10/10 |
| `testing ` | 465 | █████████░ 87% | 10/10 |
| `architecture ` | 516 | █████████░ 86% | 8/10 |
| `concurrency ` | 393 | █████████░ 89% | 8/10 |
| `configuration ` | 291 | █████████░ 92% | 8/10 |
| `logging ` | 299 | █████████░ 92% | 8/10 |
📦 ios (15 skills | avg 431 tokens | quality 9.7/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `app-lifecycle ` | 421 | █████████░ 88% | 10/10 |
| `architecture ` | 601 | ████████░░ 84% | 10/10 |
| `dependency-injection ` | 417 | █████████░ 89% | 10/10 |
| `deployment ` | 385 | █████████░ 89% | 10/10 |
| `ios-navigation ` | 329 | █████████░ 91% | 10/10 |
| `ios-notifications ` | 331 | █████████░ 91% | 10/10 |
| `localization ` | 496 | █████████░ 86% | 10/10 |
| `networking ` | 467 | █████████░ 87% | 10/10 |
| `performance ` | 430 | █████████░ 88% | 10/10 |
| `persistence ` | 492 | █████████░ 87% | 10/10 |
| `security ` | 435 | █████████░ 88% | 10/10 |
| `state-management ` | 456 | █████████░ 88% | 10/10 |
| `ui-navigation ` | 458 | █████████░ 87% | 10/10 |
| `ios-design-system ` | 416 | █████████░ 89% | 8/10 |
| `swiftui ` | 328 | █████████░ 91% | 8/10 |
📦 java (5 skills | avg 543 tokens | quality 8.8/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `concurrency ` | 492 | █████████░ 87% | 10/10 |
| `testing ` | 548 | █████████░ 85% | 10/10 |
| `best-practices ` | 580 | ████████░░ 84% | 8/10 |
| `language ` | 657 | ████████░░ 82% | 8/10 |
| `tooling ` | 438 | █████████░ 88% | 8/10 |
📦 javascript (3 skills | avg 408 tokens | quality 10.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `best-practices ` | 429 | █████████░ 88% | 10/10 |
| `language ` | 431 | █████████░ 88% | 10/10 |
| `tooling ` | 365 | █████████░ 90% | 10/10 |
📦 kotlin (4 skills | avg 475 tokens | quality 9.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `coroutines ` | 354 | █████████░ 90% | 10/10 |
| `tooling ` | 420 | █████████░ 89% | 10/10 |
| `best-practices ` | 550 | █████████░ 85% | 8/10 |
| `language ` | 577 | ████████░░ 84% | 8/10 |
📦 laravel (10 skills | avg 383 tokens | quality 10.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `api ` | 357 | █████████░ 90% | 10/10 |
| `architecture ` | 399 | █████████░ 89% | 10/10 |
| `background-processing` | 406 | █████████░ 89% | 10/10 |
| `clean-architecture ` | 453 | █████████░ 88% | 10/10 |
| `database-expert ` | 423 | █████████░ 88% | 10/10 |
| `eloquent ` | 344 | █████████░ 91% | 10/10 |
| `security ` | 356 | █████████░ 90% | 10/10 |
| `sessions-middleware ` | 409 | █████████░ 89% | 10/10 |
| `testing ` | 335 | █████████░ 91% | 10/10 |
| `tooling ` | 347 | █████████░ 91% | 10/10 |
📦 nestjs (21 skills | avg 547 tokens | quality 7.3/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `architecture ` | 385 | █████████░ 89% | 10/10 |
| `nestjs-bullmq ` | 446 | █████████░ 88% | 10/10 |
| `nestjs-notification ` | 465 | █████████░ 87% | 10/10 |
| `security ` | 553 | █████████░ 85% | 10/10 |
| `security-isolation ` | 565 | █████████░ 85% | 10/10 |
| `testing ` | 572 | ████████░░ 84% | 10/10 |
| `api-standards ` | 368 | █████████░ 90% | 8/10 |
| `database ` | 619 | ████████░░ 83% | 8/10 |
| `caching ` | 650 | ████████░░ 82% | 6/10 |
| `configuration ` | 547 | █████████░ 85% | 6/10 |
| `controllers-services ` | 679 | ████████░░ 81% | 6/10 |
| `deployment ` | 576 | ████████░░ 84% | 6/10 |
| `documentation ` | 858 | ████████░░ 77% | 6/10 |
| `error-handling ` | 518 | █████████░ 86% | 6/10 |
| `file-uploads ` | 352 | █████████░ 90% | 6/10 |
| `observability ` | 595 | ████████░░ 84% | 6/10 |
| `performance ` | 817 | ████████░░ 78% | 6/10 |
| `real-time ` | 637 | ████████░░ 83% | 6/10 |
| `scheduling ` | 359 | █████████░ 90% | 6/10 |
| `search ` | 458 | █████████░ 87% | 6/10 |
| `transport ` | 466 | █████████░ 87% | 6/10 |
📦 nextjs (18 skills | avg 506 tokens | quality 8.1/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `data-fetching ` | 436 | █████████░ 88% | 10/10 |
| `i18n ` | 571 | ████████░░ 84% | 10/10 |
| `pages-router ` | 961 | ███████░░░ 74% | 10/10 |
| `rendering ` | 422 | █████████░ 88% | 10/10 |
| `security ` | 355 | █████████░ 90% | 10/10 |
| `testing ` | 368 | █████████░ 90% | 10/10 |
| `tooling ` | 367 | █████████░ 90% | 10/10 |
| `app-router ` | 533 | █████████░ 85% | 8/10 |
| `architecture ` | 832 | ████████░░ 77% | 8/10 |
| `authentication ` | 314 | █████████░ 91% | 8/10 |
| `server-components ` | 498 | █████████░ 86% | 8/10 |
| `styling ` | 449 | █████████░ 88% | 8/10 |
| `caching ` | 414 | █████████░ 89% | 6/10 |
| `data-access-layer ` | 383 | █████████░ 90% | 6/10 |
| `optimization ` | 573 | ████████░░ 84% | 6/10 |
| `server-actions ` | 586 | ████████░░ 84% | 6/10 |
| `state-management ` | 562 | █████████░ 85% | 6/10 |
| `upgrade ` | 485 | █████████░ 87% | 6/10 |
📦 php (7 skills | avg 340 tokens | quality 9.7/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `best-practices ` | 349 | █████████░ 90% | 10/10 |
| `concurrency ` | 308 | █████████░ 92% | 10/10 |
| `error-handling ` | 363 | █████████░ 90% | 10/10 |
| `language ` | 346 | █████████░ 91% | 10/10 |
| `security ` | 353 | █████████░ 90% | 10/10 |
| `testing ` | 321 | █████████░ 91% | 10/10 |
| `tooling ` | 339 | █████████░ 91% | 8/10 |
📦 quality-engineering (4 skills | avg 480 tokens | quality 8.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `zephyr-test-generation` | 608 | ████████░░ 83% | 10/10 |
| `business-analysis ` | 425 | █████████░ 88% | 8/10 |
| `quality-assurance ` | 380 | █████████░ 90% | 8/10 |
| `jira-integration ` | 507 | █████████░ 86% | 6/10 |
📦 react (8 skills | avg 427 tokens | quality 9.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `component-patterns ` | 430 | █████████░ 88% | 10/10 |
| `hooks ` | 595 | ████████░░ 84% | 10/10 |
| `performance ` | 672 | ████████░░ 82% | 10/10 |
| `security ` | 362 | █████████░ 90% | 10/10 |
| `testing ` | 347 | █████████░ 91% | 10/10 |
| `state-management ` | 379 | █████████░ 90% | 8/10 |
| `typescript ` | 352 | █████████░ 90% | 8/10 |
| `tooling ` | 281 | █████████░ 92% | 6/10 |
📦 react-native (13 skills | avg 469 tokens | quality 10.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `architecture ` | 762 | ████████░░ 79% | 10/10 |
| `components ` | 526 | █████████░ 86% | 10/10 |
| `deployment ` | 469 | █████████░ 87% | 10/10 |
| `navigation ` | 477 | █████████░ 87% | 10/10 |
| `performance ` | 526 | █████████░ 86% | 10/10 |
| `platform-specific ` | 454 | █████████░ 88% | 10/10 |
| `react-native-dls ` | 288 | █████████░ 92% | 10/10 |
| `react-native-navigation` | 343 | █████████░ 91% | 10/10 |
| `react-native-notifications` | 351 | █████████░ 90% | 10/10 |
| `security ` | 549 | █████████░ 85% | 10/10 |
| `state-management ` | 444 | █████████░ 88% | 10/10 |
| `styling ` | 436 | █████████░ 88% | 10/10 |
| `testing ` | 470 | █████████░ 87% | 10/10 |
📦 spring-boot (10 skills | avg 393 tokens | quality 9.8/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `api-design ` | 330 | █████████░ 91% | 10/10 |
| `best-practices ` | 414 | █████████░ 89% | 10/10 |
| `data-access ` | 338 | █████████░ 91% | 10/10 |
| `deployment ` | 356 | █████████░ 90% | 10/10 |
| `microservices ` | 339 | █████████░ 91% | 10/10 |
| `observability ` | 334 | █████████░ 91% | 10/10 |
| `scheduling ` | 288 | █████████░ 92% | 10/10 |
| `security ` | 542 | █████████░ 85% | 10/10 |
| `testing ` | 375 | █████████░ 90% | 10/10 |
| `architecture ` | 610 | ████████░░ 83% | 8/10 |
📦 swift (8 skills | avg 377 tokens | quality 10.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `best-practices ` | 375 | █████████░ 90% | 10/10 |
| `concurrency ` | 366 | █████████░ 90% | 10/10 |
| `error-handling ` | 337 | █████████░ 91% | 10/10 |
| `language ` | 362 | █████████░ 90% | 10/10 |
| `memory-management ` | 359 | █████████░ 90% | 10/10 |
| `swiftui ` | 404 | █████████░ 89% | 10/10 |
| `testing ` | 415 | █████████░ 89% | 10/10 |
| `tooling ` | 395 | █████████░ 89% | 10/10 |
📦 typescript (4 skills | avg 541 tokens | quality 10.0/10)
| Skill | Tokens | Savings (vs Heavy) | Quality |
| ----------------------- | ------ | ------------------ | ------- |
| `best-practices ` | 508 | █████████░ 86% | 10/10 |
| `language ` | 594 | ████████░░ 84% | 10/10 |
| `security ` | 452 | █████████░ 88% | 10/10 |
| `tooling ` | 611 | ████████░░ 83% | 10/10 |
## 🏆 Quality Leaders
| Rank | Skill | Category | Quality | Tokens |
| ---- | ----------------------- | -------- | ------- | ------ |
| 1 | `android-navigation ` | android | 10/10 | 318 |
| 2 | `android-notifications` | android | 10/10 | 337 |
| 3 | `architecture ` | android | 10/10 | 445 |
| 4 | `background-work ` | android | 10/10 | 262 |
| 5 | `compose ` | android | 10/10 | 356 |
| 6 | `concurrency ` | android | 10/10 | 284 |
| 7 | `deployment ` | android | 10/10 | 311 |
| 8 | `di ` | android | 10/10 | 306 |
| 9 | `legacy-navigation ` | android | 10/10 | 271 |
| 10 | `legacy-security ` | android | 10/10 | 308 |
## 📐 Methodology & Baseline Justification
### Why These Baselines?
The baselines are derived from **real, token-counted example prompts** that represent what a developer actually writes when there is no structured skill available.
Using NestJS as the **Reference Unit**: Because we measure instruction volume replaced, using a high-density reference ensures scientific consistency across all tech stacks.
#### 🟡 Reference Technical Prompt — Light — 1449 tokens
> **Reference Technical Prompt — Light (e.g., NestJS)**
> A compact inline system prompt used as a reference for token count calibration. Representative of focused developer instructions without a structured skill.
#### 🔴 Reference Technical Prompt — Heavy — 3656 tokens
> **Reference Technical Prompt — Heavy (e.g., NestJS Architecture)**
> A comprehensive architect-level inline prompt used as a reference for complex tasks. Includes deep patterns and rules sent by developers when no skill is present.
### 🏆 Detailed Quality Rubric (0–10)
To ensure skills are not just "short" but actually **high quality**, every skill is scored against this structural rubric:
| Score | Criteria | Rationale |
| ------ | ------------------------- | ------------------------------------------------------ |
| **+2** | **Structured Guidelines** | At least 3 specific instructions/bullet points. |
| **+2** | **Anti-Patterns** | Specifically listing what the LLM should *avoid*. |
| **+2** | **Reference Examples** | Presence of a verified `references/` folder with code. |
| **+2** | **Token Optimality** | Entire `SKILL.md` is ≤100 lines (forces brevity). |
| **+2** | **Trigger Metadata** | Proper keywords and file-match triggers defined. |
### 🛡️ How to Verify This Report
Trust but verify. You can audit the raw data and run the benchmark yourself:
1. **Clone the repo** and install dependencies (`pnpm install`).
2. **Inspect Source**: The benchmark logic is open in [cli/src/scripts/benchmark/](./cli/src/scripts/benchmark/).
### Pricing (per 1M input tokens, Feb 2026)
- **Gemini 3 Flash**: $0.50
- **GPT-5**: $1.25
- **Gemini 3.1 Pro**: $2.00
- **Claude Sonnet 4.5**: $3.00