# AgentPack Public Benchmark Table - date: 2026-06-25 - suite: public real-repo commits (pallets-click, pallets-itsdangerous, pallets-markupsafe, vite, gin, spring-petclinic, nestjs) - agentpack version/commit: 0.3.31 - cases: 107 - command: `agentpack benchmark --release-gate` | Metric | Value | |---|---:| | avg precision | 43.8% | | avg recall | 65.7% | | avg F1 | 48.3% | | avg token precision | 51.4% | | pack p50 tokens | 315 | | pack p95 tokens | 1,137 | | low-budget cases with last-summary diagnostic | 3 | | avg last-summary waste | 33 tokens | | avg precision delta if drop last summary | -4.9% | | Repo / suite | Task | Type | Mode | Budget | Packed tokens | Recall | Cand R@50 | Cand P@3 | Token precision | Rank@K | Time | Misses | |---|---|---|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | current repo | Implement a regex-based solution to hide hidden input | python-cli | balanced | 2,000 | 1,110 | 50.0% | 100.0% | 0.0% | 39.2% | 5 | 3.96s | 1 | | current repo | FuncParamType should use ValueError message in self.fail | python-cli | balanced | 6,000 | 838 | 50.0% | 100.0% | 33.3% | 50.2% | 4 | 4.72s | 1 | | current repo | Add NoSuchCommand exception with suggestions for misspelled commands | python-cli | balanced | 4,000 | 1,065 | 50.0% | 83.3% | 66.7% | 100.0% | 56 | 5.28s | 3 | | current repo | Ensure fish completion handles multiline help strings correctly | python-cli | balanced | 2,000 | 1,007 | 100.0% | 100.0% | 66.7% | 69.8% | 2 | 5.23s | 0 | | current repo | Extract _guess_type from convert_type and add overloads | python-cli | balanced | 1,200 | 940 | 100.0% | 100.0% | 33.3% | 44.8% | 1 | 5.34s | 0 | | current repo | Fix broken fish completion and multiline help string | python-cli | balanced | 3,000 | 1,133 | 100.0% | 100.0% | 33.3% | 27.7% | 2 | 4.24s | 0 | | current repo | start version 8.5.0 | python-cli | balanced | 3,000 | 467 | 100.0% | 100.0% | 33.3% | 18.2% | 3 | 4.05s | 0 | | current repo | Add missing space between option help text and deprecation label (#3423) | python-cli | balanced | 3,000 | 1,303 | 0.0% | 100.0% | 0.0% | 0.0% | 10 | 4.15s | 1 | | current repo | Fix sentinel typing and its uses in parser (#3396) | python-cli | balanced | 3,000 | 721 | 100.0% | 100.0% | 33.3% | 100.0% | 8 | 4.92s | 0 | | Add file-like pager | `click.get_pager_file()` (#1572) | python-cli | balanced | 3,000 | 1,174 | 50.0% | 100.0% | 66.7% | 64.1% | 45 | 4.57s | 2 | | fix | `_termui_impl.open_url()` — 'start' on Windows is a cmd built-in, not an executable (#3186) | python-cli | balanced | 3,000 | 852 | 100.0% | 100.0% | 33.3% | 38.1% | 1 | 4.81s | 0 | | current repo | Fix readline backspace/line-wrapping on linux (#2969) | python-cli | balanced | 3,000 | 802 | 0.0% | 100.0% | 0.0% | 0.0% | 5 | 4.57s | 1 | | current repo | `Parameter` typing improvements (#2805) | python-cli | balanced | 3,000 | 1,134 | 66.7% | 100.0% | 33.3% | 62.9% | 7 | 4.57s | 1 | | current repo | Revert "Use `default=True` as a sentinel for non-boolean flags" | python-cli | balanced | 3,000 | 1,137 | 100.0% | 100.0% | 0.0% | 38.2% | 7 | 4.62s | 0 | | current repo | Use `default=True` as a sentinel for non-boolean flags | python-cli | balanced | 3,000 | 1,137 | 100.0% | 100.0% | 0.0% | 38.2% | 8 | 4.68s | 0 | | current repo | Fix completions for quoted/escaped parameters in Fish (#3013) | python-cli | balanced | 3,000 | 1,131 | 100.0% | 100.0% | 33.3% | 27.8% | 2 | 9.96s | 0 | | current repo | Fix Zsh completions with colons (#2846) | python-cli | balanced | 3,000 | 1,070 | 100.0% | 100.0% | 33.3% | 29.3% | 1 | 10.38s | 0 | | current repo | update dev dependencies | python-cli | balanced | 3,000 | 325 | 50.0% | 50.0% | 33.3% | 100.0% | 132 | 7.15s | 1 | | current repo | update dev dependencies | python-cli | balanced | 3,000 | 325 | 25.0% | 75.0% | 33.3% | 100.0% | 58 | 5.14s | 3 | | current repo | Expand `Choice` token normalization + make generic (#2796) | python-cli | balanced | 3,000 | 1,116 | 66.7% | 100.0% | 0.0% | 72.8% | 29 | 2.80s | 1 | | current repo | Improve typing on Windows (#2803) | python-cli | balanced | 3,000 | 704 | 66.7% | 100.0% | 66.7% | 94.7% | 9 | 2.79s | 1 | | current repo | Correct click.edit typing (#2804) | python-cli | balanced | 3,000 | 1,150 | 100.0% | 100.0% | 66.7% | 62.5% | 3 | 2.68s | 0 | | current repo | Only try to set flag_value if is_flag is true (#2829) | python-cli | balanced | 3,000 | 1,038 | 0.0% | 100.0% | 0.0% | 0.0% | 7 | 2.72s | 1 | | current repo | Add CliRunner default `catch_exceptions` parameter (#2818) | python-cli | balanced | 3,000 | 937 | 100.0% | 100.0% | 33.3% | 31.5% | 3 | 2.79s | 0 | | current repo | Introduce Parameter.deprecated + Command.deprecated message customization (#2271) | python-cli | balanced | 3,000 | 882 | 100.0% | 100.0% | 33.3% | 48.5% | 2 | 3.25s | 0 | | current repo | Drop end of life python versions | python-library | balanced | 2,000 | 148 | 33.3% | 100.0% | 33.3% | 58.1% | 18 | 2.90s | 2 | | current repo | start version 2.2.0 | python-library | balanced | 2,500 | 43 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 0.39s | 0 | | current repo | start version 2.1.0.dev0 | python-library | balanced | 2,500 | 49 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 0.41s | 0 | | current repo | start version 2.0.1.dev0 | python-library | balanced | 2,500 | 49 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 0.46s | 0 | | current repo | mention itsdangerous in deprecation message | python-library | balanced | 2,500 | 289 | 66.7% | 100.0% | 33.3% | 68.2% | 16 | 0.43s | 1 | | current repo | prerelease version 2.0.0rc1 | python-library | balanced | 2,500 | 49 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 0.41s | 0 | | current repo | add typing with mypy | python-library | balanced | 2,500 | 439 | 100.0% | 100.0% | 0.0% | 15.5% | 5 | 0.55s | 0 | | current repo | Remove previously deprecated code | python-library | balanced | 1,000 | 86 | 100.0% | 100.0% | 33.3% | 100.0% | 3 | 2.74s | 0 | | current repo | Change to DeprecationWarning | python-library | balanced | 1,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 10 | 2.77s | 1 | | current repo | start version 3.1.0 | python-library | balanced | 2,000 | 296 | 100.0% | 100.0% | 33.3% | 29.1% | 2 | 1.95s | 0 | | current repo | fix version | python-library | balanced | 2,000 | 296 | 100.0% | 100.0% | 33.3% | 29.1% | 1 | 1.91s | 0 | | current repo | start version 3.0.1 | python-library | balanced | 2,000 | 296 | 100.0% | 100.0% | 33.3% | 29.1% | 2 | 1.79s | 0 | | current repo | remove unused config | python-library | balanced | 2,000 | 87 | 0.0% | 100.0% | 0.0% | 0.0% | 6 | 1.78s | 1 | | current repo | add gha-update, remove dependabot | python-library | balanced | 2,000 | 85 | 0.0% | 100.0% | 0.0% | 0.0% | 10 | 2.77s | 1 | | current repo | use __class__ instead of type() | python-library | balanced | 2,000 | 455 | 100.0% | 100.0% | 33.3% | 47.7% | 2 | 3.54s | 0 | | current repo | fix license metadata | python-library | balanced | 2,000 | 315 | 100.0% | 100.0% | 33.3% | 27.3% | 2 | 2.25s | 0 | | current repo | update dev dependencies | python-library | balanced | 2,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 21 | 1.94s | 1 | | current repo | remove dynamic version | python-library | balanced | 2,000 | 315 | 100.0% | 100.0% | 33.3% | 27.3% | 1 | 1.96s | 0 | | current repo | simplify using assignment expressions | python-library | balanced | 2,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 8 | 2.12s | 1 | | chore | update `create-react-app` links (#22659) | typescript | balanced | 4,000 | 120 | 100.0% | 100.0% | 0.0% | 84.2% | 4 | 13.76s | 0 | | chore | correct `parseAst`/`parseAstAsync` deprecation hints (#22656) | typescript | balanced | 4,000 | 336 | 100.0% | 100.0% | 33.3% | 38.7% | 1 | 13.09s | 0 | | fix(html) | insert import map before modulepreload that is not self-close tag (#21409) | typescript | balanced | 4,000 | 760 | 100.0% | 100.0% | 33.3% | 77.4% | 1 | 15.84s | 0 | | chore | fix tailwind playground comments (#22614) | typescript | balanced | 4,000 | 329 | 100.0% | 100.0% | 66.7% | 3.0% | 2 | 12.56s | 0 | | fix | apply correct fs restrictions for pnpm gvs (#22415) | typescript | balanced | 4,000 | 784 | 100.0% | 100.0% | 0.0% | 36.2% | 8 | 13.35s | 0 | | chore | clean up eslint config (#22616) | typescript | balanced | 4,000 | 364 | 0.0% | 100.0% | 0.0% | 0.0% | 26 | 16.83s | 1 | | feat(types) | add more precise typing for known `query` types to match known `as` types (#21863) | typescript | balanced | 4,000 | 1,039 | 0.0% | 0.0% | 0.0% | 0.0% | 203 | 17.77s | 1 | | refactor | match import glob common base by path segment correctly (#22558) | typescript | balanced | 4,000 | 592 | 50.0% | 50.0% | 33.3% | 42.4% | 219 | 14.33s | 1 | | fix(optimizer) | preserve sourcemaps for transformed optimized deps with follow-up transforms (#22428) | typescript | balanced | 4,000 | 852 | 66.7% | 100.0% | 33.3% | 74.4% | 14 | 18.69s | 1 | | docs | clarify `loadEnv` merges `process.env` (#22561) | typescript | balanced | 4,000 | 656 | 100.0% | 100.0% | 33.3% | 11.4% | 1 | 17.52s | 0 | | feat | integrate with Vite Task for zero-config build caching (#22453) | typescript | balanced | 4,000 | 837 | 33.3% | 66.7% | 33.3% | 73.6% | 342 | 23.58s | 4 | | fix | use node_modules/.vite as cacheDir when node_modules exists (#21777) | typescript | balanced | 4,000 | 1,387 | 50.0% | 50.0% | 0.0% | 39.6% | 143 | 13.28s | 1 | | feat(server) | support multiple hosts in __VITE_ADDITIONAL_SERVER_ALLOWED_HOSTS (#21501) | typescript | balanced | 4,000 | 420 | 100.0% | 100.0% | 33.3% | 90.2% | 13 | 18.10s | 0 | | feat(css) | support lightningcss plugin dependency (#21748) | typescript | balanced | 4,000 | 718 | 50.0% | 100.0% | 33.3% | 89.7% | 27 | 12.39s | 2 | | feat | add warning to discourage Vite with yarn pnp (#21906) | typescript | balanced | 4,000 | 1,189 | 0.0% | 100.0% | 0.0% | 0.0% | 13 | 12.72s | 1 | | feat(html) | add `html.additionalAssetSources` option (#21412) | typescript | balanced | 4,000 | 1,357 | 50.0% | 50.0% | 66.7% | 100.0% | 1180 | 12.96s | 3 | | fix | reject windows alternate paths (#22572) | typescript | balanced | 4,000 | 553 | 0.0% | 33.3% | 0.0% | 0.0% | 1129 | 12.98s | 3 | | fix(deps) | update all non-major dependencies (#22511) | typescript | balanced | 4,000 | 406 | 0.0% | 0.0% | 0.0% | 0.0% | 170 | 14.76s | 1 | | fix(optimizer) | close the rolldown bundle when write() rejects (#22528) | typescript | balanced | 4,000 | 956 | 100.0% | 100.0% | 33.3% | 56.2% | 1 | 16.20s | 0 | | refactor | correct logic in `collectAllModules` function (#22562) | typescript | balanced | 4,000 | 549 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 12.04s | 0 | | fix(response) | panic on Hijack/CloseNotify when wrapper unsupported (#4645) | go-service | balanced | 3,500 | 267 | 100.0% | 100.0% | 66.7% | 65.5% | 3 | 1.46s | 0 | | docs(context) | align inline comments in GetPostForm example (#4675) | go-service | balanced | 3,500 | 266 | 100.0% | 100.0% | 33.3% | 34.2% | 2 | 1.27s | 0 | | feat(context) | add Scheme() with proper reverse proxy support (#4655) | go-service | balanced | 3,500 | 266 | 100.0% | 100.0% | 66.7% | 68.8% | 3 | 1.33s | 0 | | refactor | optimize error message concatenation in default_validator (#4685) | go-service | balanced | 3,500 | 259 | 100.0% | 100.0% | 33.3% | 33.6% | 1 | 1.42s | 0 | | test(engine) | add regression tests for HandleContext with NoRoute (#4571) | go-service | balanced | 3,500 | 350 | 100.0% | 100.0% | 33.3% | 24.3% | 2 | 1.30s | 0 | | refactor(test) | use the built-in max/min to simplify the code (#4576) | go-service | balanced | 3,500 | 261 | 0.0% | 100.0% | 0.0% | 0.0% | 15 | 1.40s | 1 | | feat(render) | add PDF renderer and tests (#4491) | go-service | balanced | 3,500 | 186 | 50.0% | 100.0% | 33.3% | 53.8% | 12 | 1.35s | 2 | | docs | document and finalize Gin v1.12.0 release (#4551) | go-service | balanced | 3,500 | 253 | 0.0% | 0.0% | 0.0% | 0.0% | 62 | 1.24s | 1 | | ci | update Go version support to 1.25+ across CI and docs (#4550) | go-service | balanced | 3,500 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 38 | 1.35s | 2 | | fix(tree) | panic in findCaseInsensitivePathRec with RedirectFixedPath (#4535) | go-service | balanced | 3,500 | 169 | 100.0% | 100.0% | 66.7% | 100.0% | 2 | 1.59s | 0 | | test(context) | use http.StatusContinue constant instead of magic number 100 (#4542) | go-service | balanced | 3,500 | 360 | 50.0% | 100.0% | 33.3% | 50.8% | 15 | 1.34s | 2 | | test(render) | add comprehensive error handling tests (#4541) | go-service | balanced | 3,500 | 275 | 100.0% | 100.0% | 33.3% | 3.3% | 1 | 1.37s | 0 | | fix(render) | write content length in Data.Render (#4206) | go-service | balanced | 3,500 | 192 | 50.0% | 100.0% | 33.3% | 4.7% | 6 | 1.28s | 1 | | chore(logger) | allow skipping query string output (#4547) | go-service | balanced | 3,500 | 104 | 100.0% | 100.0% | 33.3% | 91.3% | 5 | 1.39s | 0 | | chore(binding) | upgrade bson dependency to mongo-driver v2 (#4549) | go-service | balanced | 3,500 | 119 | 60.0% | 100.0% | 100.0% | 92.4% | 5 | 1.42s | 2 | | test(render) | add comprehensive tests for MsgPack render (#4537) | go-service | balanced | 3,500 | 277 | 100.0% | 100.0% | 33.3% | 31.8% | 2 | 1.33s | 0 | | refactor | replace magic numbers with named constants in bodyAllowedForStatus (#4529) | go-service | balanced | 3,500 | 91 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 1.31s | 0 | | fix | Correct typos, improve documentation clarity, and remove dead code (#4511) | go-service | balanced | 3,500 | 87 | 0.0% | 25.0% | 0.0% | 0.0% | 121 | 1.34s | 4 | | feat(render) | add bson protocol (#4145) | go-service | balanced | 3,500 | 265 | 33.3% | 50.0% | 33.3% | 66.8% | 123 | 1.29s | 4 | | refactor | for loop can be modernized using range over int (#4392) | go-service | balanced | 3,500 | 179 | 14.3% | 42.9% | 33.3% | 51.4% | 123 | 1.28s | 6 | | current repo | Register native resource hints for nested db/{h2,mysql,postgres}/ scripts | java-spring | balanced | 4,000 | 220 | 100.0% | 100.0% | 33.3% | 44.1% | 2 | 1.22s | 0 | | current repo | Validate future visit dates | java-spring | balanced | 4,000 | 286 | 66.7% | 100.0% | 100.0% | 66.8% | 3 | 1.17s | 1 | | fix | add @Size validation on Person firstName and lastName | java-spring | balanced | 4,000 | 292 | 100.0% | 100.0% | 33.3% | 32.9% | 1 | 1.29s | 0 | | current repo | Add missing validation test for Person.lastName | java-spring | balanced | 4,000 | 293 | 100.0% | 100.0% | 0.0% | 33.8% | 6 | 1.28s | 0 | | current repo | Be more careful with allowed fields in binders | java-spring | balanced | 4,000 | 294 | 100.0% | 100.0% | 100.0% | 100.0% | 3 | 1.18s | 0 | | current repo | Update MySQL and PostgreSQL Docker images to latest versions | java-spring | balanced | 4,000 | 191 | 50.0% | 100.0% | 33.3% | 50.8% | 9 | 1.39s | 1 | | current repo | Revert checkstyle to latest v12 | java-spring | balanced | 4,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 4 | 1.27s | 1 | | current repo | Set fetch size for Hibernate | java-spring | balanced | 4,000 | 199 | 0.0% | 100.0% | 0.0% | 0.0% | 47 | 1.22s | 1 | | current repo | Fix main method for docker compose | java-spring | balanced | 4,000 | 122 | 0.0% | 0.0% | 0.0% | 0.0% | 61 | 1.29s | 1 | | current repo | Upgrade to Spring Boot 4.0.1 | java-spring | balanced | 4,000 | 266 | 100.0% | 100.0% | 33.3% | 31.6% | 2 | 1.30s | 0 | | current repo | Use snake case physical naming strategy | java-spring | balanced | 4,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 33 | 1.22s | 4 | | current repo | Polish | java-spring | balanced | 4,000 | 0 | 0.0% | 100.0% | 0.0% | 100.0% | 14 | 1.15s | 1 | | current repo | Use more specific test dependencies | java-spring | balanced | 4,000 | 269 | 100.0% | 100.0% | 33.3% | 31.2% | 2 | 1.32s | 0 | | current repo | Use canonical starter name for Boot 4.0 | java-spring | balanced | 4,000 | 265 | 100.0% | 100.0% | 33.3% | 31.7% | 2 | 1.28s | 0 | | current repo | Support building with Java 17 | java-spring | balanced | 4,000 | 280 | 100.0% | 100.0% | 33.3% | 30.0% | 2 | 1.45s | 0 | | current repo | Revert removal of --release 17 | java-spring | balanced | 4,000 | 84 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 1.36s | 0 | | current repo | Upgrade to Spring Boot 4.0.0 | java-spring | balanced | 4,000 | 266 | 16.7% | 100.0% | 33.3% | 31.6% | 21 | 1.32s | 5 | | current repo | Switch to building the project with Java 25 | java-spring | balanced | 4,000 | 266 | 100.0% | 100.0% | 33.3% | 31.6% | 1 | 1.17s | 0 | | current repo | Remove unused imports | java-spring | balanced | 4,000 | 395 | 66.7% | 100.0% | 33.3% | 50.1% | 10 | 1.24s | 1 | | current repo | Update to current versions | java-spring | balanced | 4,000 | 84 | 100.0% | 100.0% | 33.3% | 100.0% | 1 | 1.22s | 0 | | fix | should skip transient providers for snapshots | typescript-monorepo | balanced | 5,000 | 454 | 66.7% | 100.0% | 66.7% | 80.6% | 5 | 6.93s | 1 | | test(core) | add deeply nested transient providers in scoped chains | typescript-monorepo | balanced | 5,000 | 438 | 100.0% | 100.0% | 33.3% | 54.3% | 2 | 7.25s | 0 | | test(core) | add several tests for parallel request-scoped providers | typescript-monorepo | balanced | 5,000 | 703 | 33.3% | 83.3% | 0.0% | 17.2% | 195 | 7.52s | 4 | ## Notes - Use real historical tasks with `expected_files` set to files actually changed. - Treat small curated suites as smoke proof; expand case counts before broad external claims. - Keep synthetic fixture smoke results separate from public repo claims. - Investigate misses with `agentpack benchmark --misses` and `agentpack explain --omitted`.