# Public Benchmark Gallery Status: public-safe evidence surface. This page summarizes only self-checks and sanitized fixture evidence. README claim wording is tracked in [`docs/README-CLAIMS.md`](README-CLAIMS.md). ## Generated Evidence This section is generated from runtime evidence. It separates README-ready claim candidates from evidence that still needs more repo shapes or manual review. Generated: 2026-06-28T16:03:51.303Z Source: `.anamnesis/evidence/events.jsonl; docs/benchmark-evidence/public-shapes.jsonl` (25 valid, 0 invalid) ## Evidence Entries | Project | Kind | Generated | Evidence | Result | Claim candidate | |---|---|---|---|---|---| | anamnesis | dogfood-check | 2026-06-24T15:45:55.260Z | docs/DOGFOOD.md; .anamnesis/evidence/events.jsonl | dogfood 5/5; tools claude-code, codex, cursor | anamnesis dogfood check passes 5/5 continuity criteria across claude-code, codex, cursor. | | sanitized-nextjs-adoption | benchmark-compare | 2026-05-07T07:19:25.533Z | docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl | 3 improved, 0 regressed, 6 unchanged | sanitized-nextjs-adoption before/after benchmark improved 3 scorecard dimension(s) with 0 regressions. | | sanitized-nextjs-frontend | benchmark-report | 2026-05-07T07:15:06.340Z | docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl | ready layers 4/5; continuity 6/6; doctor 0 errors, 1 warnings | sanitized-nextjs-frontend current benchmark has continuity 6/6, ready layers 4/5, and doctor errors 0. | | sanitized-nest-k8s-backend | benchmark-report | 2026-05-07T07:15:06.340Z | docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl | ready layers 4/5; continuity 6/6; doctor 0 errors, 2 warnings | sanitized-nest-k8s-backend current benchmark has continuity 6/6, ready layers 4/5, and doctor errors 0. | | anamnesis | benchmark-report | 2026-05-07T05:25:00.700Z | docs/BENCHMARKS.md; .anamnesis/evidence/events.jsonl | ready layers 3/5; continuity 6/6; doctor 0 errors, 0 warnings | anamnesis current benchmark has continuity 6/6, ready layers 3/5, and doctor errors 0. | ## README Claim Candidates - **dogfood-check-anamnesis**: anamnesis dogfood check passes 5/5 continuity criteria across claude-code, codex, cursor. Evidence: docs/DOGFOOD.md; .anamnesis/evidence/events.jsonl Boundary: Self-check evidence for this managed repo; skipped external smokes must stay disclosed. - **benchmark-compare-sanitized-nextjs-adoption**: sanitized-nextjs-adoption before/after benchmark improved 3 scorecard dimension(s) with 0 regressions. Evidence: docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl Boundary: Same-repo deterministic scorecard delta only; not a model-intelligence benchmark. - **benchmark-report-sanitized-nextjs-frontend**: sanitized-nextjs-frontend current benchmark has continuity 6/6, ready layers 4/5, and doctor errors 0. Evidence: docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl Boundary: Current deterministic context surface only; limitations depend on installed fragments and missing Layer A/B targets. - **benchmark-report-sanitized-nest-k8s-backend**: sanitized-nest-k8s-backend current benchmark has continuity 6/6, ready layers 4/5, and doctor errors 0. Evidence: docs/BENCHMARKS.md; docs/benchmark-evidence/public-shapes.jsonl Boundary: Current deterministic context surface only; limitations depend on installed fragments and missing Layer A/B targets. - **benchmark-report-anamnesis**: anamnesis current benchmark has continuity 6/6, ready layers 3/5, and doctor errors 0. Evidence: docs/BENCHMARKS.md; .anamnesis/evidence/events.jsonl Boundary: Current deterministic context surface only; limitations depend on installed fragments and missing Layer A/B targets. ## Release Warnings _No release warnings._ ## Claim Policy Only make public README claims from benchmark entries that: - expose no proprietary source snippets, credentials, hostnames, local paths, or internal business logic - use sanitized fixture names - include the repo shape and fragment set - report before/after or baseline/current state with the same metric - state limitations instead of implying universal behavior Do not claim yet: - "works for every framework" - "100% native UX across all agents" - "all ontology is fully automatic" - "registry fragments are secure" - "benchmark-proven across the ecosystem" ## Collection Targets Future public benchmark evidence should come from: 1. synthetic fixtures maintained inside this repository 2. intentionally public example repositories 3. sanitized archives reviewed before publication Private validation may guide product priorities, but it must stay out of public docs, npm package artifacts, generated gallery regions, and README claim candidates.