# unsorry — status report (2026-06-12) What the project has actually achieved, stated against verified ground truth rather than ambition. The house style is honesty: each claim is followed by its limit. ## One paragraph In roughly three days (v0.1.0 → v1.6.2, ~195 merged PRs), `unsorry` went from a coordination skeleton to a swarm that has **proved five mathlib-absent results, demonstrated its two load-bearing mechanisms end-to-end** (decomposition and dependency reuse), **survived its own operational failure modes** (three CLI quota outages in one day, now absorbed unattended; its first gate false negative caught, fixed, and canary-guarded within two hours), **passed three adversarial red-team rounds** including the CRITICAL hole an external review found, and **built — and auto-runs — a policy-compliant pipeline to upstream its novel lemmas to mathlib**. The honest frontier is unchanged: every result so far is elementary, the hard-target ceiling is untested, and nothing has landed in the commons yet. ## What is true, with verification ### The architecture works as designed - **The kernel is the only correctness oracle.** Every merge passes Gate A (build `--wfail`, axiom audit against the `{propext, Classical.choice, Quot.sound}` whitelist, leanchecker kernel replay, a regenerated statement-binding obligation) and Gate B (deterministic AISP hygiene). No human is in the correctness path. - **Soundness has been adversarially tested three times** and holds: round 001 (9/9 bypass vectors blocked), round 002 (3/3 statement-vacuity attacks blocked by the binding gate), and [round 003](../metrics/gate-a-redteam-003.md) (the #190 same-PR goal-tampering attack blocked by goal-statement immutability — the *only* layer that catches a fully self-consistent rewrite). - **The gate's one observed defect failed in the sound direction** ([#231](https://github.com/agenticsnz/unsorry/issues/231)): a statement whose named hypothesis follows an implicit binder (`{n : ℕ} (hn : 1 < n)`) made the regenerated binding obligation trip `linter.unusedVariables` under `--wfail`, so Gate A rejected *every correct proof* of that goal — a false negative, never a false positive. Fixed by scoping the lint suppression to the generated bindings, whose force is the type-check, not lints (#225), and pinned by `binder-shape-canary`, a permanent sound goal carrying exactly that binder shape whose binding the gate regenerates and rebuilds every run (#233) — a regression now goes red at the gate, not on a contributor's PR. ### It has produced novel, kernel-verified mathematics Five results verified mathlib-absent **before** the run and proved by the swarm: | Result | What | How | |---|---|---| | `nicomachus_sum_cubes` | ∑k³ = (∑k)² | direct, 24-line multi-step (phase2-run-001) | | `sum_range_pow_four_closed` | Faulhaber k=4 closed form | direct induction | | `platonic_schlafli_pairs` | the five Platonic Schläfli pairs | **forced depth-3 decomposition**, 13 lemmas ([phase3-run-001](../metrics/phase3-run-001.md)) | | `sum_range_cube_eq_triangular_sq` | triangular closed form of Nicomachus | **dependency reuse** of the above, 4m57s ([phase3-run-002](../metrics/phase3-run-002.md)) | | `not_prime_pow_four_add_four` | n⁴ + 4 is never prime for n > 1 (Sophie Germain) | direct, by external machine `binto-labs` (#221) — the proof that surfaced gate bug #231 | ### Both load-bearing mechanisms are demonstrated, not just built - **The chain** (v1.4.0): a target too big for one attempt became a depth-3 tree of 13 lemmas, each kernel-verified, recomposed level by level — the parent's proof is literally the composition of its sub-lemmas. - **Compounding** (v1.5.0): a merged lemma was *imported and invoked* by a later proof, turning a difficulty-2 target into a two-line module — plus the chain's four recompositions are four more instances of the same reuse mechanism. ### It survives its operating environment Born from three real quota outages in one day: the **infrastructure-failure guard** (ADR-016 — a fast-failed call + failed health probe is never read as goal evidence) and the **supervisor** (ADR-017 — exponential backoff, PR hygiene, claim guard). The third outage was absorbed with **zero queue corruption and no human intervention**; the first two had each demoted a whole goal tree and needed manual repair. The **progressive effort ladder** (ADR-015) spends deep reasoning only where a cheaper attempt failed — 11 of 13 run-001 proofs closed at the cheapest rung. ### It is hardened against the external review [Issue #190](https://github.com/agenticsnz/unsorry/issues/190)'s CRITICAL (goal tampering), HIGH (workflow self-protection), MEDIUM (unpinned actions) and LOW (audit corpus) findings are all addressed — goal-statement immutability (ADR-018), SHA-pinned actions + CODEOWNERS + a settings checklist (ADR-019), and an `opaque` audit fixture. The review's own verdict — "serious, well-engineered work… README's framing is accurate" — and its one stale finding ("decomposition never completed", true when written, closed by run-001) are recorded honestly. ### The public-good path is built and self-starting The **upstream pipeline** (ADR-020) scans nightly for packet-eligible lemmas (proved + absence-verified + unpacketed), re-checks absence at mathlib HEAD, and auto-opens a sponsor packet PR — a `git apply`-able patch with the human author's header, gate evidence, a paste-ready factual AI disclosure, and an explicit rewrite-in-own-words boundary (mathlib forbids LLM-written conversation). The first run produced **9 packets**, and the first of them (Nicomachus) is stamped **kernel-verified at mathlib HEAD**. The last mechanical mile is now one command (ADR-021, v1.6.1): `python3 -m tools.upstream.raise_pr` turns a HEAD-verified packet into a **draft** mathlib PR — clone, apply, push, pre-filled factual disclosure — with the policy boundary enforced in code: it refuses without the sponsor's `--understood` attestation, refuses an unverified packet, opens drafts only, and writes no review words. The irreducibly human stages — understand the proof, ask Zulip, write the narrative, face review — stay human ([docs/upstreaming.md](../upstreaming.md)). The model is policy-compliant by construction: machine prepares, the human sponsor (signed up) owns every word the community reads. ### It is no longer a single machine Merged work now comes from the agent swarm, the maintainer, **and three external contributor machines** (`mac-158f`, `binto-labs`, `ohdearquant`) — proving lemmas and fixing tooling through the same gates, with no coordination beyond the claims branch. One of those machines (`binto-labs`) contributed the fifth headline result above and, in doing so, found the gate's first false negative — external traffic is now hardening the gates, not just adding lemmas. The contributor path is documented (`CONTRIBUTING.md`, v1.6.2), and the Apache-2.0 `LICENSE` file that the README and every packet's copyright header already referenced now actually exists in the tree. ## The honest limits (unchanged where they were honest) - **Every result is elementary.** The decomposition in run-001 was *forced* by a strangled budget; whether decomposition beats one-shotting on a genuinely hard target is untested. This is the central open question. - **Reuse is depth-1.** Run-002 was one declared dependency edge; a chosen *result* reached through a deep dependency tree routed bottom-up is not yet done. - **Nothing is upstreamed.** The pipeline is built and the packets are ready; mathlib may decline both candidates — a valid, recorded outcome that still validates the machinery. - **One known operational bug:** a claim race past `PROVE_CLAIM_CAP=1` is real but unreproduced; its cost is bounded by PR dedupe, not eliminated. - **The gate can fail closed.** #231 blocked a correct proof; nothing unsound was admitted. The canary pins the one binder shape that bit — a different statement shape tripping a different lint would be found the same way: by blocking a contributor, not by polluting the library. - **Absence is a grep pre-filter,** not a proof; a result proved today could be upstreamed by mathlib tomorrow — the recorded revision makes that detectable, not impossible. ## Where it goes next The frontier is the difficulty ceiling: a target a working mathematician would call non-trivial, reached by a queue that genuinely compounds at depth — and a first lemma actually merged into the commons. Everything built so far is the apparatus for asking that question honestly. --- *Generated as part of the v1.6.0 release; updated the same day through v1.6.2 (the fifth result #221, the #231 gate fix #225 and canary #233, the ADR-021 sponsor helper #229, CONTRIBUTING/LICENSE #232). Verified against the repository at the time of the update: 40 library modules, 39 proved goals (one of them the permanent binder-shape canary), ~195 merged PRs, 9 upstream-eligible lemmas (the fifth result awaits the nightly packet scan), ADRs through ADR-021, three green red-team rounds.*