# unsorry — status report (2026-06-12)

What the project has actually achieved, stated against verified ground truth rather than ambition.
The house style is honesty: each claim is followed by its limit.

## One paragraph

In roughly three days (v0.1.0 → v1.6.2, ~195 merged PRs), `unsorry` went from a coordination
skeleton to a swarm that has **proved five mathlib-absent results, demonstrated its two
load-bearing mechanisms end-to-end** (decomposition and dependency reuse), **survived its own
operational failure modes** (three CLI quota outages in one day, now absorbed unattended; its
first gate false negative caught, fixed, and canary-guarded within two hours), **passed
three adversarial red-team rounds** including the CRITICAL hole an external review found, and
**built — and auto-runs — a policy-compliant pipeline to upstream its novel lemmas to mathlib**.
The honest frontier is unchanged: every result so far is elementary, the hard-target ceiling is
untested, and nothing has landed in the commons yet.

## What is true, with verification

### The architecture works as designed
- **The kernel is the only correctness oracle.** Every merge passes Gate A (build `--wfail`, axiom
  audit against the `{propext, Classical.choice, Quot.sound}` whitelist, leanchecker kernel replay,
  a regenerated statement-binding obligation) and Gate B (deterministic AISP hygiene). No human is
  in the correctness path.
- **Soundness has been adversarially tested three times** and holds: round 001 (9/9 bypass vectors
  blocked), round 002 (3/3 statement-vacuity attacks blocked by the binding gate), and
  [round 003](../metrics/gate-a-redteam-003.md) (the #190 same-PR goal-tampering attack blocked by
  goal-statement immutability — the *only* layer that catches a fully self-consistent rewrite).
- **The gate's one observed defect failed in the sound direction**
  ([#231](https://github.com/agenticsnz/unsorry/issues/231)): a statement whose named hypothesis
  follows an implicit binder (`{n : ℕ} (hn : 1 < n)`) made the regenerated binding obligation trip
  `linter.unusedVariables` under `--wfail`, so Gate A rejected *every correct proof* of that goal —
  a false negative, never a false positive. Fixed by scoping the lint suppression to the generated
  bindings, whose force is the type-check, not lints (#225), and pinned by `binder-shape-canary`, a
  permanent sound goal carrying exactly that binder shape whose binding the gate regenerates and
  rebuilds every run (#233) — a regression now goes red at the gate, not on a contributor's PR.

### It has produced novel, kernel-verified mathematics
Five results verified mathlib-absent **before** the run and proved by the swarm:

| Result | What | How |
|---|---|---|
| `nicomachus_sum_cubes` | ∑k³ = (∑k)² | direct, 24-line multi-step (phase2-run-001) |
| `sum_range_pow_four_closed` | Faulhaber k=4 closed form | direct induction |
| `platonic_schlafli_pairs` | the five Platonic Schläfli pairs | **forced depth-3 decomposition**, 13 lemmas ([phase3-run-001](../metrics/phase3-run-001.md)) |
| `sum_range_cube_eq_triangular_sq` | triangular closed form of Nicomachus | **dependency reuse** of the above, 4m57s ([phase3-run-002](../metrics/phase3-run-002.md)) |
| `not_prime_pow_four_add_four` | n⁴ + 4 is never prime for n > 1 (Sophie Germain) | direct, by external machine `binto-labs` (#221) — the proof that surfaced gate bug #231 |

### Both load-bearing mechanisms are demonstrated, not just built
- **The chain** (v1.4.0): a target too big for one attempt became a depth-3 tree of 13 lemmas, each
  kernel-verified, recomposed level by level — the parent's proof is literally the composition of
  its sub-lemmas.
- **Compounding** (v1.5.0): a merged lemma was *imported and invoked* by a later proof, turning a
  difficulty-2 target into a two-line module — plus the chain's four recompositions are four more
  instances of the same reuse mechanism.

### It survives its operating environment
Born from three real quota outages in one day: the **infrastructure-failure guard** (ADR-016 — a
fast-failed call + failed health probe is never read as goal evidence) and the **supervisor**
(ADR-017 — exponential backoff, PR hygiene, claim guard). The third outage was absorbed with **zero
queue corruption and no human intervention**; the first two had each demoted a whole goal tree and
needed manual repair. The **progressive effort ladder** (ADR-015) spends deep reasoning only where a
cheaper attempt failed — 11 of 13 run-001 proofs closed at the cheapest rung.

### It is hardened against the external review
[Issue #190](https://github.com/agenticsnz/unsorry/issues/190)'s CRITICAL (goal tampering),
HIGH (workflow self-protection), MEDIUM (unpinned actions) and LOW (audit corpus) findings are all
addressed — goal-statement immutability (ADR-018), SHA-pinned actions + CODEOWNERS + a settings
checklist (ADR-019), and an `opaque` audit fixture. The review's own verdict — "serious,
well-engineered work… README's framing is accurate" — and its one stale finding ("decomposition
never completed", true when written, closed by run-001) are recorded honestly.

### The public-good path is built and self-starting
The **upstream pipeline** (ADR-020) scans nightly for packet-eligible lemmas (proved +
absence-verified + unpacketed), re-checks absence at mathlib HEAD, and auto-opens a sponsor packet
PR — a `git apply`-able patch with the human author's header, gate evidence, a paste-ready factual
AI disclosure, and an explicit rewrite-in-own-words boundary (mathlib forbids LLM-written
conversation). The first run produced **9 packets**, and the first of them (Nicomachus) is stamped
**kernel-verified at mathlib HEAD**. The last mechanical mile is now one command (ADR-021,
v1.6.1): `python3 -m tools.upstream.raise_pr` turns a HEAD-verified packet into a **draft**
mathlib PR — clone, apply, push, pre-filled factual disclosure — with the policy boundary enforced
in code: it refuses without the sponsor's `--understood` attestation, refuses an unverified
packet, opens drafts only, and writes no review words. The irreducibly human stages — understand
the proof, ask Zulip, write the narrative, face review — stay human
([docs/upstreaming.md](../upstreaming.md)). The model is policy-compliant by construction:
machine prepares, the human sponsor (signed up) owns every word the community reads.

### It is no longer a single machine
Merged work now comes from the agent swarm, the maintainer, **and three external contributor
machines** (`mac-158f`, `binto-labs`, `ohdearquant`) — proving lemmas and fixing tooling through
the same gates, with no coordination beyond the claims branch. One of those machines
(`binto-labs`) contributed the fifth headline result above and, in doing so, found the gate's
first false negative — external traffic is now hardening the gates, not just adding lemmas. The
contributor path is documented (`CONTRIBUTING.md`, v1.6.2), and the Apache-2.0 `LICENSE` file
that the README and every packet's copyright header already referenced now actually exists in
the tree.

## The honest limits (unchanged where they were honest)

- **Every result is elementary.** The decomposition in run-001 was *forced* by a strangled budget;
  whether decomposition beats one-shotting on a genuinely hard target is untested. This is the
  central open question.
- **Reuse is depth-1.** Run-002 was one declared dependency edge; a chosen *result* reached through
  a deep dependency tree routed bottom-up is not yet done.
- **Nothing is upstreamed.** The pipeline is built and the packets are ready; mathlib may decline
  both candidates — a valid, recorded outcome that still validates the machinery.
- **One known operational bug:** a claim race past `PROVE_CLAIM_CAP=1` is real but unreproduced;
  its cost is bounded by PR dedupe, not eliminated.
- **The gate can fail closed.** #231 blocked a correct proof; nothing unsound was admitted. The
  canary pins the one binder shape that bit — a different statement shape tripping a different
  lint would be found the same way: by blocking a contributor, not by polluting the library.
- **Absence is a grep pre-filter,** not a proof; a result proved today could be upstreamed by
  mathlib tomorrow — the recorded revision makes that detectable, not impossible.

## Where it goes next

The frontier is the difficulty ceiling: a target a working mathematician would call non-trivial,
reached by a queue that genuinely compounds at depth — and a first lemma actually merged into the
commons. Everything built so far is the apparatus for asking that question honestly.

---
*Generated as part of the v1.6.0 release; updated the same day through v1.6.2 (the fifth result
#221, the #231 gate fix #225 and canary #233, the ADR-021 sponsor helper #229,
CONTRIBUTING/LICENSE #232). Verified against the repository at the time of the update: 40 library
modules, 39 proved goals (one of them the permanent binder-shape canary), ~195 merged PRs, 9
upstream-eligible lemmas (the fifth result awaits the nightly packet scan), ADRs through ADR-021,
three green red-team rounds.*