<!--
SPDX-FileCopyrightText: Copyright (c) 2026 Objectionary.com
SPDX-License-Identifier: MIT
-->

# Design — Machine-Checked Confluence of φ-Calculus Normalization

The durable design record and provenance for this proof. It captures **why** the proof
is built the way it is: the goal, the artifacts it relates to, every load-bearing design
decision and its rationale, the proof strategy and architecture, the faithfulness
("does this really mirror the paper?") evidence, and the deliberate scope boundaries.

The proof is **complete** (the eleven `phino` normalization rules are confluent on
well-formed terms; `#print axioms PhiConfluence.confluence` = `[propext, Quot.sound]`).
This document is therefore a finished-project record, not a roadmap. Two companions own
the other halves: the exact theorem statement, the rule⇄constructor table, and the eight
deviations live in **`M0-spec.md`** (the frozen contract); the module layout and the
"what to run" quickstart live in **`README.md`**.

---

## 1. Goal

A fresh, machine-checked proof in Lean 4 that the normalization (reduction) rules of the
current φ-calculus are **confluent (Church–Rosser)** — the order in which rules fire never
changes the final result. The proof is about the calculus **as implemented by `phino`**
(the reference manipulator), which is the same calculus the paper renders.

It supersedes the earlier minimal/extended development in this repository's history, which
proved confluence only for the minimal calculus and never finished the EO-matching variant.

---

## 2. The artifacts and how they relate

| Artifact | Repo / location | Role |
|---|---|---|
| The paper | `objectionary/calculus-paper` | The informal spec. **Its Fig. 4 rules and Appendix-A example reductions are auto-generated by `phino`** (`\iexec{phino explain --normalize}`, `phino rewrite`), checked in its CI. |
| `phino` | `objectionary/phino` | The **authoritative, executable** rule spec. Rules live in `resources/*.yaml`. |
| Earlier proof | this repo's git history | The retired Minimal/Extended development; reused only as a **technique template** (Takahashi parallel-reduction skeleton, `Record` design). |
| This project | the present repo (`objectionary/proof`) | The new Lean 4 proof. |

**Key structural fact:** because the paper's rules and examples are generated *from
phino*, "match the paper" reduces to "match phino", which is executable and testable.
(Two phino bugs surfaced and were fixed during this work: the `alpha`-ordinal asset-counting,
phino #749, and a duplicate-`ρ` printer bug, phino #748 — our model already matched the
paper-faithful side of both. See `M0-spec.md`.)

---

## 3. The calculus

The reduction relation `⟶` is the compatible (congruence) closure of the **eleven**
`phino` rules: `dot, copy, alpha, phi, stay, over, stop, null, miss, dd, dc`, over `Term`
(formations `⟦B⟧`, applications `e(τ↦e')`, dispatches `e.τ`, the locators `Φ`/`ξ`, and the
terminator `⊥`). The full pattern→result table, side conditions, and the eight deviations
from a naive "paper Fig. 4 verbatim" reading are in **`M0-spec.md`** (not duplicated here).

Two facts from that table shape everything below:

* The system is **non-terminating** (`⟦x↦y,y↦x⟧.x` diverges), so Newman's lemma is
  unavailable — confluence must come from the diamond, which needs no termination.
* `dot`/`copy` carry **`nf` guards** (and `copy` a `ξ`-freeness guard), so each rule is
  single-path. This removes the `Rcopy`↔`Rdot` ordering ambiguity that blocked the old
  proof, at the cost of making `⟶` a *conditional* relation with **non-monotone** guards —
  the central proof subtlety.

---

## 4. Key decisions and rationale

| Decision | Rationale |
|---|---|
| **Source of truth = phino + calculus-paper LaTeX source** (never the arXiv PDF) | The PDF is a stale build; the repo source regenerates rules from current phino. Paper-first, phino as its executable interpretation. |
| **Prove via parallel reduction → diamond → `Relation.church_rosser`** | The system is **non-terminating**, so Newman's lemma is unavailable. The Tait–Martin-Löf/Takahashi method needs no termination. |
| **Build on mathlib's `Prop`-valued `Relation` API** | Idiomatic, least code; `church_rosser` already proves "diamond ⇒ confluent-closure". |
| **Named attributes** (`φ, ρ, αᵢ, label`) | Matches the paper and phino; makes contextualization `C` and the `αᵢ` ordinals natural. (De Bruijn would obscure named-attribute semantics.) |
| **`nf` defined structurally** (like phino's `isNF`), not as "no `Step`" | Avoids an import cycle and the big-step circularity; a *local* property, well-defined regardless of confluence. A single `nf` encodes every rule's redex, including `copy`'s `ξ`-free guard in its void-application case. |
| **Executable `reduceStep` separate from relational `Step`, linked by `reduce_sound`** | Lets the demo *run* and print traces, while a proof certifies the printed steps are genuine `Step`s — a tighter link than phino has (its Haskell engine is not proven against its YAML). |
| **`copy` is `ξ`-free and local** (no `scope`/`contextualize`) | Under `ξ`-freeness, `contextualize(e₁, ctx) = e₁` is **proved** (`contextualize_eq_self`, `Parallel.lean`, `[propext]`), so the paper figure's `C(e₁⊳scope)` equals our slot value `e₁` for any `scope`; dropping `scope`/`contextualize` is exact, not a weakening, and keeps `copy` inside the de-risked fragment. (M0-spec dev. #7; the author is correcting the paper + phino to match.) |
| **Headline is `WF`-scoped** | Two clauses, different weight. **`legalKey`** (no `αᵢ` formation key) is *necessary for confluence*: the counterexample that makes unconditional `Confluent Step` false once `alpha` is present is a `legalKey` violation (a malformed `αᵢ`-keyed void slot lets `alpha`/`over` fork to never-joining object/`⊥`), and the diamond proof consumes exactly this clause. **`Nodup`** (unique keys) is *faithfulness*, not a shown necessity — no duplicate-key non-joinable fork is known and the diamond binds-but-never-uses it; it matches Def. Binding and makes first-match `lookup` agree with phino's matcher. Both re-impose the paper's own grammar. (M0-spec, "Why `WF`-scoped"; dev. #6/#8.) |

---

## 5. Proof strategy — the spine

1. Define single-step `Step` (`⟶`): the eleven rules + the four congruences, phino-faithful.
2. Define **parallel reduction** `Par` (contracts any set of redexes at once; congruence
   built in) and a total **complete development** `devel`.
3. Prove `Step ⊆ Par ⊆ Step∗`, so `ReflTransGen Step = ReflTransGen Par` (`redMany_eq`).
4. Prove the **Takahashi triangle** `WF e → Par e u → Par u (devel e)`, giving the diamond.
5. `Abstract.Diamond.confluent` (via `church_rosser`) turns the diamond into confluence;
   transport to `Step` (the closures coincide).
6. Define `≡` as convertibility; confluence makes it an `Equivalence` on `{e // WF e}`.

The hardest parts, all discharged: the **"`C` commutes with reduction"** lemma
(`par_contextualize_ctx`), the **non-monotone `nf` guards** threaded through parallel
reduction, and the **`ρ`-feedback** of `dot` (the source of non-termination).

---

## 6. Proof architecture (the diamond layer)

Three structural decisions carry the proof; each load-bearing shape was validated
`[propext]`-clean against Lean `v4.30.0` + mathlib.

1. **Well-formedness as a `Prop` predicate.** Mutual `WF`/`WFB` (`WellFormed.lean`,
   importing only `Syntax`) carried as a *hypothesis* on the diamond/headline — **not** an
   indexed `Binding` type (which would force re-deriving `Syntax`/`Step`/`Attributes`).
   It encodes dev. #6 (`(domain bs).Nodup`) and dev. #8 (no positional `αᵢ` key). Its
   preservation engine (`domain_append`/`domain_set`, mirroring `lookup_set_*`) rests on a
   single fact: reducing a binding's *value* never changes the `domain`, so `WF` survives
   reduction (`WF.step`/`WF.par`, `Preservation.lean`).

2. **Parallel reduction `Par`** = mutual `Par`/`ParB` with a **bespoke cons-structured
   `ParB`** (the `List.Forall₂ ParBind` route is kernel-rejected as a nested inductive
   carrying `Par`). The append-index lives only on `Step.congForm`, crossed once by
   `parB_set`. Mutual induction goes through `induction h using Par.rec (motive_2 := …)`
   with `motive_1` inferred. The cons-lift `redMany_form_cons` is proved by
   `induction … generalizing` + `form_step_inv` (*not* `ReflTransGen.lift` — unsound,
   because `stay` turns app→form).

3. **The Takahashi guard-on-developed-subterm variant** (this, *not* Huet single-step on
   `Step`). With `ξ`-freeness dropped from `copy`, `dot`'s result places the developed
   formation `⟦bs'⟧` in *both* the `contextualize` receiver and the `ρ`-argument, so
   reduction is **duplicating** and a `dot`-vs-sibling fork needs >1 step on both sides,
   breaking Huet. One `Par` step contracts all copies at once, so the triangle needs no
   non-duplication argument. The parallel-step constructors check `nf` on the *developed*
   child `e₁'`, not the original `e₁` (e.g. `Par.dot` reads the guard off `develB bs` via
   `lookup`, keeping `devel` structural) — this restores a total `devel` and the triangle.

**WF-relativization (the bridge).** `church_rosser` / `Abstract.Diamond.confluent` demand
an *unconditional* strip `∀ a b c, r a b → r a c → ∃ d, …`. A `WF`-hypothesised strip does
not discharge it for the full calculus (off-`WF`, the `alpha`-vs-`over` diamond genuinely
fails). The fix: relativize to `ParWF a b := WF a ∧ Par a b`, prove `WF.par` (so `ParWF`
stays `WF`-rooted), whence the strip for `ParWF` holds *unconditionally* (it is vacuous
when `¬WF a`); feed THAT to the generic diamond to get `Confluent (ReflTransGen ParWF)`,
then bridge back to `WF`-rooted `ReflTransGen Step` via `redMany_eq`. `Diamond ParWF` needs
`WF` of the *source* only, so no new `Abstract` lemma is required (`Diamond.lean`).

---

## 7. Faithfulness — "does this really mirror the paper?"

Lean's kernel guarantees **soundness** (the proof establishes the statement). It does *not*
guarantee **adequacy** (the statement/definitions capture φ-calculus) — that gap is
irreducible when one side is an informal paper. It is shrunk and cross-checked from several
independent directions:

* **Tiny, human-readable trusted surface.** Only `Syntax`, `Step`, `↝∗`, and the
  `confluence` statement must be read and endorsed (a few dozen lines — the M0 spec).
* **Rule transcription vs phino.** The eleven-rule *display* table (`Rules.lean`) is
  **generated** from phino's `resources/*.yaml` by `gen-rules.py` and pinned by the
  `rules-in-sync` CI job, so the displayed rules cannot drift from phino/the paper. The
  *proof relation* `Step` is hand-written (constructors are needed for case-analysis) and
  pinned to phino **behaviorally** by `difftest`.
* **Differential testing against phino.** `Difftest.lean` + `scripts/difftest.sh` normalize
  each program with both `phino rewrite --normalize` and our reducer and assert equality —
  **20/20**, exercising all eleven rules (incl. `alpha`'s domain-ordinal skip of `Δ`/`λ` assets) on `⊥`-collapse *and* real formation results. This is the same `phino rewrite` mechanism the
  paper's Appendix A is generated from, so it doubles as reproducing the paper's examples
  (the fragment-fit subset; the rest need `λ`/`Δ` dataization — see §10).
* **`#print axioms` CI gate.** The headline results (`confluence`, `conv_equivalence`,
  `reduce_sound`, `par_triangle`, `parWF_diamond`, `nf_iff`) are gated to depend only on
  `propext`/`Quot.sound` — never `sorryAx`, `Classical.choice`, or `native_decide`.

### De-risking verdict (the structural argument behind the kernel proof)

Before investing in the Lean diamond, the central open question — *does the full eleven-rule
calculus even have the diamond, given the non-monotone `nf` guards?* — was stressed both
empirically and by structural analysis. The empirical probe (`scripts/confluence-probe.sh`) is a
**fixed corpus of 7 hand-crafted programs plus the paper's Appendix-A examples**, each run under
rule-order `--shuffle` and an off-strategy single-rule redex-position check; no divergence was
found. (This is a *fixed* corpus, not random fuzzing — there is no term generator.) The analysis
concluded confluence holds, for reasons that the mechanized proof then made rigorous:

* **No genuine root critical pairs.** The two LHS-overlap groups are mutually exclusive by
  guards along root-stable axes — dispatch `⟦B⟧.τ` {dot, null, phi, stop, dd} by slot-state
  (attached/void/absent) × `φ`-membership × subject-head; formation-app `⟦B⟧(τ↦e)`
  {copy, alpha, over, stay, miss, dc} by slot-state × name-kind (positional `αᵢ` with
  `index = |B₁|` → alpha vs named → copy vs `ρ` → stay) × subject-head. This rests on three
  facts: duplicate keys are barred (a slot is in exactly one state), `α`-names cannot be
  formation slots, and `index()` is defined only for `αᵢ` — exactly the `WF` invariants.
* **All non-root critical pairs join.** The *duplication* diamond (`dot` relocates and
  contextualizes a redex-bearing sibling into the `ρ`-context — reduce before/after →
  identical NF) and the *discard* diamond (over/null/stop/miss/dc/dd erase to the absorbing
  `⊥` regardless of inner activity — which is exactly why they need no `nf` guard, and why
  Church–Rosser holds *despite* non-termination).
* **The non-monotone guard is real but benign.** Reducing inside `e₁` can expose a fresh
  outer `dot`/`copy` redex, so the *naive* maximal development and one-step triangle break.
  But the non-monotonicity *serializes* (no competing root redex exists while `¬nf(e₁)`) and
  is monotone in the *destruction* direction (no present guarded redex is destroyed by a
  sibling contraction) — which is precisely what the guard-on-developed-subterm variant (§6)
  exploits.

Routes considered and rejected: Huet single-step strong confluence (broken by `dot`'s
duplication once `ξ`-freeness is dropped), orthogonality (non-left-linear LHS repeat `τ`),
Hindley–Rosen (no mathlib commutation API), decreasing diagrams (no Lean port, unneeded).

### The faithfulness loop (every arrow CI-checked or kernel-proved)

```
   paper ──(phino explain/rewrite, calculus-paper CI)──▶ phino rules + example reductions
     ▲                                                          │
     │                                              (CI diff, this project)
     │                                                          ▼
     └──────────────────────────────────  our printed rules + our reductions (the demo)
                                                                │
                                                (Lean: reduce_sound, reduceStep ⊆ Step)
                                                                ▼
                                                          relation  Step
                                                                │
                                                  (Lean kernel: no sorry/axiom)
                                                                ▼
                                                       confluence theorem
```

### Trusted Computing Base (TCB)

Exactly: (a) Lean's kernel; (b) the definitions + the theorem statement; (c) for the
demo/CI, the term printer/parser bridge and phino. The proof adds nothing to (a)–(c).

---

## 8. The runnable demo and CI

**Demo (`lake exe demo`):** prints the normalization rules in the paper's Unicode notation
and reduction traces of example φ-programs computed by our own `reduceStep`. `reduce_sound`
proves every printed step is a genuine `Step`, so the runnable behavior is tied to the
relation the confluence theorem governs.

**Single-source rules.** The displayed `Rules.lean` is generated by `scripts/gen-rules.py`
from phino's `resources/*.yaml` — the same source the paper's Fig. 4 is rendered from — so
the displayed rules cannot drift from phino/the paper by construction.

**CI** (single-purpose workflows, each on push to `master` + PRs, tracking phino-latest):

* **build** — install elan, `lake exe cache get`, `lake build`; then a `sorry`/`admit`/
  `axiom` source gate **and** a `#print axioms` gate on the headline results.
* **rules-in-sync** — `scripts/regen-rules.sh` clones `objectionary/phino` and regenerates
  `Rules.lean`; `git diff --exit-code` fails if our committed table has drifted from phino.
* **difftest** — install the `phino-latest` binary, build `difftest`, run
  `scripts/difftest.sh`; fails if our reducer disagrees with phino.

The two Lean workflows share a composite action (`.github/actions/setup-lean`) that caches
`~/.elan` and `.lake` so Lean/mathlib aren't re-downloaded each run.

---

## 9. The implicit parent (`ρ`) — modelled (`Canonical.lean`)

The paper (`foundations.tex`, Def. Parent) and phino treat **every formation as carrying a parent
`ρ`, void until set** (a `this`-pointer, void until a method is called). The paper *grammar*
(`syntax.tex`) does **not** mandate `ρ` — a formation may be written `⟦⟧` — so the parent is supplied
**semantically**: phino materialises a `ρ↦∅` **appended at the end** of every formation that lacks one
(verified: `⟦x↦Φ⟧` ⟶ `⟦x↦Φ, ρ↦∅⟧`, recursively; explicit `ρ` is kept in place, never duplicated).
Omitting it would diverge from phino *and* the paper three ways: rule choice (`⟦⟧(ρ↦Φ)` is
`copy→⟦ρ↦Φ⟧` for phino but `miss→⊥` without `ρ`), `alpha` indexing (its positional index counts the
trailing `ρ`: `⟦x↦∅⟧(~1↦Φ) ⟶ ⟦x↦∅, ρ↦Φ⟧`), and normal-form shape (every formation NF carries `ρ↦∅`).

**We model it as a canonicalisation** (`Canonical.lean`), faithfully:

* `canon` / `canonB` — the injection: recursively append `ρ↦∅` to every `ρ`-less formation (explicit
  `ρ` kept in place), exactly phino's behaviour.
* `Canonical` / `CanonicalB` — the invariant "every formation carries a `ρ`".
* `canon_canonical` (`[propext]`) — `canon` always establishes it; `wf_canon` (`[propext, Quot.sound]`)
  — `canon` preserves `WF` (it adds `ρ` only when absent → no duplicate key, and `ρ` is legal);
  **`step_canonical`** (`[propext]`) — **reduction preserves `Canonical`**, i.e. phino's
  parent-everywhere term space is *closed* under our `Step`.

Because canonical terms are `WF`, the headline `confluence : WF e → …` already governs them — so the
result is confluence of **phino's actual calculus**, not a `ρ`-free fragment. With `alpha` now able to
rename onto a void `ρ` (then `copy` fills it), `difftest` matches phino on real formation results,
including the cases that diverged before `canon` (`⟦⟧(ρ↦Φ)`, `⟦x↦∅⟧(α1↦Φ)`, bare value formations).
This was **not** a threat to confluence (the implicit `ρ` is a representational default, not a rule;
routing `ρ`-application to `copy` removes the `miss`-on-`ρ` case and adds no critical pair) — it was a
term-level *fidelity* obligation, now discharged both formally (the three lemmas) and behaviourally
(`difftest` 20/20).

## 10. Scope boundaries

* **`WF`-scoped, by design.** The headline carries `WF e` = `legalKey` ∧ `Nodup`. Only `legalKey`
  (no `αᵢ` formation key) is *necessary for confluence* (the `alpha`-vs-`over` counterexample is a
  `legalKey` violation; the diamond proof uses exactly this clause). `Nodup` (unique keys) is a
  *faithfulness* clause — matches Def. Binding, makes first-match `lookup` agree with phino's matcher
  — carried but unused by the diamond argument (§4, M0-spec "Why `WF`-scoped"). (phino once emitted
  a non-`Nodup` term — `⟦ρ↦⟦⟧⟧ ⟶ ⟦ρ↦⟦ρ↦∅⟧, ρ↦∅⟧` (duplicate `ρ`) — a printer bug contradicting its
  own Def. Binding; fixed in phino **#748**. Our model already gave the WF answer `⟦ρ↦⟦ρ↦∅⟧⟧`, so we
  now agree.)
* **`λ`/`Δ` atoms are outside the normalization relation `⟶`.** They have no rule among the
  eleven. Their reduction is the paper's *separate* **Morphing** (`fig:morphing` — `Mlambda`
  calls a host atom by value) and **Dataization** (`fig:dataization`) partial functions:
  stateful, side-effecting, host-dependent — *functions*, not term rewriting (so the relevant
  property there is determinism, not confluence). Confluence of `⟶` neither needs nor includes
  them; `λ`/`Δ` are represented as inert atoms (`Binding.lambda`/`Binding.delta`) that never
  fire. This is the reason the paper's Appendix-A examples are validated through `difftest`
  (against a merged `runtime.phi`) rather than re-encoded in Lean.
* **Small-step `copy`, not the old big-step `Rcopy`.** The current paper's `copy` is the
  small-step `nf`-guarded form we prove; the superseded big-step form (arXiv v9, which
  presupposed unique normal forms = confluence) is not modelled.
* **Encoding looseness (low, intentional).** `Step.alpha`/`Step.copy` carry no per-rule
  `Nodup`/`legalKey` premise at the constructor, relying on the headline's `WF` to exclude
  malformed redexes globally (matching phino, which checks no such premise). Likewise `WF`'s
  `Nodup` is over `domain` (assets excluded), so it does not forbid duplicate `λ`/`Δ` assets —
  fine while assets are inert, a latent looseness to tighten only if asset reduction is modelled.

---

## Gold-standard future tooling (optional, not required)

Mechanically generating the Lean `Step` *relation* (not just the display table) from phino's
YAML, so the proof object and the paper's figure share a single source. Today `Step` is
hand-written and pinned to phino behaviorally by `difftest`; this would make the pin
structural. Optional — it does not affect the proof's validity.