# Design — Machine-Checked Confluence of φ-Calculus Normalization The durable design record and provenance for this proof. It captures **why** the proof is built the way it is: the goal, the artifacts it relates to, every load-bearing design decision and its rationale, the proof strategy and architecture, the faithfulness ("does this really mirror the paper?") evidence, and the deliberate scope boundaries. The proof is **complete** (the eleven `phino` normalization rules are confluent on well-formed terms; `#print axioms PhiConfluence.confluence` = `[propext, Quot.sound]`). This document is therefore a finished-project record, not a roadmap. Two companions own the other halves: the exact theorem statement, the rule⇄constructor table, and the eight deviations live in **`M0-spec.md`** (the frozen contract); the module layout and the "what to run" quickstart live in **`README.md`**. --- ## 1. Goal A fresh, machine-checked proof in Lean 4 that the normalization (reduction) rules of the current φ-calculus are **confluent (Church–Rosser)** — the order in which rules fire never changes the final result. The proof is about the calculus **as implemented by `phino`** (the reference manipulator), which is the same calculus the paper renders. It supersedes the earlier minimal/extended development in this repository's history, which proved confluence only for the minimal calculus and never finished the EO-matching variant. --- ## 2. The artifacts and how they relate | Artifact | Repo / location | Role | |---|---|---| | The paper | `objectionary/calculus-paper` | The informal spec. **Its Fig. 4 rules and Appendix-A example reductions are auto-generated by `phino`** (`\iexec{phino explain --normalize}`, `phino rewrite`), checked in its CI. | | `phino` | `objectionary/phino` | The **authoritative, executable** rule spec. Rules live in `resources/*.yaml`. | | Earlier proof | this repo's git history | The retired Minimal/Extended development; reused only as a **technique template** (Takahashi parallel-reduction skeleton, `Record` design). | | This project | the present repo (`objectionary/proof`) | The new Lean 4 proof. | **Key structural fact:** because the paper's rules and examples are generated *from phino*, "match the paper" reduces to "match phino", which is executable and testable. (Two phino bugs surfaced and were fixed during this work: the `alpha`-ordinal asset-counting, phino #749, and a duplicate-`ρ` printer bug, phino #748 — our model already matched the paper-faithful side of both. See `M0-spec.md`.) --- ## 3. The calculus The reduction relation `⟶` is the compatible (congruence) closure of the **eleven** `phino` rules: `dot, copy, alpha, phi, stay, over, stop, null, miss, dd, dc`, over `Term` (formations `⟦B⟧`, applications `e(τ↦e')`, dispatches `e.τ`, the locators `Φ`/`ξ`, and the terminator `⊥`). The full pattern→result table, side conditions, and the eight deviations from a naive "paper Fig. 4 verbatim" reading are in **`M0-spec.md`** (not duplicated here). Two facts from that table shape everything below: * The system is **non-terminating** (`⟦x↦y,y↦x⟧.x` diverges), so Newman's lemma is unavailable — confluence must come from the diamond, which needs no termination. * `dot`/`copy` carry **`nf` guards** (and `copy` a `ξ`-freeness guard), so each rule is single-path. This removes the `Rcopy`↔`Rdot` ordering ambiguity that blocked the old proof, at the cost of making `⟶` a *conditional* relation with **non-monotone** guards — the central proof subtlety. --- ## 4. Key decisions and rationale | Decision | Rationale | |---|---| | **Source of truth = phino + calculus-paper LaTeX source** (never the arXiv PDF) | The PDF is a stale build; the repo source regenerates rules from current phino. Paper-first, phino as its executable interpretation. | | **Prove via parallel reduction → diamond → `Relation.church_rosser`** | The system is **non-terminating**, so Newman's lemma is unavailable. The Tait–Martin-Löf/Takahashi method needs no termination. | | **Build on mathlib's `Prop`-valued `Relation` API** | Idiomatic, least code; `church_rosser` already proves "diamond ⇒ confluent-closure". | | **Named attributes** (`φ, ρ, αᵢ, label`) | Matches the paper and phino; makes contextualization `C` and the `αᵢ` ordinals natural. (De Bruijn would obscure named-attribute semantics.) | | **`nf` defined structurally** (like phino's `isNF`), not as "no `Step`" | Avoids an import cycle and the big-step circularity; a *local* property, well-defined regardless of confluence. A single `nf` encodes every rule's redex, including `copy`'s `ξ`-free guard in its void-application case. | | **Executable `reduceStep` separate from relational `Step`, linked by `reduce_sound`** | Lets the demo *run* and print traces, while a proof certifies the printed steps are genuine `Step`s — a tighter link than phino has (its Haskell engine is not proven against its YAML). | | **`copy` is `ξ`-free and local** (no `scope`/`contextualize`) | Under `ξ`-freeness, `contextualize(e₁, ctx) = e₁` is **proved** (`contextualize_eq_self`, `Parallel.lean`, `[propext]`), so the paper figure's `C(e₁⊳scope)` equals our slot value `e₁` for any `scope`; dropping `scope`/`contextualize` is exact, not a weakening, and keeps `copy` inside the de-risked fragment. (M0-spec dev. #7; the author is correcting the paper + phino to match.) | | **Headline is `WF`-scoped** | Two clauses, different weight. **`legalKey`** (no `αᵢ` formation key) is *necessary for confluence*: the counterexample that makes unconditional `Confluent Step` false once `alpha` is present is a `legalKey` violation (a malformed `αᵢ`-keyed void slot lets `alpha`/`over` fork to never-joining object/`⊥`), and the diamond proof consumes exactly this clause. **`Nodup`** (unique keys) is *faithfulness*, not a shown necessity — no duplicate-key non-joinable fork is known and the diamond binds-but-never-uses it; it matches Def. Binding and makes first-match `lookup` agree with phino's matcher. Both re-impose the paper's own grammar. (M0-spec, "Why `WF`-scoped"; dev. #6/#8.) | --- ## 5. Proof strategy — the spine 1. Define single-step `Step` (`⟶`): the eleven rules + the four congruences, phino-faithful. 2. Define **parallel reduction** `Par` (contracts any set of redexes at once; congruence built in) and a total **complete development** `devel`. 3. Prove `Step ⊆ Par ⊆ Step∗`, so `ReflTransGen Step = ReflTransGen Par` (`redMany_eq`). 4. Prove the **Takahashi triangle** `WF e → Par e u → Par u (devel e)`, giving the diamond. 5. `Abstract.Diamond.confluent` (via `church_rosser`) turns the diamond into confluence; transport to `Step` (the closures coincide). 6. Define `≡` as convertibility; confluence makes it an `Equivalence` on `{e // WF e}`. The hardest parts, all discharged: the **"`C` commutes with reduction"** lemma (`par_contextualize_ctx`), the **non-monotone `nf` guards** threaded through parallel reduction, and the **`ρ`-feedback** of `dot` (the source of non-termination). --- ## 6. Proof architecture (the diamond layer) Three structural decisions carry the proof; each load-bearing shape was validated `[propext]`-clean against Lean `v4.30.0` + mathlib. 1. **Well-formedness as a `Prop` predicate.** Mutual `WF`/`WFB` (`WellFormed.lean`, importing only `Syntax`) carried as a *hypothesis* on the diamond/headline — **not** an indexed `Binding` type (which would force re-deriving `Syntax`/`Step`/`Attributes`). It encodes dev. #6 (`(domain bs).Nodup`) and dev. #8 (no positional `αᵢ` key). Its preservation engine (`domain_append`/`domain_set`, mirroring `lookup_set_*`) rests on a single fact: reducing a binding's *value* never changes the `domain`, so `WF` survives reduction (`WF.step`/`WF.par`, `Preservation.lean`). 2. **Parallel reduction `Par`** = mutual `Par`/`ParB` with a **bespoke cons-structured `ParB`** (the `List.Forall₂ ParBind` route is kernel-rejected as a nested inductive carrying `Par`). The append-index lives only on `Step.congForm`, crossed once by `parB_set`. Mutual induction goes through `induction h using Par.rec (motive_2 := …)` with `motive_1` inferred. The cons-lift `redMany_form_cons` is proved by `induction … generalizing` + `form_step_inv` (*not* `ReflTransGen.lift` — unsound, because `stay` turns app→form). 3. **The Takahashi guard-on-developed-subterm variant** (this, *not* Huet single-step on `Step`). With `ξ`-freeness dropped from `copy`, `dot`'s result places the developed formation `⟦bs'⟧` in *both* the `contextualize` receiver and the `ρ`-argument, so reduction is **duplicating** and a `dot`-vs-sibling fork needs >1 step on both sides, breaking Huet. One `Par` step contracts all copies at once, so the triangle needs no non-duplication argument. The parallel-step constructors check `nf` on the *developed* child `e₁'`, not the original `e₁` (e.g. `Par.dot` reads the guard off `develB bs` via `lookup`, keeping `devel` structural) — this restores a total `devel` and the triangle. **WF-relativization (the bridge).** `church_rosser` / `Abstract.Diamond.confluent` demand an *unconditional* strip `∀ a b c, r a b → r a c → ∃ d, …`. A `WF`-hypothesised strip does not discharge it for the full calculus (off-`WF`, the `alpha`-vs-`over` diamond genuinely fails). The fix: relativize to `ParWF a b := WF a ∧ Par a b`, prove `WF.par` (so `ParWF` stays `WF`-rooted), whence the strip for `ParWF` holds *unconditionally* (it is vacuous when `¬WF a`); feed THAT to the generic diamond to get `Confluent (ReflTransGen ParWF)`, then bridge back to `WF`-rooted `ReflTransGen Step` via `redMany_eq`. `Diamond ParWF` needs `WF` of the *source* only, so no new `Abstract` lemma is required (`Diamond.lean`). --- ## 7. Faithfulness — "does this really mirror the paper?" Lean's kernel guarantees **soundness** (the proof establishes the statement). It does *not* guarantee **adequacy** (the statement/definitions capture φ-calculus) — that gap is irreducible when one side is an informal paper. It is shrunk and cross-checked from several independent directions: * **Tiny, human-readable trusted surface.** Only `Syntax`, `Step`, `↝∗`, and the `confluence` statement must be read and endorsed (a few dozen lines — the M0 spec). * **Rule transcription vs phino.** The eleven-rule *display* table (`Rules.lean`) is **generated** from phino's `resources/*.yaml` by `gen-rules.py` and pinned by the `rules-in-sync` CI job, so the displayed rules cannot drift from phino/the paper. The *proof relation* `Step` is hand-written (constructors are needed for case-analysis) and pinned to phino **behaviorally** by `difftest`. * **Differential testing against phino.** `Difftest.lean` + `scripts/difftest.sh` normalize each program with both `phino rewrite --normalize` and our reducer and assert equality — **20/20**, exercising all eleven rules (incl. `alpha`'s domain-ordinal skip of `Δ`/`λ` assets) on `⊥`-collapse *and* real formation results. This is the same `phino rewrite` mechanism the paper's Appendix A is generated from, so it doubles as reproducing the paper's examples (the fragment-fit subset; the rest need `λ`/`Δ` dataization — see §10). * **`#print axioms` CI gate.** The headline results (`confluence`, `conv_equivalence`, `reduce_sound`, `par_triangle`, `parWF_diamond`, `nf_iff`) are gated to depend only on `propext`/`Quot.sound` — never `sorryAx`, `Classical.choice`, or `native_decide`. ### De-risking verdict (the structural argument behind the kernel proof) Before investing in the Lean diamond, the central open question — *does the full eleven-rule calculus even have the diamond, given the non-monotone `nf` guards?* — was stressed both empirically and by structural analysis. The empirical probe (`scripts/confluence-probe.sh`) is a **fixed corpus of 7 hand-crafted programs plus the paper's Appendix-A examples**, each run under rule-order `--shuffle` and an off-strategy single-rule redex-position check; no divergence was found. (This is a *fixed* corpus, not random fuzzing — there is no term generator.) The analysis concluded confluence holds, for reasons that the mechanized proof then made rigorous: * **No genuine root critical pairs.** The two LHS-overlap groups are mutually exclusive by guards along root-stable axes — dispatch `⟦B⟧.τ` {dot, null, phi, stop, dd} by slot-state (attached/void/absent) × `φ`-membership × subject-head; formation-app `⟦B⟧(τ↦e)` {copy, alpha, over, stay, miss, dc} by slot-state × name-kind (positional `αᵢ` with `index = |B₁|` → alpha vs named → copy vs `ρ` → stay) × subject-head. This rests on three facts: duplicate keys are barred (a slot is in exactly one state), `α`-names cannot be formation slots, and `index()` is defined only for `αᵢ` — exactly the `WF` invariants. * **All non-root critical pairs join.** The *duplication* diamond (`dot` relocates and contextualizes a redex-bearing sibling into the `ρ`-context — reduce before/after → identical NF) and the *discard* diamond (over/null/stop/miss/dc/dd erase to the absorbing `⊥` regardless of inner activity — which is exactly why they need no `nf` guard, and why Church–Rosser holds *despite* non-termination). * **The non-monotone guard is real but benign.** Reducing inside `e₁` can expose a fresh outer `dot`/`copy` redex, so the *naive* maximal development and one-step triangle break. But the non-monotonicity *serializes* (no competing root redex exists while `¬nf(e₁)`) and is monotone in the *destruction* direction (no present guarded redex is destroyed by a sibling contraction) — which is precisely what the guard-on-developed-subterm variant (§6) exploits. Routes considered and rejected: Huet single-step strong confluence (broken by `dot`'s duplication once `ξ`-freeness is dropped), orthogonality (non-left-linear LHS repeat `τ`), Hindley–Rosen (no mathlib commutation API), decreasing diagrams (no Lean port, unneeded). ### The faithfulness loop (every arrow CI-checked or kernel-proved) ``` paper ──(phino explain/rewrite, calculus-paper CI)──▶ phino rules + example reductions ▲ │ │ (CI diff, this project) │ ▼ └────────────────────────────────── our printed rules + our reductions (the demo) │ (Lean: reduce_sound, reduceStep ⊆ Step) ▼ relation Step │ (Lean kernel: no sorry/axiom) ▼ confluence theorem ``` ### Trusted Computing Base (TCB) Exactly: (a) Lean's kernel; (b) the definitions + the theorem statement; (c) for the demo/CI, the term printer/parser bridge and phino. The proof adds nothing to (a)–(c). --- ## 8. The runnable demo and CI **Demo (`lake exe demo`):** prints the normalization rules in the paper's Unicode notation and reduction traces of example φ-programs computed by our own `reduceStep`. `reduce_sound` proves every printed step is a genuine `Step`, so the runnable behavior is tied to the relation the confluence theorem governs. **Single-source rules.** The displayed `Rules.lean` is generated by `scripts/gen-rules.py` from phino's `resources/*.yaml` — the same source the paper's Fig. 4 is rendered from — so the displayed rules cannot drift from phino/the paper by construction. **CI** (single-purpose workflows, each on push to `master` + PRs, tracking phino-latest): * **build** — install elan, `lake exe cache get`, `lake build`; then a `sorry`/`admit`/ `axiom` source gate **and** a `#print axioms` gate on the headline results. * **rules-in-sync** — `scripts/regen-rules.sh` clones `objectionary/phino` and regenerates `Rules.lean`; `git diff --exit-code` fails if our committed table has drifted from phino. * **difftest** — install the `phino-latest` binary, build `difftest`, run `scripts/difftest.sh`; fails if our reducer disagrees with phino. The two Lean workflows share a composite action (`.github/actions/setup-lean`) that caches `~/.elan` and `.lake` so Lean/mathlib aren't re-downloaded each run. --- ## 9. The implicit parent (`ρ`) — modelled (`Canonical.lean`) The paper (`foundations.tex`, Def. Parent) and phino treat **every formation as carrying a parent `ρ`, void until set** (a `this`-pointer, void until a method is called). The paper *grammar* (`syntax.tex`) does **not** mandate `ρ` — a formation may be written `⟦⟧` — so the parent is supplied **semantically**: phino materialises a `ρ↦∅` **appended at the end** of every formation that lacks one (verified: `⟦x↦Φ⟧` ⟶ `⟦x↦Φ, ρ↦∅⟧`, recursively; explicit `ρ` is kept in place, never duplicated). Omitting it would diverge from phino *and* the paper three ways: rule choice (`⟦⟧(ρ↦Φ)` is `copy→⟦ρ↦Φ⟧` for phino but `miss→⊥` without `ρ`), `alpha` indexing (its positional index counts the trailing `ρ`: `⟦x↦∅⟧(~1↦Φ) ⟶ ⟦x↦∅, ρ↦Φ⟧`), and normal-form shape (every formation NF carries `ρ↦∅`). **We model it as a canonicalisation** (`Canonical.lean`), faithfully: * `canon` / `canonB` — the injection: recursively append `ρ↦∅` to every `ρ`-less formation (explicit `ρ` kept in place), exactly phino's behaviour. * `Canonical` / `CanonicalB` — the invariant "every formation carries a `ρ`". * `canon_canonical` (`[propext]`) — `canon` always establishes it; `wf_canon` (`[propext, Quot.sound]`) — `canon` preserves `WF` (it adds `ρ` only when absent → no duplicate key, and `ρ` is legal); **`step_canonical`** (`[propext]`) — **reduction preserves `Canonical`**, i.e. phino's parent-everywhere term space is *closed* under our `Step`. Because canonical terms are `WF`, the headline `confluence : WF e → …` already governs them — so the result is confluence of **phino's actual calculus**, not a `ρ`-free fragment. With `alpha` now able to rename onto a void `ρ` (then `copy` fills it), `difftest` matches phino on real formation results, including the cases that diverged before `canon` (`⟦⟧(ρ↦Φ)`, `⟦x↦∅⟧(α1↦Φ)`, bare value formations). This was **not** a threat to confluence (the implicit `ρ` is a representational default, not a rule; routing `ρ`-application to `copy` removes the `miss`-on-`ρ` case and adds no critical pair) — it was a term-level *fidelity* obligation, now discharged both formally (the three lemmas) and behaviourally (`difftest` 20/20). ## 10. Scope boundaries * **`WF`-scoped, by design.** The headline carries `WF e` = `legalKey` ∧ `Nodup`. Only `legalKey` (no `αᵢ` formation key) is *necessary for confluence* (the `alpha`-vs-`over` counterexample is a `legalKey` violation; the diamond proof uses exactly this clause). `Nodup` (unique keys) is a *faithfulness* clause — matches Def. Binding, makes first-match `lookup` agree with phino's matcher — carried but unused by the diamond argument (§4, M0-spec "Why `WF`-scoped"). (phino once emitted a non-`Nodup` term — `⟦ρ↦⟦⟧⟧ ⟶ ⟦ρ↦⟦ρ↦∅⟧, ρ↦∅⟧` (duplicate `ρ`) — a printer bug contradicting its own Def. Binding; fixed in phino **#748**. Our model already gave the WF answer `⟦ρ↦⟦ρ↦∅⟧⟧`, so we now agree.) * **`λ`/`Δ` atoms are outside the normalization relation `⟶`.** They have no rule among the eleven. Their reduction is the paper's *separate* **Morphing** (`fig:morphing` — `Mlambda` calls a host atom by value) and **Dataization** (`fig:dataization`) partial functions: stateful, side-effecting, host-dependent — *functions*, not term rewriting (so the relevant property there is determinism, not confluence). Confluence of `⟶` neither needs nor includes them; `λ`/`Δ` are represented as inert atoms (`Binding.lambda`/`Binding.delta`) that never fire. This is the reason the paper's Appendix-A examples are validated through `difftest` (against a merged `runtime.phi`) rather than re-encoded in Lean. * **Small-step `copy`, not the old big-step `Rcopy`.** The current paper's `copy` is the small-step `nf`-guarded form we prove; the superseded big-step form (arXiv v9, which presupposed unique normal forms = confluence) is not modelled. * **Encoding looseness (low, intentional).** `Step.alpha`/`Step.copy` carry no per-rule `Nodup`/`legalKey` premise at the constructor, relying on the headline's `WF` to exclude malformed redexes globally (matching phino, which checks no such premise). Likewise `WF`'s `Nodup` is over `domain` (assets excluded), so it does not forbid duplicate `λ`/`Δ` assets — fine while assets are inert, a latent looseness to tighten only if asset reduction is modelled. --- ## Gold-standard future tooling (optional, not required) Mechanically generating the Lean `Step` *relation* (not just the display table) from phino's YAML, so the proof object and the paper's figure share a single source. Today `Step` is hand-written and pinned to phino behaviorally by `difftest`; this would make the pin structural. Optional — it does not affect the proof's validity.