# Roadmap — the W-graph to the machine-level VM theorem The open work, organized as a dependency graph of work items (`W*`) whose **end state is the target theorem** `Target.sp1_target_execution` (`SP1Clean/Soundness/TargetVm.lean`): *from a verifying Clean ensemble over the committed boundary, the official LeanRV64D Sail interpreter, run from any state that loads the guest program, reaches the halting `ECALL` with the committed exit code.* The theorem is **stated and its walk induction proved today**; every open gap is a named hypothesis in its `TargetObligations` bundle (see `targetSeams` in that file). **Done when:** every `TargetObligations` field is discharged for a concrete `OperandsBound`, and W1 closes the capstone premise — at that point `sp1_target_soundness` is axiom-clean end-to-end (Sail model + one logUp axiom only). For what is *already proven* and the full trust boundary, see [`release-audit.md`](release-audit.md). For the machine-checked axiom inventory, run `scripts/run_audit.sh` (snapshot: [`snapshots/axiom-ledger.md`](snapshots/axiom-ledger.md)). Legend: `[ ]` open · `[~]` partial · `[x]` done. Effort: S < M < L < XL. **Progress snapshot (2026-06-17).** A sweep down the critical path landed (build green throughout, every new headline theorem axiom-clean modulo the Sail model's own decoder axioms; the 4-`sorry` debt unchanged): **W3** closed end-to-end on the concrete-program path — the *real* `noncomputable` Sail decoder is reduced (`Model/SailDecode.lean`), the decode bound is discharged from Program-bus balance, and `decodedInROM` is proved for a concrete instruction. **W6b** non-vacuity witness done (`FormalModel/Trace/Witness.lean`). **W4a** MemoryGlobalInit provider constructed (`Soundness/MemoryGlobal.lean`). **W2+W7** exact-replay keystone done — `RefinesAt`/`RowEffect` strengthened to exact replay / strict write, `chain_to_refines` re-proved — and the **W2 value-half assembled** (`ValueBound.lean`: `ValueOperandsBound`, the concrete `OperandsBound = decode ∧ value`, and `targetObligations_full` with `bound` discharged). `SailConfigured` strengthened to `isInitialized ∧ machine mode`. Remaining on the critical path: discharge the W2 cross-bus residual `TraceValueBinding`, W5 HALT, W7's `try_step` reduction, glue. --- ## The project debt at a glance — one `sorry` The chip completeness debts (ShiftLeft/ShiftRight/DivRem) are now **all closed** (axiom-clean); the **sole** remaining `sorry` is the capstone-packaging premise (item 4 below). Soundness is `sorry`-free. (Line numbers drift; the declaration names are the stable handles — `scripts/run_audit.sh` gates on exactly this set.) `MulChip.completeness` was closed 2026-06-10 via `MulOperation.spec_populate` (the witnessed `populate` columns satisfy the structural `Spec`); items 1–3 below record the now-closed ShiftLeft/ShiftRight/DivRem completeness work. 1. `ShiftLeftChip.completeness` — `Proofs/Chips/ShiftLeftChip/Formal.lean` — **CLOSED 2026-06-12** (axiom-clean), honest `populate`-style witness + `ProverHint` opcode threading (the `BranchChip.completeness` recipe). 2. `ShiftRightChip.completeness` — `Proofs/Chips/ShiftRightChip/Formal.lean` — **CLOSED 2026-06-12** (axiom-clean), same hint-`populate` recipe as ShiftLeft. 3. `DivRemChip.completeness` — `Proofs/Chips/DivRemChip/Formal.lean` — **CLOSED 2026-06-18** (axiom-clean), via `completeness_driver` in `Proofs/Chips/DivRemChip/Completeness/Driver.lean`. The 13 nested `IsEqualWord`/`IsZero` cols pins were unblocked by *selective non-decomposition* — `attribute [local circuit_norm ↓ 100000] ProvableType.eval_fromElements` keeps those sub-op cols folded so `circuit_proof_start` never explodes them into the intractable nested record (see `docs/agents/proof-patterns.md`). Heartbeats 256M → 64M. 4. `sp1_witness_decode` — `Soundness/SP1GatedVm.lean` — the single isolated capstone premise: the witness → `ChipRow` decode seam (`SP1WitnessDecode`, = **W1b/W1c** below); a packaging premise, not a chip debt. (`sp1_gatedExecution_prereqs` itself is now a *proven* assembly of this seam with the W1a balance translation `sp1_state_balance_of_balancedInteractions`.) Note (census fact): because each chip `circuit` *embeds* its completeness proof as a structure field, items 1–3 also surface as `sorryAx` on `sp1Tables`/`sp1GatedVm`/`sp1_machine_soundness` under `#print axioms`, even though the soundness proofs never consume those fields. Closing items 1–3 is therefore also what makes the *capstone chain's* census clean (together with item 4). --- ## The W-graph ``` W10 TargetVm skeleton (DONE) / | \ W6a GuestProgram (DONE, \ [independent tracks] in skeleton) \ / \ \ W1 close sp1_gatedExecution_prereqs (XL) W6b ELF tool W3 Decode/Fetch W7 try_step ├─ W1a Clean→native balance translation [x] (M, off-path) chips (L) step-lift (XL) ├─ W1b 25-table witness ↔ ChipRow decode (L) \ | └─ W1c isU64 recovery from memory bus (M) W4a MemoryGlobalInit/ | W8 logUp axiom packaging (M–L, after W1) Final single-shard | W9 Clean PR #398 migration [x] (2026-06-10) slice (L) | W11 re-base GatedVm on upstream VmTables (M–L) \ | B1 the 3 completeness sorries (M each) \ | W2 operand binding from memory-bus balance (XL) [+W2b load/store data addresses into RowView (L)] \ | W5 ECALL/HALT chip (M–L) [clk_inc prereq x 2026-06-10] \ | glue: discharge TargetObligations | sp1_target_execution UNCONDITIONAL | W4b multi-shard ShardComposition (XL) [post-target] ``` **Critical path:** W3 → W4a → W2 → W5 → glue (W3 + W4a substantially done, W2's exact-replay keystone done — see the progress snapshot; the live front is the **W2 value-half ↔ W5 ↔ W7 `try_step` reduction**). **W7 runs fully in parallel** and becomes the critical path if the Sail `translateAddr`/fetch reduction in machine mode is heavier than expected — note its decode stage is already reduced (`Model/SailDecode.lean`), and its `RowEffect` target shape (strict write) is now fixed by the W2+W7 keystone. **W1 is on the path of the axiom-clean end-to-end claim but not of the named-hypothesis target theorem** — it merges at the very end (`sp1_target_soundness` inherits its closure automatically). **Clean PR #398 exposure: resolved (2026-06-10).** W9 landed by pinning Clean to the open PR's head SHA (`292b9cc3`, 13 commits ahead / 0 behind the old pin) rather than waiting for the merge — the custom gating is gone and W1a now works directly against the upstream primitives (and gains `InteractionsWellFormed` + the new `Air/Balance` gated lemmas for free). Residual: a small re-pin to the merge commit when the PR lands on `main`. --- ## Work items ### W10 — the target-theorem skeleton `[x]` (2026-06-10) `Soundness/TargetVm.lean`: `GuestProgram`, `IsInitialState` (a load *relation*, with the `SailConfigured` residue seam), `SailStep`/`SailChain` over the official `try_step`, `SP1Halted` (observed one step before the halting ECALL), `SP1TargetPublicIO` (+`exit_code`, `toLegacy`), `RefinesAt`/`RowEffect`, the `TargetObligations` gap bundle, and the **proved** walk induction `sp1_target_execution` + the `sp1_machine_soundness`-routed corollary `sp1_target_soundness`. ### W6 — the guest program - [x] **W6a** `GuestProgram` (encoded ROM primary, decode is a theorem target) — in the skeleton. - [~] **W6b** **non-vacuity witness done (2026-06-17, `FormalModel/Trace/Witness.lean`, axiom-clean):** `isInitialState_nonvacuous : ∃ s0, IsInitialState emptyProgram s0`, so the target theorem's hypothesis is not vacuous. Reusable machinery: `Fintype Register` (derived), `configuredState pc` + `cfgState_init`/`pc`/`priv` (a state with every register present, PC pinned, machine mode). **Remaining (deferred, lower value):** ELF → `GuestProgram` byte ingestion (mirror SP1's `Program::from_elf`, `../sp1 crates/core/executor/src/program.rs`) + a richer (non-empty) witness ROM (reuses `configuredState` + `mem` content + `romLoaded` byte proofs; `BitVec 64` `rom_*` proofs need `bv_decide`). ### W3 — InstructionDecode/InstructionFetch → the decode half of `OperandsBound` `[~]` (closed end-to-end on the concrete-program path, 2026-06-17) Build the project `decode` on LeanRV64D's own decoder (so fetch-decode coherence with `try_step` is by construction); the decode component of `OperandsBound` is each real row's operand indices/immediates being the decode of `prog.fetchWord` at its pc. **Landed:** - `Soundness/Decode.lean` — `instrToProgramRow` (all opcode families), `DecodeOperandsBound`/`decodedInROM`, and `decode_bound_of_balance`/`decode_targetBound_of_balance` (the decode half of `bound` from a **constructed** `ProgramProvider (decodedInROM prog)` + Program-bus balance, no threaded `h_link`; the generalization `programProvider_of_valid` is in `ProgramProviderSpike.lean`). - `Model/SailDecode.lean` — the **real `noncomputable` Sail decoder reduced**: `decode_ADD_example` proves `(ext_decode 0x003100B3).run s = .ok (RTYPE …) s` via a lazy branch-skip walk (`run_bind_ok_none`/`_some` + a clean-stop `refine`-walk), under `SailConfigured` (= `isInitialized ∧ machine mode`). - `Decode.lean` `decodedInROM_addRow` — composes the two into the W3 obligation for a concrete ADD. **Remaining:** a symbolic-register `ext_decode_RTYPE` is **out of scope** (would need `bv_decide` per cascade branch; decode is only ever applied to *concrete* ROM words, so the per-opcode recipe suffices for the witness program). The general `∀ prog` case keeps `decodedInROM` a trusted decode-chip assumption (cf. `ProgramRowSpec`). ### W4 — the memory-infrastructure chips - [~] **W4a (single-shard slice, L):** **MemoryGlobalInit provider constructed (2026-06-17, `Soundness/MemoryGlobal.lean`, axiom-clean)** — the Memory-bus analog of `ProgramProviderSpike`. `memGenesisContributions` (one entry per address at value 0, genesis timestamp `t0`) + `memProviderGenesis_of_contributions` discharges the threaded `MemProviderGenesis`; `traceMemoryValid_of_genesis_and_balance` derives `TraceMemoryValid` from the constructed provider + ordering side conditions + balance (residual: `t0` below all real clocks). `memBalanceHyps_of_genesis` (`MemoryIsU64.lean`) lifts the operand isU64/value facts onto the same provider. **Remaining:** `MemoryGlobalFinal` + binding the genesis value / final image to a concrete `prog.memImage` / boundary (the per-address *value* — W2's replay precision). - [ ] **W4b (multi-shard, XL, post-target):** `ShardBoundary` (pc/clk chaining, init/final memory boundary, cumulative-sum carry), `MemoryLocal`/`MemoryBump`/`StateBump`, `machineValid_of_shards`. SP1's full memory argument is fundamentally multi-shard; the target theorem is single-shard first. ### W2 — operand/register binding from the memory-bus balance (XL; the long pole) `[~]` exact-replay keystone landed (2026-06-17) Derive, from the memory-bus balance + the register adapters' `prev_value` columns, that each row's committed operand *value* columns equal the live register/memory values at its walk position — i.e. prove `TargetObligations.bound` for the concrete `OperandsBound` and strengthen `RefinesAt`'s register frame to exact replay. **Keystone DONE (commit, `Soundness/TargetVm.lean`):** the exact-replay surgery on the *proved* capstone — `replayVal` (most-recent `op_a` write over the path prefix); `RefinesAt.frame` strengthened from a frame-disjunction to **exact replay**; `RowEffect.regs` strengthened to the **strict write** form (`s'=rdWrite` at `op_a`, `s'=s` elsewhere — what W7's `wX_bits rd` produces); `chain_to_refines` / `sp1_target_execution` re-proved green, no new axioms. So W2's exact-replay and W7's `RowEffect` shape land together as designed. **Value-half assembled (`Soundness/ValueBound.lean`):** `ValueOperandsBound` (live registers = committed `op_b`/`op_c` `prev_value` columns); `value_targetBound` proves the value half of `bound` by composing the exact-replay invariant (`RefinesAt.frame`) with the cross-bus link (axiom-clean); `OperandsBound_full = decode ∧ value`, `operandsBound_full_targetBound` (full `bound`, both halves), and `targetObligations_full` (the full `TargetObligations` at the concrete `OperandsBound`, `bound` discharged, `lift`/`halt` the W7/W5 seams — the Phase-7 glue entry point). **Walk-clk bridge landed (`ValueBound.lean`):** `walk_clk_monotone` — consecutive `WalkOf` rows advance the state-bus clock (`sndClkOf path[i] = rcvClkOf path[i+1]`), i.e. the walk visits rows in increasing clk order = the order the Memory-bus value chain reads them (`sndClk_eq_rcvClk` is the clk twin of `sndPc_eq_rcvPc`; `isWalk_chain` was exposed for it). **Remaining `TraceValueBinding` discharge:** compose `walk_clk_monotone` with (i) the memory event timestamps = row clocks (`rowClkLow`), (ii) the Memory-bus value chain (`memEvent_prevValue_eq_writer` / `traceMemoryValid_of_genesis_and_balance`: a read returns the most-recent earlier write, read-backs preserving it), and (iii) the genesis alignment (`s0`'s initial registers = 0 = the init chip's genesis) — the induction relating `replayVal`'s walk recursion to the memory event chain. Sub-item **W2b (L):** thread real load/store data addresses into `Trace.RowView` (the §8.4 gap) and strengthen `RowEffect`'s ROM clause to full store-replay memory. ### W5 — the ECALL/HALT chip (M–L) Model the HALT slice of `SyscallInstrs` (`../sp1 crates/core/machine/src/syscall/instructions/air.rs`: `is_halt` ⟹ `next_pc = [HALT_PC,0,0]`, syscall id in `t0`/x5, exit code in `a0`/x10). **Hidden prerequisite — done (2026-06-10):** syscall rows advance the clock by **256**, not 8 — `StateAccess` now carries a per-row `clk_inc` (`Soundness/StateConsistency.lean`, projected at 8 by `stateAccess` for all 25 current chips), `stateLookups`/`sndKey` (`Soundness/GatedVm/StateBridge.lean`) key on `clk_low + clk_inc`, and the PC-chain layer (`pcChainProp`/`clkStep`/`TraceClkAdvance`/ `state_successor_of_balance`/`balanced_state_bus`) is per-access, so a mixed-increment trace type-checks. The chip itself remains open. Deliverables: `TargetObligations.halt`/`halt_nonempty`, `exit_code` bound into the public values (replace `SP1PublicIO` with `SP1TargetPublicIO` in `SP1GatedVm.lean`), ECALL routing in `Coverage.lean` (today ECALL/EBREAK/UNIMP are the 3 uncovered opcodes of 53), and pointing `stateAccess`'s `clk_inc` projection at a `RowView`-level increment. ### W7 — the `try_step` step-lift (XL; parallel track) `[~]` decode stage + RowEffect shape landed Per chip kind: in a state satisfying `RefinesAt` + the concrete `OperandsBound`, reduce `(try_step 0 false).run s` — interrupt check, fetch (vs `RomLoaded`), decode (vs W3's decoder), execute (vs the existing `correct_*_native` bridges), PC commit — to `.ok _ s'` with `RowEffect r s s'`. **Already in place (2026-06-17):** the **decode stage** is reduced (`Model/SailDecode.lean`: `run_bind_ok_none`/`_some` + the branch-skip walk reduce the real `ext_decode`); `SailConfigured` is populated to `isInitialized ∧ machine mode` (the two pins the decode reduction needs; more added as fetch/execute discover them); and the **`RowEffect` target shape is now fixed** (the strict write form from the W2+W7 keystone — `wX_bits rd` produces exactly it). Remaining: fetch + execute (vs the per-chip `sailEquiv`/`correct_*_native` bridges, no `Bridge.lean` changes expected) + PC commit, per chip kind. Risk: the address-translation reduction; if heavy, this becomes the critical path. ### W1 — close `sp1_gatedExecution_prereqs` (§B5 residue; XL, independent track) `sp1_gatedExecution_prereqs` is no longer a monolithic `sorry` (2026-06-10): it is a **proven** assembly of the W1b decode seam `sp1_witness_decode : … → SP1WitnessDecode witness` (the sole remaining `sorry` in `Soundness/SP1GatedVm.lean`) with the proven W1a balance translation. - [x] **W1a (2026-06-10):** Clean `Statement.BalancedChannels` → native `isConsistentBalanced` State-bus translation, **proven and clean-3** at ensemble scale: `sp1_state_balance_of_balancedInteractions` (`Soundness/SP1GatedVm.lean`), riding the generic adapter `isConsistentBalanced_of_balancedInteractions` + the per-key cast-sum kernel `intCast_multiplicitySum_map_toAccess` (`GatedVm/BalanceMod.lean`; same-channel `toAccess` keys separate exactly on the message by `ZMod.val`/`Array.toList` injectivity, so each `LookupKey` ℤ-sum casts to one Clean `balanceOf`), `Interaction.toAccess`/`intCast_signedVal` (`Model/InteractionProjection.lean`), the native `{-1, 0, 1}` bound `stateLookups_mult_binary` (`Soundness/StateConsistency.lean`), and the landed `isConsistentBalanced_of_intCast_zero`. *Notes: upstream's gated counting lemmas (`balanceOf_eq_mult_countP_of_mult_or_zero`, `exists_push_of_pull`, `activeInteractions`) turned out unnecessary — the per-key cast argument replaces counting; `InteractionsWellFormed` is carried by `BalancedChannel` but unconsumed (the multiplicity bound is native, from binary `is_real`). Residual (deliberately moved to the seam): the witness ↔ access-list correspondence itself — the `state_accesses_perm` field of `SP1WitnessDecode` (`stateLookups_eq_emitted` lifted over the 25-table flatMap + the verifier boundary) — needs the row ↔ table binding and so rides W1b.* - [ ] **W1b (L, the biggest piece):** the 25-table `witness.tables ↔ List (ChipRow p)` decode (`same_circuits` + `valueFromOffset`) + per-table `Component.weakSoundness`, now with a concrete target shape: produce the `SP1WitnessDecode` bundle (rows/data + `spec_holds` + `is_real_binary` + `state_accesses_perm`) demanded by `sp1_witness_decode`. - [~] **W1c (M):** each chip's `isU64` operand `Assumptions` recovered from the memory-bus balance. Lemma family landed (`operand_{a,b,c}_isU64_of_memBalance` in `Soundness/MemoryIsU64.lean`, riding the limb-level bus-key extraction + per-address chain induction `eventsAt_values_isU64`, under the `MemBalanceHyps` bundle); wiring into the capstone rides W1b. ### W8 — logUp/GKR packaging (M–L, after W1) Replace the assumed balance with one named `axiom logupGkrSound` ("a verifying GKR+PCS transcript ⟹ fingerprinted cumulative sum = 0") in `Model/InteractionBus.lean`, and prove the non-crypto half (fingerprinted-sum-zero ⟹ send/receive multiset equality, LogUp/Schwartz–Zippel). **Done when** the TCB cites one crypto axiom instead of "balance assumed." ### W9 — Clean PR #398 migration `[x]` (2026-06-10) Landed by pinning Clean to the **open PR's head SHA** ([Verified-zkEVM/clean#398](https://github.com/Verified-zkEVM/clean/pull/398) = `292b9cc369be11baf816926a4bd5a697c01b1dcc`, 13 commits ahead / 0 behind the old `main` pin, same 4.28 toolchain) rather than waiting for the merge. Upstream `Channel.toRaw` is now gated on zero multiplicity and receives owe no `Requirements` at all, so the whole custom layer in `Model/Channels.lean` (`toRawGated`/`gatedReceive`/`emitGated`/`receivedGated`/`emittedGated` + projections + `binary_gate_req_vacuous`; the asymmetric family turned out to be dead code) is deleted in favor of `Channel.pullIf`/`Channel.emit`/gated `toRaw`. Side effect worth knowing: pre-W9, `sp1GatedVm.busChannels` listed `toRawGated` records while the readers emitted on `toRaw` — records that differed at `mult = 0` — so the ensemble balance plausibly did not bind the program/memory emissions; the unification makes the Statement's bus balance genuinely bind them. Extraction reproducibility (release-audit TB-9) was **not** fixed by the pin alone — the compiler at the SP1 pin emits a `name`/`main`-**field** `ElaboratedCircuit` from a transient window of Clean main (`60665ed0`, later reworked), which no pinned Clean accepts. `update_extracted.py` now normalizes the emitter output (`_normalize_circuit_api`: parameterized instance + `pullIf`/`toRaw` names); a full regen at the pin reproduces `Extracted/` + `WitnessTests/` byte-identical and the 14 circuit-form files up to the (accepted, re-committed) emitter formatting of the two `Lt` files. **Residual:** re-pin to the merge commit when the PR lands on `main`; upstream `d25bba8d` (post-pin, not in the PR branch) likely retires the Batteries import-narrowing workaround (`docs/agents/lean-sail-notes.md`). ### W11 — re-base `GatedVm` on upstream `VmTables` (M–L; new, post-W9) `GatedVm/` exists only because pre-#398 upstream `VmTables` hardwired `±1` multiplicities. Post-#398 it natively supports gated VMs (`VmStep`, `tables_channel` over `pullIf`/`pushIf` with enabledness derived from constraints, `stepOfAllTables`, the gated `verifier_guarantees_of_requirements_of_requirements_of_guarantees`, `addVm_soundVmChannel_of_soundChannels`). Re-basing would inherit the upstream VM-channel soundness engine for the State bus. Cost: every chip exposes `[pullIf is_real cur, pushIf is_real next]` on the State channel (the state *receive* switches from `emit (-is_real)` to a true `pullIf` — harmless, `StateMsg.Spec = True`) plus enabledness booleanity from the existing `is_real` gate. Adjacent to W1a; not coupled to it. ### B1 — the three completeness `sorry`s (M each, independent) Debt items 1–3 above (`MulChip.completeness` closed 2026-06-10 via `MulOperation.spec_populate` + the `LtChip` witnessed-columns `convert`/`getElem_toElements_eval_varFromOffset` recipe); the `BranchChip.completeness` recipe (honest `ProverHint` flag witnesses + shared dispatch) is the template for the rest. Closing them also cleans the capstone chain's axiom census (see the debt note) and retires the K6 sampled-conformance reliance for those chips. --- ## Coverage-claim hygiene (ongoing) - Keep `allChipKinds_length` (25), `sp1Tables_length` (25), and the `Coverage.lean` guards (`coverage_kinds_eq_registry`, the covered/uncovered partition — 50 of 53 opcodes; ECALL/EBREAK/UNIMP open until W5) in sync as chips are added. - In any external claim, cite the machine-derived surface figure — the 25 modeled chips cover the **Supervisor-mode halves of 25 of SP1's 122 `RiscvAir` variants** (v6.2.2-20-g9d249b8d4) — and the explicit exclusion list (decode/fetch, memory-infra, PageProt, syscalls/traps, Global, Range, precompiles, and the User-mode duplicates). `Supervisor/User`: decide whether single-variant coverage extends to the User duplicates (same AIR, different bus tags?) or stays a documented gap. - The witness-vector battery and `Extracted/` currency are re-checked by `scripts/run_audit.sh` §A4 + CI `lake build`; the `SP1_PINNED_COMMIT` assertion in `update_extracted.py` keeps extraction provenance explicit. Remaining: a CI job that re-extracts and diffs per-PR. --- ## Cleanup / polish backlog (non-blocking) Deferred quality/perf TODOs — none gate the VM theorem; pick up opportunistically. The *how-to-golf-safely* lessons (heavy-core caution, kernel-safe dedup, the `maxHeartbeats`-is-the-wrong-lever finding, the available `/cleanup` skills) live in `docs/agents/proof-patterns.md` § "Compile-time / performance landmines" + "Golf & cleanup discipline". - **`linter.style.longLine`** — the one remaining syntactic linter not yet enabled (it's the last candidate noted in AGENTS.md § Linters). ~1080 lines exceed 100 chars (`Native/` ~817, `FormalModel/` ~263). Enable it alone on the core pillar lake libraries, then reflow or per-file-suppress back to 0 warnings. Heavy, mechanical. - **Shift soundness tail-dedup** — the real build-time prize. Extract the byte-identical `cpuA/msb*/aluA` requirements tail shared across the 6 Shift soundness conjuncts into a `requirements_holds`/`SpecObligation` helper, mirroring `Proofs/Chips/DivRemChip/Soundness/Tail.lean` (recipe: proof-patterns § "Shared-tail dedup"). Structural, multi-hour, overlaps the recently-golfed Shift soundness files — do it as a dedicated pass, not a drive-by. - **`/decompose-proof` candidates** — long proof bodies worth splitting into named sub-lemmas: `ShiftLeftChip`/`ShiftRightChip` `Formal.lean` `completeness` (~123/~180 lines), `LoadHalfChip`'s 4-way `h_sel_lt` offset-selection case-bash (near-verbatim across soundness + completeness), `BranchChip` `soundness`/`completeness` (~156/~290 lines of per-column `env.get` plumbing). Several are perf-tuned — decompose with care and watch elaboration time. - **SailState-staging bridge preamble** — the `hpc_get`/`key`/`hsp_config` preamble recurs across ~10 store/jal/load `Bridge.lean` files → a shared lemma. **Re-examine the shape first** — upstream #101/#102 rewrote several bridges in the 2026-06-23 merge, so the pre-merge duplication may have shifted. - **Namespace-isolate the auto-gen (linter hardening, Option B)** — the `sp1Lint` exclusion is a *soft* module-path filter. A *hard* boundary would relocate all auto-gen to a separate root namespace `SP1Extracted.*` so the stock `runLinter` excludes it by construction (no custom filter). Cost: ~87 module renames + import-line edits + `update_extracted.py` writer paths + lakefile globs. Not worth it for linting alone; reconsider only if a hard auto-gen/hand-written namespace split is wanted for other reasons.