# PLAN: Verified RISC-V EVM Implementation

> **Agent instruction**: Keep this file up to date as you work. When you finish
> implementing an opcode or task, move it to the "Done" list under
> "Current Status", update any counts or details that changed, and note any
> new sub-tasks you discovered. Check this file at the start of each session
> to pick up where the last agent left off.

Goal: implement and verify the EVM state transition function (STF) as RISC-V
macro assembly programs, for use as a zkEVM. Each EVM opcode becomes a verified
RISC-V subroutine operating on 256-bit stack words in memory. The STF is the
single most important piece in the execution layer — it processes blocks of
transactions against the world state.

**Target: RV64IM (64-bit)**, per the zkvm-standards spec
(`EvmAsm/Evm64/zkvm-standards/`). RV32IM was removed (not relevant).

Reference spec: `execution-specs/src/ethereum/forks/shanghai/vm/` (Python).
zkVM standards: `EvmAsm/Evm64/zkvm-standards/` (submodule).

> **Parallel codegen track.** Emitting verified `Program`s as executable
> RV64 ELFs that run on the Zisk emulator is tracked separately in
> [`CODEGEN.md`](CODEGEN.md). **M0–M10 are done**: text emitter, total
> `Instr` coverage, `evm_add` round-trip on `ziskemu` from both `.data`
> and `ziskemu -i`, tiny EVM interpreter with runtime fetch/decode/
> dispatch, and **91 wired opcodes** through `tinyInterpRegistry` —
> PUSH0–32, DUP1–16, SWAP1–16, 17 clean-shape singletons, MLOAD/MSTORE/
> MSTORE8 (M7), DIV/MOD (M8), runtime-bytecode dispatcher (M8.5),
> SDIV/SMOD via trampoline (M9), and ADDMOD via inline-callable (M10).
> EXP via inline-callable `_fixed_fixed` body (M27; per-limb counter
> moved x6→x22 so it survives `mul_callable`). Codegen-
> proofs **Phase 1 (registry invariants)** and the first 13/91
> instances of **Phase 4 (handler-level `cpsTripleWithin` specs via
> `cleanRetHandlerSpec`)** have landed under
> `EvmAsm/Codegen/Proofs/`. Codegen remains purely additive — it does
> not modify the verified core.

---

## Architecture Overview

### RISC-V Backend

| | RV64IM (Evm64) |
|---|---|
| **Target** | `riscv64im_zicclsm-unknown-none-elf` |
| **Word size** | 64-bit (`BitVec 64`) |
| **Limbs per EvmWord** | 4 × 64-bit (LE) |
| **Memory ops** | LD/SD (8-byte aligned) |
| **Files** | `EvmAsm/Evm64/` |
| **Infrastructure** | `EvmAsm/Rv64/` |

### zkVM Standards (submodule: `EvmAsm/Evm64/zkvm-standards/`)

The standards define the target environment for Ethereum zkVMs:
- **RISC-V target**: RV64IM + Zicclsm (misaligned load/store support)
- **IO interface**: `read_input` / `write_output` for private input and public output
- **Cryptographic accelerators**: C interface for keccak256, ecrecover, SHA-256,
  RIPEMD-160, modexp, BN254, BLS12-381, BLAKE2f, KZG, secp256r1 (via
  `zkvm_accelerators.h`)
- These accelerators map directly to Ethereum precompiles and KECCAK256

### Machine State (Rv64)

```
MachineState:
  regs : Reg → BitVec 64       -- Registers: x0(zero), x1(ra), x2(sp),
                                --   x5-x7(t0-t2), x10-x12(a0-a2)
  mem  : Addr → BitVec 64      -- 64-bit addressable memory
  code : Addr → Option Instr   -- Instruction memory (immutable)
  pc   : BitVec 64             -- Program counter
  committed : List (Word × Word)  -- legacy SP1 COMMIT word-pair outputs
  publicValues : List (BitVec 8)  -- public output bytes for write_output/WRITE
  privateInput : List (BitVec 8)  -- legacy SP1 HINT_READ input stream
```

EVM stack: x12 is EVM stack pointer, stack grows upward, 32 bytes per element.

### Proof Framework

- **Separation logic**: `r ↦ᵣ v` (register), `a ↦ₘ v` (memory), `**` (sep conj)
- **CPS Hoare triples**: `cpsTriple base end P Q` — from `base` to `end`, if P
  holds then Q holds, with automatic frame rule for untouched resources
- **Per-limb composition**: Each 256-bit op decomposes into 4 per-limb specs,
  then composed via `runBlock` tactic
- **Key tactics**: `xperm`, `xsimp`, `xcancel`, `seqFrame`, `runBlock`,
  `validMem`, `liftSpec`, `pcFree`

---

## Current Status

### Evm64 (PRIMARY) — 52 opcodes

| Category | Opcodes | Instructions (per op) | Status |
|----------|---------|----------------------|--------|
| Arithmetic | ADD, SUB, MUL, SIGNEXTEND | 30 / 30 / 63 / 48 | ✅ Fully proved |
| Bitwise | AND, OR, XOR, NOT | 17 / 17 / 17 / 12 | ✅ Fully proved |
| Shift | SHR, SHL, SAR | 90 / 90 / 95 | ✅ Fully proved |
| Comparison | ISZERO, LT, GT, EQ, SLT, SGT | 12 / 26 / 26 / 21 / 25 / 25 | ✅ Fully proved |
| Byte/SignExt | BYTE, SIGNEXTEND | 45 / 48 | ✅ Fully proved |
| Stack | POP, PUSH0, DUP1-16, SWAP1-16 | 1 / 5 / 9 / 16 | ✅ Fully proved |

**Deleted spec files** (incomplete CodeReq migration, easier to recreate):
- ~~`ShiftSpec.lean`~~ — ✅ Recreated as `LimbSpec.lean` (SHR) + `ShlSpec.lean` (SHL) + `Compose.lean` + `ShlCompose.lean` + `Semantic.lean` + `ShlSemantic.lean`
- ~~`ShlSpec.lean`~~ — ✅ Recreated (per-limb + body + composition + stack-level spec)
- ~~`SarSpec.lean`~~ — ✅ Recreated (per-limb + body + sign-fill + composition + stack-level spec)
- ~~`ByteSpec.lean`~~ — ✅ Recreated as `Byte/Spec.lean` (stack-level `evm_byte_stack_spec`) + `Byte/LimbSpec.lean` (per-body + cascade dispatch)
- ~~`StackOps.lean`~~ — ✅ Recreated as modular `Pop.lean`, `Push0.lean`, `Dup.lean`, `Swap.lean`

All deleted spec files have been recreated. See **Pending: Recreate Deleted Spec Files** below for details.

**Removed targets** (not relevant to primary goal):
- Evm32 (secondary RV32IM target) — removed entirely
- Rv32 infrastructure — removed entirely
- Examples (Swap, HelloWorld, Echo, etc.) — removed (all depended on Rv32)

### Infrastructure — RV64 only, no sorry

- RV64: Basic, Instructions, Program, Execution, CPSSpec,
  ControlFlow, SepLogic, GenericSpecs, InstructionSpecs, SyscallSpecs,
  HalfwordOps, WordOps
- Tactics: XPerm, XSimp, XCancel, SeqFrame, RunBlock, LiftSpec, ValidMem,
  PcFree, SpecDb
- **CodeReq infrastructure** (Issue #35): `CodeReq` type + `cpsTriple` 5-arg
  form + composition rules + tactic support in Rv64.
  CodeReq monotonicity helpers in SepLogic.lean
  (`union_singleton_apply`, `beq_base_offset`, `union_mono_tail`).
- **`CodeReq.ofProg`** (recent): Replaces chains of `singleton.union` with
  program-based CodeReq construction. Key infrastructure in SepLogic.lean:
  - `ofProg base prog` — builds CodeReq from a `List Instr`
  - `ofProg_append` — splits `ofProg base (p1 ++ p2)` into two `ofProg` unions
  - `ofProg_none_range` — proves out-of-range addresses return `none`
  - `unionAll` — structural subsumption for lists of CodeReqs
  - Range-based `ofProg` disjointness (O(1) vs O(n) singleton expansion)
  - MultiplySpec col0–col3 migrated to `ofProg` pattern
- **runTacticSilent**: Suppresses bv_omega diagnostic leaks from speculative
  tactic calls (Lean 4.29 regression fix in SeqFrame.lean/RunBlock.lean).
- **`bv_decide` purge — COMPLETE** (fully kernel-checkable trust base):
  Following the `native_decide` elimination (206 → 0), `bv_decide` was driven
  from **290 → 0** call sites. The full library builds green (3027 jobs) and
  the witnessed trust base (`check-axioms.sh --report`) is **0 non-classical
  axioms** beyond `propext`/`Classical.choice`/`Quot.sound`. Conversion
  techniques: `BitVec.eq_of_getLsbD_eq` + `simp`/`omega` (getLsbD block-splits
  for `fromLimbs`/`getLimb`/8-byte-concat identities), `congr 1`/`decide` +
  `bv_omega` for address/`signExtend` arithmetic, a shared
  `ushr63_bool : x >>> 63 ∈ {0,1}` helper for sign disjunctions, the bit-0
  parity helper for `&&& ~~~1` JALR masks, `decide` for concrete 256-bit goals
  (kernel `Nat` is GMP-fast), and — for the multi-limb two's-complement
  lemmas in `SDiv`/`SMod` `Compose/Words.lean` (`…_eq_neg_word_of_sign_one`,
  `= -quotient`, `= -modulus`, plus the dependent `…_sign_split`/`…_eq_zero_iff`
  /`…_limb_sign` lemmas) — `EvmWordArith.add_carry_chain_correct (~~~v) 1`
  (the ripple-carry negation `sdivAbsWord = ~~~v + 1 = -v`) composed with
  `fromLimbs_getLimb`/`BitVec.neg_eq_not_add`. The `divmod_addr`/`rv64_addr`
  grind-set definition files use a local `addrclose` macro. The 6 now-unused
  `import Std.Tactic.BVDecide` lines were removed. **CI keeps it out** via two
  complementary gates (`.github/workflows/build.yml`):
  `scripts/check-forbidden-tactics.sh` (fast source scan — fails on any
  `bv_decide`/`native_decide` tactic invocation in `EvmAsm/**.lean`) and
  `scripts/check-axioms.sh` (kernel-truth `#print axioms` backstop — now
  forbids `bv_decide` trust axioms too, not just `native_decide`). See
  `CLAUDE.md` / `CONTRIBUTING.md`.
- **Progress-registry semantics — agent-steering rollout** (`EvmAsm/Progress.lean`,
  `docs/agent-progress-steering-review.md`):
  - **Phase 0 (merged, PR #7994):** `check-axioms.sh`, `check-statement-tamper.sh`,
    `check-conformance-floor.sh`, `check-forbidden-tactics.sh` wired into
    `build.yml`.
  - **Phase 1 (this work):** added a `conditional` `ProofTier` (complete
    top-level Hoare triple gated by a nonvacuous input-domain precondition —
    distinct from `partly`, which has no complete triple). **DIV, MOD, SDIV**
    reclassified `.partly → .conditional` (each gated by `b.getLimbN 3 = 0` /
    `hStack`). New kernel-checked counts `conditionalCount_eq = 3` /
    `conditionalBytes_eq = 3`; `partialCount 9→6`, `partialBytes 39→36`; totals
    unchanged (85 entries / 149 bytes). SMOD/MULMOD/EXP/ADDMOD stay `.partly`
    (no full triple; ADDMOD's triple is single-point `b=0` only). Added typed
    `cycleBound : Option Nat` to `OpcodeEntry` (the `N=…` cycle bounds migrated
    out of free-text `notes` into this field, rendered in the `Cycles (N)`
    column — R-C4) plus optional `milestones`/`coverRef` scaffold fields
    (R-A4/R-A3). Registry rows now use an `entry` smart constructor so the
    optional fields stay defaulted. Per-tier rubric documented in `AGENTS.md` +
    `progress-template.md`. **Follow-up (deferred):** kernel-checked *binding*
    of `cycleBound` to the witness theorem's literal `cpsTripleWithin N` (the
    `Progress.lean`→Spec circular-import problem; option ii/iii in the bootstrap)
    — landed field+renderer (option i) for now. Cover lemmas for the three
    `conditional` entries (`coverRef`) also not yet written.
  - **Phase 2 (this work) — direction tracking:** net-new kernel-checked
    obligation tracker `EvmAsm/Progress/Obligations.lean` (`ObligationStatus`
    `done|blocked|notStarted`, a `Blocker` sum type `.opcode|.infra`, the 9
    guest-program obligations, `by decide` counts `doneCount_eq = 2` /
    `blockedCount_eq = 6` / `notStartedCount_eq = 1` / `totalObligations_eq = 9`,
    and `blocker_opcodes_in_registry` — a `by decide` cross-check that every
    `.opcode` blocker names a real `Progress.registry` entry, so a renamed
    opcode fails the build). `MainProgress.lean` renders an "obligation ×
    blocker" matrix into `PROGRESS.md` (the obligations table moved out of
    `progress-template.md` into the generated section). New `DRIFT.md` TCB /
    "what is NOT proven" ledger, generated by `lake exe progress-report drift`
    (`scripts/drift-report.sh`) and drift-gated by `scripts/check-drift.sh`
    (wired into `build.yml`) — lists the conditional uncovered domains, the
    `partly`/`execSpec`/`notStarted` opcodes, and the trust boundaries (codegen
    unverified by design, RV64/EVM/gas modeling). Velocity time-series:
    `scripts/progress-snapshot.sh` (one JSONL row per commit, parsed from the
    drift-gated `PROGRESS.md` — no build needed) + `scripts/progress-velocity.sh`
    (deltas + monotonic-regression alarm for the DIV-style silent downgrade),
    persisted to a `progress-history` orphan branch by
    `.github/workflows/progress-history.yml` (mirrors `benchmark-history`), which
    also surfaces an advisory velocity read in the job summary on each main push.
    Obligations carry a kernel invariant `done_obligations_well_formed`
    (`done → witness ∧ no blockers`) so a false-green status flip can't be a
    one-token edit. **Follow-up (deferred to Phase 4 / R-B1 scorecard):** wire
    `progress-velocity.sh --check` as a *PR-time* gate (the post-merge workflow
    can't block a merge); it is advisory-only today.
  - **Phase 3 (this work) — DIV-class defense (differential / fuzz testing):**
    closes the *codegen* blind spot (the kernel proves the in-Lean semantics,
    not the RISC-V lowering / ziskemu — the home of the v4 DIV bug).
    - **D1 (R-E1) — boundary-biased arithmetic differential fuzzer.**
      `EvmAsm/Tests/ArithDiffCheck.lean` + `lake exe arith-diff-check` fuzz
      `EvmWord.{div,mod,sdiv,smod,mulmod,addmod}` against an INDEPENDENT
      Nat/Int reference over operands biased toward the Knuth-D rare paths
      (sign edges ±2^255, limb boundaries, 4-limb divisors with a large top
      word — the `b.getLimbN 3 ≠ 0` regime the v4 proof excluded, add-back).
      Permanent regression corpus `tests/fuzz-corpus/arith/corpus.jsonl`
      (29 curated DIV-class edges) carries the execution-specs *amsterdam*
      oracle verdict, re-checked by the PR fast-path without Python. Nightly
      tier `scripts/fuzz-arith-diff.sh --python` (`.github/workflows/
      fuzz-arith.yml`) drives 20k+ operands through both evm-asm and the
      UNMODIFIED execution-specs oracle (`scripts/fuzz_arith_oracle.py` under
      `uv run`), appending any divergence to the corpus. PR fast-path wired
      into `build.yml`. The oracle is a TEST ORACLE ONLY (never in the TCB).
    - **D2 (R-E3) — EEST budget-vs-wrong distinction.**
      `scripts/codegen-eest-stateless-check.sh` now buckets a `--steps`
      exhaustion as `BUDGET` (separate from `ERROR`, never folded into fail
      or the `--min-*` gates), detected via the overridable
      `EEST_STEP_LIMIT_RE`; non-match falls through to `ERROR` (regression-
      free). Per-opcode/per-step localization deferred (needs ziskemu trace
      instrumentation).
    - **D4 (R-E4) — round-trip coverage fence.**
      `scripts/check-roundtrip-coverage.sh` (wired into `build.yml`) asserts
      every `Rv64/Basic.lean` `Instr` constructor (55) has a `#guard` in
      `Codegen/RoundTripTests.lean`; fails when a new shape lands unguarded.
    - **D3 (R-E2) — dual-path differential: DEFERRED.** In-Lean execution
      model vs codegen→ziskemu round-trip; the ziskemu-equality assertion
      can't be validated without a working ziskemu (unavailable this
      session). Concrete build plan captured in
      `docs/dual-path-differential-design.md` for a session with ziskemu.
  - **Phase 4 (this work) — review throughput & merge safety (P3):** rations the
    single maintainer's attention with objective triage and enforces the trust
    boundary at the merge gate. Stacked on the Phase 3 branch (Phases 2–4 still
    unmerged at hand-off).
    - **D1 (R-C5) — enforced trust boundary.** Net-new `.github/CODEOWNERS`
      maps the verified core (`Progress.lean`, `Progress/`, `Rv64/`,
      `**/Spec.lean`, `EL/Conformance/`, verifier-config scripts, `lakefile.toml`/
      `lean-toolchain`, `.github/workflows/`) to `@pirapira`; codegen is
      deliberately left unowned (lighter bar — unverified by design, fenced by
      conformance/fuzz/dual-path). CODEOWNERS is advisory until paired with a
      branch ruleset (a repo *setting*); the proposed ruleset + the approval-
      count question are in the PR body (open question #1).
    - **D2 (step 14) — merge-queue design doc FIRST.**
      `docs/merge-queue-design.md` documents the GitHub native merge queue with
      *batching* + TAP test-and-bisect eviction (on batch failure, bisect to
      eject the offending PR — `O(log N)` runs — rather than re-running the
      ~516-script ziskemu suite per PR). `build.yml` already triggers on
      `merge_group`; the doc gates *activation* (a repo setting), it does not
      flip the queue on. Heavy-vs-light check placement table included.
    - **D3 (R-B1) — deterministic risk scorecard.** Extended
      `scripts/progress-delta.sh` (still pure git+awk, no LLM, the existing
      `summary.yml` payload path — not new plumbing) with a 5-column scorecard
      `{new top-level triples, Δtier counts, net Δsorries+axioms, Δconformance,
      touches-trusted-core, statement-edit, changed-lines}` + a HIGH/MEDIUM/LOW
      risk label. HIGH = touches `Progress.lean`/`Rv64/Basic.lean`/`*Spec.lean`,
      XL diff (>1000 lines excl. `codegen-*.sh`), or codegen changed with no
      round-trip/conformance script change; MEDIUM = other trusted-core /
      statement / sorry-axiom-increase signals. Trusted-core path set mirrors
      `check-statement-tamper.sh`. Risk is **triage ordering only — not
      auto-merge authority for the verified core**. Also fixed a latent
      `conformance_count` regex bug (required a space before the backtick-wrapped
      `allConformanceVectors_length`, so conformance delta always read `?`).
    - **D4 (R-B2) — path + size auto-labeling.** `.github/labeler.yml` +
      `.github/workflows/labeler.yml` (`actions/labeler@v5`) tag
      `area:verified-core` / `area:codegen` / `area:docs`; a size labeler buckets
      `size/XS..XL` from per-file API line counts **excluding `scripts/codegen-*.sh`**
      so a bulk script regen isn't mislabeled XL (XL boundary 1000 = scorecard
      threshold).
    - **D5 (R-B3) — statement-strength rubric.** `docs/pr-summary-progress-prompt.md`
      gains a spec-quality sign-off (`Statement-strength: OK|REVIEW — …`) layered
      **only** on the un-kernel-checkable question (vacuous/over-restricted
      precondition? postcondition covers stack+memory+gas+halting?) — explicitly
      **never on proof correctness** (the kernel is the perfect oracle there).
    - **D6 (G2.6) — long-running-branch visibility.** `BEADS.md` claim ledger
      (coordination overlay on the `bd` tracker — collision detection for
      double-claimed beads) + `.github/workflows/stale-pr-nudge.yml`
      (`actions/stale@v9`, 21-day nudge, **never auto-closes**, `long-running`
      label exempts deliberately long-lived stacks).
    - **D7 (Phase-2 deferral / R-A5) — PR-time velocity gate.**
      `.github/workflows/progress-velocity-check.yml` snapshots PR **base** (via
      new `progress-snapshot.sh --ref <commit>`, pure `git show`, no checkout)
      vs PR **head** and runs `progress-velocity.sh --check` on the 2-record log
      — catching a monotonic proven/conformance/obligation downgrade *before* it
      lands. Base→head (not head-vs-latest-main) avoids false positives on
      stale branches. Cheap (no `lake build`). **Blocking by default** — a
      monotonic downgrade fails the PR; set the `VELOCITY_GATE` repo var to
      `warn` to relax to advisory for a known generalization, then restore to
      `block` (open question #4).
    - **Open questions surfaced in the PR:** (1) ruleset approval counts for a
      single-maintainer repo; (2) merge-queue batch size + eviction; (3) risk
      XL threshold + exact trusted-core path set; (4) velocity gate warn-vs-block.
    - **Opportunistic carry-overs still deferred:** Phase-1 `cycleBound`-binding
      (R-C4) + `coverRef` cover lemmas (R-A3); Phase-3 D3 dual-path (needs
      ziskemu) + per-opcode EEST localization.
  - **Phase 5 (this work) — conventions-as-gates + cost trend (P1/P3): the
    FINAL phase. ROLLOUT COMPLETE (Phases 0–5 done).** Grows the `check-*.sh`
    suite so prose conventions become executable architecture fitness functions
    (Ford/Parsons), retires the one dead gate, and stands up advisory cost/churn
    trend infra — without tripping the build-budget / noisy-gate non-goals. No
    Lean source touched (pure scripts/config/workflows/docs), so `lake build` is
    unaffected. Branched off fresh `upstream/main` (Phases 0–4 all merged).
    - **D1 (R-D1) — opcode-structure gate.** `scripts/check-opcode-structure.sh`
      hard-fails (block) on a broken `AddrNorm.lean`↔`AddrNormAttr.lean` pairing
      (Lean forbids `register_simp_attr` in its declaring file — a real
      structural bug, holds tree-wide today). The four `OPCODE_TEMPLATE.md`
      essentials can't be blanket-required (only `DivMod` has all four; FullPath
      is the end-state composition), so the rest is an **advisory checklist**
      diff-scoped to *newly-added complex* opcode dirs (those adding `Compose/`
      or `LimbSpec/`): nudges for FullPath, `@[irreducible]` Post, `Offsets.lean`.
    - **D2 (R-D1) — naming scan (advisory).** `scripts/check-naming.sh` flags
      camelCase hypotheses (`hLt`) *newly added in the PR diff*, the PR #1497
      regression class. Diff-scoped + advisory because the tree already has
      thousands of legacy camelCase hypotheses — a full-tree gate would be pure
      noise. Never blocks.
    - **D3 (R-D1) — heartbeat allowlist (block).** `check-heartbeats-approved.sh`
      + `scripts/approved-heartbeat-overrides.txt` make heartbeats off-limits:
      a **dumb case-insensitive `grep heartbeats`** over ALL repo `.lean`
      (excluding `.lake/` dependency sources) + the lakefiles flags EVERY
      mention (overrides AND prose), each adjacent value of which must be
      sanctioned in the allowlist at its exact value (`-` for a non-numeric
      docstring mention). No Lean lexer — nothing to bypass (three review rounds
      each found a fresh hole in an earlier comment/string-aware scanner; the
      policy is simply "nobody touches heartbeats," so the gate matches it).
      Seeded green with the two grandfathered numeric overrides
      (`Swap/Spec.lean` 800000, `ALUProofs.lean` 4000000; each carries a
      relitigation bead, open question #3) + three sanctioned prose mentions.
    - **D4 (R-D1, REFRAMED) — layering gate (block).** `scripts/check-layering.sh`.
      The bootstrap's "EL must not import Rv64" is **false against the tree**
      (RLP spans both directions: `EL/RLP/*`→`Rv64` *and* `Rv64/RLP/Phase*`→`EL`),
      so it was reframed to the true, currently-clean invariants: **L1** the
      verified core never imports the unverified `Codegen` layer (fences the
      report §6 "codegen unverified by design" boundary) — scoped
      *core-by-default*, i.e. every `EvmAsm/` dir except `Codegen`/`Tests`/
      `Examples`, so a future new dir can't become an unscanned laundering hop —
      **L2** nothing imports the `Progress` registry, **L3** core never imports
      the `Tests`/`Examples` escape hatches (which may import Codegen), closing
      the last `core→Tests→Codegen` hop. Advisory
      rider: `Rv64`↛`Evm64` with the one grandfathered tactic-bridge edge
      (`Rv64/Tactics/LiftSpec.lean`) allowlisted.
    - **D5 (R-D1) — fitness-function reframing + dead-gate retirement.** AGENTS.md
      now documents the `check-*.sh` suite as architecture fitness functions
      (blocking vs advisory tiers tabulated). Retired the orphaned
      `scripts/check-unbounded-cps.sh` (unwired since `6e7ce6dec`; it matched
      only prose `cpsTriple` mentions and would red-line if wired — it policed
      nothing).
    - **D6 (R-C6) — churn report (advisory).** `scripts/churn-report.sh` (pure
      `git log`, no deps): top-churn `.lean`/`scripts` per window + short-lived
      churn. Runs on the weekly `quality-trends.yml` cadence (needs full history,
      off the per-PR path). Never gates.
    - **D7 (R-C6) — duplication watch (advisory).** `scripts/check-duplication.sh`
      + `scripts/jscpd.json` + `scripts/duplication-baseline.txt` (budget seeded
      2.93% over the default `scripts/` scope). Rule of Three: `codegen-*.sh` and
      concrete `Program(s).lean` excluded. Per-PR scope is `scripts/` only (a full
      ~2000-file `.lean` sweep exceeds 300s); the full sweep runs weekly in
      `quality-trends.yml`. Advisory now; `--gate` promotes after calibration
      (open question #2).
    - **D8 (R-F1) — ziskemu cycle instrumentation (schema + appender; live parse
      DEFERRED).** `scripts/cycles-append.sh` + the `cycles-history.jsonl` schema
      (git-derived commit/date, pinned EEST tag, nullable `steps`/`cycles`). The
      only path to validating the founding "better zkVM performance" claim. No
      working ziskemu this session → the log live-parse is guarded/deferred; the
      schema + appender land now so codegen scripts can start emitting (open
      question #4).
    - **D9 (R-F2) — proof-size + build-time trend.** `benchmark.yml`'s `lakeprof`
      job already records per-module build *time*; added the deterministic proof-
      *size* dimension (`.github/workflows/scripts/oleansize_collect.sh` merges
      top-N `.olean` byte sizes into the appended record). Size is noise-free
      where CI time is noisy, so a ballooning n=4 division proof surfaces as a
      monotone trend. Deliberate no-hard-threshold stance preserved — **never a
      heartbeat bump to force a proof green**.
    - **D10 (R-F3) — external export scaffold.** `.github/workflows/export-metrics.yml`
      (monthly) bundles `benchmark-history` + `cycles-history.jsonl` and uploads
      an artifact; pushes toward `eth-act/zkevm-benchmark-workload` only when
      `vars.METRICS_EXPORT_REPO` + `secrets.METRICS_EXPORT_TOKEN` are configured
      (no-op otherwise). Performance ≠ conformance — "proves X% of blocks" is
      never a verification signal.
    - **Block vs advisory (open question #1, decided):** BLOCK = opcode-structure
      pairing + heartbeat allowlist + layering. ADVISORY = naming + opcode
      checklist + churn + duplication + all cost/trend infra (D8–D10).
    - **Verifier-config hardening:** the new allowlists/configs
      (`approved-heartbeat-overrides.txt`, `duplication-baseline.txt`,
      `jscpd.json`) + `churn-report.sh`/`cycles-append.sh` + `.github/labeler.yml`
      were added to all four trusted-core path sets (CODEOWNERS,
      `is_trusted_core`, `is_verifier_config`, `labeler.yml`) so they can't be
      self-weakened without maintainer review.
    - **Open question (R-C2 enforcement strength, surfaced by review):** a
      gate-neutering edit — adding `continue-on-error`, deleting a blocking step,
      or editing a `check-*.sh` to `exit 0` — currently fails NO CI check:
      `check-statement-tamper.sh` runs **advisory** in `build.yml` (no `--strict`),
      so verifier-config edits are only *printed*, and enforcement rests on
      CODEOWNERS + the (repo-setting) Code-Owner ruleset + human review — the
      report's intended approve-by-exception model. Defense-in-depth option for
      the maintainer: run the tamper scan `--strict` over the
      `scripts/check-*.sh` + `.github/workflows/**` subset (the
      `[allow-verifier-change]` commit token is the escape hatch) so a
      self-neutering edit at least *fails a check*. Deliberately NOT changed here
      (it reverses a Phase-0 design choice and would gate every verifier-config
      PR); flagged for a maintainer decision.
    - **Rollout-wide still-deferred beads:** R-C4 `cycleBound`-binding, R-A3
      `coverRef` cover lemmas, R-E2 dual-path (needs ziskemu), D8 ziskemu cycle
      live-parse **+ its persistence/caller wiring** (`cycles-history.jsonl` is
      gitignored and not yet persisted to an orphan branch, so the cycles half
      of the export bundle is empty until ziskemu lands — mirror
      `benchmark-history` then), and the two grandfathered heartbeat-override
      relitigations.
- **Proof-ergonomics infra distilled from the purge** (see `GRIND.md` §7):
  - **`signExtend` simprocs + `signext` tactic** (`Rv64/SignExtendSimproc.lean`):
    `reduceSignExtend12/13/21` are `dsimproc_decl`s (definitional, kernel-checkable,
    *not* in default simp) that evaluate `signExtend1? <literal>` → its `Word`
    constant, including negative offsets (`-32#12`) the core `BitVec.reduceSignExtend`
    declines. The `signext` tactic then closes `addr + signExtend c = addr + K`
    / pure-address / standalone-eval goals via `bv_omega`. Replaces the bespoke
    `congr 1 <;> decide` / `rw [show … by decide]` idiom; `Exp/AddrNorm`'s
    `addrclose` now delegates to it (84 sites).
  - **`evmword_limb` grind/simp set** (`Evm64/Basic.lean` + `EvmWordLimbAttr.lean`):
    the `getLimb`/`fromLimbs`/`getLimbN` round-trips + bitwise-distribution lemmas
    are dual-tagged `@[evmword_limb, grind =]`, so `grind` discovers limb
    normalizations and `simp [evmword_limb]` normalizes limb expressions.
  - **`Rv64/BitAux.lean`** — shared `Word` bit-level helpers (`ushr63_bool`,
    `ult_zero_false`, `word_xor_zero`, `bv6_63_toNat`, …) that were previously
    copied as `private` decls into `SDiv`/`SMod` `Compose/Words.lean` (now
    imported from one place), plus 2-byte-alignment lemmas
    (`word_andn_one_of_even`, `word_add_even_and_one`/`_andn_one`, the latter
    two `@[grind →]`) that dedup the inline JALR-mask (`&&& ~~~1` / `&&& 1 = 0`)
    parity proofs; `Exp/AddrNorm.addrAligned` and the two `SDiv/Compose/Base`
    mask lemmas now delegate to them.
  - **Deferred:** factoring the 4 near-identical multi-limb two's-complement
    `…AbsWord …= -v` proofs (`SDiv`/`SMod` `Compose/Words.lean`) into one shared
    lemma is blocked by their 4 separate underlying definitions — a clean DRY
    needs unifying those defs into a shared `rippleNegWord` function (a larger
    refactor; the 4 proofs are correct/kernel-checkable as-is).
- **Execution Layer specs** (`EvmAsm/EL/`): Pure Lean specs for Ethereum
  data structures, independent of RISC-V. Currently:
  - `EL/RLP/` — RLP encoding/decoding with round-trip proofs (`decide`)
- **Byte-level infrastructure** (`ByteOps.lean`): `extractByte`/`replaceByte`
  algebra, `generic_lbu_spec` and `generic_sb_spec` CPS specs bridging
  byte-addressable operations to word-level separation logic assertions.
- **`divmod_addr` simp/grind set** (`Evm64/DivMod/AddrNorm.lean`, `AddrNormAttr.lean`,
  issue #263): Registers atomic `signExtend12`/`<<<`/`BitVec.toNat` evaluations
  on every concrete offset used in DivMod (33 signExtend12 offsets, 5 word
  shifts, 11 BitVec 6 toNat values) as `@[divmod_addr, grind =]` facts. The
  `divmod_addr` tactic macro closes address-arithmetic equalities grind-first,
  simp+bv_omega-fallback. First migration: 4 lemmas in `LoopComposeN3.lean`.
  Conventions, layout patterns, empirical justification, rules of thumb, and
  rollout roadmap are documented in `GRIND.md` (single source of truth for
  simp/grind-set conventions; `CLAUDE.md` and `AGENTS.md` link to it).
- **`rv64_addr` simp/grind set** (`Rv64/AddrNorm.lean`, `AddrNormAttr.lean`,
  GRIND.md Phase 3): Rv64-wide counterpart to `divmod_addr`. Registers ~47
  atomic facts (29 `signExtend13` evaluations + 19 `signExtend21` evaluations
  + `word_zero_add` / `word_add_zero` identities) as `@[rv64_addr, grind =]`.
  The `rv64_addr` tactic macro tries `grind` first and falls back to
  `simp only [rv64_addr, BitVec.add_assoc]; rfl` — subsumes the legacy
  `bv_addr`. Inline `rw [show signExtend1? N = <const> from by decide]`
  migration complete across DivMod / SignExtend / Shift / Byte (82 sites
  across 12 files under PRs #385 / #388 / #390 / #392 / #395).
- **`reg_ops` simp/grind set** (`Rv64/RegOps.lean`, `RegOpsAttr.lean`,
  GRIND.md Phase 5): Registers ~40 projection lemmas (`pc_set<F>`,
  `code_set<F>`, `getReg_setPC`, `getMem_set<F>`, `committed_*`,
  `publicValues_*`, `privateInput_*` + `_append{Commit,PublicValues}`)
  from `Basic.lean` with `@[reg_ops, grind =]` via `attribute` commands.
  The `reg_ops` tactic closes projection chains in one line. Inductive
  `*_writeWords` / `*_writeBytesAsWords` variants deliberately excluded
  to avoid grind-loop risk on open-ended lists.
- **Opcode-subroutine template** (`Evm64/OPCODE_TEMPLATE.md`, issue #313):
  Day-one conventions for the next opcode subtree — parallel
  `LimbSpec/` / `LoopDefs/` / `Compose/` layout, unified Bool/Fin dispatch
  from day one (no `<Opcode>Skip.lean` + `<Opcode>Addback.lean`
  intermediates), sibling-opcode (SMOD/ADDMOD) factoring, `@[irreducible]`
  bundling for ≥3 `let` bindings or >20-atom frames, named
  `Compose/Offsets.lean` with drift checks, per-opcode `AddrNorm` +
  `AddrNormAttr` files, `structure <Opcode>Valid` validity bundle,
  pre-opcode audit checklist, reviewer checklist. Linked from `AGENTS.md`.
- **`LoopDefs/{Iter,Post,Bundle}.lean` split** (`Evm64/DivMod/LoopDefs/`,
  issue #312): Monolithic 1,359-line `LoopDefs.lean` split into three
  focused sub-files — `Iter.lean` (pure `Word`/`Prop` computations),
  `Post.lean` (Assertion-valued postconditions), `Bundle.lean`
  (Assertion-valued preconditions). `LoopDefs.lean` reduced to a 16-line
  hub that re-exports the three sub-files, so every downstream
  `import EvmAsm.Evm64.DivMod.LoopDefs` works unchanged. Follow-on work
  on `LimbSpec.lean` (still 2,992 lines) pending.
- **File-size guardrail** (`scripts/check-file-size.sh`, issue #314): CI step
  enforcing per-file line caps (1200 for `Compose/**`, 1500 elsewhere; `Program.lean`
  exempt). Files may opt out with a `-- file-size-exception: <reason>` comment in
  the first 20 lines. 6 oversize files grandfathered with exception comments
  pointing to their tracking issues (#312, #283, #266). Documented in `AGENTS.md`
  ("File-size guardrail") and `CONTRIBUTING.md`.
- **LP64 Calling Convention** (`Evm64/CallingConvention.lean`): LP64-aligned
  calling convention for the x0–x12 register subset, per zkvm-standards.
  - x1 (ra) = return address, x2 (sp) = call stack (grows down, callee-saved)
  - x10-x11 (a0-a1) = args/return values, x12 (a2) = EVM stack pointer
  - Program snippets: `cc_ret`, `cc_prologue` (16-byte frame), `cc_epilogue`
  - Proved specs: `callNear_spec`, `callFar_spec`, `ret_spec`, `ret_spec'`,
    `cc_prologue_spec`, `cc_epilogue_spec`,
    `callNear_function_spec` (call+return round-trip),
    `nonleaf_function_spec` (prologue+body+epilogue composition)
  - All new subroutines (handlers, RLP, interpreter) should use this convention.
    The older DivMod ad-hoc convention (x2 as return address) is legacy.

---

## Pending: Recreate Deleted Spec Files

Five Evm64 spec files were deleted because their CodeReq migration was
incomplete (manual `cpsTriple_seq_perm_same_cr` calls lacked the `hd :
cr1.Disjoint cr2` argument added during the migration, and CR tree shapes
didn't match goals). The program definitions and tests remain in the
corresponding non-Spec files.

### Files to recreate (by priority)

#### ~~1. StackOps.lean — POP, PUSH0, DUP1-16, SWAP1-16~~ ✅ DONE

- **Files**: `Evm64/Pop.lean`, `Evm64/Push0.lean`, `Evm64/Dup.lean`, `Evm64/Swap.lean`
  (modular split; shared infra in `Stack.lean`)
- **Programs**: `evm_pop` (1 instr), `evm_push0` (5), `evm_dup(n)` (9), `evm_swap(n)` (16)
- **Specs**: All fully proved (0 sorry). Three-level hierarchy per opcode:
  low-level (explicit limbs) → EvmWord → stack (evmStackIs).
- **Pattern**: POP/PUSH0 use `CodeReq.ofProg` + `runBlock`. DUP/SWAP use
  explicit `CodeReq` union chains (symbolic `n` prevents `ofProg` whnf) with
  `runBlock` manual mode handling monotonicity via `buildMonoProof`'s
  union-split support. Per-limb helpers (`dup_pair_spec`, `swap_limb_spec`)
  use `runBlock` auto mode.
- **Shared infra** added to `Stack.lean`: `signExtend12_ofNat_small`,
  `evmStackIs_split_at`, `EvmWord.getLimb_zero`, `signExtend12_neg32`.

#### ~~2. ShiftSpec.lean — SHR per-limb, phase, body specs~~ ✅ DONE

- **Files**: `Evm64/Shift/LimbSpec.lean` (SHR per-limb + phase + body specs),
  `Evm64/Shift/Compose.lean` (`shrCode` + subsumption + composition),
  `Evm64/Shift/Semantic.lean` (stack-level `evm_shr_stack_spec`).
- **Status**: Fully proved (0 sorry). Per-limb helpers (`shr_merge_limb_spec`,
  `shr_last_limb_spec`, `shr_ld_or_acc_spec`, `shr_last_limb_inplace_spec`),
  phase specs (`shr_cascade_step_spec`, `shr_phase_c_spec`,
  `shr_phase_a_code_spec`), body specs (`shr_body_{0,1,2,3}_spec`), and
  zero path (`shr_zero_path_spec`) all recreated under the new
  `CodeReq` + `runBlock` conventions. Mirrors items #3 (ShlSpec) and
  #4 (SarSpec) below.

#### ~~3. ShlSpec.lean — SHL per-limb + body specs~~ ✅ DONE

- **Files**: `Evm64/Shift/ShlSpec.lean` (per-limb + body),
  `Evm64/Shift/ShlCompose.lean` (composition + bridge lemmas),
  `Evm64/Shift/ShlSemantic.lean` (stack-level `evm_shl_stack_spec`)
- **Bridge lemmas** in `Evm64/Basic.lean`: `getLimb_shiftLeft`,
  `getLimb_shiftLeft_eq_div`, `getLimb_shiftLeft_low` — connect per-limb body
  outputs to `getLimb (value <<< n)`, using `extractLsb'_split_64`
- **Composition**: mirrors SHR `Compose.lean` with `shlCode`, subsumption lemmas,
  zero-path specs (`evm_shl_zero_high_spec`, `evm_shl_zero_large_spec`),
  and body-path composition (`evm_shl_body_evmWord_spec`)
- **Stack-level spec**: `evm_shl_stack_spec` — zero axioms, zero sorry

#### ~~4. SarSpec.lean — SAR per-limb + body + sign-fill + composition + stack-level specs~~ ✅ DONE

- **Files**: `Evm64/Shift/SarSpec.lean` (per-limb + body + sign-fill),
  `Evm64/Shift/SarCompose.lean` (composition + bridge lemmas),
  `Evm64/Shift/SarSemantic.lean` (stack-level `evm_sar_stack_spec`)
- **Bridge lemmas** in `Evm64/Basic.lean`: `getLimb_sshiftRight_eq_ushiftRight`,
  `getLimb_sshiftRight_last`, `getLimb_sshiftRight_sign'`,
  `getLimb_sshiftRight_geq_256`, `getLimb_fromLimbs_const` — connect per-limb
  body outputs to `getLimb (sshiftRight value n)`
- **Composition**: mirrors SHR `Compose.lean` with `sarCode`, subsumption lemmas,
  sign-fill specs (`evm_sar_sign_fill_high_spec`, `evm_sar_sign_fill_large_spec`),
  SAR Phase C dispatch (`sar_phase_c_spec_pure`), and body-path composition
  (`evm_sar_body_evmWord_spec`)
- **Stack-level spec**: `evm_sar_stack_spec` — zero axioms, zero sorry
- **Key difference from SHR/SHL**: Sign-fill path (all limbs = `sshiftRight(v[3], 63)`)
  replaces zero-path; SRA instruction for MSB limb; sign extension for vacated limbs

#### ~~5. ByteSpec.lean — BYTE per-body + store + phase B specs~~ ✅ DONE

- **Files**: `Evm64/Byte/Spec.lean` (stack-level `evm_byte_stack_spec`, 3-way case split),
  `Evm64/Byte/LimbSpec.lean` (per-body + cascade dispatch specs),
  `Evm64/Byte/Program.lean` (45-instruction program + tests)
- **Specs**: `evm_byte_zero_high_spec`, `evm_byte_zero_geq32_spec`,
  `evm_byte_body_evmWord_spec`, `evm_byte_stack_spec` — all proved, 0 sorry
- **Pattern**: Uses `CodeReq.ofProg_mono_sub` for subsumption, cascade dispatch
  with frame and consequence rules, evmWordIs abstraction for stack-level spec

### General recreation guidelines

- Use `runBlock` auto mode wherever possible (handles CR extension, address
  normalization, and composition automatically).
- For manual compositions with different CRs, use `cpsTriple_seq_perm`
  with `(by crDisjoint)` for the `hd` argument, or extend to a common CR
  first and use `_same_cr` variants (`cpsTriple_seq_perm_same_cr`).
- All `_code` abbrevs should be `CodeReq` — prefer `CodeReq.ofProg base prog`
  over chains of `singleton.union`. See MultiplySpec.lean for the current pattern.
- Theorem statements use 5-arg `cpsTriple base exit cr P Q` with no
  `instrAt` atoms in P or Q.
- Reference the existing working specs (And.lean, Add.lean, MultiplySpec.lean,
  DivModSpec.lean) for the correct patterns.

---

## Roadmap: Phases 1-6 (Opcode Implementation)

All phases below target **Evm64** primarily. Files are under `EvmAsm/Evm64/`.

### ~~Phase 1: Complete Comparisons~~ — DONE

#### ~~1.1 SLT (Signed Less Than)~~ ✅
- **Files**: `Evm64/Comparison.lean` (helpers: `beq_eq_spec`, `beq_ne_spec`, `slt_msb_load_spec`) + `Evm64/Slt.lean`
- **Approach**: Compare MSB limbs (limb 3) with signed SLT instruction.
  If equal, fall through to unsigned borrow chain on lower 3 limbs.
  Uses `by_cases` on MSB equality for deterministic paths + `runBlock`.
- 25 instructions = 100 bytes. Also added `slt_spec_gen` to `SyscallSpecs.lean`.

#### ~~1.2 SGT (Signed Greater Than)~~ ✅
- **Files**: `Evm64/Comparison.lean` + `Evm64/Sgt.lean`
- **Approach**: SGT(a,b) = SLT(b,a). Swap operand load order (b-limbs into x7, a-limbs into x6).
- 25 instructions = 100 bytes. Mirrors SLT proof structure exactly.

### ~~Phase 2: Remaining Shifts & Bitwise~~ — DONE

> **Note**: Phases 2.1–2.3 were originally proved, then deleted in commit
> `1197924` due to incomplete CodeReq migration, then fully recreated.
> All specs are now proved with 0 sorry.

#### ~~2.1 SHL (Shift Left)~~ ✅
- **Files**: `Evm64/Shift/ShlSpec.lean` (per-limb + body), `Evm64/Shift/ShlCompose.lean`
  (composition + bridge lemmas), `Evm64/Shift/ShlSemantic.lean` (stack-level `evm_shl_stack_spec`)
- 90 instructions = 360 bytes. All specs proved, 0 sorry.

#### ~~2.2 SAR (Shift Right Arithmetic)~~ ✅
- **Files**: `Evm64/Shift/SarSpec.lean` (per-limb + body + sign-fill),
  `Evm64/Shift/SarCompose.lean` (composition + bridge lemmas),
  `Evm64/Shift/SarSemantic.lean` (stack-level `evm_sar_stack_spec`)
- 95 instructions = 380 bytes. All specs proved, 0 sorry.

#### ~~2.3 BYTE (Extract byte from word)~~ ✅
- **Files**: `Evm64/Byte/Spec.lean` (stack-level `evm_byte_stack_spec`),
  `Evm64/Byte/LimbSpec.lean` (per-body + cascade dispatch), `Evm64/Byte/Program.lean`
- 45 instructions = 180 bytes. All specs proved, 0 sorry.

### Phase 3: Stack Extensions

#### ~~3.1 DUP1-16 and SWAP1-16 (Generic)~~ ✅
- **Files**: `Evm64/Pop.lean`, `Evm64/Push0.lean`, `Evm64/Dup.lean`, `Evm64/Swap.lean`
- **Approach**: `evm_dup (n : Nat)` and `evm_swap (n : Nat)` as generic
  Lean functions producing `Program`. 9 instructions for DUP, 16 for SWAP.
  Full spec hierarchy: low-level (explicit limbs) → evmWordIs → evmStackIs.
  Added `signExtend12_ofNat_small` and `evmStackIs_split_at` to `Stack.lean`.
- Covers 34 opcodes (POP, PUSH0, DUP1-16, SWAP1-16) with one proof each. Fully proved.

#### 3.2 PUSH1-32
- **File**: `Evm64/StackOps.lean`
- **Approach**: Requires EVM bytecode parsing. Push immediate from EVM code
  region. Read 1-32 bytes from code[pc+1..pc+n], zero-extend to 256 bits,
  push onto stack.
- **Depends on**: EVM code region model (Phase 5.1)

### Phase 4: Remaining Arithmetic

#### ~~4.1 MUL (256-bit Multiply)~~ ✅
- **Files**: `Evm64/Multiply.lean` (program + 16 tests)
- **Approach**: Schoolbook 4×4 limb column-wise multiplication using RV64 MUL/MULHU.
  Column j processes b[j] × a[0..3-j]. After column j, result[j] is finalized.
  Carry detection via SLTU after ADD. Intermediate r[3] accumulator spilled to memory
  (reusing freed a-limb slots). Added `sltu_spec_gen_rd_eq_rs2` to SyscallSpecs.lean.
  Fixed operator precedence bug in rv64_mulhu/rv64_mulh/rv64_mulhsu (`>>>` binds tighter than `*`).
- 63 instructions = 252 bytes. All specs proved, 0 sorry.
  Manual-mode `runBlock` with column decomposition (col0: 21, col1: 23, col2: 13, col3: 5, epilogue: 1).
  Added `mul_spec_gen_rd_eq_rs1`, `mulhu_spec_gen_rd_eq_rs1`, `sltu_spec_gen_rd_eq_rs2` to SyscallSpecs.

#### 4.2 DIV and MOD — in progress (program + specs + composition in progress)
- **Files**: `Evm64/DivMod.lean` (program + tests), `Evm64/DivModSpec.lean` (CPS specs),
  `Evm64/DivModCompose.lean` (hierarchical composition)
- **Approach**: Knuth Algorithm D in base 2^64. 316 instructions total (21 phases
  + 49-instr div128 subroutine + NOP separator). DIV and MOD share 95% of code,
  differ only in epilogue (load quotient vs remainder).
- **Status**: 69 CPS specs proved in LimbSpec.lean (0 sorry). All building
  blocks for every phase. div128 subroutine fully specified in composable blocks
  including phase1, step1 (init+clamp_q1+prodcheck1), compute_un21, step2
  (init+clamp_q0+prodcheck2), end. Branch merge specs for BEQ/BLTU patterns.
  Composed per-limb specs: mulsub_limb (11 instrs), addback_limb (8 instrs),
  trial_load (12 instrs), store_qj (4 instrs).
  Hierarchical composition using progAt to avoid WHNF scaling limit:
  - `divCode`/`modCode` split `progAt base evm_div/evm_mod` into 14 per-phase progAt blocks
  - `divCode_mid` (12 blocks excl phaseA+zeroPath), `divCode_noAB` (12 blocks excl phaseA+phaseB)
  - `progAt_divK_phaseB_at32`: pre-normalized phaseB expansion (21 instrAt atoms at base+K offsets)

  **Completed compositions (0 sorry):**
  - `evm_div_bzero_spec` (b=0 path): phaseA → BEQ taken → zeroPath ✅
  - `evm_div_phaseA_ntaken_spec` (b≠0): phaseA body → BEQ ntaken → base+32 ✅
  - `evm_div_phaseB_n4_spec` (b[3]≠0): init1→init2→ADDI→BNE(taken)→tail, 16 instrs ✅
  - `evm_div_phaseAB_n4_spec` (b≠0, b[3]≠0): phaseA+phaseB composed, 24 instrs, base→base+116 ✅
  - `evm_mod_bzero_spec` (b=0 path): same as div but with modCode ✅
  - `evm_mod_phaseA_ntaken_spec` (b≠0): same as div but with modCode ✅

  - Phase B cascade variants ✅: n=3, n=2, n=1 all composed (0 sorry)
    - `evm_div_phaseB_n3_spec` (b[3]=0, b[2]≠0): 18 instrs, x5=b[2], n=3
    - `evm_div_phaseB_n2_spec` (b[3]=b[2]=0, b[1]≠0): 20 instrs, x5=b[1], n=2
    - `evm_div_phaseB_n1_spec` (b[3]=b[2]=b[1]=0): 21 instrs (full phaseB), x5=b[0], n=1
    - 5 singleton subsumption lemmas for cascade step instructions (indices 11-15 of phaseB)
  - CLZ (Count Leading Zeros) ✅: 24 instructions, 6-stage binary search
    - `divK_clz_spec` at base+116→base+212, `clzResult` function for postcondition
    - Combined stage specs avoid exponential branching via conditional postconditions

  - Phase C2 ✅: shift check cpsBranch (4 instrs, base+212)
    - `divK_phaseC2_ntaken_spec` (shift≠0 → normB), `divK_phaseC2_taken_spec` (shift=0 → copyAU)
  - NormB ✅: normalize divisor (21 instrs, base+228), `divK_normB_full_spec`
  - NormA ✅: normalize dividend (21 instrs + JAL, base+312→base+432), `divK_normA_full_spec`
  - CopyAU ✅: copy a[]→u[] (9 instrs, base+396), `divK_copyAU_full_spec`

  - LoopSetup ✅: cpsBranch (4 instrs, base+432)
    - `divK_loopSetup_ntaken_spec` (m≥0 → loop body), `divK_loopSetup_taken_spec` (m<0 → denorm)
  - DIV Epilogue ✅: load q[0..3] + store to output (10 instrs, base+1004), `divK_div_epilogue_spec`

  **Full path compositions:**
  - LoopBody (main Knuth D loop): 114 instructions at base+448
    - 20 sorry-free theorems in `LoopBody.lean` + N-specific variants in `LoopBodyN{1,2,3,4}.lean`
    - `intro_lets` tactic added for selective let-binding expansion (xperm scaling fix)
    - Per-case concrete specs were in `LoopBodyN{X}Concrete.lean` (removed, see semantic path below)
  - Per-n full specs: removed (existentially quantified computation results → not useful)
  - Stack-level b≠0 specs: TODO (needs semantic correctness bridge first)

  **Remaining work (semantic correctness):**
  - Multi-limb arithmetic foundations: `MultiLimb.lean` — half-word decomposition, rv64_divu/mulhu
    Nat-level correctness, val128/val256 representation, partial product decomposition (done)
  - Div128 mathematical foundations: `Div128Lemmas.lean` — half-word OR-combine, 128-bit Euclidean
    uniqueness, trial quotient bounds (q_true ≤ q̂ ≤ q_true + 2 when normalized) (done)
  - Multiply-subtract chain: `MulSubChain.lean` — carry/borrow propagation, 4-limb telescoping
    chain (`mulsub_chain_nat`), correction step (`mulsub_correction_eq`) (done)
  - Normalization: `Normalization.lean` — `norm_div_eq` (shifting preserves quotient),
    `norm_euclidean_bridge` (recover original q,r from normalized), `div_mod_no_overflow` (done)
  - Division bridge: `DivBridge.lean` — `bv_eq_of_nat_eq` (Nat eq → BitVec eq, auto no-overflow),
    `div_of_nat_euclidean` / `mod_of_nat_euclidean` (Nat Euclidean → EvmWord.div/mod), `div_from_mulsub` (done)
  - N=4 case lemmas: `DivN4Lemmas.lean` — quotient bound (≤1 when MSB set), q=0/q=1 subcases,
    MSB → hi32 normalization condition, val256 positivity (done)
  - CLZ correctness: `CLZLemmas.lean` — `clz_zero_imp_msb` (shift=0 → val ≥ 2^63),
    `msb_imp_clz_zero` (converse), `clzResult_fst_eq_zero_iff` (biconditional),
    algebraic proof via `clzStep` abstraction with stage bound chain (done)
  - Limb bridge: `DivLimbBridge.lean` — OR-reduce nonzero → val256 > 0 / fromLimbs ≠ 0,
    per-limb val256 lower bounds (n=1: ≥1, n=2: ≥2^64, n=3: ≥2^128, n=4: ≥2^192) (done)
  - Per-limb mulsub: `DivMulSubLimb.lean` — `mulhu_toNat_le` (MULHU ≤ 2^64-2),
    `mulsub_limb_nat_eq` (per-limb carry equation from register ops),
    `mulsub_carry_word_eq` (Word carry = Nat carry when < 2^64),
    `mulsub_4limb_euclidean_div` (4-limb chain → EvmWord.div/mod for single-digit quotient) (done)
  - Per-limb addback: `DivAddbackLimb.lean` — `addback_limb_nat_eq` (per-limb carry equation),
    `addback_4limb_val256` (4-limb addition chain), `addback_correction_euclidean`
    (mulsub underflow + addback → corrected Euclidean with q-1) (done)
  - Remainder bound: `DivRemainderBound.lean` — `remainder_lt_of_ge_floor` (key: Euclidean eq +
    overestimate → exact quotient + remainder < divisor), `mulsub_no_underflow_correct` (happy path),
    `mulsub_addback_correct` (addback path), `val256_euclidean_to_div_mod` (val256 → EvmWord bridge),
    `norm_euclidean_correct` (normalization round-trip) (done)
  - Quotient accumulation: `DivAccumulate.lean` — multi-iteration telescoping (`iter_accumulate_{2,3,4}`),
    val256 trailing-zero simplifications, per-n end-to-end `div_correct_n{1,2,3,4}_no_shift`,
    `div_quotient_of_normalized` / `mod_remainder_of_normalized` (shift bridge),
    `div_of_val256_eq_div` / `mod_of_val256_eq_mod` (val256 → EvmWord),
    `div_correct_normalized` / `mod_correct_normalized` (combined normalization bridge) (done)
  - Mulsub carry strict bound: `DivMulSubCarry.lean` — `mulsub_limb_carry_strict_lt` (per-limb carry
    always < 2^64, proven via case analysis on MULHU maximum), `mulsub_limb_word_carry_eq` (Word carry
    = Nat carry, unconditional), `mulsub_limb_nat_word_eq` (per-limb equation with Word carry_out),
    `mulsub_register_4limb_val256` (4-limb register ops → val256 Euclidean equation) (done)
  - Addback carry bridge: `DivAddbackCarry.lean` — `or_toNat_eq_add_of_le_one` (OR = ADD for {0,1}
    Words), `addback_carries_exclusive` (two overflow flags can't both fire), `addback_limb_nat_word_eq`
    (per-limb addback with OR carry), `addback_register_4limb_val256` (4-limb addback → val256) (done)
  - Stack spec bridge: `DivLimbBridge.lean` — `ne_zero_iff_getLimbN_or` (EvmWord nonzero ↔ limbs OR
    nonzero), `getLimbN_fromLimbs_match` / `getLimbN_fromLimbs_{0,1,2,3}` (fromLimbs round-trip for
    reconstructing evmWordIs from individual memory cells) (done)
  - **Semantic correctness path:**
    - Step 1: Make `loopBodyPostN{1,2,3,4}` parametric — move output values to definition
      parameters so per-case concrete specs can fill them in concretely.
      Status: ✅ Done (PRs #197 + #202)
    - Step 2: Per-n loop iteration cpsTriple specs using `divK_store_loop_j0_spec` (j=0)
      and `divK_store_loop_jgt0_spec` (j>0). Four raw specs per (n, j) pair
      (max_skip, max_addback, call_skip, call_addback), then unified skip/addback
      into `divK_loop_body_nX_{max,call}_unified_jY_spec`.
      Status:
        - ✅ n=4: j=0 all 4 paths done (`LoopIterN4.lean`)
        - ✅ n=3: j=0 all 4 paths + j=1 all 4 paths + unified specs (`LoopIterN3.lean`, `LoopComposeN3.lean`)
        - ✅ n=2: j=0,j=1,j=2 all 4 paths + unified specs (`LoopIterN2.lean`, `LoopComposeN2.lean`)
        - ✅ n=1: j=0,j=1,j=2,j=3 all 4 paths + unified specs (`LoopIterN1.lean`, `LoopComposeN1.lean`)
    - Step 2b: **Bool-parameterized loop composition** (Issue #262, PRs #267–#272).
      Unifies max/call branch paths via `(bltu : Bool)` parameter so that
      2^k path combinations collapse to 1 theorem.
      Status:
        - ✅ Unified defs: `iterN3`, `iterN2`, `loopIterPostN3`, `loopN3UnifiedPost`, `loopN2UnifiedPost` (`LoopDefs.lean`)
        - ✅ n=3 unified 2-iteration composition: `divK_loop_n3_unified_spec (bltu_1 bltu_0 : Bool)` (`LoopUnifiedN3.lean`)
        - ✅ n=3 unified preloop+loop: `evm_div_n3_preloop_loop_unified_spec` (`Compose/FullPathN3LoopUnified.lean`)
        - ✅ n=2 unified 3-iteration composition: `divK_loop_n2_unified_spec (bltu_2 bltu_1 bltu_0 : Bool)` (`LoopUnifiedN2.lean`)
          Layered: iter10 (4 cases) → max/call+iter10 (2 lemmas) → unified dispatch
        - ✅ n=1 unified 4-iteration composition: `divK_loop_n1_unified_spec (bltu_3 bltu_2 bltu_1 bltu_0 : Bool)` (`LoopUnifiedN1.lean`)
          5-layer composition: iter10 → max/call_iter10 → iter210 → max/call_iter210 → unified
        - `iterN2Max`/`iterN2Call` marked `@[irreducible]` to prevent stuck if-reduction in projections
        - Unified condition predicates: `isTrialN3_j1/j0`, `isTrialN1_j3/j2/j1/j0`
      Issue #262 is **complete** — Bool unification achieved for all n-values (n=1,2,3; n=4 trivial).
    - Step 3: Per-n full-path composition theorems (base→base+1064) with bundled postconditions.
      Composes pre-loop (normalization) + loop body + post-loop (denorm/epilogue).
      Status:
        - ✅ n=4 shift≠0: `evm_div_n4_full_{max,call}_{skip,addback}_spec` (`FullPathN4.lean`)
        - ✅ n=4 shift=0: `evm_div_n4_full_shift0_call_{skip,addback}_spec` (`FullPathN4Shift0.lean`)
        - ✅ n=3 shift≠0: 4 full-path theorems (`FullPathN3Loop.lean`) — can be replaced by unified version
        - ✅ n=3 shift=0: 2 full-path theorems (`FullPathN3Shift0.lean`)
        - ✅ n=2 shift≠0: unified full-path `evm_div_n2_full_unified_spec` (`FullPathN2Full.lean`, `FullPathN2Cases.lean`)
          8 per-case lemmas + unified dispatch via `delta + rfl` postcondition bridge
        - ✅ n=2 shift=0: unified full-path `evm_div_n2_full_shift0_unified_spec` (`FullPathN2Shift0.lean`)
          j=2 always call (u4=0 < b1), unified over (bltu_1 bltu_0 : Bool) for 4 combinations
        - ✅ n=1 shift≠0: unified full-path `evm_div_n1_full_unified_spec` (`FullPathN1Full.lean`)
          16-case denorm_comp with parametric denorm' helper, all 16 Bool combinations
        - ✅ n=1 shift=0: unified full-path `evm_div_n1_full_shift0_unified_spec` (`FullPathN1Shift0.lean`)
          j=3 always call (u_top=0 < b0), unified over (bltu_2 bltu_1 bltu_0 : Bool) for 8 combinations
      All n-values complete. Next:
        - MOD variants: factor shared DIV/MOD loop to avoid duplication (Issue #266)
    - Step 4: Semantic correctness bridge — connect algorithm computations to `EvmWord.div`.
      Infrastructure exists: `div_correct_n4_no_shift`, `remainder_lt_of_ge_floor`,
      `mulsub_no_underflow_correct`, `mulsub_addback_correct`, `mulsubN4_val256_eq`.
      Partial progress:
        - ✅ Max trial overestimate: `val256_div_lt_pow64` — when b3≠0, val256(a)/val256(b) ≤ 2^64-1
        - ✅ Skip path correctness: `n4_max_skip_correct` — c3=0 + max trial → EvmWord.div correct
        - **Missing math theorem (Knuth's Theorem B)**: for the addback and call paths, need:
          1. **Mulsub borrow bound**: prove that `mulsubN4` borrow c3 has `c3.toNat ≤ 1`
             when the trial quotient overestimates by ≤ 1 (i.e., q_hat ≤ ⌊u/v⌋ + 1).
             This ensures the 2^256 terms cancel in the mulsub+addback combined equation.
          2. **Call path trial quotient overestimate**: prove that `div128Quot u_top u3 v3`
             produces a quotient q̂ satisfying `⌊u/v⌋ ≤ q̂ ≤ ⌊u/v⌋ + 1` when the divisor's
             leading limb has its MSB set (normalized). This is the formal version of
             Knuth TAOCP Vol 2 §4.3.1 Theorem B.
          3. **Addback combined equation**: given c3=1 (borrow) and carry=1 (addback carry),
             derive `val256(a) = (q_hat-1) * val256(b) + val256(aun)` from `mulsubN4_val256_eq`
             + `addbackN4_val256_eq`.
      Status: In progress (`DivN4Overestimate.lean`). This is independent of Steps 2-3 and can
      proceed in parallel. Once done for n=4, the bridge generalizes to n=1,2,3 via the same
      `div_correct_normalized` framework.
    - Step 5: Stack-level spec using `evmWordIs`. Case-split on b=0/≠0, then on n,
      apply full-path spec + semantic bridge to prove `evmWordIs (sp+32) (EvmWord.div a b)`.
      Status: Not started (blocked on Steps 3+4 for all n values)

  **Path to EVM-level DIV/MOD specs (summary):**
  1. ✅ Complete n=2 loop composition with Bool unification (PRs #270–#272)
  2. ✅ Complete n=2 full-path composition (PRs #274–#277)
  3. ✅ Complete n=1 loop iteration specs + Bool-unified composition (PRs #282–#286)
  4. ✅ Complete n=1 + n=2 shift=0 and shift≠0 full-path compositions (PRs #280, #288, #289)
  5. Complete Knuth's Theorem B (Step 4) — can proceed in parallel
  5. Per-n semantic bridge: connect full-path postconditions to `EvmWord.div`/`EvmWord.mod`
  6. Stack-level spec: case-split b=0/≠0, then on n, compose full-path + semantic bridge
  7. Factor shared DIV/MOD loop (Issue #266) to derive MOD specs from DIV proofs

  **V5 track — `divK_div128_v5` migration (in progress, bead `evm-asm-wbc4i.6`):**
  The executable `evm_div`/`evm_mod` were switched to `divK_div128_v4` on
  2026-05-19 (PR #4992). The **v5** track replaces the inner div128 subroutine
  with `divK_div128_v5` — the *capped* Knuth Algorithm D variant that repairs
  v4's two buggy ULTs by clamping the trial quotients `q1c`/`q0c` at `2^32 − 1`
  — and rebuilds the spec layer against it for full-domain (n=4) correctness.
  Files under `Evm64/DivMod/`:
  - **Shared code surfaces** (`Compose/V5Code.lean`): `sharedDivModCode_v5`
    and `sharedDivModCodeNoNop_v5` mirror the v4 surfaces (`Compose/Base.lean`,
    `Compose/V4NoNop.lean`), swapping in `divK_div128_v5` at block 12. These are
    the `CodeReq`s for `div128_v5_spec_shared` / `div128_v5_spec_shared_noNop`.
    Kept in a dedicated file because `Base.lean` is at its size cap.
  - **Block specs** (`LimbSpec/Div128*V5.lean`): `Div128CapV5` (cap clamp),
    `Div128Phase1bV5` (prodcheck1;;prodcheck1b body), `Div128Phase2bV5`
    (phase2b body + the two D3 spill blocks), `Div128Step1V5` / `Div128Step1FullV5`
    (step1 prefix+guard+full block spec), `Div128Step2V5` / `Div128Step2FullV5`
    (step2 prefix init+cap_q0+guard+full block spec). Built up over the
    V5.6.5–V5.6.12 commit series.
  - **n=4 dispatcher predicates + runtime chain** (`Spec/CallAddbackV5.lean`,
    `Spec/CallAddbackRuntimeV5.lean`, `LoopBody/TrialCallV5.lean`,
    `LoopDefs/IterV5.lean`): mirror the v4 `CallAddback` dispatcher predicates
    and the `hq_over`-assuming n4 runtime chain to v5.
  - **Proof-scaling discipline**: V5 follows the fold-before-`xperm` guidance in
    `AGENTS.md` — `@[irreducible]` `…PostCore` / `…PostFromBody` helpers behind
    the postcondition spine, refold via `change` in branch bridges, named lemmas
    for fresh heartbeat budgets. See `GRIND.md` for the simp/grind conventions.

Before starting **any** of the remaining arithmetic opcodes below (SDIV,
SMOD, ADDMOD, MULMOD, EXP), read
[`EvmAsm/Evm64/OPCODE_TEMPLATE.md`](EvmAsm/Evm64/OPCODE_TEMPLATE.md) —
it codifies the day-one conventions distilled from the DivMod retrofit
experience (parallel `LimbSpec/` / `LoopDefs/` / `Compose/` layout,
unified Bool/Fin dispatch from day one, sibling-opcode factoring,
`@[irreducible]` bundling thresholds, named `Compose/Offsets.lean`,
per-opcode `AddrNorm` grindset, `structure <Opcode>Valid` validity
bundle). Tracked by issue #313.

#### 4.3 SDIV and SMOD (Signed)
- **Approach**: Check signs, compute unsigned div/mod, apply sign correction.
- **Per OPCODE_TEMPLATE.md**: SMOD is a sign-sibling of SDIV; layout the
  files with a shared body + per-sibling epilogue split from the first PR
  (do not copy DIV's retrofit-style parallel MOD clone).

#### 4.4 ADDMOD and MULMOD
- **Approach**: ADDMOD needs 257-bit intermediate (carry). MULMOD needs
  512-bit intermediate. Both reuse DIV/MOD.

#### 4.5 EXP (Exponentiation)
- **Approach**: Square-and-multiply using MUL. Loop over exponent bits.

#### ~~4.6 SIGNEXTEND~~ ✅
- **Files**: `Evm64/SignExtend/` — `Program.lean` (program + 16 tests), `LimbSpec.lean` (per-body + phase A/B/C specs),
  `Compose.lean` (subsumption + no-change + body path composition), `Spec.lean` (stack-level `evm_signextend_stack_spec`)
- **Approach**: If b >= 31, result = x. Else compute limb_idx = b/8, shift_amount = 56 - (b%8)*8.
  Cascade dispatch to body_N: SLL+SRA sign-extends target limb in-place, SRAI fills higher limbs.
  Shares Phase B computation with BYTE opcode. `EvmWord.signextend` definition + per-limb bridge lemmas in `EvmWordArith.lean`.
- 48 instructions = 192 bytes. All specs proved, 0 sorry. Axiom-clean.

### Phase 5: Memory & Code Region

#### 5.1 EVM Code Region Model
- **File**: `Evm64/CodeRegion.lean` (new)
- **Approach**: Define EVM bytecode as a byte array in RISC-V memory.
  Use LBU for byte access. Define `evmCodeIs(base, bytes)` assertion.
  Needed for PUSH1-32, and for the interpreter loop (Phase 7).

#### 5.2 EVM Memory Model
- **File**: `Evm64/Memory.lean` (new)
- **Approach**: EVM memory as a byte-addressable region in RISC-V memory.
  Use LB/SB/LBU for byte access. Define `evmMemIs` assertion.
  Zero-initialized, auto-expanding (model fixed max size initially).

#### 5.3 MLOAD, MSTORE, MSTORE8, MSIZE
- **File**: `Evm64/Memory.lean`
- **Approach**: MLOAD pops offset, loads 32 bytes, pushes word.
  MSTORE pops offset+value, stores 32 bytes. MSTORE8 stores 1 byte.
  MSIZE pushes current memory size (track in register or memory).

### Phase 6: Environment & Block Context

#### 6.1 Environment Context Layout
- **File**: `Evm64/Environment.lean` (new)
- **Approach**: Memory layout for EVM execution context:
  - msg.caller, msg.value, msg.data (calldata)
  - block.number, block.timestamp, block.basefee, etc.
  - tx.origin, tx.gasprice, chainid
  Store at known base address. Define `envIs` separation logic assertion.

#### 6.2 Simple Environment Opcodes
- ADDRESS, CALLER, CALLVALUE, ORIGIN, GASPRICE, COINBASE, TIMESTAMP,
  NUMBER, CHAINID, BASEFEE, SELFBALANCE, CODESIZE, RETURNDATASIZE
- Each is LD × 4 from environment region + push to stack.

#### 6.3 CALLDATALOAD, CALLDATASIZE, CALLDATACOPY
- Load from calldata region in environment.

---

## Execution Layer Prerequisites

The STF (Phase 11) reads RLP-encoded blocks via `read_input`. These
prerequisites provide the pure spec and RISC-V infrastructure for that.

### EL.1 RLP Specification ✅
- **Files**: `EvmAsm/EL/RLP/Basic.lean`, `Decode.lean`, `Properties.lean`
- `RLPItem` type (bytes | list), `encode`, `decode` with canonical enforcement
- 17 kernel-verified properties via `decide` (round-trip, spec conformance)
- 0 sorry, 0 axioms
- ✅ **General byte-string round-trip (parametric, not `decide`)** (`Properties.lean`).
  `decode_encode_bytes (data) (hlen : data.length < 256^8) :
  decode (encode (.bytes data)) = some (.bytes data, [])` — every leaf `RLPItem`
  re-decodes to itself, for all lengths the decoder's 8-byte length field
  supports. Short form (`≤ 55`) is unconditional. Supporting new foundations:
  `Nat.fromBytesBE_toBytesBE` (big-endian round-trip), `Nat.fromBytesBE_snoc`,
  `Nat.toBytesBE_succ`, `Nat.toBytesBE_length_le` (`len < 256^k → length ≤ k`),
  `Nat.toBytesBE_eq_cons_of_pos` (nonzero leading byte / no-leading-zero),
  `readLength_toBytesBE_append`, `encodeBytes_long_of_length`. All Lean-core
  (no Mathlib): `Nat.toBytesBE.induct`, `omega`, `List.take_left`/`drop_left`.
  Axiom-clean (propext/Classical.choice/Quot.sound), 0 sorry. This is the leaf
  half of the round-trip keystone.
- ✅ **Full round-trip keystone (`decode (encode item) = some (item, [])`)**
  (`Properties.lean`). `decode_encode (item) (h : (encode item).length < 256^8)`
  re-decodes *every* `RLPItem` (lists + nesting) to itself; `decodeFully_encode`
  discharges `FullDecode.decodeFully_encode_of_decode_encode`'s `h_roundtrip`
  hypothesis, making full decode of any (length-bounded) encoded item
  unconditional. Key idea: both `decodeAux`/`decodeItems` recurse on the fuel
  `nDepth`, so `decode_encode_mutual` is proved by a single **step induction on
  `nDepth`** (an `decodeAux`-on-`encode` statement ∧ a `decodeItems`-on-
  `encodeItems` statement) — the IH at `nDepth-1` covers all sub-items, so no
  induction on `RLPItem` is needed. Reuses the leaf round-trip via the new
  fuel-parametric `decodeAux_succ_encodeBytes_append`, plus `encode_list_short`/
  `_long`, `decodeItems_succ_of_ne_nil`, `takeBytes_append_length`,
  `le_encodeBytes_length`. A single top-level `< 256^8` bound implies every
  nested bound. Axiom-clean, 0 sorry. **The RLP encode→decode round-trip is now
  complete end-to-end.**
- ✅ **Decodability part 1 — injectivity + canonical bijection + quasi-encoding
  rejection** (`Properties.lean`). Informed by Coglio's ACL2 RLP work
  (arXiv:2009.13769), whose headline results are the encode/decode *mutual
  inverses*, injectivity, and rejection of non-canonical "quasi-encodings".
  Added: `encode_injective` (`encode i₁ = encode i₂ → i₁ = i₂`, a corollary of
  the round-trip); `Nat.toBytesBE_fromBytesBE_of_canonical` (the canonical
  inverse of the big-endian bijection — `toBytesBE (fromBytesBE bs) = bs` for a
  no-leading-zero `bs`, the foundation the right-inverse needs) +
  `Nat.fromBytesBE_pos_of_head_ne_zero`; and a documented **quasi-encoding
  rejection** section (the five ACL2 forms, each pointed at its rejecting lemma
  with `decide` cross-checks). **Now imports Mathlib** (`List.reverseRecOn`,
  `positivity`) rather than re-deriving list/arithmetic facts. Axiom-clean
  (classical), 0 sorry.
- ✅ **Decodability part 2 — the full right inverse** (`Properties.lean`).
  `decode_eq_some_imp_encode : decode bs = some (item, rest) → bs = encode item ++ rest`
  — whatever `decode` accepts re-encodes to exactly the bytes consumed, so the
  decoder accepts *only* canonical encodings (the ACL2
  `rlp-encode-tree-of-rlp-parse-tree` analogue, the property whose failure hid a
  real decoder bug in Coglio's work). Proved by `decode_right_inverse_mutual`
  (step induction on the fuel `nDepth`, same shape as `decode_encode_mutual`):
  byte classes are standalone `decodeAux_{singleByte,shortBytes,longBytes}_right_inv`
  lemmas; list classes recurse through the IH. Reconstruction helpers
  `takeBytes_eq_some_imp`, `readLength_eq_some_imp` (the latter exposing the
  canonical length field via `toBytesBE_fromBytesBE_of_canonical`). Capstone
  `decodeFully_eq_encode : decodeFully bs = some item ↔ bs = encode item`
  (within the 8-byte bound) — full decode is *exactly* the inverse of `encode`.
  Axiom-clean (classical), 0 sorry. **RLP decodability is now complete: both
  inverses + injectivity + quasi-encoding rejection, to parity with the ACL2
  formalization.**
- ✅ **Self-delimiting encoding / prefix-unambiguity** (`Properties.lean`).
  `decode_encode_append` (left inverse with arbitrary trailer), `encode_append_cancel`
  (`encode i₁ ++ r₁ = encode i₂ ++ r₂ → i₁ = i₂ ∧ r₁ = r₂`), and
  `encode_prefix_unambiguous` (`encode i₁ <+: encode i₂ → i₁ = i₂`, the ACL2
  `rlp-encode-tree-unamb-prefix` analogue — no valid encoding is a proper prefix
  of another). All fall out of the two inverses. Axiom-clean, 0 sorry.
- ✅ **RLP scalar layer** (`EvmAsm/EL/RLP/Scalar.lean`, ACL2 `rlp-encode-scalar`).
  `encodeScalar n := encodeBytes (Nat.toBytesBE n)` (nonneg int → minimal
  big-endian bytes → byte string; nonces/balances/gas/lengths) and `decodeScalar`.
  `decodeScalar_encodeScalar` (round-trip), `encodeScalar_injective`, and
  `decodeScalar_eq_some_imp` (right inverse) — all reuse `decode_encode_bytes` +
  the `Nat.fromBytesBE/toBytesBE` bijection. Axiom-clean, 0 sorry. **Pure-spec RLP
  is now at full ACL2 parity (tree + scalar, both inverse directions).**

### EL.2 Byte-Level Infrastructure ✅
- **File**: `EvmAsm/Rv64/ByteOps.lean`
- `extractByte`/`replaceByte` algebra (round-trip, independence, overwrite)
- `generic_lbu_spec`: CPS spec for LBU in terms of `extractByte` on containing dword
- `generic_sb_spec`: CPS spec for SB in terms of `replaceByte` on containing dword

### EL.3 RLP RISC-V Decoder (in progress)
- **Files**: `EvmAsm/Rv64/RLP/`
- Phase 1: Prefix classifier (cascade BLTUs, 5 exits) — ✅ all three variants landed
  - `rlp_phase1_step_spec` (per-step with pure ult fact),
    `rlp_phase1_step_spec_plain` (strips pure facts),
    `rlp_phase1_step_spec_acc` (frames with accumulator, merges into single `⌜Acc ∧ …⌝`).
  - `rlp_phase1_classifier_spec` — plain 5-exit `cpsNBranch` at boundaries
    0x80, 0xB8, 0xC0, 0xF8 (no dispatch facts).
  - `rlp_phase1_classifier_spec_pure` — per-step dispatch facts at each
    exit (`⌜ult v5 k_i⌝` for taken, `⌜¬ ult v5 k4⌝` for fall-through).
  - `rlp_phase1_classifier_spec_acc` — full accumulated-chain variant:
    each exit carries the complete conjunction of prior `¬ult` facts plus
    (for taken exits) the current `ult` fact. Enables downstream range
    proofs like `0x80 ≤ p < 0xB8` at exit `e2`.
- Phase 2: Length extraction — ⏳ short form + long-form accumulation step
  - `rlp_phase2_short_length_spec` (`EvmAsm/Rv64/RLP/Phase2Short.lean`):
    one-instruction `ADDI x11, x5, -k` extractor for short byte strings
    (k = 0x80) and short lists (k = 0xC0). Concrete tests verify
    0x85 → 5, 0xB7 → 55, 0xC3 → 3, 0x80 → 0 via `decide`.
  - `rlp_phase2_long_acc_spec` (`EvmAsm/Rv64/RLP/Phase2LongAcc.lean`):
    two-instruction `SLLI x11, x11, 8 ; ADD x11, x11, x12` big-endian
    accumulation core of the long-form length-of-length loop. Post:
    `x11 ← (len <<< 8) + byte`.
  - `rlp_phase2_long_load_acc_spec` (`EvmAsm/Rv64/RLP/Phase2LongLoad.lean`):
    three-instruction `LBU x12, x13, 0` prefix over the accumulation
    step. Reads one byte from `mem[x13]` and folds it into `x11`.
  - `rlp_phase2_long_iter_spec` (`EvmAsm/Rv64/RLP/Phase2LongIter.lean`):
    five-instruction full loop body (no back-branch) adding pointer
    advance (`ADDI x13, x13, 1`) and counter decrement
    (`ADDI x14, x14, -1`) on top of load-accumulate.
  - `rlp_phase2_long_loop_body_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopBody.lean`): six-instruction loop
    body as a `cpsBranch` — iteration body + `BNE x14, x0, back`.
    Taken at `(base+20) + signExtend13 back` with `⌜cnt' ≠ 0⌝`; fall-
    through at `base + 24` with `⌜cnt' = 0⌝`.
  - `rlp_phase2_long_loop_one_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopOne.lean`): single-iteration
    closure (lenLen = 1). When `x14 = 1` at entry, the taken branch is
    unreachable (`cnt' = 0`), so the `cpsBranch` collapses to a plain
    `cpsTriple` exiting at `base + 24`.
  - `rlp_phase2_long_loop_two_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopTwo.lean`): two-iteration closure
    (lenLen = 2). Composes the body spec (iter 1, BNE taken) with the
    one-byte closure (iter 2, fall-through) via
    `cpsTriple_seq_perm_same_cr`. Assumes both bytes live in the
    same doubleword.
  - `rlp_phase2_long_loop_three_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopThree.lean`): three-iteration
    closure (lenLen = 3). Composes body spec (iter 1) with two-byte
    closure (iters 2–3). All three bytes assumed in same doubleword.
  - `rlp_phase2_long_loop_four_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopFour.lean`): four-iteration
    closure (lenLen = 4). Composes body spec (iter 1) with three-byte
    closure (iters 2–4). All four bytes assumed in same doubleword.
  - `rlp_phase2_long_loop_five_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopFive.lean`): five-iteration
    closure (lenLen = 5). Composes body spec (iter 1) with four-byte
    closure (iters 2–5). All five bytes assumed in same doubleword.
  - `rlp_phase2_long_loop_six_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopSix.lean`): six-iteration
    closure (lenLen = 6, prefixes `0xBD` / `0xFD`). Composes body
    spec (iter 1) with five-byte closure (iters 2–6). All six bytes
    assumed in same doubleword (`byteOffset ptr ≤ 2`).
  - `rlp_phase2_long_loop_seven_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopSeven.lean`): seven-iteration
    closure (lenLen = 7, prefixes `0xBE` / `0xFE`). Composes body
    spec (iter 1) with six-byte closure (iters 2–7). All seven bytes
    assumed in same doubleword (`byteOffset ptr ≤ 1`).
  - `rlp_phase2_long_loop_eight_byte_spec`
    (`EvmAsm/Rv64/RLP/Phase2LongLoopEight.lean`): eight-iteration
    closure (lenLen = 8, prefixes `0xBF` / `0xFF` — the maximum
    permitted by RLP). Composes body spec (iter 1) with seven-byte
    closure (iters 2–8). All eight bytes assumed in same doubleword
    (`byteOffset ptr = 0`, i.e., `ptr` is doubleword-aligned).
    Single-doubleword unrolling track now complete.
  - General `n`-iteration closure (induction over `cnt`) still pending
    (initial attempt hit Lean-level issues around
    `BitVec.ofNat 64 n` arithmetic and associativity normalization;
    unrolling is catching up in the meantime).
- Phase 3: Single-item flat decode (byte strings only) — ⏳ scaffolding
  - `rlp_phase3_single_byte_spec` (`EvmAsm/Rv64/RLP/Phase3SingleByte.lean`):
    one-instruction `ADDI x11, x0, 1` that materializes `length = 1` for
    Phase 1's `e1` exit (prefix byte `< 0x80`, single-byte string —
    the prefix IS the data). The data pointer in `x13` rides through
    as a frame atom; no pointer advance is needed.
  - `rlp_phase3_long_string_spec` (`EvmAsm/Rv64/RLP/Phase3LongString.lean`):
    three-instruction entry block for Phase 1's `e3` exit
    (`p ∈ [0xB8, 0xC0)`). Sets `x14 = p − 0xB7` (length-of-length;
    range [1, 8]), clears the length accumulator `x11 := 0`, and
    advances the data pointer `x13 += 1` to the first length byte —
    leaving the machine in the canonical pre-loop state expected by
    the `rlp_phase2_long_loop_*_byte_spec` family.
  - `rlp_phase1_e3_0xB8_one_byte_length_spec`
    (`EvmAsm/Rv64/RLP/Phase1E3LongStringOne.lean`): concrete full path
    for the smallest long-string prefix (`0xB8`). Composes Phase 1 e3
    classification, Phase 3 long-string entry, and the one-byte Phase 2
    length loop. Postcondition gives the zero-copy output pair:
    `x11 = payload_length_byte`, `x13 = payload_start`.
  - ✅ **Long-form single-doubleword full paths complete (lenLen 1–8).**
    Mirroring the `0xB8` one-byte path, every long-form prefix now has an
    end-to-end Phase 1 → Phase 3 → Phase 2 composition:
    - Long byte-strings (e3, prefixes `0xB8`–`0xBF`):
      `rlp_phase1_e3_0x{B8..BF}_{one..eight}_byte_length_spec_within`
      (`Phase1E3LongString{One..Eight}.lean`).
    - Long lists (e5, prefixes `0xF8`–`0xFF`):
      `rlp_phase1_e5_0x{F8..FF}_{one..eight}_byte_length_spec_within`
      (`Phase1E5LongList{One..Eight}.lean`).
    Each composes the proven `rlp_phase1_e{3,5}_full_path_spec'_within`
    entry with the matching `rlp_phase2_long_loop_{N}_byte_spec_within`
    closure via `cpsTripleWithin_seq`; the postcondition restates the
    big-endian length by reference to that closure's `…_post`. All
    16 theorems are axiom-clean (only `propext`/`Classical.choice`/
    `Quot.sound`), 0 sorry. Wired into the `EvmAsm.Rv64.RLP` umbrella.
  - ✅ **Long-form length bridged to the pure spec (`Nat.fromBytesBE`).**
    The full-path outputs above leave `x11` as a raw big-endian
    accumulation; `Phase2LongLengthBridge.lean` proves it equals the value
    the pure RLP spec decodes. Core: `rlp_be_len_{1..8}_eq_fromBytesBE`
    (and `rlp_be_byte_eq_fromBytesBE`) show
    `(((0 <<< 8) + e0) <<< 8 …) + e_{N-1} = BitVec.ofNat 64 (Nat.fromBytesBE
    [e0,…,e_{N-1}])` for `N ≤ 8` (no 64-bit overflow since the value is
    `< 256^8 = 2^64`), backed by the pure bound `Nat.fromBytesBE_lt`
    (`EL/RLP/Properties.lean`). `Phase1E{3,5}…FromBytesBE.lean` thread this
    through to 16 end-to-end `…_fromBytesBE_spec_within` restatements whose
    postcondition states `x11 = BitVec.ofNat 64 (Nat.fromBytesBE [extractByte
    …, …])` — i.e. the decoder computes the *spec-correct* length, not just a
    bitvector. All axiom-clean, 0 sorry.
  - ✅ **Per-exit `classifyPrefix`-level bridges complete (all five exits).**
    Every classification exit now has a wrapper restating its full path
    against the pure RLP classifier `classifyPrefix` (input side) and the
    matching pure length (output side): e1 `rlp_phase1_e1_single_byte_of_class_spec_within`
    (`classifyPrefix = .singleByte` → `x11 = 1`), e2
    `rlp_phase1_e2_full_path_payload_len_of_class_spec_within`
    (`x11 = ofNat (rlpPrefixShortBytesPayloadLen pfx)`), e4
    `rlp_phase1_e4_full_path_payload_len_of_class_spec_within`
    (`rlpPrefixShortListPayloadLen`), and e3/e5's `…_lenOfLen_of_class…`
    (the long-form `lenLen` counter). The e1/e2 wrappers (this slice) mirror
    e4 and reuse `rlpPrefixShortBytesPayloadLen_toWord_of_class`
    (`EL/RLP/ProgramSpec.lean`). Axiom-clean, 0 sorry.
  - ✅ **General n-iteration long-form loop closure** (`Phase2LongLoopGeneral.lean`).
    The decoder runs a single back-branching loop whose iteration count is the
    runtime counter; the eight unrolled closures only covered fixed counts 1–8.
    `rlp_phase2_long_loop_n_byte_spec_within (n) (1 ≤ n ≤ 8)` now proves the
    loop for an **arbitrary** count in one theorem (by induction on the count),
    with the decoded length bridged to the pure spec
    `BitVec.ofNat 64 (Nat.fromBytesBE (rlpLoopByteList …))`. This is the
    keystone for a unified single-item decoder (apply at the symbolic
    length-of-length). The prior blocker — parametric counter arithmetic
    (`BitVec.ofNat 64 (n+1) + signExtend12 (-1) = BitVec.ofNat 64 n`) — is
    resolved by `word_ofNat_succ_dec`/`_ne_zero`/`_add_one` (toNat + omega).
    Supporting: `rlpLoopAcc` (the loop's big-endian fold), `rlpLoopAcc_toNat`
    (mod-form invariant), and `rlpLoopAcc_zero_eq_fromBytesBE` (the fold ↔
    `fromBytesBE` bridge). Axiom-clean, 0 sorry.
  - ✅ **General long-form full paths** (`Phase1E3LongBytesFull.lean` /
    `Phase1E5LongListFull.lean`). For an **arbitrary** in-class prefix,
    `rlp_phase1_e3_longBytes_full_spec_within` /
    `rlp_phase1_e5_longList_full_spec_within` compose the Phase-1 classify +
    Phase-3 entry with the n-iteration closure at the symbolic
    `n = rlpPrefixLong{Bytes,List}LenOfLen pfx ∈ [1,8]`, yielding the decoded
    `x11 = ofNat (Nat.fromBytesBE (rlpLoopByteList …))` and payload pointer
    `x13`. Collapses the 16 concrete-prefix `…_fromBytesBE_spec_within` paths
    into 2 parametric theorems (e5 reuses the existing
    `…_lenOfLen_of_class…` prefix wrapper; e3 rewrites `x14 = pfx − 0xB7` to
    `ofNat n` via `rlpPrefixLongBytesLenOfLen_toWord_of_class`). Axiom-clean,
    0 sorry.
  - ✅ **Unified single-item decode (capstone)** (`UnifiedDecodeItem.lean`).
    `rlp_decode_single_item_spec_within`: for an arbitrary prefix byte, one
    theorem whose conclusion is a `match classifyPrefix pfx` dispatching to the
    five per-class full-path handlers — the RV64 analogue of the pure
    `decodeAux_cons_eq_classifyPrefix_match`. Each branch reaches the
    class-appropriate exit with the spec-correct decoded length
    (`1` / `rlpPrefixShort{Bytes,List}PayloadLen` / `Nat.fromBytesBE …`).
    Long-form-only proof obligations (window `hwin`, loop `hback`) ride in a
    `match`-typed hypothesis `rlpDecodeLongHyps` so flat callers needn't supply
    them. Proof: `cases classifyPrefix pfx` + `exact <handler>`. Axiom-clean,
    0 sorry. **The single-item RLP decode arc is complete end-to-end**
    (classify → per-class length extraction → unified dispatch).
  - ✅ **List-payload single-byte-run decode (pure spec)** (`EL/RLP/ListDecode.lean`).
    The merged `ListDecodeBridge` *characterizes* list decode in terms of
    `decodeItems`/`decodeListPayload` but never *computes* `decodeItems` for any
    concrete payload class. `decodeItems_singleByte_run` closes that gap: a run of
    bytes each `< 0x80` (with depth budget `2 * bs.length ≤ nDepth`) decodes to
    one one-byte `RLPItem.bytes` per byte, consuming the whole run (induction on
    `bs`, reusing `decodeAux_cons_singleByte_of_classifyPrefix` +
    `classifyPrefix_singleByte_iff`). `decodeAux_shortList_of_singleByte_items`
    lifts it through the merged short-list characterization
    (`ListDecodeBridge.decodeAux_cons_shortList_eq_some_iff`) to a full
    `RLPItem.list` of single-byte items, with a concrete cross-check
    (`0xC3 [0x01,0x7F,0x05]`). Axiom-clean (propext/Quot.sound), 0 sorry. This is
    the pure-spec foundation for the first RV64 list-payload loop (single-byte
    items are the one fixed-stride, zero-copy case); the RV64 loop + `decodeItems`
    bridge is the follow-up.
  - Remaining (future): a full semantic bridge to the pure `decodeAux` (needs
    canonical-encoding `>55`/leading-zero validation the RV64 path does not
    perform, plus recursive list decode `decodeItems` for list payloads);
    cross-doubleword length spans (`n ≤ 8` single-doubleword only); Phase-4
    `read_input` pipeline integration.
- Phase 4: `read_input` integration (obtain RLP input pointer + length)
- ✅ **Multi-dword byte-region memory abstraction** (`EvmAsm/Rv64/MemRegion.lean`,
  Phase-5 foundation). `bytesRegion (base) (bs : List (BitVec 8))` — a contiguous
  byte region stored little-endian across consecutive 8-byte dwords (each named
  `↦ₘ` via `packBytes`), mirroring `Evm64.CodeRegion.evmCodeIs` but at the `Rv64`
  layer so the RLP decoder can use it. Closes the gap that the existing decoder
  reads only **within one dword** (`Phase2LongLoopGeneral.hwin` forces a single
  `dwordAddr`). Lemmas: `bytesRegion_nil`, `bytesRegion_eq_cons` (peel one dword,
  address `+8` — the workhorse), `bytesRegion_pcFree` (frames). The pure
  byte-packing primitives (`packBytes`, `extractByte_packBytes`, `packDword`)
  were **relocated** from `Evm64/CodeRegion.lean` down to `Rv64/ByteOps.lean`
  (their natural home; one source of truth for both layers). Axiom-clean, 0 sorry.
- ✅ **`bytesRegion_lbu_within` — read byte `i` across dwords** (`MemRegion.lean`).
  An `LBU` at `regionBase + i` (`i < bs.length`, base dword-aligned, no overflow)
  reads `bs[i]` — the multi-dword read the single-`dwordAddr` `hwin` model could
  not express (cross-check: reads byte 9 of a 10-byte / 2-dword region). New
  supporting lemmas: `alignToDword_add_ofNat_of_aligned`
  (`alignToDword (base + i) = base + 8*(i/8)`) and `byteOffset_add_ofNat_of_aligned`
  (`= i % 8`) — the BitVec masking arithmetic the existing loop *assumes*; plus
  `bytesRegion_dword_at` (extract the `dw`-th dword cell, partial last chunk OK).
  Composition mirrors `Evm64/Push`: `generic_lbu_spec_within` +
  `extractByte_packBytes`, framed via `cpsTripleWithin_frameR` + `xperm_hyp`.
  Axiom-clean, 0 sorry.
- ✅ **Single-byte-item list-decode loop** (`SingleByteListLoop.lean`). The first
  RV64 RLP *list*-decode loop: a 4-instruction back-branch body
  (`LBU x12,x13,0; ADDI x13,x13,1; ADDI x14,x14,-1; BNE x14,x0,back`) traversing a
  single-byte-item payload, reading each byte from `bytesRegion` via
  `bytesRegion_lbu_within` (assume-precondition scope — every byte `< 0x80` assumed,
  no in-line validation/fail-exit yet). Layers: `sbll_iter_spec_within` (3-instr
  triple), `sbll_body_spec_within` (2-exit `cpsBranchWithin`), `sbll_loop_succ/n_spec_within`
  (n-iteration closure by **remaining-count induction with a generalized start
  index** — keeping `regionBase`/`bs` fixed and re-indexing the per-iteration
  validity hypotheses, since `bytesRegion_lbu_within` is keyed by the *absolute*
  byte index, unlike Phase-2's `dwordAddr`-shift), and `sbll_loop_bridge` tying the
  operational spec to the pure `decodeItems_singleByte_run` (zero-copy: shared
  `< 0x80` precondition + matching consumed length `bs.length`). Cross-dword
  cross-check: 10-byte / 2-dword payload → 10 items in `4*10` steps. Reuses
  Phase-2's `word_ofNat_*`/`cnt_dec_1` counter lemmas. Axiom-clean, 0 sorry.
- ✅ **In-line `< 0x80` validation — drop the `hsingle` assumption**
  (`SingleByteListLoopValidated.lean`). The validated loop *proves* every payload
  byte `< 0x80` in-machine instead of assuming it: it ORs each byte into an
  accumulator (`x11`) during the scan (a `cpsTripleWithin` closure mirroring the
  Phase-2 accumulator — `sbll_val_iter/body/loop_succ/n`), then a single post-loop
  `ANDI x15, x11, 0x80; BNE x15, x0, fail` (accumulate-then-check, avoiding a
  branching loop closure). `sbll_val_loop_checked_spec_within` is the 2-exit
  `cpsBranchWithin` (success = `acc &&& 0x80 = 0`, fail = some byte `≥ 0x80`);
  `sbll_val_loop_bridge` derives `decodeItems_singleByte_run` from the
  *machine-checked* success condition — **no `hsingle` precondition**. Key new
  BitVec lemma `orAccList_and_0x80_eq_zero_imp_all_lt` (accumulator bit-7 clear ⇒
  all bytes `< 0x80`) via `BitVec.msb_eq_false_iff_two_mul_lt`. `OR` cannot
  overflow ⇒ no `n ≤ 8` bound. Cross-dword cross-check at 10 bytes. Axiom-clean,
  0 sorry.
- ✅ **Unified decoder 5-exit reconvergence — flat classes**
  (`UnifiedDecodeItemReconverge.lean`). The unified single-item decoder
  (`rlp_decode_single_item_spec_within`) reaches 5 different exit PCs with 5
  different posts; a per-item list loop needs one common exit + a uniform post.
  This reconverges the **3 flat classes** (`singleByte`/`shortBytes`/`shortList` —
  no memory, no `x12`/`x14`): each class handler runs to its exit, then an
  unconditional `JAL x0` (`jal_x0_spec_gen_within`, 1 step, `emp` pre/post)
  jumps to a common `joinPC`. `rlp_decode_single_item_reconverged_flat` is one
  `cpsTripleWithin 11 base joinPC cr` whose uniform post exposes the cascade
  residue / payload length / payload pointer via `classifyPrefix`-dispatched
  helpers (`itemCascadeResidue`/`itemPayloadLen`/`itemPayloadPtr`), so a loop can
  advance uniformly by `x13 += x11`. Generic `reconverge_arm` (handler ∘ JAL ∘
  `extend_code` to the shared `cr` ∘ `mono_nSteps`) + a 3-way `classifyPrefix`
  case split; `singleByte`'s `x13`-free pre is unified by framing `x13`. The
  shared decoder code is a parameter `cr` with per-class sub-CR hypotheses
  (`handlerCR ∪ JAL ⊆ cr`), discharged by the eventual loop caller. Axiom-clean,
  0 sorry.
- ✅ **Unified decoder 5-exit reconvergence — ALL classes**
  (`UnifiedDecodeItemReconvergeAll.lean`). Completes the reconvergence by adding
  the 2 long classes (`longBytes`/`longList` — `(dwordAddr ↦ₘ wordVal)` + `x12`/`x14`
  + variable steps). `rlp_decode_single_item_reconverged_all` is a single
  `cpsTripleWithin 60 base joinPC cr` (60 = `max(3,6,10,9+6·8,11+6·8)+1`, longList
  binds it) over ALL of `classifyPrefix pfx`, with a uniform post exposing
  `x10`/`x11`/`x12`/`x13`/`x14`/memory via 5-class match helpers
  (`itemResidue`/`itemLen`/`itemPtr`/`itemX12`/`itemX14`). Bound-parameterized
  `reconverge_arm_n` (the flat arm with `11`→`N`); 5-way `classifyPrefix` case
  split invoking each handler directly (flat arms frame `x12`/`x14`/memory; long
  arms carry them); variable long-step bound discharged by
  `rlpPrefixLong{Bytes,List}LenOfLen_le_8_of_class`. Axiom-clean, 0 sorry.
- ✅ **Flat per-item list-loop BODY** (`FlatListLoopBody.lean`). One iteration of
  the flat-item list-decode loop as a 2-exit `cpsBranchWithin`: `LBU x5,x13,0`
  (read the prefix from `bytesRegion` via `bytesRegion_lbu_within`) → the flat
  reconverged decoder (passed as an OPAQUE `cpsTripleWithin` hypothesis,
  `decoder_base = lbase+4`) → `ADD x13,x13,x11` (advance to the next item) →
  `ADDI x14,x14,-1` (item counter) → `BNE x14,x0,back` (taken → `lbase`).
  `fll_body_spec_within` (step bound 15) with bundled post `fll_body_post`;
  new stride helpers `itemTotalLen`/`itemNextPtr` + `itemNextPtr_eq`
  (`itemPayloadPtr + itemPayloadLen = v13 + itemTotalLen`, the factored form the
  closure re-indexes on). The decoder is opaque so its disjointness from the
  loop's LBU/ADD/ADDI/BNE singletons is taken as hypotheses (the closure
  discharges them from the concrete decoder layout). `bytesRegion`+`x14` framed
  through the decoder. Flat-only (long items need the decoder's length-read
  upgraded to `bytesRegion`). Axiom-clean, 0 sorry.
- ✅ **Flat-item stride-equivalence** (`FlatListLoop.lean`). The mathematical
  core the closure needs: `isFlatItem` (`.bytes data` with `data.length ≤ 55`;
  `.list` with payload `≤ 55`), `classifyPrefix_encode_head_flat` (a flat item's
  encoding starts with a flat prefix), and **`encode_head_eq_itemTotalLen`**:
  `itemTotalLen ((encode item)[0]) = ofNat (encode item).length` — the
  operational per-item stride equals the pure encoding length (per-class
  byte-shape algebra via `encodeBytes`/`encode_list_short`/`classifyPrefix_*_iff`,
  no `bv_decide`). Bridges the machine loop's pointer advance to the pure
  `encode`/`decodeItems` round-trip. Axiom-clean, 0 sorry.
- ✅ **Concrete flat decoder program** (`FlatDecoderConcrete.lean`). Discharges the
  list-loop closure's abstract `decoderH` with REAL code: `flat_decoder_prog`
  (one linear 16-instruction program — the 4-step phase-1 cascade + the three
  flat-class phase-3 handlers + their reconvergence `JAL`s) and
  **`flat_decoder_spec`** (`∀` flat prefix, `cpsTripleWithin 11 base (base+64)
  (CodeReq.ofProg base flat_decoder_prog) …` — exactly the `decoderH` shape).
  Proved by instantiating `rlp_decode_single_item_reconverged_flat` at the forced
  offsets (`off1=off2=28, off4=24, joff1=28, joff2=16, joff4=4`) and discharging
  every side-condition: `htarget*`/`hjoin*` via `rv64_addr`; the six `hd_*`
  disjointness via `crDisjoint`; the three `hsub_*` subsets via `union_sub` +
  `ofProg_mono_sub`/`singleton_mono` (`flat_piece`/`flat_jal_piece` helpers). No
  `bv_decide`. Axiom-clean, 0 sorry.
- ✅ **Flat list-loop n-closure + bridge** (`FlatListLoop.lean`). The operational
  loop over a list of flat items: `fll_loop_spec_within` (structural induction
  over `items : List RLPItem`, threading the byte offset via
  `bs.drop O = encode.encodeItems items`, applying `fll_body_spec_within` per
  item and re-indexing the pointer with `encode_head_eq_itemTotalLen`; the
  decoder is a ∀-hypothesis `decoderH`, discharged per iteration; uniform
  `15 * items.length` steps with `regOwn`-abstracted scratch in `fll_loop_post`),
  `fll_loop_n_spec_within` (offset-0 entry), and `fll_loop_bridge` (conjoins the
  loop with the pure `decodeItems (2*len+1) … = some (items, [])` via
  `decode_encode_mutual.2` — strict fuel `2*len < nDepth`). Axiom-clean, 0 sorry.
- ✅ **Fully concrete end-to-end flat list decoder** (`FlatListLoopConcrete.lean`).
  Wires `flat_decoder_spec` into `fll_loop_bridge`: **`flat_list_loop_concrete_bridge`**
  has NO abstract hypotheses — a real RV64 program (`[LBU] ++ flat_decoder_prog ++
  [ADD, ADDI, BNE]` at `base`, scaffold bracketing the 16-instr decoder so
  `joinPC=base+68`, loop exit `base+80`, back-edge `-76`) decodes a non-empty flat
  `items` list from `bytesRegion` in `15*items.length` steps AND coincides with the
  pure `decodeItems` round-trip. The loop's `decoderH` is `flat_decoder_spec (base+4)`
  (exit rewritten `(base+4)+64 → base+68`); scaffold disjointness via
  `singleton_ofProg`/`ofProg_singleton` + `ofProg_none_range_len` + `bv_omega` (the
  `crDisjoint` tactic times out on the opaque 16-instr `ofProg`); `hback` via
  `signExtend13 (-76) = -76` (`decide`) + `bv_omega`. Concrete 2-item `example`.
  Axiom-clean, 0 sorry. This completes the **flat** RLP list-decoder arc.
- 🚧 **Long-item-capable list loop** (arc, step 2 of 6). Goal: a list loop that
  handles `longBytes`/`longList` (real Ethereum RLP exceeds 55 bytes). Design:
  reuse the 5-class single-item decoder (`rlp_decode_single_item_reconverged_all`,
  already proven), with the list-loop item counter on **x15** (the decoder
  clobbers x14 as its length-read counter), the uniform advance `x13 += x11`
  (decoder leaves x13=payloadPtr, x11=payloadLen), and all reads over `bytesRegion`.
  - ✅ **Step 1 — `bytesRegion` length-read loop** (`Phase2LongLoopRegion.lean`).
    Region analog of the single-dword `Phase2Long*` stack: `rlpLoopAccRegion` (+
    `_zero_eq_fromBytesBE`), `rlp_phase2_long_region_iter_spec_within` (one iteration,
    proved by extracting the containing dword from `bytesRegion` and reusing the
    single-dword iter — the cross-dword read the `alignToDword`-constrained loop
    couldn't express), `rlp_phase2_long_region_body_spec_within` (`cpsBranchWithin 6`),
    `rlp_phase2_long_loop_region_succ_spec_within` (induction over `k+1∈[1,8]`
    iterations), and `rlp_phase2_long_loop_region_n_spec_within` (from `0`, decoded
    length `= ofNat (Nat.fromBytesBE ((bs.drop off).take n))`). Same body program as
    the single-dword version; only the memory model differs. Frame `x0` on the
    **left** (`frameL`) so `xperm` never commutes a `regIs` rightward past the opaque
    `bytesRegion` atom. Axiom-clean, 0 sorry, no `bv_decide`.
  - ✅ **Step 2 — long-item stride foundation** (`LongItemStride.lean`). Pure
    long analog of `FlatListLoop.lean` §1: `isLongItem`/`itemPayloadCount`/`itemLenOfLen`,
    `classifyPrefix_encode_head_long`, `encode_long_lenOfLen_eq_{bytes,list}`,
    `encode_long_length_eq`, `encode_long_lenBytes_read`, and **`encode_long_stride`**
    (`payloadPtr + payloadCount = v13 + (encode item).length`). NB: the long stride is
    runtime-read-dependent, so it is NOT a prefix-only `itemTotalLen` (that flat helper's
    long branch correctly stays `_ => 0`); the unified loop re-indexes via this identity.
  - ✅ **Step 3a — region long decoder arms** (`Phase1LongFullRegion.lean`).
    `rlp_phase1_e3_longBytes_full_region_spec_within` / `…_e5_longList_…`: the e3/e5 long
    arms re-derived over `bytesRegion` (analogs of `Phase1E{3,5}Long*Full.lean`). Phase
    1/Phase 3 are register-only (reused verbatim); only the phase-2 loop is swapped to the
    step-1 region length-read, `bytesRegion` framed through, and `x13` re-expressed from
    `v13 + 1` to `regionBase + ofNat (off+1)` (item at byte offset `off`). Axiom-clean, 0 sorry.
  - ✅ **Step 3b — region all-class decoder** (`UnifiedDecodeItemReconvergeAllRegion.lean`).
    **`rlp_decode_single_item_reconverged_all_region`**: the 5-class single-item decoder over
    `bytesRegion regionBase bs` (analog of `rlp_decode_single_item_reconverged_all`). Region
    post-helpers `itemLenRegion`/`itemX12Region` (read `bs` at `off+1`) + `itemPtrRegion`;
    `itemResidue`/`itemX14` reused; `rlpDecodeLongHypsRegion` (region window, no `alignToDword`);
    `reconverge_arm_n` reused unchanged. Flat arms (e1/e2/e4) reuse the existing register-only
    handlers framing `bytesRegion` (`x13` via `hv13`/`region_succ_ptr`); long arms (e3/e5) use the
    step-3a region arms (post matches the helpers exactly). The `decoderH` the long loop body
    consumes. Axiom-clean, 0 sorry.
  - ✅ **Step 4 — unified loop body** (`UnifiedListLoopBody.lean`). **`unified_body_spec_within`**:
    one iteration decoding ANY item (all 5 classes) — `LBU x5,x13,0` + region decoder (opaque
    `cpsTripleWithin 60`) + `ADD x13,x13,x11` + `ADDI x15,x15,-1` + `BNE x15,x0,back`, as a
    `cpsBranchWithin 64` (taken → `lbase`, fall → `joinPC+12`). Item counter on **x15** (the decoder
    clobbers x14/x12); x10/x12/x14 are decoder scratch. `unified_body_post` (+`_unfold`/`_pure`),
    `itemNextPtrRegion := itemPtrRegion + itemLenRegion` (the `ADD` result). Faithful mirror of
    `fll_body_spec_within`; the decoder stays opaque (the concrete region decoder discharges it in
    the end-to-end step). Axiom-clean, 0 sorry.
  - ✅ **Step 5a — unified stride-equivalence** (`UnifiedItemStride.lean`).
    **`encode_head_eq_itemNextPtrRegion`**: for ANY item (all 5 classes), the body's per-item advance
    `itemNextPtrRegion ((encode head)[0]) regionBase off bs` (= `itemPtrRegion + itemLenRegion`, the
    `ADD x13` result) re-indexes to `regionBase + ofNat (off + (encode head).length)` — the all-class
    analog of the flat `encode_head_eq_itemTotalLen`. Flat items bridge through `itemTotalLen`
    (reusing `encode_head_eq_itemTotalLen`); long items tie the runtime-read length bytes back to the
    encoding via private `long_lenBytes_in_region` (`(bs.drop (off+1)).take lenOfLen =
    toBytesBE payloadCount`) + step-2 lemmas (`encode_long_lenOfLen_eq_*`, `encode_long_lenBytes_read`,
    `encode_long_length_eq`). Also `flat_or_long` dichotomy. Axiom-clean, 0 sorry.
  - ✅ **Step 5b — unified loop closure + bridge** (`UnifiedListLoop.lean`). All-class analog of the
    flat `fll_loop_*`. **`unified_loop_spec_within`**: structural induction over arbitrary `items`,
    threading the byte offset via `bs.drop O = encode.encodeItems items`, applying the step-4 body per
    item (the 60-step region decoder supplied as the ∀-hypothesis `UnifiedDecoderH`), re-indexing `x13`
    via `encode_head_eq_itemNextPtrRegion` (5a) — uniform `64 * items.length` steps, counter on x15,
    scratch x10/x12/x14 abstracted to `regOwn` in `unified_loop_post`. **`unified_loop_n_spec_within`**
    (offset-0 entry) and **`unified_loop_bridge`** (conjoins the pure `decodeItems` round-trip via
    `decode_encode_mutual.2`). Per-head `itemPayloadCount < 256^8` discharged from `hover` (flat ≤55 /
    long via `encode_long_length_eq`). Axiom-clean, 0 sorry.
  - ✅ **Step 6a — dischargeable decoder hypothesis** (`UnifiedListLoop.lean`). Added a per-index
    `regionLongWindow` guard to `UnifiedDecoderH` (the window-only half of `rlpDecodeLongHypsRegion`: a
    long prefix's `lenOfLen` length bytes lie within the region) and discharge it in
    `unified_loop_spec_within` at every item-start offset (private `regionLongWindow_of_split`: flat →
    `True`; long → the length bytes sit inside `encode head`, in-bounds via `encode_long_length_eq`,
    readable via `hwin`). Plus local per-structure classification `classify_bytes_long`/
    `classify_list_long`. Fixes the latent gap where the old guardless `UnifiedDecoderH` was
    unsatisfiable by any concrete program (a mid-item byte can classify as a long prefix running off
    the region end). Axiom-clean, 0 sorry.
  - ✅ **Step 6b — concrete all-class decoder** (`UnifiedDecoderConcrete.lean`, analog of #9007).
    `unified_decoder_prog : List Instr` — a single 36-instruction program (phase-1 cascade @base..+32,
    e5 long-list handler+loop+JAL @+32/+44/+68, e1–e4 handlers @+72/+80/+92/+104 with the e3 long-string
    loop @+116, reconvergence JALs; `joinPC = base+144`; loop back-edge `-20`; cascade offsets
    `68/68/84/64`, JAL offsets `68/56/4/44/76`). `unified_decoder_spec` proves `cpsTripleWithin 60 base
    (base+144) (ofProg base unified_decoder_prog) …` for `bs[off]` — exactly the `UnifiedDecoderH` shape
    — discharging `rlp_decode_single_item_reconverged_all_region`'s ~30 hypotheses: targets/joins via
    `rv64_addr`, 12 disjointness via `simp [rlp_phase1_step_code]; crDisjoint`, 5 subset embeddings via
    `union_sub` + `unified_piece`/`unified_jal_piece` (`ofProg_mono_sub`/`ofProg_lookup_addr`), and
    `rlpDecodeLongHypsRegion` from the passed `regionLongWindow` window + a concrete `-20` back-edge
    (`signExtend13 (-20) = -20` by `decide`, then `bv_omega`). Key perf lesson: write piece subBases in
    the region decoder's `e_target + k` form (syntactic unification, not BitVec defeq) and skip
    `simp only [rlp_phase1_step_code]` in the subset proofs (defeq handles the abbrev); needs
    `set_option maxHeartbeats 800000`/`maxRecDepth 8000`. Axiom-clean, 0 sorry, 0 warnings.
  - ✅ **Step 6c — fully concrete end-to-end** (`UnifiedListLoopConcrete.lean`, analog of #9008).
    **`unified_list_loop_concrete_bridge`** has NO abstract hypotheses: the real RV64 program `[LBU
    x5,x13,0] ++ unified_decoder_prog ++ [ADD x13 x13 x11, ADDI x15 x15 -1, BNE x15 x0 -156]` at `lbase`
    (scaffold bracketing the 36-instr decoder so `joinPC = lbase+148`, exit `lbase+160`, back-edge
    `-156`) **decodes a non-empty list of ARBITRARY RLP items (all 5 classes, long strings/lists
    included) from `bytesRegion` in `64*items.length` steps AND coincides with the pure `decodeItems`
    round-trip**. Discharges `unified_loop_bridge`'s abstract `UnifiedDecoderH` via `unified_decoder_spec
    (lbase+4)` (`rwa [(lbase+4)+144 = lbase+148]`); back-edge `signExtend13 (-156) = -156` by `decide`;
    scaffold/decoder disjointness via `dcr_none` (`ofProg_none_range_len`, n=36) + `Disjoint.singleton_
    ofProg`/`ofProg_singleton`. Concrete 2-item `example`. Built first try, axiom-clean, 0 sorry. **This
    completes the long-item RLP list-decoder arc — single-level lists of all 5 classes now decode
    end-to-end in real RISC-V with a kernel-checkable round-trip proof.**
- **Phase 4 done.** RLP single-level list decoding (flat #9004–#9008 + long-item #9020–#9031) is
  complete. Remaining for feature-complete RLP:
- Phase 5: Recursive list decode (iterative with explicit stack)
  - ✅ **Step 1 — descend-one-level kernel** (`NestedDescendOne.lean`). **`list_item_payload_window`**:
    for a `.list items` item at offset `off`, the single-item decoder's window (`x13 = itemPtrRegion`
    payload pointer, `x11 = itemLenRegion` payload length) points at EXACTLY the `encode.encodeItems
    items` payload — `∃ payloadOff, itemPtrRegion = regionBase + ofNat payloadOff ∧ itemLenRegion =
    ofNat (encodeItems items).length ∧ bs.drop payloadOff = encode.encodeItems items ++ tail` (the
    third conjunct is precisely the list loop's `hdrop` precondition, so a caller can descend). The
    all-class analog, for the payload window, of the 5a stride lemma; short-list via
    `classifyPrefix_shortList_iff` + `encode_list_short`, long-list via `classifyPrefix_encode_head_long`
    + `encode_list_long` + `take_left'`/`drop_left'` + `fromBytesBE_toBytesBE`. Plus `decode_list_descend`
    (top-level round-trip) and concrete nested `example`s. Axiom-clean, 0 sorry.
  - ✅ **Step 2 — length-driven loop body** (`UnifiedLenLoopBody.lean`). Exploration confirmed the
    count-driven descent is a dead end (the decoder yields a payload byte-length in `x11`, not an item
    count; the count is symbolic, unloadable) — so the foundation is a LENGTH-driven loop.
    **`unified_lenloop_body_spec_within`**: one iteration as `cpsBranchWithin 63` over `LBU x5,x13,0` +
    opaque region decoder (`cpsTripleWithin 60`) + `ADD x13,x13,x11` + **`BNE x13,x15,back`** — guard
    compares the advanced pointer to the invariant end pointer `x15 = endPtr` (no counter, one fewer
    instr than the count-driven body); taken (`itemNextPtrRegion ≠ endPtr`)→`lbase`, fall (`= endPtr`)→
    `joinPC+8`. With no counter, `x15` holds `endPtr` (the decoder never touches it, framed through the
    60-step decode). `unified_lenloop_body_post` (+`_unfold`/`_pure`). Faithful mirror of the
    count-driven `unified_body_spec_within` with the guard swapped; built first try, axiom-clean, 0 sorry.
  - ✅ **Step 3 — length-driven closure + bridge** (`UnifiedLenLoop.lean`). **`unified_lenloop_spec_
    within`**: induct over `items`, `x13` from `O`, `x15 = endPtr := regionBase + ofNat (O +
    (encodeItems items).length)` fixed (invariant, threaded unchanged); applies the step-2 body per item,
    terminating when `x13` reaches `endPtr` exactly after the last item — `63 * items.length` steps, no
    item count. NEW lemma **`region_offset_ne`** (`regionBase + ofNat a ≠ regionBase + ofNat b` for
    distinct in-bounds offsets, by `bv_omega`) discharges `itemNextPtrRegion ≠ endPtr` for intermediate
    items (so the loop continues). **`unified_lenloop_n_spec_within`** (offset-0 entry, `x15 = regionBase
    + ofNat (encodeItems items).length`) and **`unified_lenloop_bridge`** (conjoins `decodeItems
    (2*len+1) … = some (items, [])` via `decode_encode_mutual.2`). Also exposed
    `regionLongWindow_of_split` (was private in `UnifiedListLoop.lean`) — both closures discharge the same
    per-item window obligation. Faithful mirror of the count-driven #9028 with termination swapped;
    axiom-clean, 0 sorry.
  - ✅ **Step 4a — concrete length-driven loop** (`UnifiedLenLoopConcrete.lean`, length-driven analog of
    #9032). **`unified_lenloop_concrete_bridge`** has NO abstract hypotheses: the real RV64 program `[LBU
    x5,x13,0] ++ unified_decoder_prog ++ [ADD x13 x13 x11, BNE x13 x15 -152]` at `lbase` (joinPC=lbase+148,
    exit lbase+156) decodes a list of arbitrary RLP items from `bytesRegion`, running until `x13` reaches
    the end pointer `x15 = regionBase + ofNat len`, in `63 * items.length` steps, AND coincides with the
    pure `decodeItems`. Discharges `UnifiedDecoderH` via `unified_decoder_spec`; `back=-152`
    (`signExtend13 (-152) = -152` by `decide`); scaffold/decoder disjointness via `dcr_none`
    (`ofProg_none_range_len`, n=36). **Directly the top-level list decoder** (the end pointer is the known
    input length). Built first try, axiom-clean, 0 sorry.
  - **Next (step 4b — genuine top-level descent):** compose the single-item decoder (decode a `.list`
    header → `x13 = payload ptr`, `x11 = payload len`) + `ADD x15 x13 x11` (set `x15 := endPtr`) + the
    concrete length-driven loop, bridged by `list_item_payload_window` (#9033), proving a top-level
    `decode (encode (.list items)) = some (.list items, [])` with the operational program validating the
    sub-list structure. (Top-level ⇒ payload tail empty ⇒ step-3 closure applies.) **Later** — a
    tail-general closure (`bs.drop O = encodeItems items ++ tail`; stride/window lemmas only read within
    each item, so the byte tail is inert) for descent into interior sub-lists, then arbitrary depth
    (bounded composition for the STF block/tx structure, or the explicit-stack generalization).
- Phase 6: Top-level pipeline (`read_input` -> decode -> `write_output`)
- **Host I/O ABI**: See `docs/zkvm-host-io-interface.md`; SP1
  `HINT_LEN`/`HINT_READ`/`COMMIT` are legacy handler shapes, not the target
  C ABI.
- **Output format**: Pointer + length (zero-copy into input buffer)
- **Depends on**: EL.1 (spec to verify against), EL.2 (byte-level specs)

---

## Roadmap: Phases 7-11 (STF — State Transition Function)

The STF is the end goal. It takes a block (header + transactions) and the
pre-state, executes all transactions, and produces the post-state. The STF
is what gets proved inside the zkVM.

### STF Architecture

The STF decomposes into layers (from the execution-specs):

```
state_transition(block, pre_state) → post_state
  └── apply_body(block_header, transactions, ommers)
        └── for each tx: process_transaction(env, tx)
              └── process_message_call(message)
                    └── execute(env) — the interpreter loop
                          └── for each step: dispatch opcode → handler
```

In our RV64 implementation, this maps to:

```
main():  read_input → Block + pre_state
         call state_transition
         write_output → post_state_root
```

### Phase 7: Interpreter Loop (EVM execution core)

This is the heart of the STF — the inner loop that executes EVM bytecode.

#### 7.1 EVM Machine State
- **File**: `Evm64/EvmState.lean` (new)
- **Approach**: Define the EVM-level execution state in RISC-V memory:
  ```
  struct EvmState {
    pc      : u64       // EVM program counter (byte offset into code)
    gas     : u64       // Remaining gas
    sp      : u64       // Stack pointer (already x12)
    memory  : *u8       // EVM memory base pointer
    memsize : u64       // Current memory size
    code    : *u8       // EVM bytecode pointer
    codelen : u64       // Code length
    env     : *u8       // Environment context pointer
    status  : u64       // Running / Stopped / Reverted / Error
  }
  ```
  Define `evmStateIs` assertion combining all sub-assertions.

#### 7.2 Opcode Dispatch
- **File**: `Evm64/Dispatch.lean` (new)
- **Approach**: Read `code[evm_pc]` byte, dispatch to handler.
  **Option A**: Jump table — load handler address from table[opcode], JAL.
  **Option B**: Binary search tree of BEQ comparisons.
  Jump table is faster (O(1)) but needs 256-entry table in memory.
  Binary search is smaller but O(log n).
  **Recommendation**: Jump table. 256 × 8 = 2048 bytes, small for zkVM.
- **Spec**: `dispatch_spec` relates opcode byte to correct handler entry point.

#### 7.3 Opcode Handlers (subroutine wrappers)
- **File**: `Evm64/Handlers.lean` (new)
- **Calling convention**: Use LP64 convention from `CallingConvention.lean`.
  Each handler is a non-leaf function using `cc_prologue` / `cc_epilogue`.
  Compose with `callNear_function_spec` / `nonleaf_function_spec`.
- **Approach**: Each handler is a thin wrapper:
  1. Deduct gas cost
  2. Call the opcode subroutine (e.g., `evm_add`) via `JAL x1, offset`
  3. Advance EVM PC by appropriate amount (1 for most, 1+n for PUSHn)
  4. Return to dispatch loop via `cc_ret`
- **Spec**: Each handler spec composes gas deduction + opcode spec + PC advance.

#### 7.4 Interpreter Main Loop
- **File**: `Evm64/Interpreter.lean` (new)
- **Approach**: RISC-V loop:
  ```
  loop:
    LBU opcode, code_base[evm_pc]    // read current opcode
    // dispatch to handler via jump table
    LD  handler, table[opcode * 8]
    JALR ra, handler
    // handler returns here
    // check status: if still running, loop
    BEQ status, RUNNING, loop
    // else: halt (STOP/RETURN/REVERT/ERROR)
  ```
- **Spec**: Inductive spec relating N EVM steps to N iterations:
  `interpreter_step_spec`: one iteration preserves EVM state invariant.
  `interpreter_N_spec`: N iterations = N EVM instruction executions.
- **Key invariant**: At each loop entry, the RISC-V state correctly
  represents the EVM state (stack, memory, PC, gas, status).
- **Proof strategy**: Define simulation relation between EVM abstract state
  and RISC-V concrete state. Prove each opcode handler preserves the
  simulation. Then the loop preserves it inductively.

### Phase 8: Storage & System Calls

#### 8.1 Storage Model (via host syscalls)
- SLOAD/SSTORE use ECALL to communicate with the zkVM host.
- The host provides storage read/write as part of the witness.
- **Spec**: Abstract storage as `Map U256 U256`. SLOAD returns `storage[key]`,
  SSTORE updates `storage[key] := value`.

#### 8.2 Precompiles (via zkvm_accelerators)
- The canonical C ABI is the vendored header
  `EvmAsm/Evm64/zkvm-standards/standards/c-interface-accelerators/zkvm_accelerators.h`
  (eth-act zkvm-standards). See
  [`docs/zkvm-accelerators-interface.md`](docs/zkvm-accelerators-interface.md)
  for the ADR; per-function bridge progress (input/output Lean payload types,
  syscall ID, Hoare-triple bridge spec) is tracked in beads parent
  `evm-asm-nr2sk`.
- Map EVM precompile addresses (0x01-0x11, 0x100) to `zkvm_accelerators.h` calls.
- ECRECOVER (0x01) → `zkvm_secp256k1_ecrecover`
- SHA256 (0x02) → `zkvm_sha256`
- RIPEMD160 (0x03) → `zkvm_ripemd160` — **DONE** (pure-software RV64 kernel,
  `Ripemd160.lean` — ZisK has no RIPEMD-160 accelerator; table-driven
  two-line compression, ~5.3k steps/64-byte block; 600+120/word gas,
  32-byte left-padded returndata; validated against the standard vectors
  via `scripts/codegen-zisk-ripemd160-check.sh`; bead fhsxz.2.4.2.62.11)
- IDENTITY (0x04) → no accelerator (pure memory copy)
- MODEXP (0x05) → `zkvm_modexp`
- BN254_ADD (0x06) → `zkvm_bn254_g1_add` — **DONE** (real kernel,
  `Bn254Field.lean`/`Bn254Curve.lean`, ZisK Bn254CurveAdd/Dbl `csrs
  0x806/0x807` + Arith256Mod `csrs 0x802` accelerators; EIP-196 validation +
  EIP-150 child-allotment failure gas; bead fhsxz.2.4.2.62.10)
- BN254_MUL (0x07) → `zkvm_bn254_g1_mul` — **DONE** (same backend; raw
  256-bit double-and-add, ~0.93M ziskemu steps worst case)
- BN254_PAIRING (0x08) → `zkvm_bn254_pairing` — **DONE** (py_ecc-mirroring
  FQ12 Miller loop + final exponentiation: `Bn254Fp2.lean` via
  Bn254ComplexAdd/Sub/Mul `csrs 0x808/0x809/0x80a`, `Bn254Fq12.lean` poly
  machine with one fused Arith256Mod per coefficient op,
  `Bn254Fq12Point.lean`/`Bn254PairingCore.lean`/`Bn254Pairing.lean` with the
  real EIP-197 G2 subgroup check; ~40M steps + ~22M/pair; bead
  fhsxz.2.4.2.62.10.1)
- BLAKE2f (0x09) → `zkvm_blake2f`
- KZG_POINT_EVAL (0x0a) → `zkvm_kzg_point_eval`
- BLS12_G1_ADD (0x0b) → `zkvm_bls12_g1_add` — **DONE** (real kernel,
  `Bls12G1.lean`, ZisK Bls12_381CurveAdd/Dbl + Arith384Mod; PR #8739)
- BLS12_G1_MSM (0x0c) → `zkvm_bls12_g1_msm` — **DONE** (LE-internal
  double-and-add + REAL order-n subgroup check; 128-pair MSM ~14.8M
  steps after PR #8741)
- BLS12_G2_ADD (0x0d) → `zkvm_bls12_g2_add` — **DONE** (software Fp2
  chord/tangent over the complex accelerators + Fermat inverse,
  `Bls12G2.lean`; PR #8740)
- BLS12_G2_MSM (0x0e) → `zkvm_bls12_g2_msm` — **DONE** (same backend)
- BLS12_PAIRING (0x0f) → `zkvm_bls12_pairing` — **DONE** (py_ecc-
  mirroring FQ12 = Fp[w]/(w^12-2w^6+2) Miller loop on fused Arith384Mod,
  `Bls12Fq12.lean`/`Bls12Pairing.lean`; real subgroup checks both sides;
  EEST 88/88; PR #8745)
- BLS12_MAP_FP_TO_G1 (0x10) → `zkvm_bls12_map_fp_to_g1` — **DONE**
  (SSWU + 11-isogeny + accelerated cofactor clear, `Bls12Map.lean`;
  EEST 77/77 with 0x11; PR #8745)
- BLS12_MAP_FP2_TO_G2 (0x11) → `zkvm_bls12_map_fp2_to_g2` — **DONE** (same file, 3-isogeny)
- secp256r1_verify (0x100) → `zkvm_secp256r1_verify`
- Non-precompile accelerators reused by EVM opcode handlers: `zkvm_keccak256`
  (KECCAK256 opcode, §8.3), `zkvm_secp256k1_verify` (transaction signature
  verification).

#### 8.3 KECCAK256 (via accelerator)
- Pop offset+size, hash EVM memory region.
- Delegates to `zkvm_keccak256` accelerator.
- Spec: result = keccak256(memory[offset..offset+size]).

#### 8.4 LOG0-LOG4
- Pop offset+size (+topics), emit log event via ECALL.

#### 8.5 CALL, STATICCALL, DELEGATECALL, CREATE, CREATE2
- Create child EVM frames. Model as recursive interpreter calls or
  host-delegated syscalls.
- RETURN and REVERT halt the current frame with output data.

### Phase 9: Gas Metering

#### 9.1 Static Gas
- Each opcode deducts a fixed gas cost before execution.
- Out-of-gas → halt with error, revert state.

#### 9.2 Dynamic Gas
- Memory expansion: quadratic cost based on memory high-water mark.
- Storage: cold/warm access costs (EIP-2929).
- CALL gas: 63/64 rule, stipend for value transfers.

### Phase 10: Transaction Processing

#### 10.1 Message Call
- **File**: `Evm64/MessageCall.lean` (new)
- **Approach**: Set up EVM execution frame:
  1. Initialize EVM state (code, calldata, gas, value, caller)
  2. Run interpreter loop to completion
  3. Handle output (RETURN data, REVERT, error)
  4. Apply state changes (storage writes, balance transfers)
- **Reference**: `execution-specs/.../vm/interpreter.py:process_message_call`

#### 10.2 Transaction Validation & Execution
- **File**: `Evm64/Transaction.lean` (new)
- **Approach**:
  1. Validate transaction (nonce, gas limit, balance)
  2. Deduct upfront cost
  3. Execute message call
  4. Refund remaining gas
  5. Pay priority fee to coinbase
- **Reference**: `execution-specs/.../fork.py:process_transaction`

### Phase 11: Block-Level STF

#### 11.1 Block State Transition
- **File**: `Evm64/StateTransition.lean` (new)
- **Approach**: The top-level STF function:
  1. Read block (header + transactions) from `read_input`
  2. Validate block header
  3. Process each transaction sequentially, updating world state
  4. Apply block rewards
  5. Compute post-state root
  6. Write post-state root via `write_output`
- **Reference**: `execution-specs/.../fork.py:state_transition`
- **Spec**: `state_transition_spec` proves that the RISC-V program computes
  the same post-state as the Python reference spec.

#### 11.2 World State Model
- Account state: nonce, balance, storage root, code hash
- State trie: delegated to host via ECALL (trie operations are zkVM-accelerated
  or proven separately)
- MPT proof verification: either inline or via host

#### 11.3 IO Integration
- `read_input`: Reads block data + pre-state witness (per zkvm IO standard)
- `write_output`: Writes post-state root (32 bytes) as public output
- The zkVM proves: "given this block and pre-state, the post-state root is X"

#### 11.4 Conformance Testing
- Run against Ethereum test vectors (ethereum/tests).
- Compare RISC-V execution results to reference Python execution.
- Use `decide` or extraction for executable tests.

---

## Stateless Guest (parallel STF track)

Full plan: `~/.claude/plans/please-cut-a-branch-warm-wand.md`.
Branch: `feat/run-stateless-guest-scaffold`.

Stakeholders asked for `run_stateless_guest`
(`execution-specs/src/ethereum/forks/amsterdam/stateless_guest.py:33`)
to be implemented in RV64IM macro-assembly **early**, so testing can
start before the proof effort catches up. Lands in a multi-PR sequence
under `EvmAsm/Stateless/`. Each module ships with a `Program.lean` and
a `Spec.lean` placeholder so CPS-triple proofs slot in later without
restructuring. Precompiles raise a distinct `Unimplemented` exit code
(0xFE marker at `OUTPUT_ADDR`); KECCAK256, ECRECOVER, SHA256, etc. go
through ECALL bridges (extending `EvmAsm/EL/Keccak*EcallBridge.lean`).

### PR sequence

| PR | Scope | First fixture that passes |
|---|---|---|
| PR1 | Scaffold + `Unimplemented` exit + `Entry` stub | `empty_witness` (Unimplemented marker round-trip) |
| PR2 | SSZ decode/encode + roundtrip script | `empty_witness` (false-validation roundtrip) |
| PR3 | Headers RLP decode + validate + Witness DBs | `single_header`, `chain_3_headers` |
| PR4 | MPT walk + read-side WitnessState | `mpt_one_account` |
| PR5 | SSZ `hash_tree_root` + SHA256 bridge | `compute_new_payload_request_root` |
| PR6 | Block + Transaction + ECRECOVER bridge | `tx_transfer` |
| PR7 | EVM interpreter dispatch + opcode wiring | `bytecode_push_add` |
| PR8+ | Remaining opcodes, MPT mutation, state-root | tracked under Phase 5–11 above |

### Status

- ✅ PR1 scaffold committed: `EvmAsm/Stateless/` with
  `MemoryLayout.lean`, `Unimplemented.lean`, `Entry.lean`,
  `EntrySpec.lean`, and the `Stateless.lean` umbrella.
- ✅ #5164 resolved: `isValidMemAddr` is a 3-region predicate
  (legacy + INPUT + RAM); existing proofs unaffected.
- ✅ PR2 SSZ output encoder + roundtrip: `Stateless/SSZ/Encode/`
  emits the 41-byte SSZ encoding of `StatelessValidationResult(root=0,
  valid=false, chain_id=1)` at `OUTPUT_ADDR`; bytes verified identical
  to Python's `serialize_stateless_output` reference encoder via
  `scripts/codegen-stateless-roundtrip-check.sh` on ziskemu.
- ✅ PR3 SSZ chain_id decoder: `Stateless/SSZ/Decode/` reads
  `chain_id` from `INPUT_ADDR + 24` (SSZ container header byte 8).
  Encoder parameterised on `x10`; the decoded `chain_id` flows
  through to `OUTPUT_ADDR`. Roundtrip test feeds Python-generated
  SSZ blobs with `chain_id ∈ {1, 0x1234567890ABCDEF}`; both pass on
  ziskemu.
- ✅ PR4 witness-length validation bit: `decode_validation_bit`
  reads `offset_1` and `offset_3` from the outer container header
  and sets `x11 = 1` iff the witness body is empty (length 12).
  Encoder ORs `x11` into the packed `bool || chain_id` word.
  Third fixture (`--with-empty-header`) flips the bool to 0;
  output round-trips through Python's SSZ decoder.
- ✅ PR5 headers-emptiness bit: `decode_validation_bit` now chases
  the inner offset chain (outer offset_1 → witness_addr → inner
  offset_headers → headers_addr; outer offset_3 → headers_end) and
  sets `x11 = 1` iff `witness.headers` is empty regardless of
  state/codes. Fourth fixture (`--with-empty-state-node`) keeps
  bool=1 under PR5 vs. 0 under PR4 -- confirms the deeper walk.
- ✅ PR6 header_count surfacing: `decode_header_count` reads the
  first u32 of the headers list (with a BEQ guard for the empty
  case) and divides by 4, leaving the count in `x16`. Encoder
  writes it as a u64 at `OUTPUT_ADDR + 48` (diagnostic field past
  the 41-byte SSZ result). Fifth fixture (`--with-two-empty-headers`)
  verifies count=2.
- ✅ PR-K1 ziskemu keccak intrinsic pinned: `CSRS 0x800, a0`
  (32-bit encoding `0x80052073`) triggers `_opcode_keccak` in
  ziskemu, which permutes the 200-byte state buffer pointed to by
  `a0` via `zisk_keccakf1600`. New `zisk_keccak_probe` BuildUnit
  emits the raw `.4byte` and copies the post-permutation state to
  OUTPUT_ADDR; matches the Keccak team's reference vector for the
  zero-state permutation. Source:
  `ziskos/entrypoint/src/syscalls/keccakf.rs` + `syscall.rs`
  (`SYSCALL_KECCAKF_ID = 0x800`) + `ziskos_syscall!` macro
  expanding to `csrs <csr>, <reg>`.
- ✅ PR-K2…PR-K286 (rolling, in active development): per-helper
  RV64IM macro-asm pieces of `run_stateless_guest` shipped under
  `EvmAsm/Codegen/Programs/` and `EvmAsm/Stateless/`. The catalogue
  currently covers RLP primitives, transaction decoders (legacy +
  EIP-1559/2930/4844/7702), account and MPT walkers, block-body
  helpers, header field extractors, withdrawal RLP/hash, address
  derivation (CREATE / CREATE2), and (recent slice, 2026-05-)
  the chain-level helpers `chain_extract_basefee_range`,
  `chain_validate_basefee_{non-decreasing,non-increasing}`,
  `chain_validate_gas_limit_{constant,non-decreasing,non-increasing}`,
  `chain_extract_gas_limit_first_last`, `chain_compute_total_gas_limit`,
  `chain_extract_excess_blob_gas_first_last`,
  `chain_compute_{max,min}_excess_blob_gas`,
  `chain_compute_{max,min}_blob_count`,
  `chain_extract_first_last_parent_beacon_block_root`,
  `chain_extract_first_last_requests_hash`,
  `header_extract_requests_hash`. Each helper ships with a
  ziskemu fixture cross-checked against
  [`execution-specs`](execution-specs/) Python; see
  `scripts/codegen-stateless-*-check.sh` for the per-helper round-
  trip scripts.
- ✅ **State-trie / MPT read family (2026-05-)**: a new tranche of
  `*_at_header_state_root` helpers walks the witness state trie from a
  header's `state_root` to read account and storage data, mirroring EVM
  account-accessing opcodes. Storage-proof scaffolding first
  (`zisk_validate_state_root_against_witness_node`,
  `zisk_validate_witness_state_contains_root`,
  `zisk_validate_storage_root_in_witness_storage`), then the account walk
  (`zisk_account_at_header_state_root`, `zisk_account_exists_at_header_state_root`)
  and the opcode-level getters built on it:
  `zisk_balance` (BALANCE), `zisk_nonce`, `zisk_code` / `zisk_code_size`
  (EXTCODESIZE) / `zisk_extcodehash` (EIP-1052) / `zisk_extcodecopy`
  (EXTCODECOPY), `zisk_slot` / `zisk_sload` (SLOAD storage-slot lookup),
  `zisk_account_is_empty` (EIP-161 emptiness), and
  `zisk_has_code_or_nonce` (EIP-684 CREATE-collision check) — all
  `_at_header_state_root`. Plus `zisk_blockhash_from_witness_headers`
  (BLOCKHASH) and `zisk_witness_headers_chain_validate` (chain continuity).
  Same fixture/round-trip discipline against `execution-specs` Python.
- ✅ **EEST conformance harness (2026-05-31)**: the `stateless_guest` ELF
  now runs against the real Ethereum Execution Spec Tests instead of only
  synthetic inputs. Target = `ethereum/execution-spec-tests` @ `zkevm@v0.4.0`
  (the dedicated stateless "zkevm" fixture line for the **Amsterdam /
  Glamsterdam** fork; adds the 2-byte schema-id prefix + filled `public_keys`
  the guest reads), vendored as the `execution-spec-tests` submodule
  (`shallow=true`). Pipeline: `scripts/eest-fetch-fixtures.sh` downloads the
  `fixtures_zkevm.tar.gz` release artifact;
  `scripts/eest-stateless-to-input.py` turns each block's
  `statelessInputBytes` into a ziskemu `-i` input (+ manifest);
  `scripts/codegen-eest-stateless-check.sh` builds the ELF, runs each input
  on ziskemu, and compares the output against the block's recorded
  `statelessOutputBytes`. Baseline (zkevm@v0.4.0): 23,219 stateless blocks
  (22,325 valid / 894 invalid); the still-partial guest matches the
  `successful_validation` bit on the 894 invalid-expecting blocks (it rejects
  every non-empty-header witness) and 0 full-output matches (it emits the
  pre-v0.4.0 empty-`active_fork` encoding). PR7+ guest completeness moves
  this baseline up; see PROGRESS.md Axis F.
- ✅ **BAL execution-vs-witness consistency + per-tx tuple soundness (beads `bmvmx.1.6.*`,
  `i3djw`; 2026-06)**: the stateless verdict re-derives the EIP-7928 `block_access_list`
  FROM execution and rejects a witness BAL that disagrees, instead of trusting it (the guest
  otherwise only pins `keccak(prover_BAL) == header.block_access_list_hash`, a consistency
  check, so every execution-derived equality is a non-redundant soundness guard).
  - **Storage finals**: per-account forward/reverse `bal_storage_matches_exec_log` /
    `bal_storage_covers_exec_log` / `bal_storage_reads_in_exec_log` over the append-per-write
    storage exec-log, wrapped all-accounts by `bal_all_accounts_storage_consistent` (callee key
    via `bal_addr_to_exec_log_key`, recipient checked in `block_verdict`).
  - **Non-storage** (balance/nonce/code, `i3djw`): per-account `bal_account_nonstorage_finals`
    → `bal_account_nonstorage_consistent` (balance+nonce forward+reverse) +
    `bal_account_code_consistent` (deployed-code bytes); wrapped all-accounts forward
    `bal_all_accounts_nonstorage_consistent` + reverse `…_covers`, and code forward
    `bal_all_accounts_code_consistent` + reverse `…_code_covers`. These scope to CALLEES via a
    `{sender, recipient, coinbase}` skip-list — the gas/value accounts are checked on the gas
    path (`sender_debit_from_gas`, `tx_gas_bal_post_verify_runtime`, sender-nonce).
  - **Per-tx tuple sequence** (`bmvmx.1.6.6`, the consensus-binding layer the spec hashes into
    `header.block_access_list_hash`; closes the finals-only false-accept gap): the exec-log
    carries a parallel `exec_log_txindex` array; `bal_slot_tuple_sequence` (BAL side) +
    `exec_log_slot_tuples` (exec side: group-by-txindex, last-write-per-tx, net-zero filtered) +
    `slot_tuple_sequences_match`, composed per-account (`account_tuple_sequences_consistent`)
    and all-accounts (`bal_all_accounts_tuple_sequences_consistent`).
  - Each ships a ziskemu probe cross-checked against `execution-specs` amsterdam
    `block_access_lists.py` / `fork.py`. **Wiring** of the non-storage/code/tuple checks into
    `block_verdict` lands as execution emits the per-account effect records + the per-tx
    `block_access_index` (multi-tx loop, `fhsxz.2.4.2.57.11.6.3`) + CREATE/SELFDESTRUCT code
    effects; the storage tuple check can wire immediately on the existing exec-log + txindex.
  - **Wired (2026-06-09)**: CREATE/CREATE2 activated in the self-contained gate (#8649,
    `.61.8.3.3`) so created accounts execute through the real init-code dispatch descent and
    emit code + non-storage effect records (CALL-value producer `i3djw.1` #8559-active, CREATE
    producer `i3djw.2` #8643). Non-storage all-accounts forward `bal_all_accounts_nonstorage_consistent`
    (#8646) + reverse `…_covers` (#8652) wired + merged + operator-EEST-sweep-confirmed (`i3djw.3`
    closed). Forward all-accounts CODE `bal_all_accounts_code_consistent` wired with an **EIP-7702
    skip** — a BAL `code_change` that is exactly a 23-byte `0xef0100||address` delegation indicator
    (installed from the authorization list, no CREATE deposit → no exec code-effect) is skipped
    instead of false-rejected (`i3djw.4`, #8653 merged + sweep-confirmed). Reverse code `…_code_covers`
    already merged (#8626).
  - **BLOCKHASH activated (2026-06-10, `3vc2p`)**: the M29 recent-blockhash table is reconstructed
    from the witness headers (`stage_blockhash_m29` #8655) and staged into the contract-recipient
    payload between the storage preload and the env trailer — written by `stage_runtime_payload_code`
    (#8662) with `env_base` shifted by `count*32`, computed in `dispatch_tx_runtime_code` (#8663) — so
    the input-driven dispatcher setup loads `env+552/+560` + `evm_block_hashes` and BLOCKHASH (0x40) is
    removed from the self-contained reject set (#8664). Only BALANCE (0x31) + SELFDESTRUCT (0xff) remain
    rejected. (3-PR chain #8662→#8663→#8664, REQUIRES EEST SWEEP.)
  - **Contract-dispatch soundness audit (2026-06-10)**: the recently-activated contract/CREATE dispatch
    path had a class of latent bugs — fixed `.data` buffers sized for the small tested contracts,
    overflowed by realistic (EIP-170/3860-sized) inputs, all passing the sweep only because tested rows
    are small. Found + fixed **4 overflows**: env_base offset mismatch for non-empty calldata/storage
    (`3vc2p.5`/srpc_env_base #8658, merged); `bv_runtime_payload` 4096→65536 + size guard (`bmvmx.1.7.2`
    #8659, merged); `bal_recipient_storage_keys`/`bvcd_keys` 16-cap → 128 (`bmvmx.1.7.3` #8660);
    CREATE init-code > EIP-3860 49152 → `.exit_outofgas` guarding `create_child_initcode` 65536
    (`.61.8.3.6` #8665). Plus a semantic gap (`5em02`, OPEN): SELFBALANCE (activated yisv8.1) reads the
    PRE-state balance, diverging for value-moving contracts — the value-transfer descent is inert; the
    fix is the in-exec balance spine (`bmvmx.1.6.x`). See `contract-dispatch-fixed-buffer-overflows`
    auto-memory for the audit method.
  - **Requests-hash derivation (EIP-7685, `8uld3`)**: `parse_deposit_requests` (#8657, merged) +
    `assemble_execution_requests` (#8666) — derive the post-execution `requests_hash` from
    execution-produced bodies, not the trusted SSZ input. Standalone + probe-verified; the execution-gated
    piece (real receipts + EIP-7002/7251 system-call return data) is the follow-up.

- 🔶 **Contract-recipient gas-measurement accuracy (beads `nxio8`, `tpdo1`; 2026-06)**: the runtime
  dispatcher meters CONTRACT execution with STATIC base opcode gas only (`Dispatch.lean:338-345`),
  dropping the dynamic components — so a contract recipient measured by `dispatch_tx_runtime_code`
  gets wrong gas, which feeds `block_regular` → the EIP-7778/8037 block-gas gate can mis-judge
  contract blocks (false-accept/reject). This is the root blocker for contract-row EEST parity.
  - **Fixed**: `bytecode_is_self_contained` now rejects SELFDESTRUCT (`0xff`) (`tpdo1`, #8628) — its
    gas needs the un-staged beneficiary, so it can never be measured self-contained (a real
    false-accept closure; only adds conservative rejects).
  - **Primitive foundation built** (`nxio8.1`–`.3`, probe-verified vs Amsterdam `vm/gas.py`,
    unwired → verdict byte-identical): `sstore_regular_gas` (#8630), `memory_expansion_gas` (#8631),
    `keccak256`/`copy`/`log`/`exp` per-unit leaves (#8633). EIP-2929 cold/warm is already charged via
    `evm_storage_access_gas`.
  - **Dynamic-gas wiring LANDED** (2026-06-09, c1): warmth table live in `h_SLOAD`/`h_SSTORE`,
    SSTORE value-transition gas + refund accumulation, M31 memory-expansion / copy / keccak /
    log / exp dynamic gas in all memory-touching handlers, account warmth in
    BALANCE/EXTCODE*/CALL*/SELFDESTRUCT. eip7778 EEST filter = 32/32 full-match.
  - **Amsterdam schedule + EIP-8037 state gas** (2026-06-12, branch
    `feat/dispatcher-eip8037-state-gas`, bead `nxio8`): SSTORE moved off the legacy Cancun
    schedule (20000 SET / 19900 zero-restore refund) to Amsterdam (2900 clean-changing +
    97,920 EIP-8037 state gas for zero-origin creation via `evm_state_gas_left`/`evm_state_gas_used`
    + the reservoir split at TX_MAX_GAS_LIMIT, restore refund 2800 + state credit); SSTORE
    stipend `check_gas(2301)`; per-dispatch-call resets for `evm_refund_acc` /
    `evm_storage_access_count` / state cells / halt_kind (all leaked across multi-tx
    dispatch calls); `dispatcher_tx_gas_settle` folds the spec's tx-level settlement
    (error → state restore + refund discard, exceptional halt → regular gas burnt) into
    the `dispatch_tx_runtime_code` / block-verdict gas captures; the standalone
    `runtime_dispatcher`/callable-probe ELFs link again (frame-helper closure bundled —
    they had been unlinkable since the CREATE-descent handlers landed).
  - **Remaining**: EIP-2930 access-list storage-key seeding (`evm_storage_access_seed_key`
    exists, unwired) + the access-list intrinsic in the runtime tx-gas; child-frame
    `incorporate_child_on_error` state-gas/refund/warmth rollback (global cells, no
    per-frame snapshot); creation-tx intrinsic state gas + code-deposit state gas.

- 🔶 **EIP-8025 witness code-preimage rejects (beads `ok3nl`/#8638, `mkwwf`; 2026-06)**: the spec rejects a
  block when execution reads a `code_hash` absent from the witness (`WitnessState.get_code` raises). (a)
  **current-frame code** (`ok3nl`, #8638 MERGED) — a contract recipient with non-empty `code_hash` but no
  `witness.codes` preimage now rejects at `.Lbv_contract_dispatch` via `witness_lookup_by_hash` (was a
  false-accept). (b) **implicitly-executed system-contract code** (`mkwwf`, OPEN) — EIP-2935 history code
  absent from the witness should reject, but a blanket witness.codes-presence requirement FALSE-REJECTS the
  valid `system_contract_reaches_gas_limit` rows (36141b1cb: tolerated/gas-limit system calls legitimately
  omit the predeploy code). Correct fix must distinguish code-required (normal execution) from code-omittable
  (system-call errors/gas-limit) — system-call-outcome modeling, not a presence check. Both only ADD rejects;
  the unsound mkwwf attempt was caught by a clean-main regression sweep. (c) **precheck-failed CALL targets**
  (`kpvxz`, PR #8656) — `bal_call_target_delegated_code_valid` hard-rejected a `witness_lookup_by_hash` miss
  on the CALL-target's own code, false-rejecting `static_create_contract_suicide_during_init` d3 (CALL with
  value inside a STATICCALL frame raises at precheck, so the target's code is never read and its preimage is
  legitimately absent). Target-own-code miss now means "no delegated-code obligation"; the marker-visible
  delegated-code reject and the extcodesize status-5 chain are unchanged (validation_codes 01427–01437 sweep
  kept all expected verdicts).

- 🔶 **Receipts → stateless verdict (bead `fhsxz.2.4.2.63.1.6.2`; 2026-06-12 overnight PR stack)**:
  the verdict never compared a guest-computed `receipts_root`/`logs_bloom` against the header (an
  audited reachable no-tx false-accept, hermes-c3). Landed as a stacked chain on the dynamic-gas PR:
  - **#8721** (base, bead `nxio8`): Amsterdam SSTORE schedule + EIP-8037 state gas + per-tx
    dispatch-state resets + `dispatcher_tx_gas_settle` settlement fold (eip8037 200/200 + 271/271,
    eip7778 32/32, random-20 — all full-match; details in the gas section above).
  - **#8722**: NO-TX receipts consensus check (empty-trie root + zero bloom vs header, fail codes
    50/51 — closes the audited false-accept) + per-tx receipt STATUS capture (settle success bit →
    `bv_tx_status_arr` → materializer).
  - **#8725**: `log_records_encode_rlp` leaf — captured LOG descriptors + `evm_log_data` → the spec
    `rlp([log..])` (probe-verified byte-exact vs a python RLP reference).
  - **PR 6 (this branch)**: per-tx log-window capture (`block_log_window_snapshot` — each dispatch
    call resets/overwrites the capture buffers, so windows are copied into a block-level arena
    between dispatches) + per-record logs RLP + bloom (`block_receipt_logs_materialize`, filling the
    `{bloom, logs_rlp, len}` descriptors `receipt_records_encode_no_logs` consumes). INERT: debug
    status only.
  - **BLOCKER for tx-bearing enforcement** (new receipts-blocker bead under `.63.1.6.2`): the
    dispatcher emits the EIP-7708 synthetic transfer log ONLY on the SELFDESTRUCT path — top-level
    tx value transfers and CALL value transfers never land in `evm_event_logs`, so guest receipts
    for ANY value-bearing tx are missing the transfer log and enforcement would false-reject. Fix =
    emit the synthetic descriptor at the top-level value move (needs a log-preseed surviving the
    per-call reset, like the callee storage seeds) + in the CALL value-transfer producer; THEN wire
    `receipt_records_encode_no_logs` → `block_validate_receipts_root_indexed` + per-record-bloom OR
    on the exact-measured path (build trie descriptors from records @40/@48, NOT by re-parsing the
    carrier list — typed receipts are `type_byte||rlp` and break `rlp_list_nth_item`).

- 🔶 **ECRECOVER recovery output (bead `fhsxz.2.4.2.62.2.5`; 2026-06-12, #8723 + #8724)**: the
  precompile returned success with empty returndata for every call. `secp256k1_recover_pubkey_staged`
  (extracted from `tx_pubkey_recover_raw`, accelerator-backed ~2e6 steps) now backs the handler tail
  (recover → keccak → 32-byte left-padded address; invalid signature keeps empty-returndata success).
  Reached via the `ecrecover_backend_ptr` cell so non-secp closures keep linking (the guest arms it in
  `dispatch_tx_runtime_code`). #8724 adds the end-to-end dispatcher probe (CALL 0x01 recovers
  `valid_signature_1` → `a94f...bf0b`; invalid v / zero r fail closed) — it caught an a0/x10 alias bug
  in the status branches (fixed on #8723). EEST ecrecover family 80/80 before AND after (the family's
  verdicts don't yet distinguish empty vs real returndata; a returndata-consuming EEST pin is noted on
  the bead). RIPEMD160 backend and MODEXP success output remain blocked on ziskemu routes
  (`docs/eest-precompile-frontier.md` row updated).

- 🔧 **FileSizeGuard cap discipline (#8645)**: `EvmAsm/Codegen/Programs/FileSizeGuard.lean`'s `#eval`
  enforces a 1500-line hard cap on every file under `Programs/`, but reads siblings via `IO.FS.readFile`
  (untracked by lake), so the guard's olean caches — incremental builds miss a too-large file while the
  clean docker `lake build codegen` fails. `BlockVerdict.lean` (1786) was split into
  `BlockVerdictStateRoot.lean` (block_state_root + stateless_verdict_v2), byte-identical assembly. Keep
  `block_verdict` itself out of splits to avoid churn with the call-frame descent.

### Cross-references

- Memory layout: `EvmAsm/Stateless/MemoryLayout.lean`.
- Reason codes (precompile, EIP-7702, EIP-4844, etc.):
  `EvmAsm/Stateless/Unimplemented.lean`.
- SDIV blocker (`evm-asm-9iqmw`) does **not** block this track —
  fixtures avoiding SDIV are picked first.

---

## Priority Order

**Immediate (recreate deleted specs) — ✅ ALL DONE:**
1. ~~Recreate `StackOps.lean`~~ — ✅ Done (Pop.lean, Push0.lean, Dup.lean, Swap.lean)
2. ~~Recreate `ShiftSpec.lean`~~ — ✅ Done (SHR per-limb + phase + body specs, 961 lines, 0 sorry)
3. ~~Recreate `ShlSpec.lean`, `SarSpec.lean`~~ — ✅ Done (SHL/SAR full hierarchy: per-limb + compose + semantic)
4. ~~Recreate `ByteSpec.lean`~~ — ✅ Done (Byte/Spec.lean + Byte/LimbSpec.lean, stack-level spec)

**Short-term (enables simple contracts):**
5. Phase 4.2: DIV, MOD — partial. Per-branch stack specs landed for
   the partial domain `b.getLimbN 3 = 0` (bzero / n=1/2/3) under the
   unified `evm_div_stack_spec` / `evm_mod_stack_spec` dispatchers
   (`Spec/Unified.lean`). Executable `evm_div` / `evm_mod` were switched
   to `divK_div128_v4` (full Knuth Algorithm D, 2-correction in both
   Phase 1b and Phase 2b) by [PR #4992](https://github.com/Verified-zkEVM/evm-asm/pull/4992)
   on 2026-05-19, so the runtime path is correct over the full
   4-limb domain. The remaining spec-layer work — extending
   `divCode` / `modCode` to v4 and proving full-domain
   `evm_div_stack_spec_unconditional` / `evm_mod_stack_spec_unconditional`
   (the n=4 path) — is tracked under bead `evm-asm-9iqmw` and reopened
   [GitHub issue #61](https://github.com/Verified-zkEVM/evm-asm/issues/61).
   In flight as of 2026-05-29: the **v5 div128 migration** (bead
   `evm-asm-wbc4i.6`) — `divK_div128_v5` caps the trial quotients to repair
   v4's two buggy ULTs, and the spec layer is being rebuilt against it
   (shared code surfaces `Compose/V5Code.lean`; step1/step2/phase1b/phase2b
   block specs in `LimbSpec/Div128*V5.lean`; n=4 dispatcher predicates and
   the `hq_over`-assuming runtime chain in `Spec/CallAddback{,Runtime}V5.lean`).
   See Phase 4.2 "V5 track" above for the file map. The earlier v4 series
   (`feat/div-n1-*`, `feat/div-n2-n3-selected-*`, `feat/div-double-addback-*`)
   landed the partial-domain conditional-carry and overestimate bridges.
   SDIV (`evm_sdiv_stack_spec_within`) and
   the SDIV/SMOD epilog wrappers are conditional on the same v4
   migration and unblock automatically when `evm-asm-9iqmw.5` lands.
6. Phase 5: MLOAD, MSTORE, EVM memory model
7. Phase 5.1: EVM code region (needed for PUSHn and interpreter)

**Execution layer (RLP decoder — STF prerequisite):**
- ~~EL.1: RLP specification~~ — ✅ Done
- ~~EL.2: Byte-level infrastructure~~ — ✅ Done
- EL.3: RLP RISC-V decoder phases 1-6

**Medium-term (interpreter loop — STF core):**
8. Phase 7.1-7.2: EVM machine state + opcode dispatch
9. Phase 7.3: Opcode handler wrappers (gas + dispatch)
10. Phase 7.4: Interpreter main loop with simulation relation proof
11. Phase 6: Environment opcodes (CALLER, CALLVALUE, etc.)

**Towards STF (full EVM execution):**
12. Phase 8.1-8.3: SLOAD/SSTORE, KECCAK256 (via syscalls/accelerators)
13. Phase 8.4-8.5: LOG, CALL/CREATE, RETURN/REVERT
14. Phase 9: Gas metering (static then dynamic)
15. Phase 10: Transaction processing (message call + validation)
16. Phase 11: Block-level STF + IO integration + conformance testing

---

## Design Decisions

1. **RV64IM target**: Per zkvm-standards, `riscv64im_zicclsm` is
   the standardized target for Ethereum zkVMs. 64-bit words mean 4 limbs
   per 256-bit word.

2. **Stack-in-memory**: EVM stack elements are 256-bit words stored in
   RISC-V memory (4 consecutive 64-bit words in RV64). SP register (x12)
   points to top of stack. Stack grows upward, 32 bytes per element.

3. **Syscall bridge (ECALL)**: Complex operations (KECCAK, SLOAD/SSTORE, CALL,
   precompiles) use ECALL to delegate to the zkVM host. This aligns with the
   `zkvm_accelerators.h` C interface standard. The host provides:
   - Cryptographic accelerators (keccak, EC ops, pairings)
   - Storage read/write
   - State trie operations

4. **Per-limb modularity**: Each 256-bit operation decomposes into 4 per-limb
   operations (RV64) with individual specs, then composed via `runBlock`.

5. **Simulation relation for STF**: The interpreter loop proof uses a
   simulation relation between abstract EVM state and concrete RISC-V state.
   Each opcode handler preserves the simulation; the loop proof is inductive.

6. **Reference spec**: All opcodes must match the semantics in
   `execution-specs/src/ethereum/forks/shanghai/vm/`.

7. **Proof automation**: `xperm`/`xsimp` for assertion permutation,
   `runBlock` for multi-limb composition, `validMem`/`liftSpec`/`pcFree`
   for boilerplate elimination. Recent refactorings (let-code, runBlock)
   have eliminated thousands of lines of manual proof.

8. **IO standard**: The STF program uses `read_input`/`write_output` per
   the zkvm IO standard. Input = block + pre-state witness. Output =
   post-state root hash.