# Read-side projections: when and how

This is the methodology doc for the read-side layer in Sal. The
per-RDT crosswalks (`peritext-vs-paper.md`, `or-set-vs-paper.md`,
etc.) cite the rules; this file states them once.

## The "vacuous convergence" problem

Sal's 24 RA-linearizability VCs prove **state convergence**: two
replicas with the same set of observed ops reach the same `concrete_st`
under merge. For RDTs whose state is a flat lattice with semantically
direct read (e.g. counters, simple registers), state-convergence is
the entire correctness story.

For many interesting RDTs it is not. Consider OR-Set: its 24 VCs
prove that the underlying tag-set + tombstone-set converges under
union — but they say nothing about whether `e ∈ set` agrees on
two replicas under concurrent Add/Remove on `e`. The **headline
property** of OR-Set lives in the read-side projection, not in the
state-equivalence under merge.

The Sal paper proves convergence on 13 RDTs and stops. Read-side
projections are the methodological extension that lifts the
*semantic* claims of an RDT (lookup, query, peek-max, antichain,
visible-string, …) into Lean — orthogonal to the convergence
layer, but composing with it to give a complete correctness story.

## Why the 24 VCs already do more than CCI

The textbook state-based CRDT story — commutative, associative,
idempotent merge ("CCI") — is strictly weaker than what Sal's 24 VCs
prove, and it's worth being precise about the gap.

CCI isn't even sufficient for a counter. Take `Int` as state and
`max` as merge: CCI holds, but the structure isn't a counter. Two
concurrent `inc`s on `0` produce `max(1, 1) = 1` instead of `2`. The
lattice laws hold; the operation semantics is not modelled. To
recover a counter you need a per-replica vector, pointwise-max as
merge, and the invariant that `inc` bumps only the acting replica's
slot. CCI has no machinery to express any of that.

Sal's 24 VCs prove RA-linearizability: the merged state matches
*some* sequential application of the same ops under `rc`. This is
strictly stronger than CCI. The int+max "counter" *fails* the 24 VCs
— there is no rc-respecting linear order of `inc`s whose result
equals `max`. So the 24 VCs already rule out the kind of
pseudo-counter that CCI accepts, and that's why Tier A feels free:
the operation semantics is baked into the converged state, and the
read is just a direct projection.

The real Tier C gap, then, is not "CCI vs intent." It is that the
user-visible query is a *non-injective projection* of the state.
The 24 VCs prove `(adds, tombstones)` converges in an op-respecting
way; they say nothing about `e ∈ set`, because that predicate isn't
a state component — it is a function of one. Read-side theorems
are what re-attach the user's mental model to the projection.

| | Catches int+max-as-counter? | Captures op semantics? | Captures projected queries? |
|---|---|---|---|
| CCI lattice merge | no | no | no |
| Sal's 24 VCs (RA-lin) | yes | yes | only when read = state itself |
| 24 VCs + read-side | yes | yes | yes |

This trichotomy is the load-bearing reason the tier framework below
exists. Tier A/B sit in the middle row; Tier C needs the bottom row.

## The three tiers

The Sal suite separates RDTs by how much of their semantics the
24 VCs already cover.

### Tier A — read-side is mechanical (still written for documentation)

The 24 VCs cover everything; the read is essentially the identity
projection or a per-key direct lookup whose convergence follows
componentwise from state-convergence. The read-side and SPOT files
are still written — *absence of a read-side file would communicate
nothing about whether the read is trivial*; presence with a short
mechanical proof tells a reader "yes, this query exists, this is
its name and type, here's the convergence theorem (`rfl` or one
`rw`), and here are 2–3 SPOT scenarios pinning the headline op-vs-
read claim."

| RDT | Read |
|---|---|
| `PN_Counter` | per-replica `incs[r] − decs[r]` |
| `Increment_Only_Counter` | per-replica `state[r]` (CRDT) / scalar (MRDT) |
| `LWW_Register` | `value` field of the lex-max pair |
| `MIN_Register` / `MAX_Register` | the scalar state |
| `MAX_Map` | per-key `lookup` (zero-default) |
| `Grow_Only_Set` | `lookup e = mem e` |
| `Grow_Only_Multiset` | per-replica per-element `count_at` |
| `Grow_Only_Map` | per-key per-value `lookup k v = v ∈ state[k]` |
| `LWW_Map` | per-key `lookup k = state[k].value` |
| `Bounded_Counter` | per-replica `inc_count`, `dec_count`, `transfer_count` |
| `Shopping_Cart` | per-replica per-product `add_count`, `remove_count`, `per_replica_qty` |

**Rule of thumb**: if the read is a single arithmetic/lookup over
already-converged components, it's Tier A. Write it anyway — it
takes ~30–60 lines per RDT and pays back the next reader who has
to confirm the read-side really is "obvious."

### Tier B — read-side adds clarification, not structural verification

The 24 VCs imply state-convergence, and the read converges via a
short composition. The read-side adds documentation-quality
clarification (what is "the value" of an MVR? what is "the visible
set" of a flag?) but the underlying state is simple enough that the
clarification is essentially a definition.

| RDT | Read | Why Tier B |
|---|---|---|
| `Multi_Valued_Register` (paper / Neem F* simplified variant) | "all values ever written" | Grow-only set; sequential writes don't actually overwrite, so the read-side adds little beyond `lookup`. |
| `LWW_Element_Set` | `addTs[e] > remTs[e]` | Cross-component comparison adds a small fact not in the 24 VCs, but mechanically is just `>` on per-element max. |

Tier B is the boundary: you *can* write a read-side, the theorems
*are* honest, but the payoff is small. We do it anyway when the
RDT is well-known and the read-side helps a downstream consumer
make sense of the state shape.

### Tier C — read-side is essential

The 24 VCs prove union-commutativity on grow-only state, but the
*user-visible* semantics live in the read. Without a read-side,
you have proved that the data structure converges and *nothing
else*. These are the RDTs where the read-side is structurally
load-bearing.

| RDT | What the 24 VCs prove | What the read-side adds |
|---|---|---|
| `OR_Set` (CRDT, MRDT, Efficient MRDT) | tag-set + tombstone-set converge | "is `e` in the set?" via add-wins resolution; `add_wins_over_concurrent_remove`, `add_then_remove_extinguishes` |
| `Peritext` (CRDT, MRDT) | RGA chars + flat mark-attachment set converge | Visible-order DFS, span-membership via `in_span_visible`, paper Ex 1/2/3/5/7/8 intent theorems |
| `RGA` (CRDT, MRDT) | per-OpId chars/afters/deleted maps converge | Visible-order traversal `visible_lt`, causal-order preservation, deterministic concurrent-insert tiebreak |
| `Add_Win_Priority_Queue` (CRDT, MRDT) | (A, I, R) components converge | `lookup`, LWW innate, MCW-collapsed-to-Σ acquired, `get_max`, `inc_increases_acquired` |
| `Multi_Valued_Register` (classical, post-rewrite) | (writes, removed) converge | "concurrent writes both survive" + "sequential writes supersede" via snapshot |

For Tier C, the convergence proof is necessary but **not sufficient**.
The read-side is the proof that the converged state means what the
paper says it means.

## When is a new RDT Tier C?

Three signals, any of which strongly suggests Tier C:

1. **The user query is structurally non-trivial.** Visible string,
   peek-max, "is x in the set under add-wins", maximal antichain in
   a version DAG — anything that's not a single lookup or arithmetic
   composition.
2. **The state is "richer" than the read.** OR-Set carries every
   add-tag in history; users want to know membership. AW-CRPQ
   carries every add/inc record; users want a single priority. The
   gap between state and read is where the read-side does work.
3. **The op semantics involves observation.** `Rem` snapshots
   currently-visible tags. `inc` updates per-record counters. The
   prepare/effect split that produces grow-only state often hides
   the actual semantics in the snapshot mechanism — read-side
   theorems make it explicit.

Conversely: if the query is "extract the field" or "compute a
sum," it's Tier A. Don't add a read-side for completeness alone.

## The methodology, in three steps

For each Tier-C RDT (`docs/peritext-vs-paper.md` is the canonical
template; `or-set-vs-paper.md` is the closest minimal example):

### 1. Define the query

A `def` (often `noncomputable` because of `Classical.choose` over
the Bool-valued `set`) on `concrete_st`. Examples:

- `def lookup (s) (e) : Prop := ∃ ts, mem (e, ts) (Prod.fst s) ∧ ¬ mem (e, ts) (Prod.snd s)` (OR-Set)
- `def is_visible_value (s) (v) : Prop := ∃ ts, mem (ts, v) (Prod.fst s) ∧ ¬ mem ts (Prod.snd s)` (MVR classical)
- `def is_acquired (s) (e) (v) : Prop := ∃ l : List ..., l.Nodup ∧ ... ∧ v = (l.map _).sum` (AW-CRPQ acquired, list-witness style to sidestep `Finset`)

### 2. Prove convergence-at-the-read

**`eq s₁ s₂ → (query s₁ ↔ query s₂)`** (or `=` if the query is
function-valued and the underlying `eq` is Lean-equal).

This is the read-side analogue of merge convergence. Almost always
trivial: structural induction on the propositional `eq`, or `subst`
if the underlying `eq` is `=`. It is the *foundational* read-side
theorem that justifies talking about "the value" rather than "this
replica's value."

### 3. Prove 2–4 intent-preservation theorems

The headline claims of the paper, lifted as state-based predicates.
Recurring patterns:

- **`add_wins_over_concurrent_*`** — under premises modelling
  concurrency (the new tag is fresh in the common pre-state, OR the
  Rmv's snapshot doesn't include it), the Add survives the merge.
- **`*_then_*_extinguishes`** — under sequential premises, the
  later op overrides the earlier.
- **`*_after_*`** — applying a constructor op makes the
  characteristic predicate hold.
- **A few RDT-specific ones**: `causal_order_visible_lt` (RGA),
  `bold_expand_in_span_visible` (Peritext Ex 7), `innate_record_unique`
  (AW-CRPQ LWW), `latest_write_wins` (LWW-Element-Set).

Three to four intent theorems is usually enough; the paper's
headline claims are typically that few. The Peritext suite has more
because the paper has eight examples in §A.2.

## Step 4 (recommended) — concrete SPOTs

For each Tier-C RDT, alongside the universal read-side theorems we
keep a small **`*_SPOT.lean`** file (Small Proof-Oriented Tests):
concrete `do_`/`merge` traces over literal arguments, with the
expected read-side outcome pinned by proof. Each SPOT is named
after the headline read-side claim it exercises.

Two roles:

1. **Regression tests.** A predicate that is right on every input we
   could close universally can still go subtly wrong under a
   refactor — say, an `init_st` change or a payload-shape rework.
   SPOTs catch the breakage where it bites: on a specific 3-replica,
   4-op trace.
2. **Verified API documentation.** The same scenarios that a demo
   page or paper §X.Y example narrates, here pinned by Lean. A
   reader can trace `Add e at ts=1` then `concurrent Rem e` then
   `merge` and read off the verdict from the example, with the
   guarantee that Lean agrees.

Discharge style is one of:

- Apply the existing read-side theorem (`lookup_after_add init_st …`)
  with literal arguments — fastest, and shows the universal
  theorem firing on a specific instance.
- `refine ⟨witness, ?_, ?_⟩ <;> decide` — when the membership
  predicates `mem`/`contains` reduce on literals.
- `simp [do_, merge]; grind` — when the goal needs unfolding plus
  light arithmetic / case-splitting.

### Pitfalls (audit every SPOT for these)

These are the bugs we hit when first writing the Tier-C SPOT suite;
catch them at write-time, not at review.

**1. Match `rid`s to the paper's framing.** A read-side theorem
with a universal premise (`mark_present s m' → …`) admits witness
states with one or zero such ops — the universal becomes vacuous
and the theorem applies trivially. That's a *valid invocation* but
doesn't exercise the paper scenario.

For paper claims phrased as *"concurrent X from different
replicas"*, use distinct `rid`s on the relevant ops:
`(ts₁, 0, op₁); (ts₂, 1, op₂)`. Same `rid` is only for ops a
single replica issues sequentially. With `rc := Either` on these
RDTs, sequential `do_; do_` on different rids converges to the
same state as a literal merge of two divergent branches — that's
the standard state-based stand-in for "concurrent" used elsewhere
in Sal (see e.g. `OR_Set` CRDT's `add_wins_over_concurrent_remove`).

**2. Don't degenerate the scenario.** "Two overlapping same-type
Adds" needs *two* AddMarks of the same type with truly overlapping
spans — exhibit the overlap point as the witness `c`. "Add wins
over concurrent Remove" needs both an AddMark and a RemoveMark in
the state. If your SPOT compiles with one Add and zero Removes,
re-read the theorem name: you're probably missing structure.

**3. Inline structured-type literals.** `let m : MarkOp := ⟨…⟩`
in a tactic block blocks projection reduction (`m.startSide`,
`m.markType`) inside `simp`. Pass the literal anonymous-constructor
or tuple directly to the theorem and to every reduction call.
This bit us on Peritext SPOTs 7–9 — once the literal is inlined,
`simp` unfolds the projections and the proof goes through.

**4. Use the read-side query in the conclusion, not raw membership.**
A read-side theorem's job is to lift a paper claim into the
projected vocabulary (`is_visible_value`, `lookup`, `formatted_visible`,
…). If your theorem concludes `mem ts (Prod.snd (do_ s …)) = true`
and the docstring says "v₁ is no longer visible," it's actually
proving a *building-block* fact, not the headline. The two are
different: `ts ∈ removed` invalidates *the specific* `(ts, v₁)`
witness; `¬ is_visible_value (do_ s …) v₁` says no witness for
`v₁` survives. The latter requires extra premises (typically a
coverage premise like *"every visible witness for `v₁` in `s` has
ts ∈ O₂"* and an inequality like `v₂ ≠ v₁`) but it is the actual
paper claim.

Pattern: keep both. Name the witness-level lemma
`<theorem>_witness` (the building block), and the value-level
theorem `<theorem>_value` (the headline, in projection
vocabulary). The MVR pair `sequential_write_supersedes_witness` /
`sequential_write_supersedes_value` is the canonical example —
the witness version drops out of `simp [do_]` in two lines, the
value version takes a coverage premise and a value-inequality
premise and concludes `¬ is_visible_value …`.

If a SPOT proof has to do raw `rintro/simp/grind` to close the
negative half of a "v is no longer visible" claim, that's a sign
the read-side theorem is missing — write the value-level one.

**5. The `mark_present`-uniqueness pattern.** Read-side premises
of the form `∀ m', mark_present σ m' = true → P m'` reduce, on a
concrete state with 1–2 AddMarks, to "any present `m'` equals one
of the constructed marks". Discharge:

```lean
have h_uniq : ∀ m' : MarkOp, mark_present σ m' = true →
    m' = literal₁ ∨ m' = literal₂ ∨ … := by
  intro m' h_pres
  simp [hσ, ..., mark_present, marks_of, do_, add,
        _root_.singleton, union, empty] at h_pres
  rcases m' with ⟨_, _, _, _, _, _, _⟩
  grind
```

The `rcases m'` *before* `grind` is the trick — destructuring the
structured type into fields lets `grind` collapse the
AnchorAttachment-injectivity disjunction componentwise. Without
the `rcases`, `grind` stalls on a 6-way Prop disjunction with
projection-blocked equalities.

### Concurrency encoding — two equivalent conventions

Both the readside theorems and the SPOTs use one of two conventions
for representing "concurrent":

- **Literal merge** (used by OR-Set, RGA, MVR, AW-CRPQ readsides):
  the theorem's statement contains `merge (do_ s o₁) (do_ s o₂)` (or
  `merge l (do_ l o₁) (do_ l o₂)` for MRDT) with `rid₁`, `rid₂`
  universal. The "concurrent" shape is in the theorem's form.
- **Abstract state** (used by Peritext readside): the theorem takes
  any state `s` with the relevant ops' effects already present (e.g.
  `mark_present s addOp = true ∧ mark_present s remOp = true`).
  Concurrency is implicit — such a state arises by concurrent
  application from different replicas.

Both styles are paper-faithful at the readside level. The SPOT layer
is where the concurrent shape becomes concrete: in the literal
convention by literal `merge`, and in the abstract convention by
`do_; do_` with distinct rids. Either way, the rid distinction is
how the *reader* recognises the SPOT as exercising the concurrent
case.

The Sal SPOT files live next to their read-sides:

| RDT | SPOT file |
|---|---|
| OR-Set CRDT | `Sal/CRDTs/OR_Set/OR_Set_SPOT.lean` |
| OR-Set MRDT | `Sal/MRDTs/OR_Set/OR_Set_SPOT.lean` |
| OR-Set Efficient MRDT | `Sal/MRDTs/OR_Set_Efficient/OR_Set_Efficient_SPOT.lean` |
| MVR CRDT / MRDT | `Sal/{C,M}RDTs/Multi_Valued_Register/Multi_Valued_Register_SPOT.lean` |
| RGA CRDT | `Sal/CRDTs/RGA/RGA_SPOT.lean` |
| RGA MRDT | `Sal/MRDTs/Replicated_Growable_Array/Replicated_Growable_Array_SPOT.lean` |
| AW-CRPQ CRDT / MRDT | `Sal/{C,M}RDTs/Add_Win_Priority_Queue/Add_Win_Priority_Queue_SPOT.lean` |
| Peritext CRDT / MRDT | `Sal/{C,M}RDTs/Peritext/Peritext_SPOT.lean` |

## The snapshot-in-op-payload pattern

A recurring trick when porting an op-based CRDT whose `effect`
function reads state. The classical examples:

- **OR-Set's `Rem(e)`** — at prepare time, snapshots
  `{(e, ts) ∈ adds}`; at effect time, `tombstones := tombstones ∪ snapshot`.
- **AW-CRPQ's `Rmv(e)`** — at prepare time, snapshots
  `{(e, ts) ∈ A ∧ (e, ts) ∉ R}`; at effect time, `R := R ∪ snapshot`.
- **MVR (classical) `Write(v)`** — at prepare time, snapshots all
  currently-visible ts values; at effect time, `removed := removed ∪ snapshot, writes := writes ∪ {(ts, v)}`.

The pattern keeps `rc := Either` everywhere by making `do_` a pure
pointwise-OR on grow-only components, regardless of the op's
payload. Concurrent ops have disjoint snapshots; sequential ops
each include the prior in their snapshot. Convergence is trivial
(union of grow-only sets); the headline semantics emerges in the
read-side via the visible / not-tombstoned check.

When you see "this op needs to read state" in the paper's algorithm,
this pattern is almost always how to keep it state-based.

## Spec validation — the `in_span_boundary` cautionary tale

The first read-side predicate Claude wrote for Peritext was
`in_span_boundary`. It looked plausible from a surface reading of
§3.3. About 400 lines of theorems went through against it before KC
caught the bug: the predicate's `after_of c endId → endSide` clause
encoded the *opposite* of what §3.3 specifies.

The proofs were correct. The predicate was wrong.

This is the verification analogue of "your tests pass but the code
is wrong." More proofs don't help; **spec validation** is a
separate activity. Two complementary tools:

- **Statelet model checking (SMC)**: enumerate short executions
  (3 replicas, 4 ops). For each resulting state, check whether the
  read-side predicate agrees with a reference implementation
  (e.g. for Peritext, the TypeScript reference from the paper's
  artefact). Fast, no universal claim, catches predicate-vs-paper
  drift in minutes.
- **Reference implementation cross-check**: even without SMC, run
  the predicate by hand on the paper's small examples (Peritext §A.2,
  OR-Set §3.3.5 Spec 14, etc.) and check the verdicts match.

The keynote-notes (`docs/papoc-2026-keynote-notes.md`) discuss the
SMC angle in more depth; this doc just records the rule:

> **Validate the spec before committing proof effort to it.**

## Cross-references

| Per-RDT crosswalk | Tier | Headline read-side claim |
|---|---|---|
| [`peritext-vs-paper.md`](peritext-vs-paper.md) | C | Paper Ex 1/2/3/5/7/8 intent-preservation |
| [`rga-vs-paper.md`](rga-vs-paper.md) | C | Causal-order preservation, tombstone monotonicity, deterministic tiebreak |
| [`or-set-vs-paper.md`](or-set-vs-paper.md) | C | Add-Wins on lookup |
| [`aw-crpq-vs-paper.md`](aw-crpq-vs-paper.md) | C | Add-wins, LWW innate, `get_max` |
| [`mvr-vs-paper.md`](mvr-vs-paper.md) | C | Concurrent-writes-both-survive + sequential-supersede |
| [`lww-element-set-vs-paper.md`](lww-element-set-vs-paper.md) | B/C boundary | LWW comparison lookup |
| [`papoc-2026-keynote-notes.md`](papoc-2026-keynote-notes.md) | (methodology) | Tier framework, SMC angle, agent-human collaboration |
| [`porting-op-based-crdts.md`](porting-op-based-crdts.md) | (methodology) | Five-step recipe for porting op-based CRDTs into Sal |