# SirixDB vs PostgreSQL: Versioned-Document Benchmark

An honest, same-machine comparison of SirixDB's native versioning against the standard
PostgreSQL pattern for versioned JSON documents (jsonb column + trigger-maintained history
table). Both systems run the **identical logical workload**; results are cross-checked
(identical field-history checksums) to prove both ended up with the same 5,001-version
history. **PostgreSQL wins most raw numbers in this small-document regime — that is the
finding, and the analysis below explains where each system's advantages actually are.**

Date: 2026-06-11.

---

## 1. Setup

| | |
|---|---|
| Machine | Intel i7-12700H (20 threads), 32 GB RAM, WD SN810 1 TB NVMe, ext4, Linux 6.8 |
| SirixDB | dev build `1.0.0-alpha22` + uncommitted working-tree changes (`/tmp/sirix-fix`, prebuilt classes), **embedded in-process**, GraalVM JDK 25.0.3, `-Xms1g -Xmx4g`, `StorageType.FILE_CHANNEL`, `VersioningType.SLIDING_SNAPSHOT` |
| PostgreSQL | 17.10 (official `postgres:17` Docker image), data on a named Docker volume on the **same ext4 NVMe**, `shared_buffers=1GB`, `synchronous_commit=on`, `fsync=on`, everything else default |
| Workload driver | SirixDB: single Java process (`/tmp/wave5-b/SirixVersionedDocBench.java`). PostgreSQL: `psql` inside the container via unix socket; hot loops run **server-side** (plpgsql procedure/functions) so PostgreSQL pays no client round trips during measurement (see caveat #1) |
| Document | deterministic ~2.4 KB JSON: 50 top-level fields (`counter` first, 36 strings, 8 ints, 4 bools) + one nested array of 20 item objects. Identical bytes fed to both systems |
| Execution order | strictly sequential (sirix full config → sirix lean config → PostgreSQL); no disk/CPU contention between systems |

Two SirixDB resource configurations were measured, because several sirix features have
per-commit cost that PostgreSQL's pattern simply doesn't have an equivalent for:

- **full** (= the defaults): path summary, child counts, rolling hashes, per-node history
  index (`RECORD_TO_REVISIONS`), per-commit stored diff files.
- **lean**: all of the above disabled (`hashKind(NONE)`, `storeDiffs(false)`,
  `storeNodeHistory(false)`, `buildPathSummary(false)`, `storeChildCount(false)`).

### Durability parity (verified, not assumed)

- PostgreSQL: `synchronous_commit=on`, `fsync=on` → one `fdatasync` of the WAL per commit.
  `pg_test_fsync` on the same volume: **fdatasync 4,778 ops/s (209 µs/op)** — PostgreSQL's
  measured 4,015 commits/s is at 84 % of that hardware floor, i.e. its W1 is fsync-bound
  and honestly tuned.
- SirixDB: each commit ends with a `dataFileChannel.force(true)` write-ahead barrier plus
  three O_DSYNC (FUA) write-through writes (revision record + dual uber-page beacons) —
  verified in `bundles/sirix-core/.../io/filechannel/FileChannelWriter.java`. That is, if
  anything, a *stronger* per-commit durability protocol than PostgreSQL's single
  fdatasync (~0.85 ms/commit durability floor on this disk vs ~0.21 ms).

Both data directories live on the same physical filesystem; nothing runs on tmpfs.

### The workloads

| | Semantics | SirixDB implementation | PostgreSQL implementation |
|---|---|---|---|
| W1 | insert doc, then 5,000 single-field updates, **each one its own durable transaction**, full history retained → 5,001 versions | `wtx.insertSubtreeAsFirstChild(JsonShredder.createStringReader(doc))` then loop `wtx.moveTo(counterNodeKey); wtx.setNumberValue(i); wtx.commit()` | `doc(id, doc jsonb)` + `doc_history(id, rev, valid_from timestamptz, doc jsonb)` maintained by an `AFTER INSERT OR UPDATE` trigger in the same transaction; (a) server-side `CALL bench_w1(5000)` — plpgsql loop `UPDATE … jsonb_set(doc,'{counter}',to_jsonb(i)); COMMIT;`; (b) client-driven: 5,000 autocommit `UPDATE` statements via `psql -f` |
| W2 | 1,000 random point-in-time reads, each fetching + serializing the whole document | random revision ∈ [1, 5001]; `session.beginNodeReadOnlyTrx(rev)` + `JsonSerializer` to a `StringWriter` | random timestamp ∈ [min, max]; `SELECT doc::text FROM doc_history WHERE valid_from <= t ORDER BY valid_from DESC LIMIT 1` (plpgsql loop; index `(valid_from)`; verified plan: Index Scan Backward, 3 buffer hits) |
| W3 | list all 5,001 version timestamps | `session.getHistory()` and iterate the `RevisionInfo` list | plpgsql loop over `SELECT rev, valid_from FROM doc_history ORDER BY valid_from` |
| W4 | one field's value across all 5,001 versions | (a) native `AllTimeAxis` from the counter node; (b) manual loop `beginNodeReadOnlyTrx(rev) + moveTo(nodeKey)` | `SELECT count(c), sum(c) FROM (SELECT (doc->>'counter')::bigint AS c FROM doc_history ORDER BY valid_from) s` |
| W5 | bytes on disk for the full history | bytes of the database directory (apparent = Σ file sizes, and allocated = `du`) | `pg_total_relation_size('doc') + pg_total_relation_size('doc_history')` after `CHECKPOINT` (**WAL excluded** — see caveat #6) |
| W6 | diff between version N/2 and N/2+1 | native `new BasicJsonDiff(db).generateDiff(session, 2500, 2501)` | **no native diff** — representative top-level compare: `jsonb_each` of both revisions, `FULL OUTER JOIN … WHERE a.v IS DISTINCT FROM b.v` (not semantically equivalent, see caveat #5) |

Timing: reads = 1 untimed warm-up pass + 3 timed runs, **median** reported (all runs in the
raw logs). W1 = single timed pass (5,000 commits warms the JVM as it goes; per-1,000-commit
window rates reported). PostgreSQL statement times via `\timing` (psql in-container, unix
socket, one round trip per measured statement).

Correctness cross-checks (passed): both systems report 5,001 versions; W4 sum of the counter
across all versions = **12,502,500 on both**; final counter = 5,000 on both; the W6 diff on
both identifies exactly the `counter` field (2499 → 2500).

---

## 2. Results

5,000 single-field updates on one ~2.4 KB document, 5,001 retained versions, same NVMe,
both warm, durability verified on both sides. Medians of 3 for reads.

| Workload | SirixDB (full) | SirixDB (lean) | PostgreSQL 17 | Winner |
|---|---|---|---|---|
| **W1 ingest**: 5,000 durable single-field commits | 13.32 s = **375 commits/s** (2.66 ms/commit; steady-state windows 444–477/s) | 11.65 s = **429 commits/s** (2.33 ms/commit; peak window 555/s) | server-side: 1.245 s = **4,015 commits/s** (0.249 ms/commit) · client-driven: 2.10 s = **2,376 commits/s** | **PostgreSQL, 5.5–10.7×** |
| W1 initial insert (version 1) | 71.3 ms (first-ever commit, cold JIT) | 75.0 ms | 3.0 ms | PostgreSQL |
| **W2**: 1,000 random point-in-time full-doc reads | 75.7 ms (**75.7 µs/read**) | 74.5 ms | batched: 17.5 ms (**17.5 µs/read**) · client-driven, per-statement: **~104 µs/read** | **PostgreSQL 4.3× (batched)** — but **SirixDB wins per-statement** (75.7 vs ~104 µs) |
| W2 fixed mid-history (1,000 reads @ rev 2500) | 63.1 ms | 60.6 ms | 18.6 ms (batched) | PostgreSQL 3.3× |
| **W3**: history listing (5,001 timestamps) | 4.57 ms | 3.36 ms | 1.99 ms | PostgreSQL ~2× |
| **W4**: one field across all 5,001 versions | 55.4 ms (AllTimeAxis) / 49.4 ms (manual loop) | 48.8 / 47.5 ms | 6.91 ms (**1.4 µs/version**) | **PostgreSQL ~7×** |
| **W6**: diff of adjacent versions | **0.30 ms** — node-level semantic diff (exact node keys, 163-char JSON patch) | 0.40 ms | 0.15 ms — top-level field compare only (would need app-side recursion for parity) | sub-ms tie on speed; **SirixDB on capability** |
| **W5**: storage for full history | 16.43 MiB apparent / 37.2 MB allocated (5,000 diff files × 4 KB blocks) | 11.81 MiB apparent / 12.4 MB allocated | **4.66 MiB** (history table 4.6 MB incl. 256 KB index; WAL excluded) | **PostgreSQL 2.5–3.5×** |
| W5 per version | 3,444 B | 2,476 B | **978 B** (2.4 KB doc pglz-compressed to 836 B/row) | PostgreSQL |

Raw logs: `/tmp/wave5-b/sirix-full.log`, `/tmp/wave5-b/sirix-lean.log`,
`/tmp/wave5-b/pg-results.log`, `/tmp/wave5-b/pg-reads.log`.

---

## 3. Honest caveats

1. **Process boundary.** SirixDB ran embedded (zero IPC). PostgreSQL's headline numbers
   use server-side plpgsql loops / a single `CALL`, i.e. they deliberately remove client
   round trips — the most favorable honest setup for PostgreSQL. The client-driven
   variants quantify the boundary: W1 drops 4,015 → 2,376 commits/s, and W2 drops
   17.5 µs → ~104 µs/read (1,000 individual `SELECT`s through psql over the **unix
   socket**, measured as wall time minus a separately measured ~65 ms `docker exec`+psql
   startup overhead; TCP/JDBC from another host would be slower still). An application
   that reads single documents one statement at a time sits on the client-driven line,
   where embedded SirixDB (75.7 µs) is actually *faster* than PostgreSQL.
2. **No JDBC driver available** in the local caches, so PostgreSQL was driven by psql
   inside the container. All quoted PG numbers are psql `\timing` of single statements
   (one round trip each) or wall-clocked `psql -f` runs as labeled — methodologically
   this is *better* for PostgreSQL than JDBC would have been.
3. **Narrow regime.** One ~2.4 KB document, one writer, 5,001 versions, fully cached.
   Nothing here measures large documents, millions of versions, many documents, or
   concurrent writers. The regime choice favors PostgreSQL (see §5).
4. **W2 randomness mapping.** SirixDB picks revisions uniformly; PostgreSQL picks
   timestamps uniformly over the history span. Equivalent only if commit rate is roughly
   constant within each run (it was, after the first JIT-warm window).
5. **W6 is not apples-to-apples.** SirixDB's diff is a recursive node-level semantic diff
   with stable node keys (directly usable as a patch); the PostgreSQL query only compares
   top-level fields — a change inside `items[7].qty` would report "items changed" without
   localization. Equivalent functionality in PostgreSQL means fetching both versions and
   diffing application-side. (With `storeDiffs(true)` — the default — sirix additionally
   persists per-commit diff files at write time; the REST layer serves those without any
   tree traversal.)
6. **W5 excludes PostgreSQL's WAL** (`pg_total_relation_size` only). SirixDB has no
   separate WAL — its data files *are* the entire on-disk story, so the asymmetry favors
   PostgreSQL slightly. PostgreSQL's number is after `CHECKPOINT`; the history table is
   append-only (no bloat); the 5,000 dead tuples in `doc` round to 32 KB. SirixDB's
   *allocated* full-config number (37 MB) is inflated by 5,000 tiny per-commit diff files
   each occupying a 4 KB block — a filesystem packing artifact; the apparent sizes are the
   fair comparison, and the lean config shows the no-diff-files footprint.
7. **Compression asymmetry, not a tuning trick.** PostgreSQL stores a *full copy* per
   version and gets pglz for free (2.4 KB doc → 836 B/row). SirixDB stores *changed page
   fragments* plus fixed per-revision metadata (revision root, indirect pages, …) — at
   2.4 KB documents that fixed floor (~2.4–3.4 KB/version) exceeds a compressed full copy,
   so full-copy-with-compression wins. This inverts as documents grow (see §5).
8. **JIT.** Sirix W1 includes JVM warm-up (first 1,000 commits at 223/s, steady state
   444–555/s); read workloads had an untimed warm-up pass. PostgreSQL has no JIT-equivalent
   cold tax here.
9. **Dev build.** SirixDB is `1.0.0-alpha22` plus uncommitted in-progress changes, not a
   tagged release. PostgreSQL 17.10 is a GA release with two decades of tuning.
10. Serialized text differs cosmetically (`jsonb::text` reorders keys and adds spaces:
    2,665 vs 2,404 chars for the same content) — read-volume checksums differ accordingly;
    the W4 numeric checksum is identical, which is the cross-system correctness proof.

---

## 4. Where each system wins

**PostgreSQL wins, in this benchmark's regime (small docs, modest history):**

- **Durable commit throughput** — 4,015/s vs 375–429/s (5.5–10.7×). PG's per-commit work is
  one compact WAL record + one fdatasync; sirix writes a CoW page-tree (several pages),
  an fsync barrier *plus* three FUA writes, hashes, and (full config) a diff file.
- **Scan-shaped history reads** (W4: 7×, W3: 2×) — a heap scan over 5,001 compressed rows
  beats opening 5,001 revision contexts (~10 µs each).
- **Batched point-in-time reads** (4.3×) — one B-tree probe + detoast vs page-fragment
  reconstruction.
- **Storage at small doc sizes** (2.5–3.5×) — compressed full copies beat structural
  sharing when the document is barely larger than sirix's per-revision metadata floor.
- **Concurrency model** (not measured here): MVCC supports many concurrent writers across
  documents; a sirix resource has a single writer (concurrent readers).

**SirixDB wins:**

- **Per-statement read latency from an application** — 75.7 µs embedded vs ~104 µs for
  client-server PostgreSQL even over a local unix socket. There is no batching trick for
  an app that needs one document now.
- **Versioning as a first-class capability, not a pattern you build.** The PostgreSQL side
  required a trigger, a manually maintained revision sequence, an index choice, and gets:
  timestamp-only addressing, no node identity across versions, no native diff. SirixDB
  gives numbered revisions *and* timestamps (`getHistory()`), stable node keys, time-travel
  axes (`AllTimeAxis` et al.), per-node history indexes, and audit-grade rolling hashes.
- **Semantic diffs** — 0.3 ms for an exact node-level patch between any two revisions;
  PostgreSQL needs application code (and shipping both full documents to it) for the same
  answer.
- **Write/storage cost shape**: per-version cost is O(changed nodes) + ~fixed metadata,
  independent of document size — PostgreSQL's is O(document) per version (full jsonb copy
  in heap *and* WAL). At 2.4 KB this is PG's win; it cannot stay that way as documents grow.

---

## 5. What this means for SirixDB positioning

1. **Don't pitch SirixDB as "faster than PostgreSQL for keeping history of small
   documents." It is not.** For ≤ a-few-KB documents with full history, a jsonb column,
   a trigger, and an index is faster on every server-side metric and smaller on disk.
   This benchmark is exactly PostgreSQL's home turf, and it shows.
2. **The honest pitch is the cost *shape*, the capability set, and embedding.**
   - PostgreSQL's per-version cost is a full document copy (~0.35× raw after pglz here):
     a 1 MB document updated 5,000 times costs it on the order of GBs of history and MBs
     of WAL per update. SirixDB's measured per-version cost was ~2.5–3.4 KB *for a 2.4 KB
     document* and is dominated by fixed metadata, not document size — the storage and
     write-amplification crossover plausibly sits in the tens-of-KB document range.
     **That claim needs its own benchmark (100 KB / 1 MB / 10 MB docs) before quoting
     numbers — measure it, don't extrapolate in public.**
   - Sub-document time travel (field history, node-level diffs, per-node revision index)
     has no native PostgreSQL equivalent at any document size.
   - Embedded, SirixDB answers single point-in-time reads faster than client-server
     PostgreSQL can be reached at all.
3. **Commit throughput is SirixDB's weakest measured axis** (375–429/s vs ~4,800/s fsync
   floor): ~0.85 ms of the ~2.3 ms/commit is the (deliberately strong) durability
   protocol; the rest is CoW page serialization, hashing, and bookkeeping. If
   high-frequency tiny commits matter, batching updates per commit is the documented
   answer; an optional group-commit / relaxed-durability mode would be the engineering
   answer.
4. The lean-vs-full spread (429 vs 375 commits/s, 11.8 vs 16.4 MiB) quantifies the price
   of hashes + per-node history + stored diffs + path summary: ~13 % commit rate and
   ~39 % storage at this scale — worth surfacing as a tuning knob in docs, since these
   features are exactly what PostgreSQL doesn't offer.

---

## 6. Reproduction

```bash
# PostgreSQL (cleaned up after the run)
docker run -d --name sirix-bench-pg -e POSTGRES_PASSWORD=bench -p 15432:5432 \
  -v sirix-bench-pgdata:/var/lib/postgresql/data postgres:17 \
  -c shared_buffers=1GB -c synchronous_commit=on
# scripts: /tmp/wave5-b/pg/{01-schema,02-w1,03-reads}.sql, w1-client.sql, run-pg.sh

# SirixDB (embedded, prebuilt classes; classpath captured in /tmp/sirix-test-cp.txt)
javac --enable-preview --release 25 --add-modules jdk.incubator.vector \
  -cp "$(cat /tmp/sirix-test-cp.txt)" -d /tmp/wave5-b/classes \
  /tmp/wave5-b/SirixVersionedDocBench.java
java --enable-preview --add-modules jdk.incubator.vector --enable-native-access=ALL-UNNAMED \
  --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED \
  -Xms1g -Xmx4g -cp "/tmp/wave5-b/classes:$(cat /tmp/sirix-test-cp.txt)" \
  SirixVersionedDocBench /tmp/wave5-b/run 5000 full   # and: ... 5000 lean

# durability floor on the same volume
docker exec sirix-bench-pg pg_test_fsync -s 2
```

---

## 7. Re-run after the W3/W4 read-path changes (2026-06-14)

The history-scan read paths were reworked to close W3/W4 (branch
`claude/versioning-gaps-postgresql-hpsbcq`). This section re-measures **W3 and W4 only**, on a
**different machine** than §1, so the numbers here are **not comparable to the table in §2** —
only to each other and to the PostgreSQL baseline re-measured alongside them.

| | |
|---|---|
| Machine | cloud VM (shared), OpenJDK 25, ext4 NVMe-backed volume — slower than §1's i7-12700H |
| SirixDB | this branch, **embedded**, default ("full") config, `StorageType.FILE_CHANNEL`, `VersioningType.SLIDING_SNAPSHOT` |
| PostgreSQL | **16.13** (local apt install, not Docker — the sandbox's docker daemon was unavailable), `shared_buffers=1GB`, `synchronous_commit=on`, `fsync=on` |
| Workload | identical to §1: one ~2.4 KB JSON doc (`counter` first field), 5,000 single-field durable commits → 5,001 versions. Cross-checks pass on both: 5,001 versions, counter sum **12,502,500**, final counter 5,000 |

Reads: 1 warm-up + 3 timed, **median** reported.

### W3 — list all 5,001 version timestamps

| Path | Time | vs PostgreSQL |
|---|---|---|
| PostgreSQL 16 (`ORDER BY valid_from`, server-side) | ~2.2 ms | — |
| SirixDB `getHistory()` (full `RevisionInfo`, optimized warm path) | ~2.4–2.8 ms | ~on par |
| **SirixDB `getHistoryTimestamps()` (new bulk API)** | **~0.05 ms** | **~40× faster** |

The new timestamp-only API serves the whole history from the resident in-memory `RevisionIndex`
(`long[]` + one `arraycopy`), with no page reads, no per-revision transactions, and no async
fan-out — so it beats a PostgreSQL heap scan by ~40× and the previous SirixDB history path
outright. **W3 gap: closed (and reversed) for the timestamp-only case; at parity for full
`RevisionInfo`.**

### W4 — one field's value across all 5,001 versions

Two regimes, because the cost shape differs by how often the field changes:

| Field | Path | Time | Record reads |
|---|---|---|---|
| `counter` (changes **every** revision) | PostgreSQL 16 | ~10.4 ms | 5,001 rows |
| | SirixDB OLD: per-revision `beginNodeReadOnlyTrx` + `moveTo` | ~82 ms | 5,001 |
| | SirixDB NEW: `scanRecordHistory` (lightweight reader) | ~76–78 ms | 5,001 |
| | SirixDB NEW: `scanValueRuns` | ~75–80 ms | 5,001 |
| `s01` (set once, **never** changes) | PostgreSQL 16 | ~9.8 ms | 5,001 rows |
| | SirixDB OLD: per-revision loop | ~80 ms | 5,001 |
| | **SirixDB NEW: `scanValueRuns` / change-set** | **~0.05 ms** | **1** |

Honest reading:

- **Dense case (`counter`):** the change set is every revision, so there is nothing to prune —
  all three SirixDB paths still read 5,001 records. The new lightweight-reader path shaves only
  ~7% off the old per-revision-transaction loop, and **PostgreSQL still wins this scan (~8×)**: a
  heap scan over 5,001 compressed rows beats opening 5,001 revision contexts. Closing the dense
  case fully needs the deferred fragment-chain scan (below).
- **Sparse case (`s01`, the common real-world shape):** `getRecordChangeRevisions` returns a
  single revision, so `scanValueRuns` reads **one** record and reconstructs the value across all
  5,001 versions — **~0.05 ms vs PostgreSQL's ~9.8 ms (~200×) and the old SirixDB loop's ~80 ms
  (~1600×)**. This is the O(changes) cost shape: PostgreSQL must scan one full-copy row per
  version regardless of whether the field changed; SirixDB reads only where it changed.

**W4 gap: closed decisively for fields that change rarely (the cost-shape win); for a field that
changes on every commit, modestly improved but PostgreSQL still leads.**

### Still deferred

A physical **fragment-chain single-pass scan** (read each distinct value once directly from the
leaf-page fragment chain, avoiding even per-revision root-page navigation) would also close the
dense case; it rewrites the storage fragment layer under SLIDING_SNAPSHOT and is left as future
work. Numbers above are medians from a shared cloud VM and PostgreSQL 16 (not 17); treat them as
indicative of the *shape* of the change, not as hardware-grade absolutes.

### Reproduction (this run)

```bash
# PostgreSQL 16, local cluster (docker daemon unavailable in the sandbox)
initdb -D $PGDATA -A trust
pg_ctl -D $PGDATA -o "-c shared_buffers=1GB -c synchronous_commit=on -c fsync=on" start
# schema: doc(id,doc jsonb) + doc_history(id,rev,valid_from,doc jsonb) via AFTER INSERT/UPDATE trigger
# ingest: CALL bench_w1(5000)   (plpgsql loop: UPDATE jsonb_set(...,'{counter}',i); COMMIT)
# W3: SELECT count(*) FROM (SELECT rev,valid_from FROM doc_history ORDER BY valid_from) s;
# W4: SELECT count(c),sum(c) FROM (SELECT (doc->>'counter')::bigint c FROM doc_history ORDER BY valid_from) s;

# SirixDB (embedded): build 5,001 versions, then time
#   session.getHistory() / session.getHistoryTimestamps()                        (W3)
#   per-revision loop  vs  session.scanRecordHistory(k,..) / scanValueRuns(k,..)  (W4)
```