# SirixDB vs DuckDB: Analytical Queries at 100M Records The honest head-to-head for the analytical fast paths: DuckDB is a best-in-class columnar OLAP engine — the bar. SirixDB is a versioned, bitemporal document store whose analytical layer (an in-memory columnar projection over the versioned tree) aims for the same physics on the shapes it covers. ## Setup | | | |---|---| | Machine | 20-core Linux (32 GB RAM, NVMe), same box for both engines | | Dataset | 100,000,000 records `{id, age: 18..65, dept: 8 values, city: 8 values, active: bool}` — identical uniform distributions in both systems | | DuckDB | 1.5.2 (official binary), native table generated in-engine, `threads=20`, 1 untimed warmup + 3 timed runs per query, `fetchall()` materialization | | SirixDB | dev build (1.0.0-beta1 branch), JVM (Oracle GraalVM 25.0.3, `-Xmx8g`), wildcard projection over (age, active, dept, city, amount, score), **fresh `SirixVectorizedExecutor` per timed iteration** so executor-level result caches never serve a timed run; store, page caches and projection shared across iterations — the moral equivalent of DuckDB's loaded table. 1 untimed warmup + 3 timed runs; results serialized and verified byte-identical to SirixDB's interpreted pipeline | | Harness | `io.sirix.query.bench.SirixVsDuckBenchMain` (in-tree) + `duck_bench.py` | Durable difference to keep in mind: DuckDB's columns are its primary, persisted storage format. SirixDB's projection is a secondary structure over the versioned store — every revision of every document remains queryable underneath it. ## Results (min of 3, milliseconds) SirixDB measured twice: on the JVM (Oracle GraalVM 25.0.3 JIT) and as a **profile-guided-optimized native binary** (GraalVM Native Image, `-O3 --pgo` with instrumented profiles from both projection modes, `-H:+VectorAPISupport`). | query | SirixDB PGO native | SirixDB JVM | DuckDB | best vs DuckDB | |---|---:|---:|---:|---| | `count(... where age > 40 and active)` | **33** | 53 | 40 | **SirixDB ahead** | | `group by dept → count` | 71 | 80 | 28 | 2.5× | | `sum(age)` | 16 | 22 | 10 | 1.6× | | `avg(age)` | 18 | 22 | 16 | 1.1× | | `min(age), max(age)` | 19 | 21 | 18 | 1.1× | | `group by dept, city → count` (two keys) | 240 | 251 | 115 | 2.1× | | `where active group by dept → count` | **43** | **41** | 59 | **SirixDB ahead** | | `count(distinct dept)` | 81 | 76 | 18 | 4.2× | | `count(... where age > 30 and age < 50 and active)` | **42** | 52 | 44 | **SirixDB ahead** | The PGO native binary comes out **ahead of DuckDB on three of the nine shapes** and within 1.1–2.5× on all but count-distinct. Two native-image findings worth knowing: `-H:+VectorAPISupport` is OFF by default (without it every SIMD kernel runs as fallback objects — the same suite measured 10–600× slower), and the projection *build* runs ~7.6× slower under native than the JVM (cursor-walk code; build on JVM or use a persisted projection). Ingest: DuckDB generated its 100M-row table in 27.3 s; SirixDB shredded the same logical data into its versioned store at 182k records/s (548 s, including path-summary construction) — different jobs (a column table vs a fully versioned tree), stated for completeness. ## The scan paths (no projection) For completeness, the same suite without any projection — the engine's in-page region/morsel scan paths over the full 100M-record store (PGO native binary, single timed run after warmup; JVM datapoints in parentheses where measured): | query | scan path (native) | scan path (JVM) | |---|---:|---:| | filterCount | 347 s | — | | group by dept | 307 s | — | | sum / avg / min+max | 310–311 s | ~59 s | | group by dept, city | 348 s | ~44 s | | filterGroupBy / compoundAnd | 352 s | — | | count(distinct dept) | 309 s | — | Two findings come with this column: cursor/page-walk-bound code runs ~5–8× slower under Native Image than the JVM (same penalty as the projection build), and the scan paths' runtime predicate code generation falls back to the interpreted evaluator under Native Image (classes cannot be defined at runtime) — correct, with a warning per scan. Conclusion unchanged either way: analytical workloads belong on the projection; the scan paths exist as always-correct fallbacks, not as the analytics engine. ## What this took (the gap was 3 orders of magnitude two days ago) Measured on the same machine and dataset during development: | query | before | after | mechanism | |---|---:|---:|---| | `sum/avg/min+max(age)` | ~59,000 ms | 21–22 ms | aggregates previously page-walked the per-page number regions; now a parallel `NUMERIC_LONG` column fold over projection leaves (full-word fast path per 64-row mask block) | | `group by dept, city` | ~43,700 ms | 251 ms | multi-key grouping previously took the typed record-at-a-time slot walk; now a composite dict-id sweep over projection leaves with per-leaf lazy key compose | | `group by age` (numeric key) | **empty result** | correct, ~1.4 s scan-path | the string-only kernels silently dropped non-string group keys; typed kernels + verification counters fixed the wrong-results family | ## Correctness gates (how we keep "fast" honest) - Every fast path is **fail-closed**: the optimizer claims a pipeline only when the query shape provably matches the kernel's output, and kernels verify their own coverage (e.g. a projection column that ever saw a non-integral number is flagged by the builder, and value-exact consumers — aggregates, numeric comparisons — decline it rather than serve truncated values). - A 27-case differential suite runs every shape through the vectorized AND interpreted pipelines and requires **byte-identical serialized results**, including typed group keys, multi-key, renamed outputs, predicated variants, and the must-decline cases. ## Honest caveats - **Cold projection build**: constructing the 6-column projection at 100M costs ~319 s (one full-DB walk). Re-encoding an already-persisted projection takes ~1.2 s. Persisting wider projections currently trips a known HOT-storage chunk-split limitation (documented in `KNOWN_LIMITATIONS.md`) — persisted-projection lifecycle is the top roadmap item this comparison motivates. - **Sparse/absent fields**: projection columns have no NULL representation (absent values read as defaults); anchor-based scans don't visit records lacking the anchor field. Dense, uniformly-shaped record sets — the target analytical workload — are exact; sparse data falls back or is documented. - **count(distinct)** and single-key group-by still trail 3–4×: the dict-id paths do per-leaf dictionary decode + hash merge that DuckDB amortizes in its hash tables; known optimization headroom. - DuckDB numbers are its general-purpose engine doing what it was built for; SirixDB reaches this neighborhood only on the projected shapes, and falls back to scan-class latencies without a projection (measured above: ~44–60 s on the JVM, ~310–350 s native, at 100M). - Native Image specifics: `Arena.ofShared` is unsupported at the session close path (benign for results, fails the process exit), runtime predicate codegen falls back to interpretation, and cursor-bound code (projection build, scan paths) pays ~5–8× vs the JVM. Queries — the hot part — are where PGO native leads. ## Reproduction ```bash # DuckDB python3 duck_bench.py 100000000 3 /tmp/duck-100m.db # SirixDB: shred with a path summary, then run the matrix java --add-modules jdk.incubator.vector -Xmx8g \ -DbuildPathSummary=true -Dsirix.shredDbPath=/tmp/sirix-100m \ -cp io.sirix.query.bench.ScaleBenchMain 100000000 true 0 java --add-modules jdk.incubator.vector -Xmx8g \ -Dsirix.projection.forceRebuild=true -Dsirix.projection.persist=false \ -cp io.sirix.query.bench.SirixVsDuckBenchMain /tmp/sirix-100m 3 20 true ```