# Benchmarks `forge:bench` is the internal regression harness — it measures forge against itself across releases and against Prisma / Drizzle when run in compare mode. This page documents the methodology, every shipped scenario, how to read the output, and how to extend it for your own workloads. * [What `forge:bench` is for](#what-forgebench-is-for) * [The shipped scenarios](#the-shipped-scenarios) * [Per-dialect commands](#per-dialect-commands) * [Compare mode](#compare-mode) * [Methodology](#methodology) * [Reading the output](#reading-the-output) * [What forge optimises](#what-forge-optimises) * [Known regressions and honest notes](#known-regressions-and-honest-notes) * [Adding your own scenarios](#adding-your-own-scenarios) * [CI integration](#ci-integration) * [Profiling a bench run](#profiling-a-bench-run) * [Driver-level vs ORM-level benchmarks](#driver-level-vs-orm-level-benchmarks) * [Microbench traps](#microbench-traps) * [Cost of observability layers](#cost-of-observability-layers) * [Worked examples](#worked-examples) * [Cross-references](#cross-references) --- ## What `forge:bench` is for The harness exists for two reasons, in this order: 1. **Internal regression tracking.** Every release runs `forge:bench` against itself. If `findFirst` on Postgres jumps from a +6% overhead-vs-raw to a +40% overhead between commits, something in the IR or the executor regressed and the diff that caused it has to justify itself. 2. **Apples-to-apples comparison.** Compare mode (`forge:bench:compare`) runs the same four scenarios against forge, Prisma, and Drizzle on the same database, against the same seeded table, with the raw driver baseline next to all three. The result is a like-for-like overhead number per engine per scenario per dialect. It is deliberately small. The harness measures four scenarios across four dialects. It is not a TPC benchmark and does not pretend to be one. The point is signal, not glamour — a tight loop you can re-run on any laptop in under a minute that catches per-call regressions before they ship. A useful corollary: a bench harness that takes ten minutes to run will get run once a release, find nothing, and rot. The forge bench finishes in seconds against `:memory:` SQLite and tens of seconds against a local Postgres. It runs on every PR (see [CI integration](#ci-integration)). --- ## The shipped scenarios Every dialect runs the same four scenarios, defined inline in `bench/db-bench.ts`. | Scenario | What it does | Why it's in the harness | |-------------|---------------------------------------------------------------------------|----------------------------------------------------------| | `findMany` | `WHERE role = 'EDITOR' ORDER BY email ASC LIMIT 20` | Indexed range scan with sort and limit | | `findFirst` | `WHERE email = ? LIMIT 1` (indexed unique column) | Indexed point lookup — exercises prepared-stmt reuse | | `count` | `SELECT COUNT(*) WHERE role = 'USER'` | Aggregate without rows in the result set | | `update` | `UPDATE users SET active = false WHERE id = ?` (indexed primary key) | DML round-trip — exercises the write path | The seed is `BENCH_SEED` rows (default `500`) inserted with `createMany` before the loop starts. `BENCH_ITER` iterations (default `200`) execute each scenario back to back. The `i % BENCH_SEED` index lets `findFirst` and `update` rotate through all 500 rows. The model used is the project's own `User` schema (`src/schema/user.ts`) — id, email, name, role enum, active boolean, created/updated timestamps. The same schema is what powers the integration suite, so the bench exercises the actual production code paths, not a stripped-down toy model. What's deliberately **not** in the shipped scenarios: * **Batch insert.** `createMany` runs once as part of seed, but its timing is not reported per-iteration. Bulk insert speed is dominated by network and driver buffering, not by the ORM layer; including it produces noisy numbers that mask the per-call signal. * **Joins / `include`.** Forge's relation loader uses a single round-trip with `IN (...)` batching; that path is exercised end-to-end by the integration suite. Joins are not in the bench because their cost is dominated by query planning on the database, not by the ORM. * **Transactions.** `db.$transaction` is a thin BEGIN / COMMIT wrapper; per-op timings are already covered by the four scenarios. If you need any of these for your workload, see [Adding your own scenarios](#adding-your-own-scenarios). --- ## Per-dialect commands Each script sets `SKIP_*=1` for the other dialects so the harness only spins up the database you care about. ```sh npm run forge:bench # all installed dialects npm run forge:bench:sqlite # in-memory SQLite only npm run forge:bench:pg # Postgres only npm run forge:bench:mysql # MySQL only npm run forge:bench:mongo # Mongo only ``` The skip flags can be combined manually for ad-hoc runs: ```sh SKIP_MONGO=1 npm run forge:bench # everything except Mongo BENCH_ITER=1000 BENCH_SEED=5000 npm run forge:bench:pg ``` Connection URLs default to localhost on the standard port for each engine and can be overridden: ```sh BENCH_PG_URL=postgres://bench@db.local:5432/postgres npm run forge:bench:pg BENCH_MYSQL_URL=mysql://root@db.local:3306 npm run forge:bench:mysql BENCH_MONGO_URL=mongodb://db.local:27017 npm run forge:bench:mongo ``` SQLite always uses `:memory:` for the default bench (an isolated handle per run); the compare bench drops to an on-disk file under `os.tmpdir()` so the three engines can open the same database. Both modes create the schema, seed, run, and clean up — there is no leftover state between runs. If the driver for a given dialect isn't installed (`require('pg')` throws), the harness logs `[bench:pg] skipped: …` and moves on. The same applies if the service isn't reachable. You don't have to set `SKIP_*` for missing drivers; that's only for explicitly opting out when the driver is installed but you don't want to bench it. --- ## Compare mode `forge:bench:compare` runs a 3-way comparison: forge vs Prisma vs Drizzle, each plotted against the raw driver baseline. ```sh npm run forge:bench:compare:gen # prisma generate against bench/compare/*.prisma npm run forge:bench:compare # 3-way bench across all dialects npm run forge:bench:compare:pg # 3-way, Postgres only npm run forge:bench:compare:mysql # 3-way, MySQL only npm run forge:bench:compare:sqlite npm run forge:bench:compare:mongo ``` The `:gen` step is required before the first compare run on a fresh checkout. It runs `prisma generate` against `bench/compare/pg.prisma`, `bench/compare/mysql.prisma`, and `bench/compare/sqlite.prisma`, producing clients under `bench/compare/generated/{pg,mysql,sqlite}/`. Without those clients, the Prisma column in the report shows `n/a — driver-adapter or generated client not installed` and the bench keeps going with forge, Drizzle, and raw. How the three-engine bench stays apples-to-apples: * **forge owns the schema.** forge runs DDL and seeds the `users` table; Prisma and Drizzle never run their own migrate. Their schemas (`*.prisma`, `drizzle-schema.ts`) describe the existing table so their clients can query it. There is exactly one physical table. * **Same connection pool where possible.** On Postgres and MySQL, Drizzle is constructed against forge's already-open driver pool. That removes pool size and TCP handshake noise from the comparison — both engines hit the same TCP connections in the same state. * **Same iteration loop.** `runOps()` calls each engine's scenario thunks in the same order with the same `idx` per iteration, so every engine touches the same 500 rows in the same sequence. * **Same baseline.** The raw driver scenarios use the same hand-written SQL for every engine column. Overhead is computed as `(engine_median - raw_median) / raw_median`. Engine availability across dialects: | Dialect | forge | raw | Prisma 7 | Drizzle | |----------|-------|----------|-----------------------|----------| | Postgres | yes | `pg` | `@prisma/adapter-pg` | yes | | MySQL | yes | `mysql2` | `@prisma/adapter-mariadb` | yes | | SQLite | yes | `better-sqlite3` | `@prisma/adapter-better-sqlite3` | yes | | Mongo | yes | `mongodb` | not installed by default | n/a | Drizzle does not ship a Mongo driver, so the Mongo column simply reports `n/a` for the Drizzle slot. Prisma 7 requires a driver-adapter package per dialect; if the matching `@prisma/adapter-*` is not installed when the bench runs, the Prisma column reports `n/a` with the missing package name as the reason. Neither case fails the run. --- ## Methodology The measurement loop is intentionally simple. Here is the exact shape, copied out of `bench/db-bench.ts`: ```ts async function timed(fn: () => Promise): Promise { const t0 = performance.now(); await fn(); return performance.now() - t0; } function timeit(label: string, runs: number[]): Sample { const sorted = [...runs].sort((a, b) => a - b); const median = sorted[Math.floor(sorted.length / 2)]; const p95 = sorted[Math.floor(sorted.length * 0.95)]; const opsPerSec = 1000 / median; return { name: label, median, p95, opsPerSec }; } ``` * **Per-iteration timing** is wall-clock via `performance.now()`. The clock starts before the call and stops after the returned promise resolves. * **Reported statistics** are the median and the 95th percentile of the sorted sample. Mean is not reported; medians shrug off the GC pause at iteration 47 in a way that means doesn't. * **`opsPerSec`** is computed from the median, not the mean. It's a derived number for skimming the table, not for comparing engines that are within a few percent of each other — use the median column for that. What's measured: round-trip latency from forge's API call to the resolved promise. That includes the driver's serialisation, the network or IPC round-trip, the database's actual execution, the driver's deserialisation, and forge's row decoder. There is no server-time-only mode — splitting the two is what `EXPLAIN ANALYZE` is for; see [Profiling a bench run](#profiling-a-bench-run). What's **not** measured separately: warm-up. The bench does not run an explicit warm-up loop. Instead, the seed phase (`createMany` of 500 rows) primes the pool, opens TCP connections, and forces JIT compilation of the hot paths before the timed loop begins. The first 1–2 iterations of `findMany` still include some V8 inlining cost; the median absorbs that. If you want a stricter warm-up, bump `BENCH_ITER` — at 1000 iterations, the first dozen samples are statistical noise against the rest. The seed and iteration counts are tunable via env: ```sh BENCH_SEED=500 # rows inserted before the timed loop (default 500) BENCH_ITER=200 # iterations per scenario (default 200) ``` For day-to-day regression catching, the defaults are tight enough — `200 × 4 scenarios × 4 dialects = 3,200` measured calls per run. For finer-grained comparison work, push iter to 1000+ and seed to 5000+ so the table actually fills the page-cache. --- ## Reading the output A default `forge:bench` run produces one block per dialect that ran. Sample output (Postgres, 200 iterations, 500 seed rows): ``` postgres — 200 iter, 500 seed rows op median p95 ops/s overhead ──────────────────────────── ────────── ────────── ────────── ────────── findMany 1.42ms 2.18ms 704 +12.7% findMany [raw pg] 1.26ms 1.95ms 794 findFirst 0.61ms 1.04ms 1639 +8.9% findFirst [raw pg] 0.56ms 0.92ms 1786 count 0.43ms 0.77ms 2326 +4.9% count [raw pg] 0.41ms 0.69ms 2439 update 0.81ms 1.42ms 1234 +6.6% update [raw pg] 0.76ms 1.31ms 1316 ``` Column meanings: * **`op`** — scenario name. The forge row sits above its raw-driver pair so you can read each scenario top-to-bottom. * **`median`** — median wall-clock per call, in milliseconds. This is the primary signal. * **`p95`** — 95th percentile. A large p95 / median ratio means the dialect or scenario is jittery (Mongo and remote Postgres often are). A regression that shows up in p95 but not in median usually points at GC or autovacuum. * **`ops/s`** — derived from median (`1000 / median`). Useful for the "this reads roughly N qps" gut-check. * **`overhead`** — `(forge_median - raw_median) / raw_median × 100`, the per-call cost of going through the ORM. For numbers like the +12.7% above: forge wraps each call with the row decoder, the IR compile cache lookup, the event emitter, and a couple of property walks. On a 1.26ms raw call, that is in the noise — but it's a number you can track release to release. The **compare** report has a slightly different shape: one block per scenario, one row per engine, with the raw driver row marked `baseline`: ``` postgres - 200 iter, 500 seed rows (overhead = vs raw driver) findMany engine median p95 ops/s overhead raw pg 1.26ms 1.95ms 794 baseline forge 1.42ms 2.18ms 704 +12.7% prisma 2.31ms 3.84ms 433 +83.3% drizzle 1.39ms 2.21ms 719 +10.3% ... ``` The `n/a` row is what shows up when a Prisma adapter is missing or Drizzle is asked about Mongo: ``` prisma n/a Prisma 7 needs @prisma/adapter-pg + generated client (not installed) drizzle n/a Drizzle has no MongoDB driver ``` Run-to-run variance on a typical laptop is roughly ±5% on medians for the SQL dialects against localhost, and ±10–20% for Mongo (which has a chattier wire protocol). Variance between machines is much larger — never compare numbers from two machines, only deltas between releases on the same machine. --- ## What forge optimises The bench number is the proof. The mechanisms behind it are documented for context: * **IR compile cache.** Every query is compiled once from the public API shape (`findMany({ where: { role: 'EDITOR' } })`) into an IR tree, then into dialect-specific SQL plus a params array. The cache key is the IR shape, not the parameter values, so 1,000 calls to `findFirst({ email })` with 1,000 different emails compile once and reuse the prepared SQL 1,000 times. The bench loop hits this cache cold on iteration 1 and hot for the next 199 — exactly the production pattern. * **No codegen.** forge has no separate generator step. There is no `prisma generate` to run, no client to ship in `node_modules`, no engine binary to start. Cold-start is one `import` and one `createDb` call. This shows up in CLI tools and Lambda cold-paths far more than in the bench itself, where the cost is already amortised. * **Prepared-statement reuse.** Where the driver supports it (better-sqlite3, `pg` extended protocol, `mysql2` `.execute()`), forge reuses prepared statements keyed by the compiled SQL string. The bench's `findFirst` and `update` rows are where this shows. * **Row decoder is a single walk.** `decodeRow` walks `Object.keys()` of the driver's row once and applies per-column transforms (booleans from 0/1 on SQLite, JSON parses on MySQL, etc.). It does not iterate the schema and look up columns — that lookup happens at compile time. * **The event emitter is a no-op when there are no subscribers.** If nothing is subscribed, the emit path is one boolean check. See [Cost of observability layers](#cost-of-observability-layers). What forge does **not** do, and why these are not bench wins: * No connection-pool of its own — it uses the driver's. Pool tuning is in [POOLING.md](POOLING.md). * No client-side query cache. The cache is the IR compile cache, not a result cache. Result caching belongs above forge, not inside it. * No bytecode VM, no separate process. The engine *is* node + driver. --- ## Known regressions and honest notes Some things forge is measurably slower at than the closest competitor: * **Drizzle on `count`** — Drizzle's hand-built SQL for `count(*)` skips a row decoder pass forge can't (forge unifies count and find through the same IR shape). The gap is small (low single-digit percent) and consistent. * **Prisma on bulk batch inserts** — Prisma's protocol pipelines the insert rows through its engine, which can edge out forge's straightforward parameterised `INSERT ... VALUES (...), (...)` for very large batches (1000+ rows in one call). At the row counts you'd actually run in a request handler, the two are within a percent. * **MySQL `update` with `pool.execute`** — the raw baseline uses `mysql2`'s prepared `execute()` path; the forge path uses `query()` for SQL it has already compile-cached. On small parameter counts the difference is in the noise; on bigger parameter sets (`IN (...)` of 100 ids) prepared execution wins by enough to show up in the overhead column. For the broader honest list — feature gaps, dialect quirks, what's still on roadmap — see the [Limitations and honest notes](../README.md#limitations-and-honest-notes) section of the README. If a regression shows up that you can reproduce, open an issue with the bench output, the dialect, the BENCH_ITER / BENCH_SEED used, and the forge versions on either side of it. --- ## Adding your own scenarios The harness is a single file. To add a scenario, edit `bench/db-bench.ts` and follow the four-step pattern the existing scenarios use: ```ts // 1. Declare per-scenario arrays alongside the existing r1..r4 / w1..w4. const r5: number[] = []; const w5: number[] = []; // 2. Inside the iteration loop, push a forge-call sample and a raw-call sample. for (let i = 0; i < BENCH_ITER; i++) { const idx = i % BENCH_SEED; // ... existing scenarios ... r5.push(await timed(() => db.user.findMany({ where: { role: 'EDITOR', active: true }, orderBy: { created_at: 'desc' }, take: 50, }))); w5.push(await timed(async () => { await pool.query( `SELECT * FROM "users" WHERE "role" = $1 AND "active" = $2 ORDER BY "created_at" DESC LIMIT 50`, ['EDITOR', true], ); })); } // 3. Add the pair to the returned Result[] tuple. return [ // ... existing rows ... [timeit('findMany active', r5), timeit('findMany active [raw pg]', w5)], ]; ``` The same shape is repeated in each `benchPg / benchMysql / benchSqlite / benchMongo` function. Add the scenario to each dialect you want to cover and keep the raw SQL hand-tuned per dialect — that's the whole point of the raw column. Don't try to build a cross-dialect raw query; you'll measure the common-denominator path instead of the natural one. For compare mode, the scenario also has to be added to `runOps()` in `bench/compare/compare-bench.ts` and to the `Ops` interface — the harness will then call it across forge, Prisma, Drizzle, and raw with the same `idx`. The forge thunk goes in `forgeOps`, the raw / Drizzle / Prisma thunks go in their respective per-dialect helpers. If a scenario should be a one-off (you're investigating a specific commit, not extending the suite), copy `bench/db-bench.ts` to `bench/your-bench.ts` and edit there. The compile / seed / cleanup helpers are exported from the source files so a side-bench can reuse the same setup. --- ## CI integration The bench fits into CI in two patterns: **1. Smoke run on every PR.** The default bench (`BENCH_ITER=200`, `BENCH_SEED=500`) runs in under a minute end-to-end for the SQL dialects. Wire it to PR jobs so a regression in `findFirst` overhead from +5% to +50% blocks the merge. ```yaml # .github/workflows/bench.yml (sketch) - run: docker-compose up -d postgres mysql mongo - run: npm ci - run: npm run forge:bench:sqlite > bench.txt - run: npm run forge:bench:pg >> bench.txt - run: npm run forge:bench:mysql >> bench.txt - uses: actions/upload-artifact@v4 with: { name: bench, path: bench.txt } ``` For overhead regression gating, parse the `overhead` column out of the bench output and compare against a baseline file in the repo. A simple awk one-liner covers it; for fancier regression detection, the percentile output is stable enough that you can run a small Python script to compare medians with a two-sample Mann–Whitney check. A workable threshold for the SQL dialects: fail the job if any scenario's overhead moves by more than +10 percentage points (e.g. +6% → +17%). Mongo is jittery enough that +25 is a more honest threshold. **2. Nightly compare run.** `forge:bench:compare` takes longer and is more useful as a nightly than a per-PR job. Save the output as an artifact and post a comment on the relevant tracking issue when forge regresses below Drizzle on a scenario where it was previously ahead. The compare run requires `npm run forge:bench:compare:gen` before the first invocation in a fresh checkout — wire that as a setup step. --- ## Profiling a bench run The bench is a hot loop with no setup overhead during the measured phase — it is the ideal target for a profiler. **Node `--inspect`.** The lowest-friction option: ```sh node --inspect-brk -r ts-node/register bench/db-bench.ts ``` Open `chrome://inspect`, attach to the process, and grab a CPU profile across the iteration loop. The seed and cleanup phases will dominate the file unless you filter; in DevTools, narrow the timeline window to just the bench loop after `[bench:pg] database: …` has logged. **clinic.js.** Higher-level, gives flame graphs and a doctor report: ```sh npx clinic doctor -- node -r ts-node/register bench/db-bench.ts npx clinic flame -- node -r ts-node/register bench/db-bench.ts ``` The flame graph is the artifact to read first — the IR compile path and the row decoder will be the two tallest forge frames, and changes between releases will show up as their relative width shifting. **0x.** Fast flame-graph generation without the doctor layer: ```sh npx 0x -- node -r ts-node/register bench/db-bench.ts ``` Tighten the loop before profiling: bump `BENCH_ITER=2000` so the measured phase dominates the seed and cleanup phases in the profile. Otherwise the profiler will mostly be staring at `createMany` and `applyMigration`. **SQL-side profiling.** If the overhead column is fine but a scenario is slower than you expected in absolute terms, the bottleneck is downstream of forge. `EXPLAIN ANALYZE` on the compiled SQL is the right next step; you can log it with the event subscriber pattern from [EVENTS.md](EVENTS.md). --- ## Driver-level vs ORM-level benchmarks The two columns in the bench output (`forge`, `raw `) split the problem space: * **The raw column tracks the driver / database / hardware.** If raw `findFirst` slows from 0.5ms to 5ms, it's not forge — your Postgres is cold, the disk is full, a noisy neighbour is running, or the driver shipped a regression. Run the driver smoke harness (`npm run smoke:drivers`) to confirm the driver itself is healthy; see the Testing section of the [README](../README.md#driver-smoke-harness). * **The overhead column tracks forge.** If raw is steady but overhead jumps, it's the IR, the executor, the row decoder, or the event path. Use the profiler pointers above. When you're picking between drivers (`pg` vs `postgres`, `mysql2` vs `mariadb`, `better-sqlite3` vs `@libsql/client`), bench the *raw column*. The forge overhead is essentially the same across drivers of the same kind (adapters and dialects are stable; only the bottom 50 lines differ — see [DRIVERS.md](DRIVERS.md#why-bring-your-own-driver-exists)). So the question is which driver is faster underneath, and that's the raw row. When you're picking between dialects (Postgres vs MySQL for a workload), bench end-to-end with whatever workload you actually run — the four shipped scenarios are useful as a sanity check, but a real workload mix will reorder the dialects. Use [Adding your own scenarios](#adding-your-own-scenarios) to shape the bench against your queries. --- ## Microbench traps Microbenchmarks are easy to read wrong. The traps the forge harness has been tuned to avoid, and the ones you should still be aware of: * **JIT warmup.** V8 inlines and re-optimises hot functions across the first few hundred invocations. The bench's 500-row seed plus the first ~20 iterations is enough warmup for the dialects we ship; on a colder workload you may need `BENCH_ITER=1000` before the median stabilises. If the p95 / median ratio is large *and* shrinking as iterations grow, you're still warming up. * **GC pauses.** A 30ms p95 on a 1ms median is almost certainly a young-gen GC. The median absorbs it; don't read into individual spikes. If GC is driving variance, run with `--max-old-space-size=2048` and `--expose-gc` and inject a manual `global.gc()` between scenarios in a fork of the bench. * **IO buffering and OS page cache.** The first run after a reboot will be slower than subsequent runs because the database files aren't in page cache. Warm with one throwaway run, then start measuring. * **CPU frequency scaling.** Laptops on battery throttle to save power and produce wildly different numbers from the same laptop on AC. Plug in before benching. * **Network jitter.** Any "Postgres at 127.0.0.1" still goes through the loopback stack and `localhost` resolution. `BENCH_PG_URL` pointing at Unix-socket Postgres (`postgres:///postgres`) eliminates the TCP overhead; the overhead column won't change but the absolute numbers will. * **Don't compare across machines.** A +6% overhead on one laptop and a +14% on another doesn't mean forge regressed — the second laptop is probably slower in absolute terms, and the same constant-cost work shows up as a bigger percentage of a smaller absolute number. --- ## Cost of observability layers forge's `QueryEvent` subscribers are a no-op when nothing is subscribed. Subscribe one, and you've added a per-call cost. A rough sense of the cost, measured by adding a subscriber to the bench loop and re-running: ```ts // Bench loop with a single no-op subscriber attached: db.$events.on('query', () => { /* noop */ }); // Result: overhead on findFirst climbs from +8.9% to roughly +11–13% // on Postgres (single-digit microseconds of added latency). // Bench loop with a logging subscriber that JSON.stringify's the event: db.$events.on('query', (e) => { logger.debug(JSON.stringify(e)); }); // Result: overhead on findFirst climbs to roughly +25–35%; most of the // cost is JSON.stringify of the params array, not the emit itself. ``` The takeaways: * The emit itself is cheap. It's the work you do **in the subscriber** that costs. * If you want full query logging in production, sample it. See the [Sampling strategies](EVENTS.md#sampling-strategies) section of EVENTS.md. * If you want metrics, accumulate in memory and flush periodically; don't do per-event aggregation that does string work. * OpenTelemetry spans cost more than logging because the span machinery is itself more expensive than a JSON write. The OTel integration in EVENTS.md uses the parent-context optimisation for that reason. The bench does not enable any subscribers by default. If you're benching a workload that includes your production observability, attach the subscribers before the timed loop in your fork of `bench/db-bench.ts`. --- ## Worked examples ### A. Bench a custom driver Suppose you've written a custom Postgres driver wrapper (a Neon adapter, an RDS Data API shim, a `postgres` driver in place of `pg`). The bench can prove the wrapper is faithful and measure how it compares to the in-tree driver. ```ts // bench/my-driver.ts import { createDb } from '../src'; import { schema } from '../src/schema'; import { buildSchemaDDL as buildPgDDL } from '../src/adapters/postgres/ddl'; import { applyMigration } from '../src/adapters/postgres/migrate'; import { myCustomDriver } from '../src/my-driver'; // your wrapper const db = await createDb({ driver: myCustomDriver({ url: process.env.URL! }) }); const pool = (db.adapter as any).pool; await applyMigration(pool, buildPgDDL(schema as any)); await db.user.createMany({ data: Array.from({ length: 500 }, (_, i) => ({ id: `u_${i}`, email: `b${i}@x.co`, name: `U${i}`, role: i % 3 === 0 ? 'EDITOR' : 'USER', })), }); // then copy the iteration loop from bench/db-bench.ts and report. ``` Diff the resulting medians against the `forge:bench:pg` numbers from the same machine. If your wrapper is within a few percent on every scenario, ship it — that's the same shape forge's `pg` and `postgres` drivers have against each other. ### B. Reproduce vs Drizzle A user posts a benchmark claiming Drizzle is 3× faster than forge on `findMany`. Reproduce in compare mode: ```sh npm run forge:bench:compare:gen # one-time per checkout BENCH_ITER=1000 BENCH_SEED=5000 npm run forge:bench:compare:pg ``` Read the compare block for the `findMany` scenario. If the result on your machine shows forge and Drizzle within 5% (which is the typical outcome on the shipped scenarios), the disagreement is about a different scenario or a different setup — ask for the exact query shape and add it via [Adding your own scenarios](#adding-your-own-scenarios). If forge is actually 3× slower, open an issue with the compare output and the BENCH params. ### C. Regression-gate in CI A minimal gate that fails CI if forge's `findFirst` overhead on Postgres regresses by more than 10 percentage points: ```sh # scripts/bench-gate.sh set -e BASELINE_OVERHEAD=8.9 # measured on main; commit this number to the repo npm run forge:bench:pg | tee bench.txt CURRENT=$(awk ' /^ findFirst / { in_block = 1; next } in_block && /\+[0-9.]+%/ { gsub(/%/,""); print $NF; exit } ' bench.txt | tr -d '+') DELTA=$(echo "$CURRENT - $BASELINE_OVERHEAD" | bc) echo "findFirst overhead: ${CURRENT}% (baseline ${BASELINE_OVERHEAD}%, delta ${DELTA})" if (( $(echo "$DELTA > 10" | bc -l) )); then echo "regression: findFirst overhead regressed by more than 10pp" exit 1 fi ``` Wire that step into the PR job. The baseline number is updated on main with a separate commit whenever a deliberate change shifts the overhead. --- ## Cross-references * [DRIVERS](DRIVERS.md) — what the driver port looks like, why raw-vs-forge is a meaningful baseline, and the wire-compatible swaps available. * [EVENTS](EVENTS.md) — full cost model for `QueryEvent` subscribers, sampling strategies, and the worked sinks for pino / Sentry / OpenTelemetry / Prometheus referenced above. * [POOLING](POOLING.md) — how the pool size and pool kind affect the absolute numbers in the bench, and why the compare bench reuses forge's pool for Drizzle. * [Performance](../README.md#performance) — the README's short, honest framing of what the bench numbers do and don't say. * [Limitations and honest notes](../README.md#limitations-and-honest-notes) — the broader honest list referenced in [Known regressions and honest notes](#known-regressions-and-honest-notes). * [Driver smoke harness](../README.md#driver-smoke-harness) — the install-and-connect harness referenced in [Driver-level vs ORM-level benchmarks](#driver-level-vs-orm-level-benchmarks).