# Batch operations `createMany`, `updateMany`, `deleteMany`, and per-dialect bulk paths — what forge-orm emits, the per-dialect bind-parameter limits, the chunking patterns that scale to millions of rows, and the failure-semantics gotchas (ordered vs unordered, skip-duplicates, RETURNING availability). [MUTATIONS.md](./MUTATIONS.md) covers the verbs in isolation. This page is the throughput-and-correctness lens on the same surface: what changes when `data` is 100,000 rows instead of one, what changes when the network RTT is 30 ms instead of 0.1 ms, and what changes when half the rows collide with a unique constraint. ## Contents * [The batch verbs](#the-batch-verbs) * [Per-dialect emit](#per-dialect-emit) * [Bind-parameter caps and chunking](#bind-parameter-caps-and-chunking) * [`skipDuplicates` per dialect](#skipduplicates-per-dialect) * [`RETURNING` after a batch](#returning-after-a-batch) * [Throughput vs per-row latency](#throughput-vs-per-row-latency) * [Failure semantics](#failure-semantics) * [Transactions wrapping a batch](#transactions-wrapping-a-batch) * [Memory profile and streaming](#memory-profile-and-streaming) * [`updateMany` is uniform — per-row diff is raw SQL](#updatemany-is-uniform--per-row-diff-is-raw-sql) * [Per-row diff with `CASE WHEN`](#per-row-diff-with-case-when) * [Mongo `bulkWrite`](#mongo-bulkwrite) * [Beyond `createMany` — native bulk loaders](#beyond-createmany--native-bulk-loaders) * [Worked examples](#worked-examples) * [See also](#see-also) --- ## The batch verbs Forge exposes three plural-form write verbs on every model, two of which emit a single statement on every SQL dialect. | Verb | Compiles to | Returns | Notes | |--------------|----------------------------|-----------------------|--------------------------------------| | `createMany` | one multi-row INSERT | `{ count: number }` | rows back via `RETURNING` on PG/SQLite/DuckDB; count-only on MySQL/MSSQL | | `updateMany` | one UPDATE with WHERE | `{ count: number }` | same `data` shape applies to every matched row | | `deleteMany` | one DELETE with WHERE | `{ count: number }` | hard delete; for soft delete see [SOFT-DELETE](./SOFT-DELETE.md) | There is no `upsertMany`. The closest shape is `createMany({ skipDuplicates: true })` when you can tolerate "skip the colliding rows" semantics, or a per-row loop of `upsert` calls inside a `$transaction` when you need update-on-conflict per row. The reasons are dialect-asymmetric: * Postgres / SQLite / DuckDB can in principle do `INSERT … VALUES (…),(…) ON CONFLICT … DO UPDATE`, but a single `SET` clause cannot reference the conflicting row's per-row values without `EXCLUDED.col` and a uniform update target. * MySQL `INSERT … ON DUPLICATE KEY UPDATE col = VALUES(col)` does work in bulk, but with the same uniform-update constraint. * MSSQL `MERGE` (the upsert path since 2.5.0) accepts a `VALUES` source but the `WHEN MATCHED` branch fires per matched row only. * Mongo `bulkWrite` does compose a batch of `updateOne … upsert: true` operations — see [Mongo `bulkWrite`](#mongo-bulkwrite) for the shape. If you need true per-row upsert across N rows in one call, drop to [`bulkWrite`](#mongo-bulkwrite) on Mongo, or `$executeRaw` with a CTE fed from `unnest` on Postgres (see [RAW-SQL](./RAW-SQL.md)). ```ts const { count } = await db.user.createMany({ data: incoming, // array of N rows skipDuplicates: true, }); const { count: archived } = await db.session.updateMany({ where: { last_seen: { lt: cutoff } }, data: { archived: true }, }); const { count: removed } = await db.tempUpload.deleteMany({ where: { created_at: { lt: ttl } }, }); ``` --- ## Per-dialect emit What forge sends to the driver, for each verb, for each adapter. ### `createMany` — multi-row INSERT | Adapter | SQL / op | |-----------|----------------------------------------------------------------------| | postgres | `INSERT INTO "t" (cols) VALUES (?,?,…),(?,?,…),… RETURNING *` | | mysql | `INSERT INTO \`t\` (cols) VALUES (?,?,…),(?,?,…),…` (no RETURNING) | | sqlite | `INSERT INTO "t" (cols) VALUES (?,?,…),(?,?,…),… RETURNING *` (3.35+)| | duckdb | `INSERT INTO "t" (cols) VALUES (?,?,…),(?,?,…),… RETURNING *` | | mssql | `INSERT INTO [t] (cols) VALUES (?,?,…),(?,?,…),…` | | mongo | `db.t.insertMany([…], { ordered: !skipDuplicates })` | The IR carries one `InsertNode` with a `rows: Row[]` field; every adapter walks `node.rows` once and emits a single statement. The column list is the union of keys across all rows in schema-declared order — rows omitting a column get `NULL` (or the field's default, applied client-side before the IR is built). Source: `src/adapters/postgres/compile-from-ir.ts` (the canonical emitter, reused by SQLite / DuckDB / MSSQL / MySQL with dialect swaps). ### `updateMany` — single UPDATE | Adapter | SQL / op | |-----------|----------------------------------------------------------------------| | postgres | `UPDATE "t" SET col1 = ?, col2 = ? WHERE …` (rowcount) | | mysql | `UPDATE \`t\` SET col1 = ?, col2 = ? WHERE …` (affectedRows) | | sqlite | `UPDATE "t" SET col1 = ?, col2 = ? WHERE …` (changes()) | | duckdb | `UPDATE "t" SET col1 = ?, col2 = ? WHERE …` (rowcount) | | mssql | `UPDATE [t] SET col1 = ?, col2 = ? WHERE …` (rowcount) | | mongo | `db.t.updateMany(filter, update)` | The same `SET` applies to every matched row. Per-row diff is not possible inside a single `updateMany` — see [Per-row diff with `CASE WHEN`](#per-row-diff-with-case-when). ### `deleteMany` — single DELETE | Adapter | SQL / op | |-----------|----------------------------------------------------------------------| | postgres | `DELETE FROM "t" WHERE …` (rowcount) | | mysql | `DELETE FROM \`t\` WHERE …` (affectedRows) | | sqlite | `DELETE FROM "t" WHERE …` (changes()) | | duckdb | `DELETE FROM "t" WHERE …` (rowcount) | | mssql | `DELETE FROM [t] WHERE …` (rowcount) | | mongo | `db.t.deleteMany(filter)` | On Mongo, `deleteMany` does not cascade — cascading deletes are walked in TypeScript by the wrapper, issuing per-relation `deleteMany` calls. See [MUTATIONS.md — Cascading](./MUTATIONS.md#cascading). --- ## Bind-parameter caps and chunking Every driver has a hard limit on how many bind parameters fit in one statement. The cap is the number you multiply by row count to get the practical batch size. | Dialect | Cap | Source | |----------------------|--------------------------------------|-----------------------------------------------| | Postgres | 65,535 placeholders per statement | wire-protocol short for parameter count | | MySQL | `max_allowed_packet` (default 64 MB) | server config — bytes, not parameters | | SQLite | 32,766 (default since 3.32) | `SQLITE_MAX_VARIABLE_NUMBER` compile-time | | DuckDB | ~65,535 | driver parameter array | | MSSQL | 2,100 parameters | T-SQL parser hard limit | | Mongo | 16 MB BSON per command | wire-protocol document size cap | So for a row with 20 columns: | Dialect | Theoretical max rows / `createMany` | Practical guidance | |----------------------|-------------------------------------|--------------------------| | Postgres | 65,535 / 20 = 3,276 | 1,000 – 3,000 | | MySQL | bytes-bound — 5,000-ish | 1,000 – 5,000 | | SQLite | 32,766 / 20 = 1,638 | 1,000 | | DuckDB | ~3,200 | 1,000 – 3,000 | | MSSQL | 2,100 / 20 = 105 | 100 | | Mongo | bytes-bound — thousands | pass any size; driver splits | These are the numbers that fit in **one statement**. Above them, you chunk on the application side. Forge does not chunk for you — every `createMany` becomes exactly one `INSERT INTO … VALUES (…),(…),…`. The deliberate non-chunking is so the caller controls transaction boundaries: a 100k-row import split into 100 chunks of 1,000 can sit inside one `$transaction` (atomic) or 100 separate transactions (resumable), and forge does not decide for you. The canonical chunk loop: ```ts async function bulkInsert(rows: Row[]) { const CHUNK = 1000; await db.$transaction(async (tx) => { for (let i = 0; i < rows.length; i += CHUNK) { await tx.event.createMany({ data: rows.slice(i, i + CHUNK), skipDuplicates: true, }); } }); } ``` On MSSQL drop `CHUNK` to 100 (the 2,100-parameter ceiling bites at ~20 columns). On Mongo, pass the whole array — the driver pipelines. For sizing in the other direction — how big a chunk gets you most of the throughput without paying for a wider parser — see [Throughput vs per-row latency](#throughput-vs-per-row-latency). --- ## `skipDuplicates` per dialect Forge's `skipDuplicates: true` flag turns "conflict → throw" into "conflict → skip and count what landed". The compiled output is uniform on the SQL side — `ON CONFLICT DO NOTHING` — because forge reuses the Postgres compiler for SQLite / DuckDB / MySQL / MSSQL and all four target dialects accept the syntax: | Adapter | Emitted clause | Notes | |-----------|-----------------------------------------------|----------------------------------------| | postgres | `INSERT … VALUES … ON CONFLICT DO NOTHING` | native since 9.5 | | mysql | `INSERT … VALUES … ON CONFLICT DO NOTHING` | MySQL 8.0.19+ / MariaDB 10.5.6+; falls back to a parser error on older servers | | sqlite | `INSERT … VALUES … ON CONFLICT DO NOTHING` | requires 3.24+ | | duckdb | `INSERT … VALUES … ON CONFLICT DO NOTHING` | DuckDB 0.8+ (PG-compatible) | | mssql | (flag ignored — `INSERT INTO … VALUES …`) | T-SQL has no equivalent; collisions throw | | mongo | `insertMany([…], { ordered: false })` + filter `E11000 duplicate key` from `writeErrors` | partial success returns the count of inserted docs | Source: `src/adapters/postgres/compile-from-ir.ts` line 529 (`node.skipDuplicates ? ' ON CONFLICT DO NOTHING' : ''`), and `src/adapters/mongo/compile-from-ir.ts` line 263 (`ordered: !node.skipDuplicates`), with the duplicate-error filter at `src/adapters/mongo/execute.ts` line 301. If you are on a MySQL older than 8.0.19, the `ON CONFLICT DO NOTHING` clause will fail to parse. Two options: upgrade, or wrap the batch in a try/catch on duplicate-key errors. Forge does not emit the older `INSERT IGNORE` syntax, by design — `IGNORE` swallows every error class (truncation, conversion, missing default) not just duplicate-key, and the resulting `count` becomes hard to reason about. MSSQL does not have a single-statement counterpart. If you need collision-tolerant bulk on MSSQL, two paths: * Loop `upsert` per row inside a `$transaction` (now lands as `MERGE` per row since 2.5.0 — see [MUTATIONS.md — upsert](./MUTATIONS.md#upsert)). * Stage the rows into a temp table with `BULK INSERT`, then `MERGE` the temp table into the real one with `WHEN NOT MATCHED THEN INSERT`. The Mongo path deserves a closer look. `ordered: false` tells the driver to continue past write errors instead of stopping at the first one; `writeErrors` then contains an entry per failed doc, and forge filters out the `code: 11000` (duplicate key) errors when returning the count. Any other error class — schema validation, BSON size overflow — rethrows. ```ts // 100 docs, 10 of which collide on a unique index const { count } = await db.event.createMany({ data: rows, skipDuplicates: true, }); // count === 90; the 10 collisions silently dropped ``` --- ## `RETURNING` after a batch What you can read back differs by dialect, and the difference shapes how you plan the round-trips. | Verb / dialect | postgres | mysql | sqlite (3.35+) | duckdb | mssql | mongo | |----------------|----------|-------|----------------|--------|-------|----------------------| | `createMany` | rows | count | rows | rows | count | inserted ids + count | | `updateMany` | count | count | count | count | count | count | | `deleteMany` | count | count | count | count | count | count | On Postgres / SQLite 3.35+ / DuckDB the `createMany` emit includes `RETURNING *`, so the rows come back inline. On MySQL / MSSQL there is no batch `RETURNING` — the wrapper returns only `{ count }`. If you need the rows after a MySQL / MSSQL batch insert and you cannot re-`SELECT` by some natural key, your options shrink to: 1. **Pre-generate the IDs client-side.** Forge already does this for `f.id()` columns — every row that omits the id key gets a ULID stamped before the insert leaves the process. The IDs are visible on the input array. See [MUTATIONS.md — ID generation timing](./MUTATIONS.md#id-generation-timing). 2. **Insert in a transaction, then re-SELECT.** Inside one `$transaction`, do the `createMany` and a `findMany` keyed by the generated IDs — atomic relative to other writers. 3. **Loop `create` per row.** N round-trips, N rows back. Only sensible at very small N. The `updateMany` / `deleteMany` rows-back gap is universal — no adapter returns the affected rows. Workarounds: `$transaction([ findMany, updateMany ])`, or a `RETURNING *` raw query on Postgres specifically (`db.$queryRaw\`UPDATE … RETURNING *\``). The `select` and `include` keys are not honoured on `createMany` — the verb returns `{ count }` even on dialects that support `RETURNING`. To get a shaped row back from a batch, use single-row `create` calls in a transaction or drop to `$queryRaw`. --- ## Throughput vs per-row latency The three shapes for writing N rows, in order of throughput, with the trade-offs. | Shape | Round-trips | Statements parsed | Commits | Memory peak | |--------------------------------------|-------------|--------------------|---------|-------------| | `createMany({ data: N rows })` | 1 | 1 | 1 | N rows | | `$transaction([N creates])` | 1 | N | 1 | N rows | | N separate `create` calls | N | N | N | 1 row | Rough numbers, single client on a local Postgres, 200-byte rows (reproduced from MUTATIONS.md): | Shape | Rows / second | |------------------------------------|---------------| | `createMany({ data: 10_000 rows })`| ~150,000 | | `$transaction([10_000 creates])` | ~25,000 | | 10,000 separate `create` calls | ~3,000 | The shape of the difference is the same on every adapter. Exact numbers shift with network latency (RTT pads the per-round-trip costs), row width (parser parses each tuple), and index count on the target table (commits include WAL fsync per index update). What this means for chunk sizing: * **Local Postgres / SQLite / DuckDB** — the wire is fast. Chunks of 500-1,000 already saturate the parser; going to 10,000 buys diminishing throughput at the cost of memory and a longer failure-rollback window. * **Remote Postgres / MySQL across the public internet** — RTT dominates. Larger chunks (5,000+) amortise the round-trip; the bind-parameter cap becomes the real ceiling. * **MSSQL** — the 2,100-parameter cap forces chunks of ~100 at typical row widths, so the round-trip overhead is always paid. The win from batching is smaller; consider the native bulk-copy path (`bcp` / Tedious `bulkLoad`) once volume exceeds ~50k rows. Below ~100 rows, the difference between `createMany` and a loop of `create` is invisible. The per-row latency of `create` is one round-trip plus per-statement parser cost; that's already in the low millisecond range. The batch win only starts to matter at hundreds to thousands of rows. --- ## Failure semantics The atomicity guarantee on `createMany` differs across SQL and NoSQL, and changes again when `skipDuplicates` is in play. ### SQL — single statement, atomic `INSERT INTO … VALUES (…),(…),(…)` is one statement. Either all rows commit, or none do — the database treats it as one logical write. A constraint violation on row 47 of a 1,000-row batch rolls back rows 1-46 along with everything from row 48 onward. The atomicity is statement-level, not transaction-level. If the `createMany` is the only statement in its session, a successful return means all N rows are visible to other readers. If it is inside a wider `$transaction`, visibility waits for `COMMIT` like any other write. With `skipDuplicates: true`, the duplicate-key errors are not errors — `ON CONFLICT DO NOTHING` succeeds and the conflicting rows are silently dropped. The returned `count` is the actual inserted rowcount, not the input array length. Non-duplicate errors (CHECK violations, foreign-key failures, type-coercion problems) still throw and roll back the whole statement. ### Postgres — `count` reflects skipped rows Postgres reports `command-status` as `INSERT 0 N` where `N` is the inserted count. Forge reads `result.rowCount` — which after a `DO NOTHING` is exactly the inserted-not-skipped count. So: ```ts const { count } = await db.event.createMany({ data: rows, // length 1000, 200 duplicates skipDuplicates: true, }); // count === 800 ``` ### MySQL — `affectedRows` vs duplicates If you are on MySQL 8.0.19+ and the `ON CONFLICT DO NOTHING` clause parses, `affectedRows` returns the inserted count. On older MySQL where the clause errors out, the entire batch fails with a parse error before any row inserts. ### SQLite — `changes()` After `ON CONFLICT DO NOTHING`, SQLite's `changes()` returns the number of rows actually written. Same model as Postgres. ### MSSQL — collisions throw T-SQL has no native `DO NOTHING`. Forge's `createMany` on MSSQL ignores `skipDuplicates: true` (the flag is accepted for API parity but does nothing). A duplicate-key violation rolls back the entire `INSERT` and rethrows. If you need conflict tolerance on MSSQL, loop `upsert` per row inside a `$transaction`, or stage-and-MERGE through a temp table. ### Mongo — ordered vs unordered Mongo's `insertMany` takes an `ordered` option that controls how the driver responds to mid-batch errors. | `ordered` | Behaviour | |------------------|------------------------------------------------------------------------------| | `true` (default) | Stop at the first failure. Rows before it inserted; rows after it skipped. | | `false` | Continue past failures. Collect errors in `writeErrors`. Insert as many as possible. | Forge sets `ordered: !node.skipDuplicates` — so the default `createMany` runs ordered (and throws on the first failure), and `skipDuplicates: true` runs unordered (and silently drops duplicates). For mixed batches where you want unordered insert but still want to surface non-duplicate errors, drop to the driver: ```ts const coll = db.$client.db().collection('events'); try { await coll.insertMany(docs, { ordered: false }); } catch (err) { const real = err.writeErrors.filter((e) => e.code !== 11000); if (real.length) throw new Error(`unexpected: ${real.length} failures`); } ``` The `ordered: false` path also runs the inserts in parallel batches on the wire, so it is faster than ordered even when no errors hit. --- ## Transactions wrapping a batch A `createMany` is one statement; wrapping it in a transaction adds zero new isolation but does change two things: * **Commit boundary** — without an explicit transaction, the statement auto-commits when it returns. Inside a `$transaction`, the commit waits for the outer block to resolve, and any earlier failure rolls the batch back. * **Visibility to other writers** — rows from an auto-committed batch are visible immediately. Rows from a batch inside a transaction are only visible after the outer commit. The pattern that matters most: **one transaction wrapping many chunks**. ```ts async function bulkIngest(rows: Row[]) { const CHUNK = 1000; await db.$transaction(async (tx) => { for (let i = 0; i < rows.length; i += CHUNK) { await tx.event.createMany({ data: rows.slice(i, i + CHUNK), skipDuplicates: true, }); } }); } ``` This buys all-or-nothing semantics across a 1M-row import — if chunk 743 fails, the previous 742 chunks roll back too. The cost is that the transaction stays open for the entire ingest, which on Postgres delays `VACUUM`, on MySQL grows the InnoDB history list, and on SQLite holds the writer lock against any concurrent writer. See [TRANSACTIONS.md — Long-running transactions](./TRANSACTIONS.md#long-running-transactions). The opposite pattern — **one transaction per chunk** — accepts partial progress on failure but releases each chunk's locks immediately: ```ts async function resumableIngest(rows: Row[]) { const CHUNK = 1000; for (let i = 0; i < rows.length; i += CHUNK) { await db.$transaction(async (tx) => { await tx.event.createMany({ data: rows.slice(i, i + CHUNK), skipDuplicates: true, }); }); } } ``` If chunk 743 fails, chunks 0-742 are committed. The caller can re-run with `skipDuplicates: true` and the conflict-skip dedupes the already-inserted chunks on the retry. The trade-off is N COMMITs instead of one — more WAL flush, more network round-trips on remote databases. Pick based on resumability: long imports across millions of rows generally want per-chunk transactions plus idempotent input. --- ## Memory profile and streaming A `createMany({ data: N rows })` holds all N rows resident in process memory plus the driver's prepared-statement buffer. At 200 bytes per row that is 2 MB for 10k rows, 200 MB for 1M rows. The 1M case starts to thrash the V8 heap. Two patterns avoid that: ### Chunk from an array (eager source) The standard chunked-import loop above already bounds resident memory at `CHUNK * row_size` — the input array is held by the caller, but each `slice(i, i + CHUNK)` is a view that the GC reclaims after the chunk inserts. If the input array itself is the problem (1M rows pre-loaded into JS), null out the consumed prefix after each chunk to let the GC shrink the array: ```ts for (let i = 0; i < rows.length; i += CHUNK) { await tx.event.createMany({ data: rows.slice(i, i + CHUNK) }); for (let k = i; k < i + CHUNK && k < rows.length; k++) rows[k] = null as any; } ``` This is rarely needed below a few hundred MB. ### Stream from a producer (lazy source) When the rows arrive from a file, an HTTP body, a database cursor, or any other producer, push them into a fixed-size buffer and flush at every `CHUNK`: ```ts async function* parseNDJSON(path: string): AsyncIterable { const stream = fs.createReadStream(path).pipe(split2()); for await (const line of stream) { if (line) yield JSON.parse(line); } } async function streamingIngest(source: AsyncIterable) { const CHUNK = 1000; const buffer: Row[] = []; await db.$transaction(async (tx) => { for await (const row of source) { buffer.push(row); if (buffer.length >= CHUNK) { await tx.event.createMany({ data: buffer, skipDuplicates: true }); buffer.length = 0; } } if (buffer.length) { await tx.event.createMany({ data: buffer, skipDuplicates: true }); } }); } ``` Resident memory peaks at `CHUNK * row_size` regardless of how many rows the source produces — a 100M-row file ingests with the same heap footprint as a 100k-row file. This pairs naturally with `findManyStream` on the read side — see [STREAMING.md](./STREAMING.md) — when the producer is itself a forge query. --- ## `updateMany` is uniform — per-row diff is raw SQL `updateMany`'s `data` shape applies to every row matched by `where`. There is no per-row variant inside one call. This is a deliberate constraint: the SQL emit is one `UPDATE … SET col = ? WHERE …` with one row of bound parameters, so the same value goes into every matched row. ```ts // Every active session gets last_seen = now() await db.session.updateMany({ where: { active: true }, data: { last_seen: new Date() }, }); // Every product in 'electronics' gets price *= 1.1 await db.product.updateMany({ where: { category: 'electronics' }, data: { price: { multiply: 1.1 } }, }); ``` The atomic-op variants (`increment`, `decrement`, `multiply`, `divide`) compose into the same uniform shape — every row gets the same delta. The per-row computed update — "set price = price * tax_rate where tax_rate is per-row" — needs a raw `UPDATE … FROM` join. See [Per-row diff with `CASE WHEN`](#per-row-diff-with-case-when) for the SQL pattern. For the **same-target-many-rows** case (one row's value applied to N rows by filter), `updateMany` is one statement and runs at the database's row-update speed — comfortably tens of thousands of rows per second on a hot index. --- ## Per-row diff with `CASE WHEN` When N rows need N different updates in one round-trip, the SQL shape is `UPDATE … SET col = CASE id WHEN ? THEN ? WHEN ? THEN ? … END WHERE id IN (…)`. Forge does not emit this from `updateMany` — the data shape would have to encode per-row values, breaking the uniform-SET contract. Drop to `$executeRaw` instead: ```ts const updates = [ { id: 'p1', price: 9.99 }, { id: 'p2', price: 14.50 }, { id: 'p3', price: 7.25 }, ]; const ids = updates.map((u) => u.id); const cases = updates.map((u) => sql`WHEN ${u.id} THEN ${u.price}`); await db.$executeRaw` UPDATE products SET price = CASE id ${sql.join(cases, sql` `)} END WHERE id IN (${sql.join(ids.map((i) => sql`${i}`), sql`, `)}) `; ``` The trade-offs vs N separate `update` calls: * One round-trip vs N. The `CASE WHEN` form wins when N exceeds a few dozen and the network has any latency. * Bind-parameter count is `2N + N = 3N`. The same per-dialect caps apply — at 200 rows that is 600 placeholders, fine on Postgres, fine on SQLite, breaks MSSQL. * Statement parse cost grows linearly in N. Above a few thousand rows the parser becomes the bottleneck. The Postgres-specific shape that scales further uses `UPDATE … FROM` with a `VALUES` source — fewer placeholders per row, no `CASE`: ```ts await db.$executeRaw` UPDATE products SET price = v.price FROM (VALUES ${sql.join( updates.map((u) => sql`(${u.id}, ${u.price}::numeric)`), sql`, `, )}) AS v(id, price) WHERE products.id = v.id `; ``` Postgres and DuckDB accept this. MySQL needs `UPDATE … JOIN (SELECT … UNION ALL SELECT …)` instead. MSSQL accepts `UPDATE … FROM` similarly but stays bound by the 2,100-parameter cap. See [RAW-SQL.md](./RAW-SQL.md) for the full `$executeRaw` surface, the `sql` tagged-template helpers, and the parameter safety contract. --- ## Mongo `bulkWrite` When the operations in a batch are not all the same verb — a mix of inserts, updates, deletes, and upserts — Mongo's `bulkWrite` is the single-round-trip path. Forge does not wrap it; reach into the driver via `db.$client`: ```ts const coll = db.$client.db().collection('events'); await coll.bulkWrite([ { insertOne: { document: { _id: 'e1', kind: 'click', at: new Date() } } }, { updateOne: { filter: { _id: 'e2' }, update: { $set: { kind: 'view' } }, upsert: true, } }, { deleteOne: { filter: { _id: 'e3' } } }, ], { ordered: false }); ``` The `ordered` semantics match `insertMany`: * `ordered: true` — stop at the first failure. Earlier ops committed, later ops skipped. * `ordered: false` — continue past failures. Run independent ops in parallel batches; surface all errors in `writeErrors`. For pure-upsert batches, `bulkWrite` with all-`updateOne` and `upsert: true` is the Mongo equivalent of an `upsertMany` — one round-trip, atomic per op (a replica-set session can wrap the whole batch in a single multi-doc transaction). The forge `_id` generation rule still applies: `f.id()` columns get the ULID stamped client-side before the call leaves. For raw driver calls, you fill `_id` yourself. For the higher-level Mongo surface — sessions, change streams, adapter-specific gotchas — see [MONGO.md](./MONGO.md). --- ## Beyond `createMany` — native bulk loaders `createMany` saturates around ~150,000 rows/s on local Postgres with a moderate row width and a small number of indexes. Sustained ingest above that calls for the dialect's native bulk path, which bypasses the prepared-statement protocol and the typed model wrapper: * **Postgres `COPY FROM STDIN`** — the canonical bulk path. Use `pg-copy-streams` against the underlying `pg.Pool` exposed by `db.$raw().connection`. Streaming-friendly: pipe a CSV / NDJSON stream straight into the COPY socket. Typical throughput: 500k-1M rows/s on a single client. * **MySQL `LOAD DATA LOCAL INFILE`** — stream a tempfile through the wire. Requires `local_infile=ON` on the server. The mysql2 driver supports it via the `infileStreamFactory` option. * **MSSQL `BULK INSERT` / `bcp`** — `BULK INSERT` reads from a server-side file path; for client-side ingest use the Tedious driver's `bulkLoad` API, which speaks the TDS bulk-copy protocol directly. Typical throughput on `bulkLoad`: ~200k rows/s. * **DuckDB `COPY FROM`** — the canonical analytical-load path. Reads CSV, Parquet, or Arrow directly. Drops `INSERT` round-trips entirely for analytical ingest. * **SQLite — single `$transaction` already close to ceiling** — a tight loop of `createMany` inside one transaction is already near disk-write speed. The remaining tuning is PRAGMA: `journal_mode = WAL` and `synchronous = NORMAL`. See [SQLITE.md](./SQLITE.md). * **Mongo — `insertMany({ ordered: false })`** — already the bulk path. The driver pipelines op-msg writes over the same connection. These paths give up the type-check on the way in (forge's coercion layer is bypassed) in exchange for 5-10x throughput. Use them when the ingest pipeline is well-validated upstream and the throughput matters more than the row-by-row safety net. For the bridge from forge to these native paths, see [RAW-SQL.md](./RAW-SQL.md) (for `$executeRaw` and the underlying driver connection) and [SEED.md](./SEED.md) (which walks through when to switch). --- ## Worked examples Three problems that come up regularly. ### (a) 100,000-row CSV import Stream the CSV, chunk into batches of 1,000, one transaction per chunk for resumability, `skipDuplicates` so retries converge: ```ts import fs from 'node:fs'; import { parse } from 'csv-parse'; async function importCSV(path: string) { const CHUNK = 1000; const stream = fs.createReadStream(path).pipe(parse({ columns: true })); let buffer: Row[] = []; let inserted = 0; let chunks = 0; const flush = async () => { if (!buffer.length) return; await db.$transaction(async (tx) => { const { count } = await tx.event.createMany({ data: buffer, skipDuplicates: true, }); inserted += count; }); chunks++; if (chunks % 100 === 0) console.log(`import: ${inserted} rows`); buffer = []; }; for await (const row of stream) { buffer.push(row as Row); if (buffer.length >= CHUNK) await flush(); } await flush(); return { inserted }; } ``` Resident memory: ~1,000 rows × row size. Time on a local Postgres for 100k rows: ~1.5 seconds. The per-chunk transaction means a mid- import crash leaves the inserted-so-far rows committed; a retry with `skipDuplicates: true` resumes from where it stopped because the duplicates skip. If atomicity matters more than resumability, move the `$transaction` outside the loop — one transaction, the whole import, all-or-nothing. ### (b) Bulk soft-delete Mark every archived order older than the retention window as soft-deleted, in one statement: ```ts const cutoff = new Date(Date.now() - 365 * 24 * 60 * 60 * 1000); const { count } = await db.order.softDeleteMany({ where: { archived: true, created_at: { lt: cutoff } }, }); log.info('archived', { count }); ``` `softDeleteMany` compiles to an `UPDATE … SET deleted_at = now() WHERE …` — one statement, rowcount-only return. Reads in the rest of the app filter `deleted_at IS NULL` by default. To hard-delete the same set later (after the soft-delete-retention window has also passed), use `deleteMany` with the same filter plus `deleted_at: { lt: hardCutoff }`. See [SOFT-DELETE.md](./SOFT-DELETE.md) for the full lifecycle. ### (c) Bulk price update across products Raise every electronics-category price by 10%, server-side: ```ts await db.product.updateMany({ where: { category: 'electronics' }, data: { price: { multiply: 1.1 } }, }); ``` One statement, no application round-trip per row, no stale-read window. The arithmetic happens inside the row lock the database takes for the update — concurrent reads see either the old price or the new price, never an in-between state. When the update is **per-row** (each product has its own tax rate), `updateMany` cannot express it. Drop to `$executeRaw` with the `UPDATE … FROM (VALUES …)` pattern from [Per-row diff with `CASE WHEN`](#per-row-diff-with-case-when), or loop `update` per row inside a `$transaction` if N is small. --- ## See also * [MUTATIONS.md](./MUTATIONS.md) — the verb reference: per-row semantics, atomic ops, nested writes, returning shape. * [RAW-SQL.md](./RAW-SQL.md) — `$executeRaw`, `$queryRaw`, the `sql` tagged-template helpers, the per-row diff pattern. * [TRANSACTIONS.md](./TRANSACTIONS.md) — `$transaction`, isolation levels, long-running-transaction warnings. * [STREAMING.md](./STREAMING.md) — `findManyStream`, AsyncIterable reads, backpressure, the producer side of streaming ingest. * [SEED.md](./SEED.md) — repeatable seeds, idempotency, chunk guidance per dialect, switching to native bulk loaders. * [MONGO.md](./MONGO.md) — `bulkWrite`, `insertMany` ordered vs unordered, sessions and transactions, change streams. * [SOFT-DELETE.md](./SOFT-DELETE.md) — `softDeleteMany`, `restoreMany`, the `deleted_at` lifecycle.