# MySQL and MariaDB forge-orm targets MySQL 5.7+, MySQL 8.x, MariaDB 10.6+, and HeatWave (for vector). This page documents the dialect-specific behavior: driver picks, the InnoDB transaction model, MySQL's idiosyncratic JSON and spatial APIs, FULLTEXT engines, and the operational concerns (charset, online DDL, replication) that touch the forge layer. ## Contents * [Driver matrix](#driver-matrix) * [Engine matrix — MySQL 5.7 vs 8.x vs MariaDB 10.6+](#engine-matrix--mysql-57-vs-8x-vs-mariadb-106) * [Type emit](#type-emit) * [Charset and collation](#charset-and-collation) * [JSON columns](#json-columns) * [FULLTEXT](#fulltext) * [Spatial](#spatial) * [HeatWave vector](#heatwave-vector) * [Transactions and isolation](#transactions-and-isolation) * [Implicit DDL commit](#implicit-ddl-commit) * [Online DDL](#online-ddl) * [`AUTO_INCREMENT` vs UUID](#auto_increment-vs-uuid) * [Replication](#replication) * [Connection pool sizing](#connection-pool-sizing) * [Common errors and fixes](#common-errors-and-fixes) * [Cross-links](#cross-links) --- ## Driver matrix Three wrappers ship for the MySQL adapter; all three implement the same `MysqlDriver` port (`src/adapters/mysql/driver.ts`). The executor talks to them through mysql2's tuple shape — `[rows, fields]` for SELECT, `[result, fields]` for DML where `result` carries `affectedRows` and `insertId`. Non-mysql2 wrappers normalise to that shape internally; outside `driver.ts`, no executor code branches on which client is underneath. | Driver | Best for | Install | |---|---|---| | `mysql2Driver` | Long-lived Node, default pool, MySQL or MariaDB over a real socket | `npm i mysql2` | | `mariadbDriver` | MariaDB Connector/Node — measurable decoder edge on rowsets >10k | `npm i mariadb` | | `planetscaleDriver` | PlanetScale over fetch — Workers, Edge, Vercel Edge Runtime | `npm i @planetscale/database` | The URL prefixes routed to the MySQL adapter are `mysql://` and `mariadb://`. Other Vitess-compatible engines (TiDB, Vitess in the wild) all speak the MySQL wire protocol — point `mysql2Driver` at them and the same SQL emit works. PlanetScale is the exception only because the transport is HTTP, not TCP, so connection pooling and transaction semantics differ. ```ts // (a) URL only — forge picks mysql2 by default. const db = await createDb({ url: process.env.MYSQL_URL!, schema }); // (b) MariaDB over a custom pool — own the lifecycle for TLS callbacks // or socket-path connections. import mariadb from 'mariadb'; const pool = mariadb.createPool({ host: 'db.internal', user: 'app', password: process.env.DB_PASS!, connectionLimit: 50, bigIntAsNumber: true, insertIdAsNumber: true, }); const db = await createDb({ schema, driver: mariadbDriver(pool) }); // (c) PlanetScale on Vercel Edge — HTTP transport, stateless conn. import { connect } from '@planetscale/database'; const conn = connect({ url: process.env.DATABASE_URL }); const db = await createDb({ schema, driver: planetscaleDriver(conn) }); ``` ### Connection-string parity `mysql://user:pass@host:3306/dbname?ssl=true&timezone=Z` is the shared shape. mysql2 and mariadb both honour `ssl`, `timezone`, `connectionLimit`, `dateStrings`, `decimalNumbers`, and (mariadb-only) `bigIntAsNumber` query params. The forge factory passes the URL straight through to `createPool({ uri: url })` on mysql2; for mariadb you build the pool yourself. ### Capability differences | Capability | `mysql2Driver` | `mariadbDriver` | `planetscaleDriver` | |---|---|---|---| | Promise pool | Yes | Yes | n/a — single conn | | Server-side stream | Yes (real cursor) | No — buffers | No — HTTP buffers | | `insertId` as JS number | Native | `bigIntAsNumber` opt-in | Coerced via `Number()` | | Multi-statement | Opt-in | Off by default | Not supported | | Transactions | `START TRANSACTION` | `beginTransaction()` | `conn.transaction(fn)` | | Distributed XA | Yes (rarely safe) | Yes | No | mysql2 is the only wrapper that runs a real server-side cursor for `findManyStream` — the others fall back to OFFSET/LIMIT chunking. --- ## Engine matrix — MySQL 5.7 vs 8.x vs MariaDB 10.6+ Three engine families share the wire protocol, but the SQL surface they accept diverges. forge picks the most portable form available across all three; features that only exist on one are gated behind a schema flag. | Feature | MySQL 5.7 | MySQL 8.x | MariaDB 10.6+ | HeatWave | |---|---|---|---|---| | `WITH` / CTE | No | Yes | Yes (10.2+) | Yes | | Recursive CTE | No | Yes | Yes | Yes | | Window functions | No | Yes | Yes (10.2+) | Yes | | `JSON` column type | Yes | Yes | Yes (as alias for LONGTEXT) | Yes | | `JSON_TABLE` | No | Yes | No | Yes | | `JSON_VALUE` | No | Yes | Yes (10.6+) | Yes | | Expression / functional indexes | No | Yes | Yes (10.5+) | Yes | | `INVISIBLE` indexes | No | Yes | Yes (10.6+, called `IGNORED`) | Yes | | Check constraints (enforced) | Parsed, ignored | Yes | Yes | Yes | | Native partial indexes | No | No | No | No | | `CHECK` on `IN(...)` (enum) | Parsed, ignored | Yes | Yes | Yes | | FULLTEXT on InnoDB | Yes | Yes | Yes | Yes | | `ngram` parser | Yes | Yes | No — use `mecab` | Yes | | `SPATIAL` indexes on InnoDB | Yes (8.0.4 backport) | Yes | Yes | Yes | | `SRID` on spatial columns | No | Yes | Yes | Yes | | `ST_Distance_Sphere` | Yes | Yes | Yes (10.2+) | Yes | | `VECTOR(N)` column | No | 9.0+ | No | Yes (HNSW/IVF) | | `STRING_TO_VECTOR` | No | 9.0+ | No | Yes | | `GENERATED ALWAYS AS` | Yes | Yes | Yes | Yes | | `INSTANT` algorithm for `ADD COLUMN` | 5.7 limited | 8.0.12+ wide | 10.3+ | 8.0.12+ | | `ON DUPLICATE KEY UPDATE` | Yes | Yes | Yes | Yes | | `INSERT … RETURNING` | No | No | Yes (10.5+) | No | | `ANALYZE TABLE` durable stats | Yes | Yes (persistent) | Yes (persistent) | Yes | The forge implications: * **No CTE on 5.7** — the IR avoids `WITH` for read paths. Cursor pagination is always emitted as an outer SELECT with a tuple predicate, not as a `WITH … SELECT FROM …` reshape. If you write `$queryRaw` against MySQL 5.7 you have to do the same. * **`JSON_TABLE` is 8.x only** — the JSON-path operator emits `JSON_EXTRACT` + `JSON_UNQUOTE` (covered below). It works on every supported version. Only reach for `JSON_TABLE` in raw SQL on 8.x. * **Expression indexes are 8.0+ on MySQL, 10.5+ on MariaDB** — `index({ expression: 'LOWER(name)' })` is the schema form (see [INDEXES](./INDEXES.md#expression-indexes)). On 5.7 the push fails with `ER_PARSE_ERROR`; the doctor probe surfaces this during pre-flight. * **Check constraints are advisory on 5.7** — `forge push` still emits the `ADD CONSTRAINT … CHECK` statement (it parses) but the server does not enforce it. Enum fields fall back to app-side validation on the inbound coerce path. * **`INSERT … RETURNING` on MariaDB 10.5+** is not yet used by the adapter — every `insert` / `update` round-trips a follow-up SELECT when the caller needs the inserted row, on every MySQL flavour including MariaDB. The doctor note mentions it; the adapter's follow-up SELECT path is the only one in the executor. ### Detecting the version at runtime ```ts const report = await db.$doctor(); console.log(report.notes); // → "Driver: mysql2 (3.10.0). Server: MySQL 8.4.0 (utf8mb4_0900_ai_ci)." ``` The doctor probe queries `@@version`, `@@version_comment`, and the default collation. The collation matters — see below. --- ## Type emit The MySQL dialect's column-type mapping lives in `src/adapters/mysql/dialect.ts` (`columnType` on `MysqlDialect`). The defaults err on the safe side for InnoDB row-format limits. | Schema field | Emit | |---|---| | `f.id()` (default) | `VARCHAR(64) NOT NULL` (app-side gen) | | `f.id({ idType: 'bigserial' })` | `BIGINT NOT NULL AUTO_INCREMENT` | | `f.id({ idType: 'uuid' })` | `CHAR(36) NOT NULL DEFAULT (UUID())` | | `f.objectId()` | `VARCHAR(64)` | | `f.string()` | `VARCHAR(255)` | | `f.text()` | `TEXT` | | `f.int()` | `INT` | | `f.float()` | `DOUBLE PRECISION` | | `f.decimal(precision, scale)` | `DECIMAL(p,s)` | | `f.uuid()` | `CHAR(36)` | | `f.bigint()` | `BIGINT` | | `f.bool()` | `TINYINT(1)` (`0`/`1`, never `TRUE`/`FALSE`) | | `f.dateTime()` | `DATETIME(3)` — millisecond precision | | `f.json()` | `JSON` | | `f.enum(['A','B'])` | `VARCHAR(64)` + `CHECK (col IN ('A','B'))` | | `f.embed(M)` | `JSON` | | `f.embedMany(M)` | `JSON DEFAULT (JSON_ARRAY())` | | `f.stringArray()` / `f.intArray()` | `JSON` | | `f.geoPoint()` | `POINT NOT NULL SRID 4326` | | `f.vector(N)` | `VECTOR(N)` (MySQL 9+ / HeatWave) | ### Why VARCHAR(255), not 191 or 256 The default `f.string()` width is 255 because that is the maximum that fits a single-column unique index under utf8mb4 on InnoDB without running into the historical 767-byte index prefix limit. On MySQL 5.7 that limit only relaxes with `innodb_large_prefix=ON` and `row_format=DYNAMIC` (both defaults on 8.x and on 5.7.7+). 255 chars × 3 bytes = 765 — under the limit by two bytes; that's the historical reason the cap is 255 and not 256. If you need longer keys, override the column with a `.raw('VARCHAR(1024)')` emit annotation — or move the value to `TEXT` and accept that you cannot declare a plain `UNIQUE` on it without a key-length prefix (`UNIQUE (col(255))`). ### TEXT, MEDIUMTEXT, LONGTEXT The default `f.text()` emits `TEXT` (64 KB). Override the emit when you need the bigger forms: | Storage class | Bytes | Schema | |---|---|---| | `TINYTEXT` | 255 | rare; use `VARCHAR` instead | | `TEXT` | 64 KB | `f.text()` | | `MEDIUMTEXT` | 16 MB | `f.text({ raw: 'MEDIUMTEXT' })` | | `LONGTEXT` | 4 GB | `f.text({ raw: 'LONGTEXT' })` | Picking the right size matters for `ROW_FORMAT=DYNAMIC` overflow behaviour: TEXT / BLOB columns get their first 768 bytes stored inline and the rest on overflow pages. Reads of large columns incur extra page fetches. If 95% of your rows fit in 64 KB and the tail spills to multi-MB, model the long tail as a separate table keyed by `id` and join only when needed. ### `DATETIME(3)` and the timezone trap forge emits `DATETIME(3)` — millisecond precision. It does **not** emit `TIMESTAMP`. Three reasons: 1. `TIMESTAMP` is 4 bytes and overflows in 2038 (Y2K38). 2. `TIMESTAMP` is implicitly converted to / from the session timezone — meaning the value you store depends on `@@session.time_zone`. 3. `DATETIME` is a naive wall-clock time stored in UTC by convention. Always set the session timezone to `'+00:00'` or `'UTC'` on the connection options: ```ts mysql.createPool({ uri: url, timezone: 'Z', dateStrings: false }); ``` mysql2 with `timezone: 'Z'` and `dateStrings: false` returns `Date` objects assumed-UTC. mariadb behaves the same with `timezone: 'Z'`. PlanetScale always returns UTC strings; the executor wraps them in `new Date(str)` on the outbound decode path. ### `DECIMAL` returns strings on mysql2 mysql2's default preserves decimal precision by returning strings instead of `Number`. `f.decimal()` is typed as `string` to match. If you want JS numbers, pass `decimalNumbers: true` on the pool — the executor doesn't double-cast, so the wire shape is what you get. ```ts mysql.createPool({ uri: url, decimalNumbers: true }); ``` ### `BIGINT` and JS-number safety mysql2 returns `BIGINT` as JS `number` by default — values above `Number.MAX_SAFE_INTEGER` (2^53 - 1) silently lose precision. Either declare `f.bigint()` and accept the string return shape, or pass `supportBigNumbers: true, bigNumberStrings: true` on the pool. mariadb returns `BigInt` natively unless you opt out via `bigIntAsNumber: true, insertIdAsNumber: true` — match mysql2's shape with that combo, or accept the `BigInt` everywhere. ### Generated columns `field.dbGenerated` compiles to `GENERATED ALWAYS AS () STORED`. `STORED` is mandatory because `VIRTUAL` columns cannot be indexed on InnoDB until 8.0.13 and even then only with extra ceremony. The MySQL adapter emits STORED to keep the index path universally available — the storage cost of a stored column is a real column write per row, but the indexable surface is worth it for most schemas. Override with a `.raw()` annotation if you specifically want VIRTUAL. --- ## Charset and collation Every table forge creates emits `ENGINE=InnoDB DEFAULT CHARSET=utf8mb4` (see `buildCreateTable`). utf8mb4 is the only safe choice — `utf8` is the historical 3-byte truncation that drops most emoji and supplementary-plane characters. The collation is **not** named explicitly in the emit. That means it falls back to the server's default, which differs by version: | Server | Default collation | |---|---| | MySQL 5.7 | `utf8mb4_general_ci` | | MySQL 8.0+ | `utf8mb4_0900_ai_ci` | | MariaDB 10.6+ | `utf8mb4_general_ci` (still) | The two are not equivalent. `_0900_ai_ci` is accent-insensitive (matches `é` to `e`) and uses the Unicode 9 sort order; `_general_ci` is a flat byte comparison with ad-hoc folding rules. A query that matches "Élise" by name with `LIKE 'el%'` works on 8.x out of the box, and fails on 5.7 / MariaDB unless you call `LOWER()` first or upgrade the collation. ### The `mode: 'insensitive'` flag forge's text-operator flag (see [QUERIES](./QUERIES.md#mode-insensitive)) is a no-op on MySQL, deliberately. The reasoning: * On `_0900_ai_ci` (the 8.x default) every `LIKE` is already case-insensitive; flipping a flag would be redundant. * On `_bin` or `_cs` collations the only honest translation is `LOWER(col) LIKE LOWER(?)`, which defeats any index on the column. * The adapter picks the conservative path: the flag is dropped and the query runs with whatever the column's collation says. If you need predictable case-insensitive search regardless of collation, store a `name_lc` generated column and filter on that: ```ts const User = model('users', { id: f.id(), name: f.string(), name_lc: f.string().dbGenerated('LOWER(name)'), }); // CREATE INDEX … ON users (name_lc); await db.user.findMany({ where: { name_lc: { contains: 'él' } } }); ``` The generated column is stored, indexed, and never written by the app — the trigger fires on every INSERT/UPDATE. ### Switching collations mid-life Changing the column collation on a non-empty table re-sorts every index that touches the column. On a 50M-row table that's hours. Plan for it: ```sql ALTER TABLE posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci, ALGORITHM=COPY, LOCK=SHARED; ``` The copy is unavoidable — the index keys themselves change. Run it behind `pt-online-schema-change` or `gh-ost` on production; see [Online DDL](#online-ddl). --- ## JSON columns `f.json()` emits a native `JSON` column on every supported MySQL flavour. MariaDB stores `JSON` as `LONGTEXT` with the `CHECK (JSON_VALID(col))` constraint applied internally — same surface, slightly different storage. ### Read path — `JSON_EXTRACT` + `JSON_UNQUOTE` The IR emits `JSON_UNQUOTE(JSON_EXTRACT(col, '$.path'))`. The `JSON_UNQUOTE` is required: `JSON_EXTRACT` returns JSON-encoded values, meaning a string is wrapped in literal quotes (`"foo"` not `foo`). Comparing that against a plain `?` placeholder always returns false unless you unwrap it first. ```ts // User code: await db.user.findMany({ where: { meta: { path: 'profile.age', gte: 18 } }, }); ``` ```sql -- Emit: SELECT * FROM `users` WHERE CAST(JSON_UNQUOTE(JSON_EXTRACT(`meta`, '$.profile.age')) AS UNSIGNED) >= ? ``` The `CAST` is added by the IR when the path operator is a number comparator (`lt`, `lte`, `gt`, `gte`). String comparisons skip the cast. ### Indexing JSON via generated columns Native expression indexes on JSON paths work on 8.0+, but the ergonomic pattern is the generated-column trick — promote the JSON path to a real column, index that column: ```ts const User = model('users', { id: f.id(), meta: f.json(), // age is auto-extracted and indexed for cheap range scans age: f.int().dbGenerated("CAST(JSON_UNQUOTE(JSON_EXTRACT(meta, '$.profile.age')) AS UNSIGNED)"), }, { indexes: [{ keys: { age: 1 }, name: 'idx_users_age' }], }); ``` The generated column is updated on every write — no app code needs to know it exists. Filtering on `age` uses the index; filtering on `meta.profile.age` via the path operator still re-extracts. ### `JSON_TABLE` (8.x only) For un-nesting an embedded array into rows, `JSON_TABLE` is the right tool. It is 8.x only — MariaDB does not have it. Drop to `$queryRaw` for that: ```ts const sql = forge.sql` SELECT u.id, jt.tag FROM users u, JSON_TABLE(u.meta, '$.tags[*]' COLUMNS (tag VARCHAR(64) PATH '$')) AS jt WHERE u.id = ${id} `; const rows = await db.$queryRaw(sql); ``` See [RAW-SQL](./RAW-SQL.md#interpolation-rules) for the safe interpolation rules. ### `null` in a JSON column `null` on a JSON column is ambiguous — column NULL vs JSON literal `null`. forge exposes `ForgeDbNull`, `ForgeJsonNull`, and `ForgeAnyNull` markers (see [QUERIES](./QUERIES.md#null-markers)). On MySQL: | Marker | Emit | |---|---| | `ForgeDbNull` | `` `col` IS NULL `` | | `ForgeJsonNull` | `` JSON_TYPE(`col`) = 'NULL' `` | | `ForgeAnyNull` | `` (`col` IS NULL OR JSON_TYPE(`col`) = 'NULL') `` | `JSON_TYPE` is the canonical MySQL discriminator. PostgreSQL uses `(col::jsonb = 'null'::jsonb)`; same idea, different emit. --- ## FULLTEXT `f.text().searchable()` (or `f.string().searchable()`) auto-emits a FULLTEXT index per searchable column: ```sql ALTER TABLE `posts` ADD FULLTEXT `forge_posts_fts_title` (`title`); ``` The runtime operator is `MATCH() AGAINST()` in `NATURAL LANGUAGE MODE`: ```sql WHERE MATCH(`body`) AGAINST (? IN NATURAL LANGUAGE MODE) ``` ### MyISAM vs InnoDB FULLTEXT Historical lore says FULLTEXT lives on MyISAM. That stopped being true with MySQL 5.6. Both engines now support it. Forge always uses InnoDB (the only engine emitted by `buildCreateTable`); the FULLTEXT index lives on the same table. The difference for InnoDB FULLTEXT: * Inverted-index updates are batched in the `FTS_DOC_ID_INDEX` auxiliary table; `OPTIMIZE TABLE` flushes them. Heavy write loads build up unflushed entries until the next optimise. * The minimum word length defaults to 3 chars (`innodb_ft_min_token_size`); set to 1 for CJK or partial-match needs. Requires a server restart. * Stopwords live in `INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD` (read-only) or a user table named via `innodb_ft_server_stopword_table='db/tbl'`. ### ngram parser for CJK Default tokenisation splits on whitespace and punctuation — terrible for Chinese, Japanese, Korean. The `ngram` parser breaks the text into n-char windows instead: ```ts const Post = model('posts', { id: f.id(), body: f.text().searchable(), }, { indexes: [{ keys: { body: 'text' }, method: 'fulltext', parser: 'ngram', name: 'forge_posts_fts_body_ngram', }], }); ``` The dialect renders `WITH PARSER ngram` after the column list. Window size is server-level (`ngram_token_size`, default 2). On MariaDB the `mecab` parser is the equivalent for Japanese morphological tokenisation — same `parser:` field. ### BOOLEAN mode and the syntax leak `NATURAL LANGUAGE MODE` is the default. Switching to `BOOLEAN MODE` unlocks `+required -excluded "exact phrase"` syntax — but the adapter does not expose a flag. Drop to `$queryRaw`: ```ts const sql = forge.sql` SELECT * FROM posts WHERE MATCH(body) AGAINST (${query} IN BOOLEAN MODE) ORDER BY MATCH(body) AGAINST (${query} IN BOOLEAN MODE) DESC LIMIT 20 `; ``` The reason it's a `$queryRaw` is that the same operator surface needs to mean the same thing across SQLite FTS5 and Postgres tsquery — BOOLEAN-mode syntax is MySQL-specific. The cross-portable surface is `search: 'apple banana'` (which on MySQL becomes natural language mode, on PG becomes `plainto_tsquery`). See [FTS](./FTS.md) for the deep dive on engines, scoring, and hybrid BM25+vector. ### Rebuilding FULLTEXT Mass deletes leave tombstones in the FULLTEXT cache. Schedule `OPTIMIZE TABLE posts;` weekly on hot search columns. It rebuilds the auxiliary tables and reclaims space. --- ## Spatial `f.geoPoint()` emits a typed `POINT` column with an explicit SRID: ```sql `location` POINT NOT NULL SRID 4326 ``` The `SRID 4326` clause is **mandatory on MySQL 8.x for spatial indexes**. Without it, the index is built against a Cartesian plane and great-circle queries return wrong distances. forge always emits the SRID — default 4326 (WGS84). Override per-field: ```ts const Pickup = model('pickups', { id: f.id(), // Web Mercator — for tile-aligned visualisations location: f.geoPoint({ srid: 3857 }), }); ``` ### The lat-first axis-order quirk MySQL SRID 4326 stores points as `POINT(lat lng)` — latitude first. PostGIS stores them as `ST_MakePoint(lng, lat)` — longitude first. The adapter handles this in `valueExpr`: a `{ lng, lat }` input is written as `POINT(lat lng)` for MySQL. If you go around the adapter with raw SQL, mirror it — getting the order wrong puts every point in the wrong hemisphere. ### `ST_*` function reference The IR emits these functions for the geo operators: | Operator | Emit | |---|---| | `near: { lng, lat, withinMeters }` | `ST_Distance_Sphere(col, ST_GeomFromText(?, 4326)) < ?` | | `orderBy nearTo` | Same `ST_Distance_Sphere` expression in the ORDER BY | | `withinPolygon: poly` | `ST_Within(col, ST_GeomFromText(?, 4326))` | | `geoDistanceExpr` (select-side projection) | `ST_Distance_Sphere(col, ref)` | `ST_Distance_Sphere` returns metres on SRID 4326. The function is faster than `ST_Distance` (which considers full ellipsoidal geodesy) and accurate to within ~0.1% for typical app distances — good enough for "stores within 5 km". ### `SPATIAL` indexes Spatial indexes are R-tree-backed on InnoDB 5.7.5+. They are required for `ST_Within` and friends to use index lookup rather than a full table scan. forge declares them via the schema's `indexes` array with `method: 'spatial'`: ```ts const Store = model('stores', { id: f.id(), location: f.geoPoint(), }, { indexes: [{ keys: { location: 1 }, method: 'spatial', name: 'idx_stores_loc' }], }); ``` That emits `CREATE SPATIAL INDEX idx_stores_loc ON stores(location)`. SPATIAL is a statement-prefix keyword, not a `USING SPATIAL` clause — the dialect handles the distinction. See [GEO](./GEO.md) for the per-dialect catalog. ### Non-WGS84 SRIDs The `f.geoPoint({ srid: N })` syntax emits the SRID at DDL time so the index aligns with the projection. forge does **not** call `ST_Transform` between SRIDs at query time — the input point and the column must share the SRID, or the query throws `ER_GIS_DIFFERENT_SRIDS`. Plan one SRID per column. ### 3D points `f.geoPoint({ dims: 3 })` is a forge convenience for "include the `alt` scalar in TypeScript". MySQL 8.x does not natively support `POINT Z` — only the 2D ground point is indexed and stored. The adapter drops `alt` at the storage layer; the JSON-shape round-trip happens via a sibling scalar column. See [GEO](./GEO.md#3d-points-and-altitude) for the round-trip pattern. --- ## HeatWave vector MySQL 9.0 introduced a native `VECTOR(N)` type and `STRING_TO_VECTOR` function for parsing JSON-array literals. forge emits both: ```ts const Doc = model('docs', { id: f.id(), embedding: f.vector(1536, { metric: 'cosine' }), }); ``` ```sql `embedding` VECTOR(1536) -- Insert: INSERT INTO `docs` (`embedding`) VALUES (STRING_TO_VECTOR(?)) -- Query (orderBy nearTo): ORDER BY DISTANCE(`embedding`, STRING_TO_VECTOR(?), 'COSINE') ASC ``` The `DISTANCE(col, vec, metric)` function returns the chosen distance — `'COSINE'`, `'DOT'`, or `'EUCLIDEAN'`. forge's `metric` field on `f.vector()` is uppercased and passed through. ### Community vs HeatWave * **Community 9.x** — brute-force scan. No HNSW, no IVF. Acceptable up to ~100k rows; above that the unindexed scan dominates query latency. * **HeatWave Vector Store** — the index forms (`HNSW`, `IVF_FLAT`) are HeatWave-only. Sub-millisecond at 10M+ vectors. The schema surface to declare the index is `method: 'vector'` with an `index_kind` option — same `indexes` array, just on the HeatWave-aware version. forge gates the vector path behind a runtime probe: `db.$doctor()` checks `SHOW VARIABLES LIKE 'innodb_vector_search'` and reports whether the feature is on. On unsupported versions the field's DDL fails with `ER_PARSE_ERROR` at push time. See [VECTOR](./VECTOR.md) for the per-dialect distance reference (`<=>` on pgvector, `vec_distance_cosine` on sqlite-vec, the same `DISTANCE` on MySQL/HeatWave). --- ## Transactions and isolation InnoDB is the only engine forge emits. Transactions are MVCC with record + gap locks; isolation defaults to `REPEATABLE READ`. ### `$transaction` shape ```ts await db.$transaction(async (tx) => { await tx.user.create({ data: { email } }); await tx.org.update({ where: { id: orgId }, data: { count_inc: 1 } }); }); ``` The adapter opens a real `START TRANSACTION` on a pinned connection and threads the connection through `ExecOpts.session`. On commit the connection is released back to the pool; on throw it's rolled back first. The wrapper code lives in `mysql2Driver.transaction` (and mirrored shapes in `mariadbDriver` and `planetscaleDriver`). ### Isolation levels Override per transaction via the session settings: ```ts await db.$transaction(async (tx) => { await tx.$queryRaw(forge.sql`SET TRANSACTION ISOLATION LEVEL READ COMMITTED`); // … reads inside this txn now see committed snapshots only }, { isolation: 'READ COMMITTED' }); // pseudo — set via $queryRaw on MySQL ``` The four MySQL levels and what they cost you: | Level | Reads | Phantom risk | Gap locks | |---|---|---|---| | `READ UNCOMMITTED` | Dirty reads allowed | Yes | Yes | | `READ COMMITTED` | No dirty reads | Yes — re-runs see new rows | **No** — only record locks | | `REPEATABLE READ` (default) | Snapshot-stable | No — gap locks block inserts | Yes | | `SERIALIZABLE` | As REPEATABLE READ + range S-locks | No | Yes, more aggressive | The big practical difference is `READ COMMITTED` vs the default: under `READ COMMITTED` InnoDB does not take gap locks, which kills most deadlock cascades on hot write paths. The tradeoff is that `SELECT … FOR UPDATE` inside a transaction can race with concurrent inserts. Pick `READ COMMITTED` for high-concurrency OLTP, leave the default for batch jobs that need a stable snapshot. ### Gap locks and the phantom row problem The `REPEATABLE READ` default is unusual — most databases use `READ COMMITTED`. The MySQL choice means a transaction that does `SELECT … FOR UPDATE WHERE order_id BETWEEN 100 AND 200` locks not just the rows in that range but also the *gaps* between them. Concurrent inserts of `order_id = 150` block until the transaction commits. This is how InnoDB prevents phantom reads under `REPEATABLE READ` without doing serial scheduling. It also means a hot range filter under load becomes a deadlock factory. The fix is one of three: 1. Drop to `READ COMMITTED` for transactions that don't need snapshot stability. 2. Switch the index access pattern so the predicate hits a unique key (gap locks degenerate to record locks on uniques). 3. Use `SELECT … LOCK IN SHARE MODE` only when really needed, and outside transactions where possible. ### Deadlock retry `ER_LOCK_DEADLOCK` (1213) is mapped to `P2034 — Transaction deadlock — please retry`. The adapter does not auto-retry; that's the caller's job because retrying the same transaction has side-effect implications the adapter cannot reason about. The pattern: ```ts async function withRetry(fn: () => Promise, max = 3): Promise { for (let i = 0; i < max; i++) { try { return await fn(); } catch (e: any) { if (e?.code !== 'P2034' || i === max - 1) throw e; await new Promise((r) => setTimeout(r, 50 * 2 ** i)); } } throw new Error('unreachable'); } await withRetry(() => db.$transaction(async (tx) => { /* … */ })); ``` Exponential backoff with three attempts is enough for the common case. Beyond that the contention is structural — find the hot write and reshape it. `ER_LOCK_WAIT_TIMEOUT` (1205) is a different beast — it means a single lock wait exceeded `innodb_lock_wait_timeout` (default 50s). Same `P2034` code; same retry approach, but if you see it often the transaction is just too long. Move read-only work outside the txn. ### Transactions and DDL — read the next section DDL implicitly commits. This is critical and rarely what callers expect: --- ## Implicit DDL commit Every DDL statement on MySQL implicitly commits the current transaction. There are no savepoints around DDL the way there are on PostgreSQL. Concretely: ```ts await db.$transaction(async (tx) => { await tx.user.create({ data: { email } }); await tx.$queryRaw(forge.sql`CREATE TABLE temp_stage (id INT)`); // The user.create just got committed. The transaction is now // over. A subsequent throw will not roll back the create. await tx.user.update({ /* … */ }); }); ``` This affects the migration runner (`src/adapters/mysql/migrate.ts`). Unlike Postgres, forge cannot wrap the entire `forge push` batch in a single transaction — DDL statements implicitly close any open txn. Instead, the runner: 1. Acquires a session-level lock via `SELECT GET_LOCK('forge_migrate', 60)`. This serialises concurrent pushes (CI + local + the human running it ad-hoc). 2. Reads `INFORMATION_SCHEMA.TABLES`, `TABLE_CONSTRAINTS`, `STATISTICS` once to know what already exists. 3. Runs each statement in order, outside a transaction. A failed statement leaves prior successes applied — the same semantic as `prisma db push`. The `ApplyReport` carries `{ applied, skipped, failures }` so the caller knows where the break happened. 4. Releases the lock via `SELECT RELEASE_LOCK(?)`. This means recovering from a half-applied push requires manual inspection. The doctor probe's `diff` command tells you what's still missing, and the migration runner is idempotent — re-running it picks up where the failure left off. DDL inside `$transaction` is **never safe** on MySQL. If you need atomic schema changes, take application downtime and run them via `forge push` outside any application transaction. See [MIGRATIONS](./MIGRATIONS.md) for the safe-DDL playbook. --- ## Online DDL MySQL's online DDL framework lets some `ALTER TABLE` operations run without blocking concurrent reads or writes. The framework picks between `ALGORITHM=INSTANT`, `INPLACE`, and `COPY`, and between `LOCK=NONE`, `SHARED`, and `EXCLUSIVE`. forge does not pin these algorithms — every `ALTER` falls back to whatever MySQL picks by default. If you need predictability, run the ALTER yourself with explicit clauses and let forge's introspection pick up the change on the next push. ### Algorithm matrix | Operation | INSTANT? | INPLACE? | LOCK=NONE? | |---|---|---|---| | `ADD COLUMN` at end with default (8.0.12+) | Yes | Yes | Yes | | `ADD COLUMN` in the middle | No | Yes | Yes | | `DROP COLUMN` | 8.0.29+ | Yes | Yes | | `ADD INDEX` (non-FULLTEXT) | No | Yes | Yes | | `ADD FULLTEXT INDEX` | No | No (rebuilds table) | No | | `ADD FOREIGN KEY` | No | Yes | Yes | | `MODIFY COLUMN` (widen) | 8.0.29+ some | Yes | Yes | | `CONVERT TO CHARACTER SET` | No | No | No | Routine schema evolution (`ADD COLUMN`, `ADD INDEX`) is online on 8.0.12+. Anything that touches FULLTEXT, charset, or row format is offline and copies the whole table. ### `pt-online-schema-change` for the offline cases On older versions or for offline operations, the standard tool is `pt-online-schema-change` from Percona Toolkit. It: 1. Creates a shadow table with the new schema. 2. Installs triggers that propagate writes from the old table to the shadow. 3. Backfills the old table's rows into the shadow in chunks. 4. `RENAME TABLE` atomically swaps the two. ```sh pt-online-schema-change \ --alter "ADD COLUMN dropoff_window TINYINT(1) NOT NULL DEFAULT 0" \ --execute D=app,t=orders,h=db.internal ``` The forge schema must be updated to match the post-rename state, or the next `forge push` will try to re-add the column. The introspect path reads from `INFORMATION_SCHEMA`, so once the rename completes the next `push` is a no-op. `gh-ost` (from GitHub) is the same idea without triggers — it replays from the binlog instead. Pick `gh-ost` when triggers are forbidden by the app architecture (e.g., binlog replication that filters trigger output), otherwise `pt-osc` is simpler. --- ## `AUTO_INCREMENT` vs UUID Three id strategies. The default is app-side string UUID v4. ```ts const User = model('users', { id: f.id() }); // → VARCHAR(64) NOT NULL, value generated app-side by `cuid()` / `nanoid()` / similar ``` The override matrix: ```ts // bigserial — BIGINT NOT NULL AUTO_INCREMENT. lastInsertId comes back via insertId. const Audit = model('audit', { id: f.id({ idType: 'bigserial' }) }); // uuid — CHAR(36) NOT NULL DEFAULT (UUID()). DB-side gen via the UUID() function. const Doc = model('docs', { id: f.id({ idType: 'uuid' }) }); ``` ### When to pick each | Strategy | Picks for | Avoid when | |---|---|---| | `auto` (app-side string) | Distributed inserts, ID known at client, cross-DB joins by string key | High write rate to a single hot table — strings are 2× the index size of bigints | | `bigserial` | Heavy single-master writes, audit logs, analytics fact tables | Sharded writes, offline-first clients that need a local id pre-sync | | `uuid` (CHAR(36)) | Cross-shard inserts, predictable string format | Hot append-only tables — random UUIDs are terrible for B-tree locality | ### The UUID-index locality problem `UUID()` returns random v1-like UUIDs that scatter writes across the entire B-tree. On a 100M-row table this turns every insert into a random page write — your buffer pool churns and write throughput collapses. The fix is `UUID_TO_BIN(UUID(), 1)` — the `1` flag rearranges the timestamp components so consecutive UUIDs land on adjacent pages. forge does not emit this by default because not every workload benefits; if you need it, override: ```ts const Order = model('orders', { // BINARY(16) instead of CHAR(36) — half the size, ordered inserts. id: f.id({ idType: 'uuid' }).raw('BINARY(16) NOT NULL DEFAULT (UUID_TO_BIN(UUID(), 1))'), }); ``` The tradeoff is that the column is now binary — every read needs `BIN_TO_UUID(id, 1)` to render as a string. Hide that behind a view or accept that the app sees `Buffer`. ### `bigserial` and the auto-increment cache InnoDB allocates `AUTO_INCREMENT` values in batches controlled by `innodb_autoinc_lock_mode`: | Mode | Behaviour | |---|---| | 0 (legacy) | Table-level lock per insert. Serial; safe; slow. | | 1 (consecutive, default on 5.7) | Bulk inserts pre-allocate a contiguous range; gaps possible on rollback. | | 2 (interleaved, default on 8.x) | Per-row allocation, fully parallel; gaps and non-monotonic order. | Mode 2 is fast and the new default but breaks the assumption that "higher id == later in time". If your code reads sort order from the id (audit logs, cursor pagination on id), set `innodb_autoinc_lock_mode=1` or add a real `created_at`. --- ## Replication MySQL has two replication formats: STATEMENT and ROW. The choice ripples into how the forge schema is interpreted on replicas. | Format | Replicated payload | Determinism risk | |---|---|---| | STATEMENT (`SBR`) | The SQL statement | `NOW()`, `RAND()`, non-deterministic UDFs drift | | ROW (`RBR`) | The actual changed rows | None — replicas apply byte-for-byte | | MIXED | STATEMENT, ROW when needed | Less drift than pure SBR, smaller binlog than pure RBR | Forge's emits are deterministic enough for either format, but two gotchas: * **`UUID()` defaults** — under SBR, the replica re-evaluates `UUID()` and gets a different value. If you rely on `id = UUID()` on the master, switch the binlog format to `ROW` or pre-compute the UUID app-side. * **`CURRENT_TIMESTAMP(3)` defaults** — `now()` differs by microseconds between master and replica under SBR, which can break `WHERE created_at = ?` lookups on replicas. Same fix — use `ROW` format, or always pass the timestamp from the app. ### Read replicas The forge driver does not natively split reads to replicas. The pattern is two adapters: ```ts const writeDb = await createDb({ url: process.env.MYSQL_PRIMARY!, schema }); const readDb = await createDb({ url: process.env.MYSQL_REPLICA!, schema }); await writeDb.user.create({ data }); // writes go to primary await readDb.user.findMany({ where: { … } }); // reads go to replica ``` Replication lag is the perennial bug — a read right after a write might miss the write. The mitigation patterns: 1. **Sticky session** — route reads after a recent write to the primary for a short window (e.g., 1 second). 2. **`MASTER_POS_WAIT`** — block on the replica until it catches up to a known binlog position before reading. 3. **GTID sync** — wait for the master's GTID set to apply. forge does not implement any of these — they live above the adapter. The transactional read inside a `$transaction` always uses the primary because the session is pinned to one connection. --- ## Connection pool sizing The pool size question is: how many concurrent queries can the server handle, divided by how many app processes share it. | Side | Knob | Typical | |---|---|---| | Server | `max_connections` | 151 (default) — bump to 500–1000 for OLTP | | Server | `wait_timeout` | 28800s (8 hrs) — drop to 600s for idle reaping | | App | mysql2 `connectionLimit` | 50 (forge default) | | App | mariadb `connectionLimit` | 50 | | App | PlanetScale | n/a — HTTP, per-request | The forge default of 50 is calibrated for a single Node process serving moderate OLTP. If you run 4 app processes against a single MySQL server, that's 200 total connections — comfortable under the default 151 cap once you bump it. ### Two pool patterns A pattern that scales well: ```ts // Hot pool — short transactions, small queries const writePool = mysql.createPool({ uri: url, connectionLimit: 50 }); // Cold pool — long analytics queries const readPool = mysql.createPool({ uri: url, connectionLimit: 5 }); const writeDb = await createDb({ schema, driver: mysql2Driver(writePool.promise()) }); const readDb = await createDb({ schema, driver: mysql2Driver(readPool.promise()) }); // Long report queries never starve the OLTP path await readDb.$queryRaw(forge.sql`SELECT … with window funcs and joins`); ``` A long analytics query that holds a connection for 60s would otherwise eat into the 50-connection pool that serves user requests. Separating them keeps p99 stable. ### `wait_timeout` and stale connections Pools that hold idle connections beyond `wait_timeout` see them get silently killed by the server. The next query returns `EHOSTUNREACH` or `ECONNRESET`. mysql2 reconnects on the next query; mariadb does the same. PlanetScale is HTTP — no idle connections to lose. If you see "MySQL server has gone away" (errno 2006), the pool's idle timeout exceeds the server's `wait_timeout`. Drop the pool's `idleTimeout` below it: ```ts mysql.createPool({ uri: url, connectionLimit: 50, idleTimeout: 60_000 }); ``` The forge emitter surfaces server-gone errors as `P1001` and includes the errno in the detail — see below. ### Shutdown handlers ```ts process.on('SIGTERM', async () => { await db.$disconnect(); process.exit(0); }); ``` `$disconnect` calls `pool.end()` on the underlying client. Without it, in-flight queries finish but the pool sockets close on FIN, which can confuse load balancers expecting graceful drains. --- ## Common errors and fixes The full mapping lives in `src/adapters/mysql/errors.ts` and surfaces through `withMysqlErrors`. Selected entries: | errno | code | Symptom | Fix | |---|---|---|---| | 1062 | `P2002` | `ER_DUP_ENTRY` — unique constraint failed | Check the index name in the message; on upsert paths reach for `db.X.upsert({ where, create, update })` | | 1452 | `P2003` | `ER_NO_REFERENCED_ROW_2` — fk insert fails | Parent row missing. Order writes parent-first inside `$transaction` | | 1451 | `P2003` | `ER_ROW_IS_REFERENCED_2` — fk delete blocked | Declare `relation({ onDelete: 'Cascade' })` or delete children explicitly | | 1048 | `P2011` | `ER_BAD_NULL_ERROR` — null on NOT NULL | Field is `.optional()`? if not, app forgot to pass it | | 3819 | `P2004` | `ER_CHECK_CONSTRAINT_VIOLATED` | Enum value not in list. On 5.7 the check is ignored — validation happens app-side | | 1146 | `P2021` | `ER_NO_SUCH_TABLE` | Forgot `forge push` on the env, or you pointed at the wrong schema | | 1054 | `P2022` | `ER_BAD_FIELD_ERROR` | Pre-push code referencing a not-yet-applied column. Run `db.$diff()` | | 1213 | `P2034` | `ER_LOCK_DEADLOCK` | Retry the transaction with backoff (see Transactions) | | 1205 | `P2034` | `ER_LOCK_WAIT_TIMEOUT` | Long transactions on hot rows. Shorten the txn or drop to `READ COMMITTED` | | 1317 | `P2024` | Query interrupted (timeout) | The client cancelled — usually upstream HTTP timeout | | 2002 | `P1001` | Connection refused | Server down or wrong host/port | | 2003 | `P1001` | Can't connect to MySQL server | Firewall, security group, or TLS handshake fail | | 2006 | `P1001` | MySQL server has gone away | Pool idle > `wait_timeout`. Drop `idleTimeout` (see above) | | 1045 | `P1010` | `ER_ACCESS_DENIED_ERROR` | Wrong credentials. Try `mysql -u … -p` to confirm | ### "Specified key was too long; max key length is 3072 bytes" InnoDB's per-index key limit is 3072 bytes on `ROW_FORMAT=DYNAMIC` (default on 8.x), 767 on legacy `COMPACT`. A composite unique on three `VARCHAR(255)` utf8mb4 columns is 3 × 255 × 4 = 3060 bytes — under the limit but cutting it close. Either drop a prefix length (`UNIQUE (col1(100), col2(100), col3(100))`), hash the column (`SHA256(col)` as `BINARY(32)`), or confirm the row format is `dynamic` with `SELECT @@innodb_default_row_format`. ### "Lost connection to MySQL server during query" Hit `net_read_timeout` or `net_write_timeout` (default 60s + 60s). Bump them server-side or chunk via cursor pagination (see [QUERIES](./QUERIES.md#pagination--offset-vs-cursor)). ### "Too many connections" `max_connections` exceeded. Bump the server cap, lower the pool size, or front the server with `ProxySQL` / `mysql-router`. PlanetScale's HTTP transport sidesteps it entirely. ### "Incorrect string value: '\\xF0\\x9F…' for column" Column charset is `utf8`, not `utf8mb4`. forge emits utf8mb4 on every CREATE TABLE; the column predates the upgrade. Convert per- column with `ALTER TABLE t MODIFY col … CHARACTER SET utf8mb4` or per-table with `ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci`. Both rebuild the table — run them under `pt-online-schema-change` on production. --- ## Cross-links * [DRIVERS](./DRIVERS.md) — `MysqlDriver` port shape, transaction semantics across wrappers * [QUERIES](./QUERIES.md) — operator emit tables, `MATCH() AGAINST()` for `search`, the `mode: 'insensitive'` notes * [INDEXES](./INDEXES.md) — `method: 'fulltext'`, `method: 'spatial'`, expression indexes, `INVISIBLE`, partial-unique-via-CASE rewrite * [FTS](./FTS.md) — FULLTEXT engine deep dive, `ngram` / `mecab` parsers, BOOLEAN mode, hybrid BM25+vector * [GEO](./GEO.md) — `ST_*` reference, lat-first axis order, 3D point handling, SRID alignment, MultiPolygon emit * [VECTOR](./VECTOR.md) — `VECTOR(N)`, HeatWave HNSW, `DISTANCE(col, vec, metric)` vs pgvector `<=>` * [MIGRATIONS](./MIGRATIONS.md) — implicit-DDL-commit constraint, `forge_migrate` advisory lock, `pt-online-schema-change` * [TYPES](./TYPES.md) — `decimalNumbers`, `bigIntAsNumber`, `dateStrings` and their TypeScript shape effects * [TRANSACTIONS](./TRANSACTIONS.md) — `$transaction`, isolation overrides, deadlock retry, gap locks under `REPEATABLE READ` * [JSON-PATH](./JSON-PATH.md) — `JSON_UNQUOTE(JSON_EXTRACT(…))`, `JSON_TYPE` null disambiguation, generated-column indexes