# Graduation: the pg_dump exit is built into the design

Status: **LOCALLY VERIFIED** (2026-07-02). A standard `pg_dump` run against a
zeropg database over the real Postgres wire produced a plain-SQL dump that
restored byte-identically into a stock PostgreSQL 18.4 server. Measured
commands, timings, sizes, and a row-level checksum match are in the
[Measured evidence](#measured-evidence) section below.

zeropg is not a new database you get locked into. It **is** Postgres - PGlite is
the Postgres source tree compiled to WebAssembly, served over the byte-identical
Postgres wire protocol by pglite-socket. So the exit is the same exit every
Postgres has: `pg_dump` out, restore into whatever managed Postgres you like,
change one connection string. There is no proprietary format to escape, no
export tool to trust, no data model to translate. You graduate with the same
tool a DBA would use to move between two RDS instances.

This document is the runbook for that move: when to make it, the exact verified
commands, and the caveats worth knowing before you run them.

## When to graduate

You do not outgrow zeropg by storing too many rows. You outgrow it by being
**awake too much of the day**, or by needing a bigger machine than a
single-writer WASM engine can be. Three signals, in rough order of how often
they actually bind (see [BREAK-EVEN.md](../BREAK-EVEN.md) for the full cost
math):

1. **Sustained awake-hours.** zeropg's whole economic argument is scale-to-zero:
   you pay for compute only while serving traffic. Once real traffic keeps the
   instance awake more than roughly **4-5 hours/day** (GCP Cloud Run, ~500 MB
   DB) the bill approaches a small always-on managed Postgres, and past
   ~18-20 awake-hr/day it exceeds it. At that point a managed instance is both
   cheaper and simpler. (The exact crossover is platform-specific: on AWS
   Fargate and IBM Code Engine the cost crossover is so high it effectively
   never binds below ~1 GB, and you migrate for size or throughput instead. See
   the per-platform tables in BREAK-EVEN.md.)

2. **Data size and cold-start latency.** Restore from the bucket is
   bandwidth-bound and scales roughly linearly with database size: **~11s cold
   start at 500 MB** (measured, 20 forced cold starts on Cloud Run - see the
   README "See it live" table and [docs/COLDSTART-MODEL.md](COLDSTART-MODEL.md)),
   and about ~22s/GB beyond that. Somewhere around **0.5-4 GB** the cold-start
   latency and the container memory tier make a warm, always-on Postgres the
   better tradeoff regardless of cost.

3. **Sustained concurrent-write throughput.** zeropg is single-writer by design:
   one PGlite engine, writes serialized, durability shipped to the bucket. On
   GCS the manifest object name is rate-capped at ~1 write/second, which
   group-commit hides for bursty traffic but which becomes a hard ceiling under
   **sustained >1 commit/second**. If you need many concurrent writers hitting
   the same database around the clock, that is a real Postgres cluster's job.

If none of these bind, you do not need to graduate - the design is built to sit
at zero cost while idle indefinitely. When one does bind, the move below is
mechanical and low-risk.

## The runbook

The whole migration is: quiesce, dump, restore, repoint, archive. Steps 2 and 3
are plain `pg_dump` / `psql` - nothing zeropg-specific.

### 1. Quiesce and flush

Stop writers and make sure the last commit is durably in the bucket before you
dump. You have two equivalent paths:

- **Embedded / library use:** call `await db.close()`. That performs the
  sleep-mode flush (ships the final WAL delta) and releases the single-writer
  lease. (`packages/client` Client, `close()`; the server's shutdown path does
  the same via `flushWireWrites('shutdown')` then `db.close()` in
  `packages/server/src/server.ts`.)
- **The running demo / service:** the **put to sleep** control streams the flush
  and lease-release steps and exits the instance (README, "See it live"). After
  it returns, the bucket holds a crash-consistent snapshot plus manifest pointer.

You can dump from a live instance too - `pg_dump` runs in a single read
transaction and sees a consistent MVCC snapshot - but quiescing first means the
dump and the bucket archive agree exactly, and it avoids the multiplexed-session
caveat noted below.

### 2. `pg_dump` over the wire

Point `pg_dump` at the pglite-socket endpoint (the same `postgres://host:port`
URL your app uses). Plain-SQL format, no TLS, ownership and grants stripped:

```bash
pg_dump "postgresql://postgres@HOST:PORT/postgres?sslmode=disable" \
  --no-owner --no-acl \
  -f dump.sql
```

Flags and why:

- `sslmode=disable` - the pglite-socket wire is a localhost/loopback endpoint
  with no TLS layer; without this libpq tries to negotiate SSL and fails.
- `--no-owner --no-acl` - PGlite runs as a single role, so ownership and GRANT
  statements are noise that only produces "role does not exist" errors on a
  target with different roles. Strip them; re-grant on the target as needed.
- **Use a `pg_dump` whose major version is >= the server's.** PGlite reports its
  server version as the Postgres version it was built from (PostgreSQL 18.3 for
  PGlite 0.5.2). `pg_dump` refuses a server newer than itself
  (`aborting because of server version mismatch`), so a stock older libpq will
  not work - install client tools at or above the PGlite Postgres major (18+
  today). This is the single most likely snag.

The default (and recommended) format uses `COPY ... FROM stdin` for table data.
Reading that direction - `COPY TO stdout` out of zeropg - works fine and is the
fast path; use it when the restore target is a real Postgres. (There is one
exception, restoring back *into* another pglite-socket instance, covered in
[Caveats](#caveats).)

### 3. Restore into managed Postgres

Against RDS, Cloud SQL, Neon, a self-hosted cluster, anything:

```bash
psql "postgresql://USER:PASS@NEW-HOST:5432/DBNAME?sslmode=require" \
  -v ON_ERROR_STOP=1 \
  -f dump.sql
```

`ON_ERROR_STOP=1` makes the restore fail loudly on the first error instead of
plowing on. For a custom/directory-format dump use `pg_restore` instead of
`psql -f`; the plain-SQL dump above restores with `psql`. If your schema uses
extensions (pgcrypto, pg_trgm, and so on), the dump emits the matching
`CREATE EXTENSION` lines - the managed target must have those extensions
available (standard on RDS/Cloud SQL/Neon).

### 4. Repoint `DATABASE_URL`

This is the only application change. `@zeropg/client`'s `connect()` takes a URL
whose scheme selects the backend: `memory://`, `file://./dev.db`, or a real
`postgres://` / `postgresql://` host (`packages/client/src/index.ts` - "changes
from laptop to bucket to a graduated postgres:// host"). Graduating is literally
swapping the value of one environment variable:

```diff
- DATABASE_URL=postgres://127.0.0.1:5432/postgres   # zeropg over the wire
+ DATABASE_URL=postgresql://user:pass@your-managed-host:5432/dbname
```

`connect()` sees the `postgres://` host and routes through node-postgres to the
managed cluster. No code path change, no query rewrite - it was Postgres on both
sides. (This env-var switch is the migration story the SDK was designed around;
DESIGN.md.)

### 5. Keep or GC the bucket

The bucket is now a frozen, self-contained archive: an immutable snapshot plus
WAL segments plus a manifest pointer, crash-consistent as of your last flush.
Two choices:

- **Keep it** as a cold, cheap point-in-time archive of the pre-migration state.
  Object storage is far cheaper than a running database.
- **GC it** with `scripts/gc.ts` (deletes unreferenced objects) once you are
  confident the managed database is authoritative.

Nothing in the bucket is load-bearing after you repoint - it is just data at
rest.

## Measured evidence

Run locally on 2026-07-02. All numbers are measured, not estimated.

**Setup.** Two independent PGlite datadirs, each served over pglite-socket via
`@zeropg/client` `serveWire()` (the same wire path `packages/server` uses):
a **source** on `127.0.0.1:5439` and a second zeropg **target** on
`127.0.0.1:5440`. Separately, a stock **PostgreSQL 18.4 (Homebrew)** server on
`127.0.0.1:5441` stood in for "managed Postgres". Client tools: libpq /
`pg_dump` / `psql` **18.4**.

**Source schema** (a small but realistic app shape): 3 tables with foreign keys
(`authors` 50 rows, `posts` 300 rows, `comments` 600 rows - 950 rows total),
two secondary indexes, a `UNIQUE` constraint, 3 sequences, and a view
(`post_stats`) that joins and aggregates across all three tables.

**The dump** (`pg_dump --no-owner --no-acl`, plain SQL, default COPY format):

| metric | value |
|---|---|
| wall time | **0.18 s** |
| dump size | **125,812 bytes** (123 KB), 1,231 lines |
| contents | 3 `CREATE TABLE`, 3 `CREATE SEQUENCE`, 2 `CREATE INDEX`, 3 FK constraints, 1 `UNIQUE`, 1 `CREATE VIEW`, and all 950 rows as inline `COPY` data |

**Restore into real PostgreSQL 18.4** (the graduation direction):

```
psql "postgresql://postgres@127.0.0.1:5441/postgres" -v ON_ERROR_STOP=1 -f dump.sql
```

- Exit 0, **zero errors**, **0.05 s**.
- Row counts identical to source: `authors=50 posts=300 comments=600
  post_stats(view)=300 published=100`.
- Row-level checksum identical:
  `md5(string_agg(id||name||email order by id))` over `authors` returned
  **`2194e93ca5b115c1dc23963ab4496d01`** on both the zeropg source and the
  restored real Postgres.
- Foreign key enforced on the restored copy: an orphan insert raised
  `violates foreign key constraint "posts_author_id_fkey"`.
- Sequence restored: `authors_id_seq` `last_value = 50` on both sides.
- View restored and correct: `post_stats` joins and aggregates returned matching
  rows.

That is the proof that matters: a standard `pg_dump` off zeropg over the wire is
valid, complete, restorable SQL, and the restored database in a real Postgres is
byte-for-byte the same data with the same constraints, indexes, sequences, and
views.

**Round-trip back into a second zeropg** (as an extra check that the dump is
also a valid zeropg-to-zeropg transfer): restoring an `--inserts`-format dump
into the second pglite-socket instance on `127.0.0.1:5440` succeeded, exit 0,
0.14 s, with the same `authors` checksum. See the first caveat for why
`--inserts` is used specifically for this direction.

## Caveats

Everything that bit during verification, recorded faithfully.

1. **Client version gate (the likely snag).** PGlite advertises the Postgres
   version it was compiled from (18.3 for PGlite 0.5.2). `pg_dump` / `psql`
   refuse a server newer than themselves: a stock libpq 17 aborted with
   `server version mismatch: server 18.3, pg_dump 17.0`. Fix: install client
   tools >= the PGlite Postgres major (e.g. `brew upgrade libpq` to 18.4, or
   install `postgresql@18`).

2. **No TLS on the wire - use `sslmode=disable`.** pglite-socket is a plain
   loopback TCP endpoint; without `sslmode=disable` libpq attempts SSL
   negotiation and the connection fails.

3. **Do not restore the default COPY dump back *into* a pglite-socket wire.**
   This only affects a zeropg-to-zeropg restore, **not** graduation into real
   Postgres. `psql -f` of a COPY-format dump into a second pglite-socket instance
   failed on the first `COPY ... FROM stdin` with
   `unexpected EOF on client connection with an open transaction` /
   `terminating connection because protocol synchronization was lost`, and left
   that socket server's session wedged. pglite-socket's connection multiplexer
   does not carry the streaming `COPY FROM STDIN` sub-protocol. Workaround when
   the destination is another zeropg: dump with `--inserts` (or
   `--column-inserts`) so table data restores as ordinary `INSERT` statements -
   verified clean and identical. Restoring into a **real** Postgres has no such
   limitation, so for actual graduation keep the fast default COPY format.

4. **search_path GUC bleed on the multiplexed session.** `pg_dump` begins its
   session with `SELECT pg_catalog.set_config('search_path', '', false)`. Because
   pglite-socket multiplexes many connections onto one shared PGlite session, a
   session-level GUC set by the dump connection can persist and be observed by
   other connections - after a dump, an unqualified query elsewhere saw an empty
   `search_path` until it was reset (`SET search_path=public` restores it). The
   dump itself is unaffected because pg_dump emits fully schema-qualified names.
   Practical guidance: quiesce other writers during the dump (step 1), which the
   graduation flow already does. (A related intermittent PGlite quirk,
   `access to non-system view "..." is restricted`, appeared on a view query
   while the session was in this odd state; it did not affect the dump or
   restore, and the view itself restored and queried correctly.)

5. **Extensions must exist on both ends.** If the source schema uses contrib
   extensions, two things follow. On the zeropg **source**, the extension must
   have been preloaded into PGlite for the objects to exist at all (PGlite's
   `CREATE EXTENSION` needs the JS contrib module passed to `PGlite.create`; see
   the project memory on extension preloading). On the **target**, the dump's
   emitted `CREATE EXTENSION pgcrypto` / `pg_trgm` / etc. lines require those
   extensions to be available in the managed Postgres (they are on RDS, Cloud
   SQL, and Neon). The verification schema above used no extensions, so this path
   was not exercised here - it is called out because it is the most likely
   real-app difference from the clean test case.

## Graduating from a cold backup instead of the live instance

You do not have to touch the running instance to graduate. The secondary
cold-backup system ([docs/D-COLD-BACKUP.md](D-COLD-BACKUP.md), implemented and
disaster-tested) keeps periodic copies in a second, colder bucket. You can
restore one of those backups into a throwaway PGlite datadir, serve it over the
wire with `serveWire()`, `pg_dump` it exactly as above, and load it into managed
Postgres - all without quiescing or disturbing production. That makes graduation
a fully out-of-band operation: the live database keeps serving while you stage
and validate the migration from a backup, and you cut over by repointing
`DATABASE_URL` only once you are satisfied.