# RTO and RPO — what RedDB actually promises

This page tells operators what to expect when a RedDB database fails
or has to be rolled back: how long recovery takes (**RTO**, recovery
time objective) and how much data the recovery may lose (**RPO**,
recovery point objective). The numbers reflect the engine's current
durability story (WAL + snapshot archiving + PITR replay) and assume
the deployment shape from
[`docs/security/vault.md`](../security/vault.md) and
[`docs/operations/secrets.md`](secrets.md).

If you came here looking for backup configuration, see the
[backup section in the README](../../README.md#backup--recovery)
first.

---

## TL;DR

| Failure mode | RTO target | RPO target |
|---|---|---|
| Process crash, disk intact | seconds | zero (committed-up-to-WAL-fsync) |
| Disk loss, recover from local snapshot + WAL | seconds–minutes | zero (committed-up-to-WAL-fsync) |
| Disk loss, recover from remote snapshot + WAL | minutes (depends on snapshot size) | seconds (last archived WAL segment) |
| Application-level rollback to a target time | minutes | bound by archive cadence |
| Replica promoted after primary loss | seconds | bounded by replication lag |

Numbers further down are concrete.

---

## What RedDB actually persists

RedDB's durability model has three layers, in order of how much they
contribute to RPO:

1. **WAL**. Every write goes to the write-ahead log before the
   in-memory store reports success. `wal::sync()` fires on every
   commit. RPO inside this layer = whatever the OS holds in
   page-cache between `fsync` calls. With default settings, **a
   process crash on a live disk loses zero committed transactions**.
   See `src/storage/wal/transaction.rs::commit`.

2. **Snapshots**. Periodic checkpoints of the in-memory store, written
   as a `.rdb-snapshot` blob. Combined with the WAL since the
   snapshot's base LSN, a snapshot is a fast-rewind point for PITR.
   See `src/storage/wal/checkpoint.rs`.

3. **Remote archive**. WAL segments and snapshots are uploaded to a
   remote backend (S3, R2, GCS, Turso, D1, or local fs). A disk loss
   on the primary recovers from the archive — RPO is bounded by the
   archive cadence. See `src/storage/wal/archiver.rs`.

When sizing RTO/RPO budgets, decide first which of these layers
survives the failure scenario you care about, then read off the row
in the table below.

---

## Numbers

The table below is calibrated for a single-node primary. Adjust by
size class; the engine's durability path is `O(WAL replay)` once the
snapshot is open, so RTO scales linearly with the unflushed WAL volume,
not with the total database size.

### Crash without disk loss

The local disk still has WAL + snapshot. Recovery is:

1. Open the `.rdb` file → header + meta-shadow recovery
2. Replay WAL records since the last checkpoint
3. Resume serving

Order-of-magnitude for typical workloads:

| Database size | Unflushed WAL window | Cold-start RTO |
|---|---|---|
| 100 MB | 10 MB | < 1 s |
| 1 GB   | 50 MB | 1–2 s |
| 10 GB  | 100 MB | 3–5 s |
| 100 GB | 200 MB | 10–20 s |
| 1 TB   | 200 MB | 30–60 s |

RPO under crash-without-disk-loss is **zero** for committed
transactions: WAL `fsync` is the durability boundary on commit, not a
cache flush.

### Disk loss, recover from remote archive

The local disk is gone (volume failure, container destroyed, etc.).
RedDB pulls the latest snapshot + WAL segments from the configured
backend. Recovery is:

1. Download latest snapshot manifest
2. Download snapshot blob
3. Download WAL segments since the snapshot's base LSN
4. Replay WAL records, validate hash chain
5. Open recovered `.rdb`

RTO is dominated by the snapshot download. WAL replay is fast (each
segment is sized so replay completes in under a second).

| Database size | Snapshot blob | RTO @ 100 Mbit/s | RTO @ 1 Gbit/s |
|---|---|---|---|
| 100 MB | ~80 MB | ~7 s | < 1 s |
| 1 GB | ~700 MB | ~60 s | ~6 s |
| 10 GB | ~7 GB | ~10 min | ~1 min |
| 100 GB | ~70 GB | ~100 min | ~10 min |
| 1 TB | ~700 GB | ~16 h | ~100 min |

For deployments that need sub-minute RTO at large sizes, run a
replica fleet — promotion is bounded by replication lag, not snapshot
download. See the [replication section of the README](../../README.md).

RPO is bounded by the archive cadence:

| Archive cadence | Worst-case RPO |
|---|---|
| `wal_archive_interval = 1s` (default) | 1 s |
| 10 s | 10 s |
| 60 s | 60 s |

### Application-level rollback to a target time

Use case: an operator command corrupted data at `t_bad`. Roll back
to `t_bad - epsilon`. Recovery is:

1. Pick the latest snapshot whose `snapshot_time <= t_bad - epsilon`
2. Download it
3. Replay WAL up to but not past the target time
4. Open recovered `.rdb`

RTO is the same as a remote-restore (same downloads + replay). RPO is
the time gap between the target and the last applied WAL record —
because RedDB stops replay at the first record past the target, there
is no data loss within the target window.

The drill that proves this works:
[`tests/drill_pitr_target_time.rs`](../../tests/drill_pitr_target_time.rs).

### Replica promoted after primary loss

This is the lowest-RTO option. The replica was already replaying the
primary's WAL stream; promotion is mostly a handshake with the lease
service.

| Replication lag (typical) | Promotion RTO | RPO |
|---|---|---|
| ms | seconds | replication lag |
| seconds | seconds | replication lag |
| minutes (slow replica) | seconds | replication lag |

The drill that proves promotion fails-closed when the replica's chain
is broken:
[`tests/chaos_promote_refused_when_lease_held.rs`](../../tests/chaos_promote_refused_when_lease_held.rs).

---

## Verifying the contract on your build

Three drills that run in CI pin the recovery contract:

- [`tests/drill_backup_restore_round_trip.rs`](../../tests/drill_backup_restore_round_trip.rs)
  — full archive → simulated primary loss → restore round trip.
- [`tests/drill_pitr_target_time.rs`](../../tests/drill_pitr_target_time.rs)
  — PITR target-time semantics (records after target are not applied).
- [`tests/drill_pitr_byte_identical.rs`](../../tests/drill_pitr_byte_identical.rs)
  — restored DB collection inventory matches the snapshot's.

If you are about to ship a release that touches the WAL, snapshot
serialization, or archive layout, rerun all three drills locally and
verify CI green before tagging.

---

## What RedDB does NOT promise

- **Disk corruption beyond CRC32.** RedDB checksums every page and
  WAL record with CRC32 and refuses to load a corrupted page. It does
  not silently repair. Recovery from disk corruption goes through the
  remote archive path; if the remote is also corrupt, you lose data.
- **Distributed two-phase commit.** Replication is single-primary
  with eventual fan-out to replicas. There is no consensus layer; a
  network split that elects two primaries is prevented by the lease
  service, not by the storage engine.
- **Sub-second RTO at multi-TB scale without a replica fleet.** A
  cold restore at TB scale is dominated by network throughput; a hot
  replica is the right tool.