# Cost model: provider-aware write/snapshot policy

Design principle: **zero writes unless something changed, and no full snapshot until restore performance degrades past a budget.** Every upload must justify itself against a per-provider cost table. Eventually each `BlobStore` driver carries a `CostModel` so the policy engine (flush cadence, snapshot threshold, GC aggressiveness, storage class) is computed per cloud, not hardcoded.

Numbers below are from provider pricing pages as last reviewed (2026-06); they drift - each driver should pin them in code with a date and we re-verify per release. Marked per 1,000 operations.

## Provider cost tables

| | GCS Standard | S3 Standard | Cloudflare R2 | IBM COS Standard | Tigris |
|---|---|---|---|---|---|
| Storage /GB-month | ~$0.020 | $0.023 | $0.015 | ~$0.021 | $0.020 |
| Write ops (PUT/list, Class A) | $0.005 | $0.005 | $0.0045 | $0.005 | $0.005 |
| Read ops (GET, Class B) | $0.0004 | $0.0004 | $0.00036 | $0.0004 | $0.0005 |
| DELETE | free | free | free | free | free |
| Egress to own-cloud compute (same region) | free | free | free | free | free (all egress) |
| Egress to internet | ~$0.12/GB | ~$0.09/GB | **$0 (free)** | ~$0.09/GB | **$0 (free)** |
| Free tier (monthly) | 5GB + small ops (always-free) | none durable | **10GB + 1M writes + 10M reads** | **25GB always-free (Lite)** | **5GB + 10k writes + 100k reads** |
| Conditional write | `ifGenerationMatch`, no surcharge | `If-Match`/`If-None-Match`, no surcharge | etag precondition, no surcharge | `If-Match`/`If-None-Match`, no surcharge (live) | `If-Match`/`If-None-Match`, no surcharge (live) |

GCS, IBM COS, Tigris, and Cloudflare R2 are all live behind the **same** S3/conditional-PUT transport — the four See-it-live demos in the README (S3 proper is the same driver, untested only because it has no free tier). Two facts steer backend choice independent of the per-op noise: **IBM COS has the largest always-free storage (25 GB)**, and **R2 + Tigris are the only zero-egress options** — which is exactly what makes bucket-served read replicas / CDN hydration free rather than ~$0.09-0.12/GB. Tigris additionally prices the same in every geography (a EUR multi-region bucket costs what a US one does).

### Limits that shape the design (not just the bill)

- **GCS: ~1 write/second sustained per object name.** The manifest is one object name - strict-mode commit rate is capped at ~1/s before 429s. Above that, commits MUST batch (group N transactions per manifest CAS). This is a correctness-adjacent limit, not a cost knob.
- S3: 3,500 PUT/s per prefix - effectively unreachable for us; no per-object write cap documented, but the manifest CAS serializes us anyway.
- R2: no published hard per-object rate; Workers-side limits dominate.
- Minimum storage durations on cold tiers: S3 IA 30d, GCS Nearline 30d / Coldline 90d - moving short-lived generation garbage to a cold class costs MORE (early-delete fees). Cold classes are only for retained old generations kept ≥ the minimum.

## What things actually cost (worked, at target scale)

Per strict commit (v1 WAL shipping): 1 segment PUT + 1 manifest CAS = 2 write ops ≈ **$0.00001**. At 1,000 writes/day: ~60k ops/month ≈ **$0.30/month** on any provider. Ops are noise.

Storage: a 500MB DB with 2 retained generations ≈ 1GB ≈ **$0.02/month** on R2. This is the pitch number, and it's real.

The dominant *avoidable* costs, in order:
1. **Pointless snapshots** (v0 ships the full DB per commit; a 50MB DB at 1,000 writes/day uploads 50GB/day - bandwidth is free intra-cloud but it burns instance CPU-seconds, which on Cloud Run cost more than the bucket does).
2. **Stale generations not GC'd** (storage is the only line that grows unbounded).
3. **Internet egress on GCS/S3** if clients hydrate from the bucket directly (the CDN-seeded read-replica idea) - free on R2, expensive elsewhere; on GCS/S3 front it with a CDN or accept the per-GB fee.

## Snapshot policy: restore-budget driven, not write driven

Since ops are noise and intra-cloud bandwidth is free, full snapshots are NOT a cost problem - they're a *compute-time* problem (instance CPU for tar/gzip/upload) and a *restore-latency* problem when too rare. So:

- **Trigger compaction on a restore-time budget, not a byte count.** Track `estimatedRestoreMs = snapshotBytes/throughput + segmentCount × perGetOverhead + replayEstimate`. When it exceeds the budget (default: 2× the bare snapshot restore time, i.e. "segments may at most double cold start"), snapshot. The byte/count thresholds in V1-WAL-SHIPPING.md are the v1 approximation of this.
- **Never snapshot on wake** (see V1-WAL-SHIPPING.md) and never on a timer that fires with zero accumulated WAL. Idle databases must converge to literally zero ops: no heartbeat writes while not serving (lease renewal only while actively committing; an expired lease at rest is fine - the next writer takes over by CAS).
- **GC promptly**: deletes are free everywhere. Keep `retainGenerations: 1` beyond current as default.

## Driver interface sketch

```ts
interface CostModel {
  writeOpUsd: number          // per op
  readOpUsd: number
  storageGbMonthUsd: number
  internetEgressGbUsd: number // 0 on R2
  maxWritesPerObjectPerSec?: number  // 1 on GCS — forces commit batching
  freeTier?: { storageGb: number; writeOps: number; readOps: number }
}
interface BlobStore {
  // ...existing get/put/list/delete
  readonly cost: CostModel
}
```

Policy decisions computed from it:
- `minCommitBatchMs = 1000 / maxWritesPerObjectPerSec` (GCS: forces ≥1s batching under sustained writes; S3/R2: 0)
- snapshot cadence from the restore-time budget (above), with storage price scaling the retention knob
- whether the client-hydration/CDN story is on by default (R2: yes; GCS/S3: behind a "this costs egress" flag)
- free-tier awareness for the README cost calculator ("your app: $0.00/month on R2")

## Google Cloud Storage - the first driver we optimize

All figures Standard storage class, single region, as last reviewed 2026-06; pin and re-verify in the driver.

### Price sheet

| item | price | notes |
|---|---|---|
| Storage, Standard | ~$0.020/GB-month | region-dependent ($0.020 us-east1/europe-west1 ballpark) |
| Class A ops (PUT, LIST, **compose**, rewrite, patch) | $0.05 / 10k = $0.005/1k | every segment PUT, manifest CAS, list = Class A |
| Class B ops (GET, getMetadata) | $0.004 / 10k = $0.0004/1k | restore reads, manifest polls |
| DELETE | free | GC costs nothing in ops |
| Egress to same-region Google compute (Cloud Run/GCE/Functions) | free | the whole replication loop is op-cost only |
| Egress to internet | ~$0.12/GB (premium tier) | matters only for client/CDN hydration |
| Always-free tier | 5GB Standard + 5k Class A + 50k Class B /month | **US regions only** (us-east1/us-west1/us-central1) - europe-west1 gets nothing |

Colder classes (Nearline $0.010, Coldline $0.004, Archive $0.0012 /GB-month) carry 30/90/365-day minimums, per-GB retrieval fees, and ~2x Class A prices. Verdict for us: **never** for live generations or short-lived garbage; only for a long-term backup branch (e.g. "keep a monthly snapshot for a year" feature, later).

### Bucket configuration the driver should demand (or set itself)

1. **Disable soft delete.** GCS buckets default to a 7-day soft-delete retention, billed at the storage rate for deleted bytes. Our workload deletes superseded snapshots constantly - with v0 shipping a 50MB snapshot per commit, soft delete would bill ~7 days x every snapshot ever GC'd. This may be the single biggest hidden cost on GCS; turn it off at bucket creation (`softDeletePolicy.retentionDurationSeconds: 0`).
2. **No object versioning** (same reasoning - the manifest IS our versioning).
3. **No Autoclass** (per-object monitoring fee buys nothing; our objects are either hot or garbage).
4. **Single region, same region as the compute.** Dual-region doubles storage cost for an HA property the single-writer design can't use.
5. Region choice: if the user has no latency constraint, default recommendation `us-east1`/`us-central1` to capture the always-free tier; otherwise same-region-as-compute wins (egress-free and lowest latency dominate).

### Limits that bind

- **~1 sustained write/second per object name.** The manifest is one object name → strict-mode sustained commit rate caps at ~1/s before 429/`rateLimitExceeded`. Driver rule: `minCommitBatchMs ≈ 1000`; under burst load, group commits into one manifest CAS (group commit). Retries with backoff on 429 are mandatory anyway (GCS documents this as a soft, burstable limit).
- Per-bucket write ramp: ~1000 writes/s initial, auto-scales. Irrelevant at our scale.
- Conditional writes (`ifGenerationMatch`) carry no surcharge and no special quota - the CAS loop is free beyond the op itself.

### GCS-specific opportunities

- **Server-side `compose`**: GCS can concatenate up to 32 objects into one without downloading them (Class A op, supports `ifGenerationMatch`). This is segment compaction WITHOUT instance CPU or bandwidth: periodically compose N small WAL segment objects into one larger object and swap the manifest's segment list. Restore GET-count drops 32x for one op's price, and no Cloud Run instance needs to be awake to do it. (Compose concatenates raw bytes - works for our segments if compression is per-segment-framed, e.g. concatenated gzip members, which gunzip handles natively.) This softens the snapshot-cadence pressure: compose is the cheap middle tier between "many segments" and "full snapshot," GCS edition of Litestream's compaction levels.
- **Appendable objects / Rapid Storage (zonal)**: Google has been rolling out appendable objects on zonal buckets. If/when generally available in our regions, WAL shipping could append to one open segment object instead of minting many - fewer objects, fewer ops, simpler manifests. Different durability scope (zonal) - investigate before relying on it; flagged as a v2 driver variant, not v1.
- **Manifest polling for read replicas is Class B** ($0.0004/1k): clients polling the manifest every 5s cost ~$0.21/month each on ops - fine; internet egress for the segments they then fetch is the real cost (front with a CDN or Cloudflare in front of GCS, or steer the client-hydration story to R2).

### Worked example: target app on GCS (europe-west1)

100MB DB, 500 writes/day batched into ~200 commits, snapshot/compaction 4x/day, 2 generations retained:
- Ops: 200x2 + 4x2 ≈ 13k Class A/month ≈ $0.065
- Storage: ~0.3GB average ≈ $0.006
- Egress: $0 (Cloud Run same region)
- **Total ≈ $0.07/month** (or ≈ $0.00 in a US free-tier region). Soft delete left on could multiply this several-fold - hence rule 1.

## Other providers (briefer; each gets this treatment when its driver lands)

- **R2**: the cost-optimal home overall. Free egress makes bucket-served read replicas / CDN seeding free; free tier (10GB, 1M writes/mo ≈ 23 commits/min sustained) covers the entire target use case. Pair with the Durable Object tier from DESIGN.md 4.7.
- **S3**: no free tier, highest internet egress; fine intra-AWS (Lambda). No per-object write cap documented (manifest CAS serializes us anyway). S3 also supports multi-object concatenation only via multipart-copy (clunkier than GCS compose). S3 Express One Zone is a separate latency-oriented driver decision later - different pricing (per-GB request fees), single-AZ durability tradeoff.
- **IBM COS**: the **largest free tier here — 25 GB always-free (Lite plan)** — plus full `If-Match`/`If-None-Match` CAS, confirmed live (Track C). Storage ~$0.021/GB-mo, ops priced like S3 ($0.005/1k Class A, $0.0004/1k Class B), internet egress ~$0.09/GB but **free to same-region IBM compute**. Pairs with IBM Code Engine (Cloud Run's analog) for an all-IBM scale-to-zero stack — the demo that runs at the IBM URL in the README. Reuses the S3/SigV4 driver verbatim; the only knobs are endpoint + region (`eu-de`).
- **Tigris**: globally-distributed, S3-compatible, **zero egress fees in every geography** (like R2) at one worldwide price — $0.02/GB-mo storage, $0.005/1k Class A, $0.0005/1k Class B; 5 GB + 10k writes + 100k reads free. Same conditional-write CAS, same driver. The free-egress home when bucket-served read replicas / CDN hydration matter but committing to Cloudflare's compute ecosystem doesn't; the demo pairs it with Cloud Run (europe-west1).

## Experiment hooks

- E5 (soak + cost) must validate this table against the actual bill: predicted vs billed, line by line.
- Add E5b: GCS manifest CAS at >1/s sustained - confirm the 429 behavior and that batching holds the rate under the cap.

## Status (2026-06-12): measured + what is implemented

**E5b ran.** Sequential CAS against one object name: **2.43/s achieved, 52% of requests answered 429** (`results/e5b.jsonl`). The ~1/s documented cap is real and soft - burstable, but anything sustained above it grinds through rejections. The first v1-WAL-shipping E2c run then hit this organically: incremental commits are ~150ms, so 60 sequential strict writes exceeded the manifest cap and failed with the exact `gcs429` error. v0 could never reach this limit (9s snapshot commits); fixing it became mandatory the moment writes got fast.

Implemented now (the "tweak settings + driver behavior" tier):
- `CostModel` on the `BlobStore` interface; `GcsBlobStore.cost` pins the GCS table with a review date. `maxWritesPerObjectPerSec: 1`.
- **Group-commit pacing in ZeroPG**, derived from the cost model: commits are spaced >=1s apart on GCS; writes arriving inside the window coalesce into the next commit's WAL range (E2c probe 5: 10 concurrent writes -> few CASes). Idle databases pay zero latency - pacing only bites under sustained write load.
- **429/5xx retry with jittered backoff** in the GCS driver. Only clean rejections retry; ambiguous network failures do NOT (a retried-but-already-landed conditional PUT would read as a false FencedError).
- Bucket hygiene verified on our bucket: soft delete OFF, no versioning, no Autoclass, single region. (Soft delete is the headline trap: 7-day retention billed on every deleted byte, and we delete superseded snapshots constantly.)
- Idle = zero ops already holds: no heartbeat at rest, interval-mode flush no-ops when clean, sleep mode uploads only on SIGTERM/idle-backstop.

The v2 worth building (GCS-structural, in order):
1. **`compose`-based segment compaction.** Our WAL segments are stored raw, so GCS server-side compose (32:1, one Class A op, `ifGenerationMatch` support) can fold many small segments into one object with **zero instance CPU/bandwidth, and no instance even awake**. Restore GET-count drops 32x; full-snapshot compaction then triggers on replay-time budget only. This is the cheap middle tier between "many segments" and "full snapshot" - Litestream's compaction levels, GCS edition.
2. **Restore-budget-driven compaction** replacing the fixed 16MB/64-segment thresholds: `estimatedRestoreMs = snapshotBytes/throughput + segments*perGetMs + walBytes/replayRate`, snapshot when it exceeds ~1.5x the bare-snapshot restore. All three constants are now measured (E0, E2c, E3).
3. **Appendable objects (zonal Rapid Storage)**: would collapse segment-per-commit into appends to one open object. Zonal durability caveat; watch, don't build.

Not worth building: manifest sharding to dodge the 1/s cap (group commit is sufficient - single-writer caps useful commit rate anyway), cold storage classes for live generations (minimum-duration + retrieval fees exceed savings at our sizes), op-count micro-optimization (ops are cents/month at target scale; the binding constraints are the per-object write cap, restore latency, and instance CPU-seconds).