# docs/13 — Hub multi-user production hardening (Track E) The convergence engine (`reduce`) is already production-grade: any node holding the same op SET reduces to the same `treeHash`, independent of arrival order (reducer.ts canonical sort + Kahn topo-sort), with content-addressing as the integrity backbone. Track E does **not** touch that engine. It hardens the *replication + trust boundary* around it — the layer that (a) delivers the correct op set to every replica, (b) stops unauthorized mutation of the inputs that feed `reduce`, and (c) prevents data destruction. The leverage: AVCS already has cryptographic actor identity (ed25519-signed operations and memberships, `core/identity.ts`). So authorization can be enforced **cryptographically at the application layer** — no transport-level identity provider required for the in-repo stages. Transport security (TLS/OIDC/replicated storage) is deployment infrastructure and is documented-only (see "Infra" below). ## Termination condition Hub multi-user production is "possible" when, for independently-developed work pushed by multiple users and pulled by others: 1. **Convergent** — everyone who has the same ops ends at the same final code, and sync always *can* deliver the same op set (completeness + scale). 2. **Authorized** — no unauthenticated party can change a replica's materialized result (no decision/membership/redaction injection) or read what they shouldn't. 3. **Non-destructive** — no party can irrecoverably destroy or corrupt history; side-effects (redaction) are isolated and admin-gated. 4. **No silent divergence** — an object the hub accepts verifies identically on every replica; a partial/causally-incomplete push never materializes wrong code. ## Stages (dependency-ordered) The security chain **E1 → E2 → E3** is the critical path (highest severity). E4/E5/E6 are independent and can land in parallel. E7 is operability. | Stage | Blocker (from the audit) | Fix | Severity | Depends | |---|---|---|---|---| | **E1** | gated hub verifies a signature over the client-*claimed* `op.oid`, but `put()` stores under the *recomputed* content oid → an op the hub accepts can be rejected by pulling replicas → divergence (hubServer.ts:44 vs 212) | verify the signature over the **recomputed content oid** (`computeOid`), so hub-accept ⟹ replica-accept | High | — | | **E2** | only `type==="operation"` is gated; `decision`/`membership`/`redaction` are waved through unauthenticated, yet a pushed `decision` changes `verdictMap` on every replica (hubServer.ts:204) | authenticate & authorize **all governance object types** by signature + membership role; reject unsigned/under-privileged governance pushes | High | E1 | | **E3** | open hub trusts ALL redactions and runs `applyRedactions` inline, unlocked → any client can irrecoverably evict any blob (DoS), and concurrent redactions race (applyRedactions.ts:37-39, hubServer.ts:215) | require an admin-signed redaction even on an ungated hub; serialize the side-effect under `store.withLock` | High | E2 | | **E4** | a push is N independent POSTs (non-atomic); an op whose `causalDeps` haven't arrived materializes without its ancestor → transient wrong tree (hubClient.ts:37-53) | accept ops but **hold causally-incomplete ones** (quarantine) until their deps arrive; never project an op missing a dep | Med | — | | **E5** | `GET /have` serializes every oid every sync → O(total history) per sync, no cursor (hubServer.ts:153-158) | **since-cursor incremental sync**: a general append-only `objlog` + `GET /sync?since=N` + a persisted per-hub pull cursor (full `/have` fallback). Incremental **pull** done; incremental **push** (idempotent re-POST of the local objlog delta) is a documented follow-up | Med (scale) | — | | **E6** | protected-head CAS runs only on the central repo; `setRef` is a plain write with no compare-and-swap (objectStore.ts setRef) | server-side **CAS finalize endpoint** + ref lock so authority never overwrites fresher history | Med | — | | **E7** | no provenance/audit of who pushed what; no app-layer rate-limit/quota | append-only **hub audit log** of accepted mutations + per-actor **push quota** (429). Hub `fsck` needs no new endpoint — the hub IS an ObjectStore, so D3's `avcs fsck` runs directly against the hub's repo dir | Low–Med | E2 | ## Infra-dependent — documented-only (out of sandbox, Track C kin) These require a deployment environment, not application code: - **TLS / mTLS** termination (transport encryption + mutual auth) — reverse proxy. - **OIDC / token IdP** binding transport identity to the app-layer authz of E2. - **Durable / replicated storage** backend (today: single-process local files) — object storage / a replicated log. - **Edge rate-limiting / WAF**, **HSM / threshold keys**, **OTel collector**. ## Track F — robustness hardening (decode-path fuzzing) Beyond the hub trust boundary, a production VCS must survive *corrupt* input on the read path, not just reject *unauthorized* input. D1 (atomic writes) and D3 (`avcs fsck`) keep bytes honest and detect rot, but the decoder itself had to degrade safely. | Stage | Blocker | Fix | Severity | |---|---|---|---| | **F1** | a single torn/bit-rotted object (truncated CBOR, broken JSON, empty file) made `get`/`list`/`materialize`/`pull` throw an opaque `SyntaxError`/`CBOR: …` with no indication of *which* object — un-actionable, and a partial path could surface as a crash deep in reduce | normalize every decode failure at the single `decodeObject` chokepoint to a typed **`CorruptObjectError`** that names the offending `oid`; a seeded **fuzz harness** (arbitrary/truncated/bit-flipped/empty bytes, 400 iterations) asserts the decode dichotomy `{value} ∪ {CorruptObjectError}` — never an opaque throw, non-Error throw, or hang | Med | This closes the docs/10 verification gate "fuzzing: 객체 파서". Remaining docs/10 fuzz targets (sync-negotiation, reduce) are already covered by the determinism property harness (reduce, split-independence) and the hub's malformed-input 4xx handling. ## Invariants Track E must not break - **Determinism**: no stage changes `reduce`'s output for a given op set — all E work is replication/trust-boundary, never reduction logic. - **Reuse cryptographic identity**: enforce authz with the existing signature / membership machinery rather than inventing a new auth system. - **Default-safe**: unsigned, under-privileged, or causally-incomplete input is rejected or quarantined — never silently applied. - **Backward-compatible sync**: incremental sync falls back to the full `have` set; signature enforcement rolls out warn-then-reject. - **Each stage = a merged PR** with the multi-node convergence harness re-run.