# Eval capture — NDJSON schema reference **Status:** stable from v0.21.0. Schema versioning via `schema_version` on every row; additive changes increment the minor version; removals are breaking-schema-v2. **Audience:** downstream consumers (primarily the sibling [gbrain-evals](https://github.com/garrytan/gbrain-evals) repo) that replay captured real-world queries as a BrainBench-Real fixture. ## The pipeline ``` MCP / CLI / subagent tool-bridge caller │ ▼ src/core/operations.ts — query + search op handlers │ │ (hybridSearch or searchKeyword) │ ▼ {results, meta: HybridSearchMeta} ┌── captureEvalCandidate │ │ (fire-and-forget) ▼ │ return to caller ▼ scrubPii(query) ←── src/core/eval-capture-scrub.ts │ ▼ buildEvalCandidateInput │ ▼ engine.logEvalCandidate │ ┌──────────────┴──────────────┐ │ success │ fail ▼ ▼ INSERT into eval_candidates engine.logEvalCaptureFailure (reason: db_down | rls_reject | check_violation | scrubber_exception | other) ``` ## `gbrain eval export` — the consumer contract ```sh gbrain eval export [--since DUR] [--limit N] [--tool query|search] ``` Emits NDJSON to **stdout**. One JSON object per `\n`-terminated line. stderr receives progress heartbeats. Every line starts with `"schema_version": 1` so a forward-compat parser can fail loudly on schema v2 instead of silently misparsing. Typical usage from gbrain-evals: ```sh # Snapshot the last week of real traffic for replay gbrain eval export --since 7d > brainbench-real.ndjson ``` ```sh # Stream through jq for ad-hoc analysis gbrain eval export --tool query | jq -c 'select(.latency_ms > 500)' ``` ## Row schema (v1) Every exported row has this shape. Field order in JSON output is not guaranteed; consumers MUST key by name, not position. | Field | Type | Notes | |---|---|---| | `schema_version` | number | Always `1` on v1 rows. Forward-compat gate. | | `id` | number | Autoincrement primary key. Stable across exports. | | `tool_name` | `"query"` \| `"search"` | Which MCP operation captured this row. | | `query` | string | **Already PII-scrubbed** by `scrubPii` unless `eval.scrub_pii: false`. Emails / phones / SSN / Luhn-verified credit cards / JWTs / bearer tokens replaced with `[REDACTED]`. Max length 50KB (CHECK-enforced). | | `retrieved_slugs` | string[] | Deduplicated slugs that came back in `SearchResult[]`. | | `retrieved_chunk_ids` | number[] | Every chunk id in result order (duplicates preserved — one per hit). | | `source_ids` | string[] | Distinct `sources.id` values across the result set (v0.18 multi-source). Empty for pre-v0.18 rows that lacked the column. | | `expand_enabled` | boolean \| null | Whether the caller **requested** Haiku expansion. `null` for `search` (no expansion concept). | | `detail` | `"low"` \| `"medium"` \| `"high"` \| null | Detail level the caller **requested**. `null` when omitted. | | `detail_resolved` | `"low"` \| `"medium"` \| `"high"` \| null | What `hybridSearch` **actually used** after auto-detect. `null` when neither caller nor heuristic classified. | | `vector_enabled` | boolean | True iff vector search actually ran. `false` when `OPENAI_API_KEY` was missing or the embed call failed. **Replay MUST respect this** — rows with `false` only exercised the keyword path. | | `expansion_applied` | boolean | True iff Haiku expansion actually produced variants (not just "was requested"). | | `latency_ms` | number | Wall-clock duration of the op handler (includes capture itself — negligible since it's fire-and-forget). | | `remote` | boolean | `true` for MCP callers (untrusted), `false` for local CLI. Partitions "real agent traffic" from "operator probing." | | `job_id` | number \| null | `OperationContext.jobId` when the caller was a subagent tool-bridge. Null for MCP + CLI. | | `subagent_id` | number \| null | `OperationContext.subagentId` for subagent-owned runs. | | `created_at` | string (ISO 8601) | UTC timestamp of insert. | ## Ordering + determinism `listEvalCandidates` orders by `created_at DESC, id DESC`. Same- millisecond inserts tie on `created_at`; `id DESC` is the stable tiebreaker. Replay tools can consume rows in order and assume: - no duplicate rows across calls with non-overlapping `--since` windows - no missed rows across calls that chain `--since` windows (window end of run 1 is the strict upper bound, not a soft cursor) ## Schema versioning promise - **v1 (shipped v0.21.0)** — this document. All fields listed above. - **Additive changes** increment gbrain minor version (v0.25.0, v0.23.0 …) and ship with new optional fields. Consumers keyed on known fields ignore unknown keys and keep working. - **Breaking changes** (rename, type change, removal) increment `schema_version` to 2. Consumers MUST branch on `schema_version` to stay compatible. ## `eval_capture_failures` — companion audit table Not exported by `gbrain eval export`. Surfaced via `gbrain doctor`: ```sh gbrain doctor # warns when failures in last 24h > 0 ``` Reason enum (stable): `db_down` | `rls_reject` | `check_violation` | `scrubber_exception` | `other`. Cross-process visibility is the whole point — `gbrain doctor` runs in its own process and reads the table directly, so in-process counters wouldn't work. ## Config + CONTRIBUTOR_MODE Capture is **off by default** as of v0.25.0 (was on for everyone in earlier drafts). Two paths to turn it on: **Path A — env var (contributor opt-in, the common case):** ```bash export GBRAIN_CONTRIBUTOR_MODE=1 # in ~/.zshrc or ~/.bashrc ``` **Path B — explicit config (`~/.gbrain/config.json`, file-plane only):** ```json { "engine": "postgres", "database_url": "...", "eval": { "capture": true, "scrub_pii": true } } ``` Resolution order (most explicit wins): 1. `eval.capture: true` in config → on 2. `eval.capture: false` in config → off (overrides CONTRIBUTOR_MODE=1) 3. `GBRAIN_CONTRIBUTOR_MODE === '1'` → on 4. otherwise → off `scrub_pii` defaults to `true` independent of capture. Set `eval.scrub_pii: false` to preserve raw query text (only if you control the brain's distribution). `gbrain config set eval.capture false` does **not** work — that command writes the DB-plane config, and the MCP server reads the file-plane. Edit the JSON directly or use the env var.