# GoodMemory Current Status and Evidence

This is the compact current-truth entrypoint. Historical narrative has been removed from this file; use `docs/archive/quality-gates/README.md`, generated reports, and git history for phase-by-phase provenance. Product scope remains in `docs/GoodMemory-PRD.md`, and execution order remains in `task-board/00-README.txt`.

## Stable OSS Surface

- Current stable package line: v0.3.x.
- Public API remains centered on `createGoodMemory`, `remember`, `recall`, `buildContext`, `feedback`, `forget`, `exportMemory`, and `deleteAllMemory`.
- Package subpaths `goodmemory`, `goodmemory/ai-sdk`, `goodmemory/host`, `goodmemory/http`, and `goodmemory/runtime-kit` resolve through compiled `dist/` artifacts and emitted type declarations.
- Storage resolution is automatic: explicit config wins, configured Postgres can be used when bootstrap succeeds, Bun gets local SQLite, and unsupported Node zero-config local SQLite falls back to in-memory with observable runtime inspection.
- The official CLI uses the package bin. The global CLI invocation path is `goodmemory ...` after `npm install -g goodmemory`; project-local installs use `npx goodmemory`, `npm exec -- goodmemory`, or `./node_modules/.bin/goodmemory ...`. Non-version command execution remains Bun-backed today.
- Generic live-memory eval semantics are auto-storage aligned: `eval:live-memory`, `eval:live-auto-memory`, `runLiveMemoryEval()`, `eval:live-provider-memory`, and historical `reports/eval/live-memory/phase-*` paths keep their existing meanings.

## Installed Host And Runtime Surface

- Phase 35 installed host-memory middleware is now part of the accepted stable host surface. Phase 35 is now closed as the installed host-memory middleware and hooks slice. Accepted commands and hooks include `goodmemory setup`, `goodmemory status`, `SessionStart` / `UserPromptSubmit` hooks, and managed Codex/Claude install/enable/disable flows.
- Phase 37 is now closed as the installed host selective writeback slice: `goodmemory codex writeback`, runtime defaults off unless explicitly changed, and new interactive installs recommend `observe`.
- Phase 37.1 is now closed as installed-host writeback productization polish: `goodmemory codex writeback inspect`, audit/undo support, and observe/selective boundaries.
- Phase 38 is now closed as the governed runtime surface slice: `GoodMemoryConfig.observability.traceSink`, targeted `reviseMemory()`, `memory.runtime.*`, `memory.jobs.enqueueRemember()`, `GoodMemoryConfig.providers.embedding`, and `examples/express-chat-server.ts`.
- Phase 39 is now closed as the Python HTTP integration bridge slice. See `docs/GoodMemory-Python-HTTP-Integration-Bridge.md` and `examples/python-fastapi-memory-consumer.py`.
- Phase 41 is now closed as installed-host pre-action unification: `goodmemory codex hook pre-tool-use`, `goodmemory codex action`, and the installed action bridge share the installed memory backend.
- Phase 42 is now closed as the Progressive Recall Protocol slice.
- Phase 43 is now closed as the Runtime Kit slice: `goodmemory/runtime-kit`.
- Phase 43.5 is now closed as the Optional Runtime Worker slice: `goodmemory runtime worker drain-once`.
- Phase 50 is now closed as the Installer CLI Runtime-Shell Hardening slice: `goodmemory doctor [codex|claude|both]` and `goodmemory repair [codex|claude|both]`.
- Phase 51 is now closed as the Typed Behavioral Memory And Enactment slice; typed behavior is stored on compiled `validated_pattern` feedback.
- Phase 52 is now closed as the Structured Text-Response Enactment And Guarded Policy slice; guarded_policy remains internal.

## Public Boundary Notes

- root `goodmemory` no longer re-exports internal evolution contracts.
- automatic adapter/event `user_correction` path is proposal-first.
- automatic adapter/event `user_correction` path is proposal-first and records selective evidence plus proposal/promotion receipts instead of writing an intermediate active feedback memory; public `feedback()` remains the explicit durable procedural feedback entrypoint.
- Provider-backed retrieval is explicit; rules-only remains the default accepted mode, and provider failures surface as `provider_error`.
- Dashboard, cloud sync, and team workspace remain a Phase 48 no-go decision.
- Full ImplicitMemBench and BEAM reports are internal research evidence until explicitly promoted. Two benchmarks are promoted public claims backed by `benchmark-claims/*.json` and `gate:public-benchmark-claim`: MemoryAgentBench (CR 0.959 / TTL 0.767, judge-free) and LongMemEval (judge-free deterministic-subset 0.720 vs no-memory 0.068).

## Active Research Slice

- Phase 62 LongMemEval is accepted as the first Sequential Benchmark Hardening slice.
- Phase 63 BEAM has an accepted rules-only measured full-run checkpoint, not performance closure: answer-pack hardening (`--evidence-pack`, `src/answer/evidencePack.ts`) raised answer accuracy from the no-pack 0.56 baseline (224/400) and the prior evidence-pack 0.6525 checkpoint (261/400) to 0.695 (278/400) at identical recall (0.9621), `executionFailures: 0`, gate accepted; 122/400 wrong answers remain, so answer-gap hardening continues before any public claim.
- Accepted LongMemEval internal close (historical, with-judge — superseded for public-claim purposes by the judge-free deterministic-subset claim below): `run-phase62-longmemeval-full500-current-after-remaining-personal-hybrid-retry-r1-merged-20260517T161058Z` with 454/500 answer accuracy, evidence-session recall 0.9590, missed recall 35, wrong recall 6, wrong answers 46, and `executionFailures: 0`.
- LongMemEval public claim (P67-B, 2026-07-02): judge-free deterministic-subset answer accuracy **0.720** (360/500) with `goodmemory-rules-only` vs no-memory baseline 0.068 (+65.2pt), `executionFailures: 0` over 500x2 cases, v0.3.5 @ `6f9152c` (`run-phase67b-longmemeval-rules-deterministic-current`). A case counts only when scored by a deterministic method (abstention/exact/contains/expected_alternative/numeric_count); the same-model semantic judge is excluded by construction (with-judge diagnostic 0.896 not claimed; evidence-session recall 0.9543; abstention only 28 of the 360 correct; the baseline's correct set is 30/34 bare abstentions). Dataset MIT (`xiaowu0162/longmemeval-cleaned`, the exact split run). Promoted to the README row 2026-07-02 with user sign-off, replacing the superseded 0.908 with-judge number; the declaration `benchmark-claims/longmemeval.json` passes `gate:public-benchmark-claim` (publicly claimable: 2, with MemoryAgentBench).
- Accepted BEAM smoke: `run-phase63-beam-smoke-current` and gate `run-20260518003000`.
- Latest accepted BEAM retained diagnostic: `run-phase63-beam-100k-recall-diagnostic-rules-project-card-total-count-current-20260615T200000Z`, evidence-chat recall 0.9620612564274538, missed 20/355, wrong-recall/noise 167/400, zero-recall 0, and hit/missing/noise ids 1022/72/810 -> 1023/71/807 (3:multi_session_reasoning:1 recovered from recall 0.5 to 1 — fourth multi_session_reasoning recovery via the multi-facet contradiction route, exhausting the confirmed-reachable msr cases; a ground-truth-misaligned [16,116] pair where the narrowed gate avoids the sibling knowledge_update 3:ku:2; returns exactly [16,116], recovering the contact-form turn 16 and shedding noise 60/88/36; cleanest single delta, a recall gain plus a noise reduction, zero ripples). The multi-facet contradiction route now serves contradiction, knowledge_update, and multi_session_reasoning (both "how much" comparisons and "how many" aggregates). Remaining partial-recall families: instruction_following (6 — via the instructionRules companionPattern recipe, NOT multiFacetGroups), multi_session_reasoning (the remaining cases have a genuine upstream candidate-pool gap, needing candidate-generation work).
- Accepted BEAM measured full-run checkpoint (rules-only, evidence pack): `run-phase63-beam-100k-live-closure-gpt55-evidence-pack-answer-hardening-current` — 278/400 answer accuracy (0.695), evidence-chat recall 0.9620612564274538, missed-recall 20/355, wrong-answer 122/400, `executionFailures: 0`; prerequisite zero-failure diagnostic `run-phase63-beam-100k-recall-diagnostic-rules-postmerge-current`, gate `run-phase63-beam-closure-gate-gpt55-evidence-pack-answer-hardening-current` accepted. Models: `gpt-5.5` answer + `gpt-5.5` judge (`ai.gurkiai.com`). The answer context is built by the general `src/answer/evidencePack.ts` (operation inferred from the question, source-ordered + timestamped, current-value/timeline/count/instruction/synthesis framing): +54 cases over the prior no-pack baseline (`run-phase63-beam-100k-live-closure-gpt55-current`, 224/400 = 0.56, same recall) and +17 cases over the prior evidence-pack checkpoint (`run-phase63-beam-100k-live-closure-gpt55-evidence-pack-current`, 261/400 = 0.6525). This is pipeline/gate evidence only: fitted recall 0.9621 vs generalization floor 0.6822, same-model judge bias, no numeric accuracy pass threshold, and still below the 0.70+ next improvement target. The measured full-run path supports `goodmemory-rules-only` and `goodmemory-hybrid`.
- BEAM answer-gap tooling is already present: `scripts/analyze-phase-63-live-answer-gap.ts`, `scripts/run-phase-63-beam-live-ablation.ts`, and the internal `src/answer/evidencePack.ts`. Baseline no-pack analysis (`run-phase63-beam-live-answer-gap-baseline-current`) had 176 wrong: 103 full-recall-clean, 41 full-recall-noisy, 18 missing-evidence. The prior evidence-pack analysis (`run-phase63-beam-live-answer-gap-evidence-pack-current`) had 139 wrong: 76 full-recall-clean, 34 full-recall-noisy, 16 missing-evidence, 8 abstention, 5 unknown. The latest answer-hardening analysis (`run-phase63-beam-live-answer-gap-answer-hardening-current`) has 122 wrong: 58 full-recall-clean, 37 full-recall-noisy, 15 missing-evidence, 7 abstention, 5 unknown. The top repair families after the preference bucket split are conflict_update 29 (dominant full-recall-clean), instruction_following 27 (dominant full-recall-noisy), temporal_order 23 (dominant full-recall-clean), aggregate_count 15, summarization 9, preference_following 8, abstention 7, and the remaining judge_or_expected_answer 3. Answer-pack hardening now has live proof, not only unit proof; local pre-live hardening has also added latest-candidate target date/time/quantity cues, contradiction minimal-pair extraction that suppresses adjacent implementation noise, instruction answer-content cues for versioned dependencies and named tools, temporal question-target timeline anchors that keep source order while reducing adjacent project noise, value-bearing summary anchors, required summary source-coverage audits for late topic shifts, and analyzer-level source-coverage warnings for expected cues found outside declared source chats. The focused summarization value-anchor live slice `run-phase63-beam-live-slice-summary-value-anchors-prelive-current` stayed at 1/9 correct with `executionFailures: 0`; its analyzer rerun `run-phase63-beam-live-answer-gap-summary-value-anchors-current` found 5/8 wrong summarization cases with source-coverage warnings (21 cues), so the next summarization repair is source-coverage / expected-answer compatibility rather than answer-pack-only prompt work. The full source-coverage audit `run-phase63-beam-live-answer-gap-answer-hardening-source-coverage-current` preserves the 122-wrong shape and adds status counts of 95 covered-or-no-warning, 15 expected-cues-outside-source, and 12 no-declared-source-ids; warning buckets are temporal_order 9/16, summarization 5/21, aggregate_count 2/4, conflict_update 1/2. Manual summarization audit classifies `9:summarization:1`, `14:summarization:1`, and `19:summarization:1` as declared-source mismatches, while `20:summarization:1` and `20:summarization:2` have no declared source ids. The corrected source-covered summarization slice `run-phase63-beam-live-slice-summary-source-covered-v2-prelive-current` filtered to `12:summarization:1` and `14:summarization:2`, measured 1/2 with `executionFailures: 0` and evidence recall 1.0, and leaves `14:summarization:2` as a source-covered expected-answer/source-content mismatch rather than a proven generic answer-pack repair. The live-slice runner can now filter `--answer-gap-source-coverage-status covered-or-no-warning`; use that filter for answer-pack-only slice validation so source metadata mismatches do not dilute the measurement. The next repair should target conflict/update resolution, instruction noise budgeting, temporal timeline granularity, count aggregation, summary/source coverage, preference constraints, abstention calibration, and judge/expected-answer compatibility without BEAM expected-answer-specific rules.
- Latest local BEAM pre-live infrastructure check: `/private/tmp/BEAM/100K.json` rebuilt from GitHub raw via `prepare:phase-63-beam` (20 rows), `eval:phase-63` and `gate:phase-63` passed with `executionFailures: 0`, and `run-phase63-beam-100k-recall-diagnostic-rules-prelive-current` compared with `run-phase63-beam-100k-recall-diagnostic-rules-postmerge-current` at `caseDeltaCount: 0` (evidence-chat recall 0.9620612564274538, missed 20/355, wrong-recall/noise 167/400, hit/missing/noise 1023/71/807). This is refreshed root/recall readiness, not answer-performance evidence.
- MemoryAgentBench (Phase 64) is active with a reproducible external root and an accepted zero-failure live answer closure. `prepare:phase-64-mab` (`scripts/prepare-phase-64-memory-agent-bench-data.ts`) fetches an upstream Hugging Face row and writes a normalized `cases.json` with no vendoring; the AR/event_qa normalizer uses structural next-event evidence and the CR/factconsolidation normalizer uses a recurrence-filter gold-evidence definition (rules-only retrieval recall AR 0.260, CR 0.573, `executionFailures: 0`). TTL and LRU are ruled out as retrieval-recall targets (TTL/ICL ~76 demos per gold label; LRU is whole-story detective_qa/summarization) and route to the live-answer path instead, where measured slices (TTL banking77 0/30, LRU detective_qa 1/6) expose a boundary: the general answer prompt + deterministic match-mode evaluate AR/CR cleanly because their gold answers are natural extractive spans, but TTL/LRU gold answers are strict format tokens (a label id, a multiple-choice option), so TTL's 0/30 is a prompt-format mismatch rather than model inability and both need answer-format-specific harnesses. The live answer path (`eval:phase-64-smoke -- --live --evidence-pack`) scores deterministically via upstream match modes (no LLM judge): the accepted zero-failure closure `run-phase64-mab-ar-cr-live-closure-rerun-current` measured CR answer accuracy 0.959 (70/73) and AR 0.67 (100/100), `executionFailures: 0`. CR 0.959 reproduces the prior hand-made CR A/B (0.94) on the reproducible root, so the general `src/answer/evidencePack.ts` current-value resolution generalizes from BEAM to MemoryAgentBench CR — it is not a BEAM-only trick. The AR retrieval gap also drove a general recall-robustness fix: a zero-retrieval lexical fallback (`src/recall/factSelection/draft.ts`) that surfaces the single best-lexical fact when fact selection is otherwise empty, lifting AR smoke recall to 0.260 while preserving abstention, with `caseDeltaCount 0` on the 100K rules-only recall diagnostic (behavior-preserving). The P67-C task-specific answer harness then completed the four-competency picture (CR 0.959, TTL 0.767, AR 0.890, LRU 0.518; `executionFailures: 0` over 259 questions via resumable retry), and a no-memory ablation scoped the public claim. CR (0.959 vs 0.000 no-memory) and TTL (0.767 vs 0.000) are GoodMemory's FIRST public benchmark claim — deterministic, judge-free, and genuine memory contributions (the questions are unanswerable without GoodMemory's retrieved fact/demos; CR also beats the published single-hop ceiling ~0.60). Accurate Retrieval (0.890 vs 0.926 no-memory) and Long-Range Understanding (0.518 vs 0.632) are EXCLUDED as multiple-choice leaks where memory does not help. Promoted to the README `MemoryAgentBench (CR, TTL)` row 2026-06-25; the declaration `benchmark-claims/memoryagentbench.json` passes the public benchmark claim gate (`gate:public-benchmark-claim`).
- LoCoMo (Phase 65) has its live answer path WIRED but performance NOT closed. The live answer generator + evidence-pack path (commit `3f9cf5e`, `--live --evidence-pack`, gpt-5.5, deterministic LoCoMo match-mode scoring, no judge) and a reproducible non-vendored external-root prep (commit `ed08dd9`, `prepare:phase-65-locomo`) are on main. A representative single-conversation pressure run (`locomo-live-pack-full`, full conversation 1, 199 questions, `executionFailures: 0`) measured overall answer accuracy 0.020 (4/199), purely recall-bound. Follow-up eval-only retrieval research ruled out positional dialog windows, rules-light query expansion, and LLM turn-caption enrichment (commit `2b2ec71`), and P65-R003 later found a real neural endpoint ties BM25 exactly; the refined bottleneck is candidate-pool admission because additive semantic/BM25 scoring only re-ranks already-admitted lexical candidates.
  Current local code has opt-in semantic candidate-generation union plumbing (`retrieval.semanticCandidates`, `--semantic-candidates`, runner flag `--provider-embedding`) with external-root provider evidence. The bounded-noise topK16/maxAdditions4 point lifted conv-1 live accuracy from 4/199 (0.020) to 97/199 (0.487), held-out conv-30 to 67/105 (0.638), held-out conv-41 to 95/193 (0.492), held-out conv-44 to 85/158 (0.538), and held-out conv-42 to 106/260 (0.408), all with `executionFailures: 0`. The adversarial abstention scorer repair is deterministic: `No information available` adversarial gold accepts explicit abstention aliases like `I do not know`, while still rejecting the tempting answer; `--resume` re-scores checkpointed live answers with the current scorer.
  Full-root category slices are now assembled by `summarize:phase-65-locomo-categories` into `locomo-category-matrix-top16-add4-live-pack-current/category-summary.json`: 1986 questions, 983/1986 live accuracy (0.4949647533), weighted evidence recall 0.5884050761, 1081/1986 fully retrieved, noise 15235, and `executionFailures: 0`. The category-gap diagnostic `locomo-category-gap-analysis-top16-add4-live-pack-current/category-gap-analysis.json` splits the 1003 wrong answers into 647 missing-evidence wrong rows and 356 full-recall-but-noisy wrong rows, with 0 clean full-recall wrong rows. Follow-up retrieval-only budget probes showed decomposition/iteration is not the open_domain lever (recall 0.2763991013 -> 0.2822990939, noise 696 -> 858), while wider topK32/maxAdditions8 semantic admission improves missing-evidence categories at a noise cost: open_domain recall 0.3536560458 / 26/96 fully retrieved / noise 1050, multi_hop recall 0.3972616166 / 48/282 / noise 3237. The new relative-score floor (`minRelativeScore`) trims that wider-admission noise while preserving part of the gain: topK32/maxAdditions8/rel0.8 measured open_domain 0.3432393791 / 25/96 / noise 936 and multi_hop 0.3767491208 / 42/282 / noise 2939; the budget-delta analyzer shows open_domain has the better retrieval/noise tradeoff at +0.0278501157 recall per 100 added noise turns versus multi_hop +0.0068534205. The first wider-admission live validation, `locomo-open-domain-semantic-provider-top32-add8-rel08-live-pack-current`, did not convert that retrieval gain into answer gain: topK16/maxAdditions4 and topK32/maxAdditions8/rel0.8 both score 22/96 (0.2291666667) open_domain answer accuracy with `executionFailures: 0`, while the gap analysis moves wrong rows from 66 missing-evidence / 8 full-recall-noisy to 59 missing-evidence / 15 full-recall-noisy. The live-delta diagnostic `locomo-open-domain-top16-add4-vs-top32-add8-rel08-live-delta-current/live-delta.json` shows why this is not defaultable: 3 answers improved, 3 regressed, +6 fully retrieved, +240 noise turns, 9 unconverted retrieval gains, and 3 noisy full-recall regressions. Category blockers remain: single_hop 457/841, open_domain 22/96, multi_hop 86/282, temporal 178/321, and adversarial 240/446. A conv-44 minSimilarity sweep did not find a defaultable noise fix, and BEAM rules-only safety run `run-phase63-beam-100k-recall-diagnostic-rules-semantic-union-safety-current` compared against the prelive baseline at `caseDeltaCount: 0`. This is still NOT LoCoMo performance closure, default promotion, full live rerun evidence, or a public claim. No upstream data is vendored (MemoryAgentBench MIT; LoCoMo CC BY-NC 4.0, non-commercial).

- Latest LoCoMo full-root multi_hop blocker proof (Phase 65): `locomo-multihop-semantic-provider-top16-add4-current` covers all 282 multi-hop questions across 10 conversations with evidence-turn recall 0.326, 37/282 fully retrieved, noise 2204, `executionFailures: 0`, and `questionCategories=["multi_hop"]`; the live evidence-pack validation `locomo-multihop-semantic-provider-top16-add4-live-pack-current` scores 86/282 answer accuracy (0.305), `executionFailures: 0`. This upgrades the multi-hop blocker from held-out-shard evidence to measured full-root category evidence; LoCoMo remains not closed.
- Latest LoCoMo full-root temporal proof (Phase 65): `locomo-temporal-semantic-provider-top16-add4-current` covers all 321 temporal questions across 10 conversations with evidence-turn recall 0.678, 207/321 fully retrieved, noise 2571, `executionFailures: 0`, and `questionCategories=["temporal"]`; the resumed live evidence-pack validation `locomo-temporal-semantic-provider-top16-add4-live-pack-current` scores 178/321 answer accuracy (0.555), `executionFailures: 0`. Temporal is less retrieval-starved than open_domain or multi_hop, but remains answer-side weak, especially conv-42 8/40 and conv-44 11/24.
- Latest LoCoMo full-root adversarial/no-answer proof (Phase 65): `locomo-adversarial-semantic-provider-top16-add4-current` covers all 446 adversarial questions across 10 conversations with evidence-turn recall 0.592, 260/446 fully retrieved, noise 3421, `executionFailures: 0`, and `questionCategories=["adversarial"]`; live validation `locomo-adversarial-semantic-provider-top16-add4-live-pack-current` scores 240/446 answer accuracy (0.538), `executionFailures: 0`. This measures the no-answer policy blocker at full-root scope; it does not close adversarial behavior.
- Latest LoCoMo full-root single_hop proof (Phase 65): `locomo-single-hop-semantic-provider-top16-add4-current` covers all 841 single-hop questions across 10 conversations with evidence-turn recall 0.676, 558/841 fully retrieved, noise 6343, `executionFailures: 0`, and `questionCategories=["single_hop"]`; resumed live validation `locomo-single-hop-semantic-provider-top16-add4-live-pack-current` scores 457/841 answer accuracy (0.543), `executionFailures: 0`. This feeds the assembled full-root category evidence matrix, not LoCoMo performance.

## Phase 40 Release Evidence

- Phase 40 is now closed as the v0.2 release proof and product eval slice.
- cross-consumer adoption smoke covers direct TypeScript, Express, Fastify, Python/FastAPI bridge, and installed-host package paths: `reports/eval/adoption/phase-40/run-20260425163012-cross-consumer/report.json`
- product eval rollup compares with-GoodMemory against a no-memory baseline: `reports/eval/product/phase-40/run-20260425165544-product-eval/report.json`
- Quality gate: `reports/quality-gates/phase-40/run-20260425172323/phase-40-quality-gate.json`

## Historical Evidence Index

This index keeps one-line evidence pointers instead of old narrative.

- Phase 20: `docs/archive/quality-gates/GoodMemory-Phase-20-Quality-Gate.md`, `reports/quality-gates/phase-20/run-20260420023503/phase-20-quality-gate.json`.
- Phase 22: `docs/archive/quality-gates/GoodMemory-Phase-22-Quality-Gate.md`, `reports/eval/live-memory/phase-22/run-1776650772564-assist/report.json`.
- Phase 23: `docs/archive/quality-gates/GoodMemory-Phase-23-Quality-Gate.md`, `reports/eval/live-memory/phase-23/run-1776658376536-promote/report.json`.
- Phase 29: `docs/archive/quality-gates/GoodMemory-Phase-29-Quality-Gate.md`, `reports/quality-gates/phase-29/run-20260421213000/phase-29-quality-gate.json`, `reports/quality-gates/phase-29/run-20260421214500/phase-29-rc-dry-run.json`.
- Phase 30: `docs/archive/quality-gates/GoodMemory-Phase-30-Quality-Gate.md`, `reports/quality-gates/phase-30/run-20260421153410/phase-30-quality-gate.json`, `reports/eval/live-memory/phase-30/run-phase30-live-current/report.json`.
- Phase 31: `docs/archive/quality-gates/GoodMemory-Phase-31-Quality-Gate.md`, `reports/quality-gates/phase-31/run-20260422041616/phase-31-quality-gate.json`, `reports/eval/live-memory/phase-31/run-phase31-live-current/report.json`.
- Phase 32: `docs/archive/quality-gates/GoodMemory-Phase-32-Quality-Gate.md`, `reports/quality-gates/phase-32/run-20260422085720/phase-32-quality-gate.json`, `reports/eval/fallback/phase-32/run-20260422173045/report.json`, `reports/eval/live-memory/phase-32/run-phase32-live-current/report.json`.
- Phase 33: `docs/archive/quality-gates/GoodMemory-Phase-33-Quality-Gate.md`, `reports/quality-gates/phase-33/run-20260422212752/phase-33-quality-gate.json`.
- Phase 34: `docs/archive/quality-gates/GoodMemory-Phase-34-Quality-Gate.md`, `reports/eval/fallback/phase-34/run-20260422213045/report.json`, `reports/eval/live-memory/phase-34/run-phase34-live-current/report.json`, `reports/quality-gates/phase-34/run-20260423102636/phase-34-quality-gate.json`.
- Phase 35: `docs/archive/quality-gates/GoodMemory-Phase-35-Quality-Gate.md`, `reports/eval/fallback/phase-35/run-20260423173045/report.json`, `reports/eval/live-memory/phase-35/run-phase35-live-current/report.json`, `reports/quality-gates/phase-35/run-20260423213045/phase-35-quality-gate.json`.
- Phase 36: `docs/archive/quality-gates/GoodMemory-Phase-36-Quality-Gate.md`, `reports/quality-gates/phase-36/run-20260423223045/phase-36-quality-gate.json`.
- Phase 37: `docs/archive/quality-gates/GoodMemory-Phase-37-Quality-Gate.md`, `reports/eval/fallback/phase-37/run-20260424101045/report.json`, `reports/eval/live-memory/phase-37/run-phase37-live-current/report.json`, `reports/eval/live-memory/phase-37/run-phase37-external-consumer/report.json`, `reports/quality-gates/phase-37/run-20260424104045/phase-37-quality-gate.json`.
- Phase 37.1: `docs/archive/quality-gates/GoodMemory-Phase-37.1-Quality-Gate.md`, `reports/eval/dogfood/phase-37-1/run-phase37-1-dogfood-current/report.json`, `reports/quality-gates/phase-37-1/run-20260424100757/phase-37-1-quality-gate.json`.
- Phase 38: `docs/archive/quality-gates/GoodMemory-Phase-38-Quality-Gate.md`, `reports/quality-gates/phase-38/run-20260425084045/phase-38-quality-gate.json`.
- Phase 39: `reports/quality-gates/phase-39/run-20260425041112/phase-39-quality-gate.json`.
- Phase 41: `reports/eval/fallback/phase-41/run-20260425213045/report.json`, `reports/eval/live-memory/phase-41/run-phase41-live-current/report.json`, `reports/quality-gates/phase-41/run-20260425223045/phase-41-quality-gate.json`.
- Phase 42: `reports/quality-gates/phase-42/run-20260426100000/phase-42-quality-gate.json`.
- Phase 43: `reports/eval/fallback/phase-43/run-20260426113000/report.json`, `reports/quality-gates/phase-43/run-20260426120000/phase-43-quality-gate.json`.
- Phase 43.5: `reports/eval/fallback/phase-43-5/run-20260426133000/report.json`, `reports/quality-gates/phase-43-5/run-20260426140000/phase-43-5-quality-gate.json`.
- Phase 45: Phase 45 is now closed as the First Reference Product and Adoption Evidence slice; examples/reference-chat-product, `bun run eval:phase-45`, `bun run gate:phase-45`, `reports/eval/adoption/phase-45/run-20260427104530-adoption-eval/report.json`, `reports/quality-gates/phase-45/run-20260427110000/phase-45-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-45-Quality-Gate.md`.
- Phase 46: Phase 46 is now closed as the Memory Quality and Maintenance 2.0 slice; `bun run eval:phase-46`, qualityRepair, `reports/eval/fallback/phase-46/run-20260427123000-quality-eval/report.json`, `reports/quality-gates/phase-46/run-20260428110000/phase-46-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-46-Quality-Gate.md`.
- Phase 47: Phase 47 is now closed as the Provider-Backed Retrieval Rollout and Quality Promotion slice; `bun run eval:phase-47`, `reports/eval/fallback/phase-47/run-20260428120000-provider-rollout-eval/report.json`, `reports/quality-gates/phase-47/run-20260428123000/phase-47-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-47-Quality-Gate.md`.
- Phase 48: Phase 48 is now closed as the Dashboard, Cloud Sync, and Team Workspace Decision slice; `bun run eval:phase-48`, no-go decision, `reports/eval/fallback/phase-48/run-20260428170000-dashboard-cloud-decision/report.json`, `reports/quality-gates/phase-48/run-20260428173000/phase-48-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-48-Quality-Gate.md`.
- Phase 49: Phase 49 is now closed as the Full ImplicitMemBench GoodMemory Research Eval; baseline-upstream-chat, goodmemory-raw-experience, goodmemory-distilled-feedback, `reports/eval/research/phase-49/baseline/run-phase49-smoke-current/report.json`, `reports/eval/research/phase-49/goodmemory/run-phase49-smoke-current/report.json`, `reports/eval/research/phase-49/comparison/run-phase49-smoke-current/report.json`, `reports/quality-gates/phase-49/run-20260428210000/phase-49-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-49-Quality-Gate.md`.
- Phase 50: `reports/eval/fallback/phase-50/run-20260428223000-installer-eval/report.json`, `reports/quality-gates/phase-50/run-20260428224500/phase-50-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-50-Quality-Gate.md`.
- Phase 52: `reports/eval/fallback/phase-52/run-phase52-fallback-current/report.json`, `reports/eval/live-memory/phase-52/run-phase52-live-current/report.json`, `reports/quality-gates/phase-52/run-20260502183000/phase-52-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-52-Quality-Gate.md`.
- Phase 59: Phase 59 is the Generalized Raw Executor Cleanup slice; failed/preferred operations, `reports/eval/fallback/phase-59/run-phase59-fallback-current/report.json`, `reports/eval/fallback/phase-59/run-phase59-fallback-current/raw-diagnostics.json`, `reports/quality-gates/phase-59/run-20260504193000/phase-59-quality-gate.json`, `docs/archive/quality-gates/GoodMemory-Phase-59-Quality-Gate.md`, `goodmemory-raw-experience` at `58 / 60`, raw `90 / 200`, raw `88 / 200`, raw at least `115 / 200`, distilled `151 / 200`.

## Documentation Policy

Root current-status docs should stay compact. If a future change needs detailed phase provenance, add it to generated reports or an archive summary, not this file.