# Roadmap
Version-by-version plan. Brief summary lives in [README.md](../README.md);
this file is the canonical source.
v1.0 and later follow semver. Before v1.0, minor version bumps could include
breaking changes. Feature timing is best-effort; items can move between
releases as user feedback and verified product evidence arrive.
## Product north star
anamnesis exists to make AI coding agents remember a project without the
user repeating setup instructions every session.
Two promises drive the roadmap:
1. **Always inject the right context and ontology** — project memory,
ontology slices, handoff state, operating rules, hooks, skills, and
command intent should be installed, refreshed, and discoverable by the
active agent.
2. **Let users switch agents without re-briefing** — moving between Claude
Code, Codex, Cursor, or another adapter should preserve enough context
for the next agent to continue from the same project state with no
bespoke "read these files first" prompt from the user.
This means user-facing parity matters more than identical native UI.
Adapters may render to different surfaces because the tools expose
different primitives, but the resulting agent experience should preserve
project recall, ontology access, handoff continuity, and operational
guardrails.
The same boundary applies to ontology automation. Layer A introspectors
should establish a reliable factual baseline from files the CLI can parse:
routes, resources, models, package signals, and other high-confidence facts.
They are not meant to become exhaustive framework-specific knowledge engines.
Layer B should use the active agent to read those facts plus project docs and
code, then generate the semantic context that makes future agent sessions
effective: relationships, flows, intent, invariants, and open questions.
---
## v0.1 — *shipped 2026-04-26*
> First daily-use release. Single tool (Claude Code). Local installs only.
| Area | Done |
|---|---|
| Core primitives | Agentfile schema, manifest hash tracking, region anchors, fragment loader, applier with 5 statuses |
| Capabilities | `project_memory`, `ontology`, `executable_hook`, `skill`, `slash_command` (Claude Code adapter only) |
| Commands | `init`, `update`, `promote` |
| Idempotency | dry-run by default, backups before apply, user-modified detection |
| Fragments | `base`, `prisma`, `k8s`, `nestjs`, `python-uv`, `fastapi` |
| Coverage | 229 tests |
---
## v0.2 — *shipped 2026-04-27*
> Multi-tool, multi-scope. npm publish. Doubled test coverage.
| Area | Done |
|---|---|
| New command | `status` (drift + suggested + declined report) |
| New adapter | Codex (`project_memory` + `ontology` only) |
| New layout | Monorepo `scopes` with `extends` + `overrides.{tools, fragments_add, fragments_remove}` |
| New fragments | `nextjs`, `docker-compose` (rulebook 100% mapped) |
| Settings | Auto-register hooks in `.claude/settings.json` (idempotent JSON merge, indent preserved) |
| `promote` | Now supports `project_memory` (region extraction from AGENTS.md) |
| Distribution | Published as `@mcprotein/anamnesis` on npmjs.org |
| Coverage | 299 tests |
---
## v0.3 — *shipped 2026-04-28*
> **Theme: complete the multi-tool promise + monorepo UX polish**
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Cursor adapter** | shipped | `.cursor/rules/*.mdc` output with `agentRequested: true`. Covers all 5 capabilities. `scoped_rule` (Cursor-native glob scoping) deferred. |
| 2 | **Codex adapter completion** | shipped (AGENTS.md path) | `executable_hook` / `skill` / `slash_command` emit AGENTS.md region fallbacks (script body / skill body / command body). Git pre-commit auto-wiring deferred to v0.4 polish. |
| 3 | **Init multi-scope detect** | partial | `init --monorepo` detects `package.json` `workspaces`, expands `
/*`, runs rulebook per sub-project, generates multi-scope Agentfile. pnpm-workspace.yaml / lerna / nx / interactive prompt deferred. |
| 4 | **`status` per-scope** | shipped | Multi-scope projects group fragments and drift entries under each scope. Single-scope output unchanged. |
| 5 | **`/handoff-prepare` slash command** | shipped | Departing agent writes structured markdown to `.anamnesis/handoff/.md` capturing goal/done/in-flight/decisions/open questions/next steps. |
| 6 | **SessionStart handoff injection** | shipped | CC uses native SessionStart hook (`inject-handoff.sh`, settings.json auto-registered). Codex/Cursor parity via AGENTS.md "session start: handoff 자동 확인" instruction (base v4). |
| 7 | **Cross-adapter handoff parity** | shipped | Same handoff file format consumed by all three adapters via tool-agnostic AGENTS.md instruction. |
**Moved to v0.4** (low value while user base is small):
- ~~Full version pinning~~ — fragment version cache + `.versions/` storage. Without external user pressure, current "library-current always" is fine.
- ~~`update --bump-pinned`~~ — companion to full pinning. Moves with it.
---
## v0.4 — *shipped 2026-04-29; patches through 0.4.4 on 2026-04-30*
> **Theme: agent continuity at scale + operational polish + project introspection**
Design: [`docs/ONTOLOGY-BOOTSTRAP.md`](ONTOLOGY-BOOTSTRAP.md)
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Hybrid ontology bootstrap** | shipped in 0.4.0; expanded in 0.4.1 | **Layer A** (deterministic CLI introspectors): `anamnesis ontology bootstrap` writes `.anamnesis/ontology/.bootstrap.yaml`. ✓ k8s (namespaces/services/ingresses/workloads). ✓ prisma (datasources/generators/models/enums). 0.4.1 adds ✓ nextjs, ✓ nestjs, ✓ fastapi, plus multi-scope scope-local output and `--scope`. **Layer B** (agent-driven `/ontology-enrich` skill, base v5): shipped via the existing skill pipeline for Claude Code, Codex, and Cursor. **`init` auto-bootstrap**: shipped; `init` runs bootstrap after fragment install (`--no-bootstrap` opt-out). |
| 2 | **Handoff auto-trigger** | shipped in 0.4.2 | Claude Code `Stop` hook reminds agents to run `/handoff-prepare` when uncommitted work is newer than the latest handoff. |
| 3 | **Multi-task handoff tracking** | shipped in 0.4.2 | `/handoff-prepare` writes `.anamnesis/handoff/active.md` plus timestamped archives. Session start injection reads the active index first, then the latest archive. |
| 4 | **`anamnesis doctor`** | shipped in 0.4.2 | Read-only installation integrity check: manifest errors, tracked file/region drift, missing library fragments, update warnings, adapter coverage gaps, and `.claude/settings.json` hook registration drift. |
| 5 | **Full version pinning** | shipped in 0.4.2 | Fragment version cache so `pinned: true` renders the pinned version, not library-current. Library stores past versions under `base/.versions//` or `fragments//.versions//`. |
| 6 | **`anamnesis update --bump-pinned`** | shipped in 0.4.2 | Explicitly bump pinned fragments after manual review while keeping them pinned. Companion to #5. |
| 7 | **Trusted Publishing setup** | shipped; OIDC verified in 1.4.4 | GitHub Actions workflow + documented npm Trusted Publisher config shipped. Early 0.4.x tags exposed an npm OIDC mismatch, so manual npmjs.org publish stayed documented as a fallback. The later `v1.4.4` tag workflow completed and published through Trusted Publishing, so OIDC is now the primary release path. |
| 8 | **Fragment catalog expansion** | shipped in 0.4.2 | Ruby on Rails, Django, Go services, Rust, plus more JS frameworks (SvelteKit, Remix, Nuxt). |
| 9 | **Codex hook auto-wiring** | shipped in 0.4.2 | Git pre-commit bridge for `executable_hook` in the Codex adapter. Codex still gets AGENTS.md fallback instructions; Git repos also get `.anamnesis/codex-hooks/` plus `.git/hooks/pre-commit` when exec adapters are allowed. |
| 10 | **Aider/Windsurf adapters (optional)** | optional | If community demand justifies. Same content+capabilities IR, different render targets. |
| 11 | **`anamnesis status --json`** | shipped in 0.4.2 | Structured output for CI integration. |
**Shipped in 0.4.1 patch:**
- nextjs introspector (App Router + Pages Router routes)
- nestjs introspector (`@Controller` / route method decorators)
- fastapi introspector (`@app.*` + `@router.*`)
- multi-scope bootstrap (per-scope ontology output + `--scope`)
**Shipped in 0.4.2 patch:**
- base v6 handoff continuity (`active.md` + Stop reminder)
- `anamnesis doctor`
- `anamnesis status --json`
- full version pinning + `update --bump-pinned`
- Trusted Publishing workflow + release docs
- Fragment catalog expansion (Rails, Django, Go, Rust, SvelteKit, Remix, Nuxt)
- Codex hook auto-wiring
**Shipped in 0.4.3 patch:**
- npm publish recovery to npmjs.org using local package-owner credentials
- normalized CLI `bin` metadata so npm 11 does not auto-correct the package at publish time
- publish workflow skip guard for versions that already exist on npmjs.org
**Shipped in 0.4.4 patch:**
- tag-triggered Trusted Publishing verification release
- GitHub Actions reached `npm publish`, but npm OIDC exchange/publish still failed with E404 at that time
**Later release automation update:**
- 2026-05-19: `v1.4.4` completed the tag-triggered GitHub Actions
`Publish` workflow and npmjs.org returned `1.4.4`. Trusted Publishing is
the primary path again; manual npmjs.org publish remains only an incident
recovery fallback.
---
## v0.5 — *shipped 2026-04-30*
> **Theme: prove automatic context continuity across real agent switches**
v0.5 is not primarily an introspector expansion release. The next risk is
whether the tool actually fulfills its main promise in day-to-day use:
install once, keep context/ontology current, and switch agents without
manual re-briefing.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Dogfood lifecycle matrix** | shipped | Ran current anamnesis against sanitized managed fixtures and recorded `init/update/status/doctor/ontology bootstrap/handoff` behavior per repo and adapter. Candidate repos stayed dogfood-driven, not framework-completion driven. |
| 2 | **Agent-switch acceptance fixtures** | shipped | Added tests/fixtures for the same Agentfile rendered to Claude Code, Codex, and Cursor, then asserted that project memory, ontology instructions, handoff startup instructions, and operational guardrails are present in each output. |
| 3 | **Session-start continuity contract** | shipped | Made the "new agent starts here" contract explicit and testable: read managed context, read ontology, read latest/active handoff, detect stale handoff, then continue without the user giving extra instructions. |
| 4 | **Actionable `status`/`doctor` output** | shipped | Improved diagnostics so a user can tell whether context, ontology, handoff, fragments, pinned versions, and adapter render targets are installed and current. |
| 5 | **README/guide alignment** | shipped | Updated user-facing docs around the two product promises: context/ontology injection and agent switching continuity. Avoided presenting framework introspection as the main product. |
| 6 | **Release fallback normalization** | shipped | Kept npmjs.org manual publish fallback documented while OIDC remains unresolved, so release operations do not block lifecycle work. |
| 7 | **Introspector API review, not expansion** | shipped (review-only) | Reviewed the current k8s/prisma/nextjs/nestjs/fastapi introspector interface for accidental coupling. The current contract remains a small registry keyed by fragment id with deterministic `appliesTo` / `introspect` methods; deeper output schema stabilization stays in v0.6. |
Progress:
- 2026-04-30: Added the initial cross-agent continuity acceptance fixture
for the base fragment.
- 2026-04-30: Enabled Claude Code, Codex, and Cursor outputs on this repo
itself and recorded the first dogfood self-check in
[`docs/DOGFOOD.md`](DOGFOOD.md).
- 2026-04-30: Added `anamnesis dogfood check --append` so future version
bumps can record continuity score/trend automatically.
- 2026-04-30: Added first-class `status` continuity readiness and `doctor`
continuity warnings for project memory, ontology, handoff startup, adapter
surfaces, and managed drift.
- 2026-04-30: Added dogfood active-handoff simulation: temporary all-adapter
project, `active.md` plus archive, Claude Code injection hook output, and
Codex/Cursor fallback instructions.
- 2026-04-30: Added stale active-handoff diagnostics to `status` / `doctor`
for missing archive references, active entries that do not point at the
newest archive, and completed/superseded entries left in open sections.
- 2026-04-30: Ran the first sanitized dogfood matrix across frontend,
backend, and backend/infra fixture shapes. Fresh frontend and backend/infra
installs reached continuity `6/6`; an existing managed fixture exposed a
repair/review gap around user-modified native surfaces.
- 2026-04-30: Added `doctor` repair guidance for user-modified managed files,
adapter-surface continuity failures, invalid settings, missing hook
registrations, and stale active handoff state.
- 2026-04-30: Reviewed the current introspector API and kept the v0.5
decision at "no expansion"; v0.6 owns deeper ontology schema and refresh
lifecycle work.
Exit criteria met:
- A fresh agent can enter a managed project through each supported adapter
and find the same current context, ontology, handoff state, and guardrails
without a bespoke user prompt.
- `status`/`doctor` can identify missing or stale context-continuity pieces.
- The next implementation task is chosen from dogfood evidence, not from
a framework catalog wishlist.
---
## v0.6 — *shipped 2026-05-03*
> **Theme: make ontology generation repeatable, bounded, and agent-assisted**
v0.6 is not a framework-introspection expansion release. The product risk is
whether anamnesis can keep project ontology current without making the user
hand-write context every time. The CLI should produce the factual base it can
prove, then guide the active agent to enrich that base into durable project
memory that every supported adapter can load.
| # | Item | Description |
|---|---|---|
| 1 | **Generation boundary guidance** | Make CLI output and docs clearly show what anamnesis generated deterministically (`AGENTS.md`, static ontology slices, `.bootstrap.yaml`) and what still needs an agent (`/ontology-enrich`, `/handoff-prepare`, semantic notes). This should appear before deeper ontology work so users do not mistake Layer A facts for complete project understanding. |
| 2 | **Ontology gap reports** | Use dogfood runs to identify which missing context pieces actually make agents less effective. Prioritize missing static slices, missing/stale bootstrap facts, missing enrichment, and adapter-visible guidance before adding broad framework coverage. |
| 3 | **Layer B enrichment lifecycle** | Define how `/ontology-enrich` re-runs should merge, replace, or diff semantic notes so agent-curated ontology can evolve safely. |
| 4 | **Ontology drift in `status`** | Report when project files imply bootstrap facts have changed and `.bootstrap.yaml` should be regenerated. |
| 5 | **Output schema stabilization** | Stabilize enough bootstrap/enriched YAML conventions for agents and docs to rely on them. |
| 6 | **Layer A baseline discipline** | Keep introspectors focused on shallow, deterministic, high-confidence facts. Improve or add one only when dogfood evidence shows the factual base itself is blocking agent continuity; semantic intent and operational meaning stay in Layer B. |
| 7 | **Agent-assisted enrichment UX** | Make the path from `status` / `doctor` / `ontology bootstrap` to `/ontology-enrich` obvious enough that users can get useful enriched ontology without manually authoring YAML. |
| 8 | **Dogfood proof of generated ontology value** | Run the full bootstrap + enrichment lifecycle against at least one sanitized managed fixture and record whether the next agent receives better context than static fragments alone. |
Progress:
- 2026-05-02: Added generation-boundary CLI guidance for `init`,
`ontology bootstrap`, `status`, and `doctor`, plus README documentation
explaining CLI-generated vs agent-required outputs.
- 2026-05-02: Added managed `CLAUDE.md` entrypoint generation for
Claude Code so its native memory surface points at canonical `AGENTS.md`,
ontology, and handoff state without replacing user prose.
- 2026-05-03: Added ontology gap reporting to `status` / `doctor` so
installed fragments show whether static ontology, deterministic bootstrap
facts, semantic enrichment, or Layer A introspector support is missing.
- 2026-05-03: Added base v7 Layer B enrichment lifecycle rules so
`/ontology-enrich` re-runs merge by stable IDs, append new facts, use
`supersedes` for replaced designs, and record weak evidence as
`open_questions`.
- 2026-05-03: Added bootstrap ontology drift detection so `status` compares
existing `.bootstrap.yaml` files with current deterministic introspector
output and `doctor` reports stale Layer A facts as repairable warnings.
- 2026-05-03: Stabilized ontology output conventions: `.bootstrap.yaml` now
renders `schema_version: anamnesis.bootstrap.v1`, deterministic
`generator`, and wrapped `facts`; `.enriched.yaml` guidance now requires
`schema_version: anamnesis.enriched.v1`.
- 2026-05-03: Re-centered the remaining v0.6 plan on bounded Layer A
baselines plus agent-assisted Layer B enrichment. Introspector work remains
allowed only when a real dogfood gap shows that deterministic facts, not
semantic enrichment, are the blocker.
- 2026-05-03: Added agent-assisted enrichment UX to diagnostics: missing or
stale bootstrap guidance now points to the follow-up `/ontology-enrich`
step, and `ontology bootstrap` prints the `.enriched.yaml` targets an agent
should create or refresh after Layer A facts are current.
- 2026-05-03: Ran the first v0.6 sanitized ontology before/after dogfood on a
NestJS/Prisma fixture. Static-only ontology had 2 ontology warnings and no
bootstrap/enriched files; after bootstrap plus agent enrichment, ontology
warnings dropped to 0 with deterministic model/controller/route facts and
semantic Layer B entries captured.
- 2026-05-03: Resolved the first dogfood-proven deterministic Layer A gap by
adding NestJS `@Sse()` route extraction. A follow-up sanitized fixture
bootstrap recorded the SSE route fact and increased the deterministic route
count.
Exit criteria met:
- Users can tell from command output whether the current ontology/context
state is CLI-generated, agent-enriched, or still missing.
- Agents get materially better project understanding from generated and
enriched ontology in at least one sanitized managed fixture.
- Layer A output stays deterministic and shallow enough to be trusted as
facts; Layer B carries relationships, flows, intent, invariants, and weak
inferences.
- Ontology refresh and enrichment are safe enough to run repeatedly during
normal project lifecycle work.
---
## v0.7 — *shipped 2026-05-03*
> **Theme: harden multi-agent UX and lifecycle scale**
| # | Item | Description |
|---|---|---|
| 1 | **Adapter parity matrix** | Publish and test a matrix for each capability (`project_memory`, `ontology`, `executable_hook`, `skill`, `slash_command`) across Claude Code, Codex, Cursor, and any new supported adapter. |
| 2 | **Switching-agent scenarios** | Exercise the full ordered 3x3 handoff matrix across Claude Code, Codex, and Cursor, including same-agent restarts, with active handoff files and stale-handoff detection. |
| 3 | **Native-surface improvements** | Where a tool offers a better native surface, use it; where it does not, keep fallback instructions explicit and testable. |
| 4 | **Lifecycle hardening** | Reduce surprises around pinned fragments, user-modified regions, backups, declined suggestions, and multi-scope updates as projects evolve. |
| 5 | **Public UX docs** | Document the expected user journey for "install once, switch agents, continue work" with limitations per adapter. |
| 6 | **Ontology refresh workflow hardening** | Turn the v0.6 bootstrap/enrichment path into a reliable lifecycle workflow: detect stale facts, prompt or route agent enrichment, preserve reviewed semantics, and keep all adapter entrypoints pointing at the same context. |
| 7 | **Benchmark/report command** | Add a repeatable benchmark surface that measures static-only vs bootstrap vs enriched context on sanitized snapshots. Candidate metrics: context recall score, question reduction, time-to-first-correct-action, handoff continuity, ontology coverage, and diagnostic quality. Output should be suitable for `docs/BENCHMARKS.md` and a compact README evidence section. |
Progress:
- 2026-05-03: Started the v0.7 adapter parity work with a canonical
test-backed matrix in `cli/src/adapters/parity.ts` and
`docs/ADAPTER-PARITY.md`. The matrix documents native vs fallback surfaces
for all current capabilities across Claude Code, Codex, and Cursor.
- 2026-05-03: Expanded switching-agent scenarios to the full ordered 3x3
matrix: Claude Code, Codex, and Cursor as both source and target agents,
including same-agent restarts. `cli/src/adapters/switching.test.ts` now
verifies prepare surfaces, resume surfaces, current active handoff state,
and stale active handoff diagnostics for every pair.
- 2026-05-03: Added first-install adapter selection with
`anamnesis init --tools `, so projects can create Claude Code,
Codex, and Cursor surfaces during initial setup instead of manually editing
`Agentfile.tools` before the first `update`.
- 2026-05-03: Added the first `anamnesis benchmark report` surface for
deterministic context-quality reporting across static ontology, Layer A
bootstrap facts, Layer B enrichment, continuity readiness, and adapter
surfaces. Reports append to `docs/BENCHMARKS.md`.
- 2026-05-03: Hardened backup lifecycle behavior by enforcing
`settings.backup_retention` during `update --apply`; old
`.anamnesis/backups/*` directories are pruned only after a new backup is
created, and `0` keeps backups unlimited.
- 2026-05-03: Hardened declined-suggestion lifecycle reporting. `status`
now labels declined entries as active or stale, and `doctor` warns when an
Agentfile declined entry no longer corresponds to a current rulebook match.
- 2026-05-03: Added `docs/AGENT-SWITCHING-GUIDE.md` as the public UX guide
for the "install once, switch agents, continue work" flow. The guide links
install-time adapter selection, ontology refresh, `/handoff-prepare`, target
agent resume behavior, verification commands, and known native-vs-fallback
limitations.
- 2026-05-03: Recorded the first v0.7 sanitized benchmark comparison in
`docs/BENCHMARKS.md`. The existing Claude Code-only managed baseline scored
ready layers `1/5`; the same sanitized fixture after all-adapter install,
Layer A bootstrap, and Layer B enrichment scored `5/5` with continuity
`6/6` and zero ontology warnings.
- 2026-05-03: Polished cross-repo benchmark collection UX so
`benchmark report --append --output ` prints the absolute
output path when the report is written outside the benchmarked project.
Exit criteria:
- Switching agents preserves project memory, ontology access, handoff
continuity, and operational reminders in normal workflows.
- Known adapter gaps are documented as tool-surface limitations, not hidden behavior.
- At least one benchmark report compares before/after context quality on a
sanitized fixture without requiring proprietary or credential-bearing source
snippets in public docs.
---
## v0.8 — *shipped 2026-05-04*
> **Theme: stabilize schema, API, and migration contracts**
v0.8 should reduce the risk of freezing the wrong surface in v1.0. The
priority is not new adapter breadth; it is making the existing lifecycle safe
to depend on.
| # | Item | Description |
|---|---|---|
| 1 | **Agentfile schema audit** | Review `Agentfile` v1 fields, defaults, scope inheritance, `settings`, `declined`, and pinned fragment semantics. Decide what can be frozen as-is and what needs a pre-1.0 adjustment. |
| 2 | **Schema fixture suite** | Add explicit compatibility fixtures for real single-scope, multi-scope, pinned, declined, and all-adapter Agentfiles so future changes can prove backward compatibility. |
| 3 | **Migration command design** | Designed in `docs/AGENTFILE-MIGRATIONS.md`; CLI skeleton shipped with dry-run/apply/backup/idempotency behavior and no built-in schema transforms yet. |
| 4 | **Stable TypeScript API boundary** | Public import boundary added at `@mcprotein/anamnesis` for Agentfile utilities only; unsupported deep imports are blocked by package `exports`. |
| 5 | **Existing-project repair workflow** | `docs/REPAIR.md` now covers user-modified managed files, missing hook registrations, partial adapter installs, stale Agentfile versions, stale handoff state, and ontology gaps. |
| 6 | **Published package smoke gate** | Recurring post-publish gate documented in `docs/RELEASING.md`: force npmjs.org, verify package version/CLI, run a fresh fixture through `npm exec @mcprotein/anamnesis@`, and record sanitized smoke when release claims depend on it. |
Exit criteria met:
- We can say which parts of `Agentfile` are v1-stable candidates.
- Backward-compatibility fixtures exist for the project shapes we already dogfood.
- Release validation includes source checks and published-package smoke checks.
- Any remaining schema/API uncertainty is explicitly assigned to v0.9 or v1.0.
Progress:
- 2026-05-04: Started the Agentfile schema audit in
`docs/AGENTFILE-SCHEMA-AUDIT.md` and added compatibility fixtures for
historical Claude Code-only, current all-adapter single-scope, and
multi-scope pinned Agentfiles in `cli/src/core/agentfile.compat.test.ts`.
- 2026-05-04: Updated `specs/agentfile.md` to distinguish parser-level hard
errors from library/project-aware diagnostics owned by `status`, `doctor`,
`init`, and `update`.
- 2026-05-04: Implemented `fragment.adapters` as a render gate for existing
projects. `update` skips disabled adapters for root fragments and scope
`fragments_add`; `doctor` uses the same gate for renderer and hook-setting
diagnostics. Existing managed-file cleanup remains assigned to the v0.8
repair/migration workflow.
- 2026-05-04: Added `docs/AGENTFILE-MIGRATIONS.md` with the dry-run-first
command contract, backup/idempotency rules, preservation rules, and test
requirements for future `anamnesis migrate agentfile` implementation.
- 2026-05-04: Added the `anamnesis migrate agentfile` skeleton with dry-run
default, `--apply`, `--json`, backup-on-write, and idempotency tests via an
injected fixture migration. Built-in schema transforms remain pending until
the remaining v0.8 field decisions are made.
- 2026-05-04: Clarified remaining Agentfile field semantics:
`overrides.*.locked` are ownership hints, not hard update locks;
`settings.commit_on_apply` is future-reserved / a deprecated candidate; and
`declined_at` remains a string for historical compatibility.
- 2026-05-04: Added `docs/API.md`, `cli/src/api.ts`, and package `exports` so
the only supported TypeScript import surface is `@mcprotein/anamnesis`
Agentfile utilities. Command internals remain CLI-only.
- 2026-05-04: Added `docs/REPAIR.md` as the existing-project repair playbook
for user-modified managed surfaces, hook registration drift, partial adapter
installs, pinned updates, stale handoff state, and ontology gaps.
- 2026-05-04: Added the recurring post-publish smoke gate to
`docs/RELEASING.md`, covering forced npmjs.org checks and fresh-fixture
`npm exec @mcprotein/anamnesis@` validation.
---
## v0.9 — *shipped 2026-05-04*
> **Theme: public ecosystem readiness**
v0.9 should prepare the project for users and fragment authors beyond the
current local-library workflow.
| # | Item | Description |
|---|---|---|
| 1 | **Fragment registry design** | Specify registry metadata, discovery, version selection, and trust boundaries before building a hosted registry. |
| 2 | **Fragment signing & checksums design** | Define how fragment archives are signed, verified, cached, and rejected. Include migration behavior for unsigned local fragments. |
| 3 | **Fragment authoring docs** | Turn current internal fragment conventions into public author guidance with examples, review checklist, and compatibility rules. |
| 4 | **Official docs site plan** | Decide whether docs remain GitHub-first or move to a docs site. Include installation, adapter parity, ontology lifecycle, handoff, monorepo, release, and fragment authoring pages. |
| 5 | **Public benchmark gallery** | Collect sanitized before/after reports across multiple public repo shapes and surface headline evidence in README/docs. |
| 6 | **Remote sync strategy** | Decide whether `anamnesis sync` belongs before v1.0 or should wait until a registry exists. |
Exit criteria met:
- Registry and signing are specified deeply enough to implement without
changing the frozen Agentfile surface.
- Public docs cover both users and fragment authors.
- Benchmark evidence includes more than one repo shape.
Progress:
- 2026-05-04: Added `docs/FRAGMENT-REGISTRY.md` as the v0.9 registry design
draft. The design keeps current local-library and Agentfile flows intact,
treats registry discovery as passive, requires archive checksums before use,
defers signature policy to the next v0.9 item, and calls out Agentfile
source metadata as an explicit pre-v1.0 decision.
- 2026-05-04: Added `docs/FRAGMENT-SIGNING.md` as the v0.9 signing/checksum
design draft. Remote archives require checksum verification and signed
release manifests for default install/update, unsigned local and bundled
fragments stay valid, unsigned remote executable adapters are rejected, and
optional Agentfile source metadata remains migration-owned before v1.0.
- 2026-05-04: Added `docs/FRAGMENT-AUTHORING.md` as the public fragment
authoring guide. It documents capability schemas, rulebook ownership,
executable-hook safety, Layer A vs Layer B boundaries, versioning,
verification, review checklist, and compatibility rules for future public
fragments.
- 2026-05-04: Added `docs/DOCS-SITE-PLAN.md` as the v0.9 docs-site
decision. Documentation stays GitHub-first through v1.0; the plan defines
user/audience entry points, future site navigation, site trigger criteria,
and maintenance rules so a generated site can mirror repo markdown later
without creating a second source of truth.
- 2026-05-04: Added `docs/BENCHMARK-GALLERY.md` as the public-safe benchmark
evidence surface. It separates allowed README claims from unsupported
claims, summarizes the current sanitized backend and self-dogfood evidence,
and records the additional frontend, infra/backend, and Python API shapes
needed before broad public benchmark claims.
- 2026-05-04: Added `docs/REMOTE-SYNC-STRATEGY.md` as the v0.9 remote sync
decision. A top-level `anamnesis sync` command is deferred until after
v1.0-safe registry primitives exist; registry refresh, fragment discovery,
and project update/apply remain explicit operations, and remote upload of
handoff or ontology state is out of scope.
---
## v1.0 — *shipped 2026-05-04*
> **Theme: lock the surface, open to community**
| # | Item | Description |
|---|---|---|
| 1 | **Frozen Agentfile schema** | No more breaking changes after this. Strict semver from v1.0 forward. |
| 2 | **Migration tooling available** | `anamnesis migrate` supports any pre-1.0 schema adjustments that must survive the freeze. |
| 3 | **Stable public TypeScript API** | Documented import targets are semver-stable; internal modules remain private. |
| 4 | **Registry/signing MVP decision** | Either ship a minimal registry/signing path or explicitly keep registry support post-1.0 without weakening local-library safety. |
| 5 | **Public documentation complete** | Install, lifecycle, adapter parity, ontology generation, handoff, monorepo, release, fragment authoring, and troubleshooting docs are coherent. |
| 6 | **Evidence-backed README claims** | Public claims about continuity and ontology quality point to dogfood, switching fixtures, and benchmark reports. |
Exit criteria:
- `npm install -g @mcprotein/anamnesis` plus the documented quickstart works
from the published package.
- Existing v0.7/v0.8/v0.9 managed projects can upgrade without losing user edits.
- The schema/API surfaces marked stable have explicit tests and docs.
- Known limitations are documented as limitations, not hidden behavior.
Progress:
- 2026-05-04: Froze the Agentfile v1 schema in
`docs/AGENTFILE-V1-FREEZE.md` and tightened the parser so unknown fields are
rejected instead of silently stripped. `settings.commit_on_apply` remains a
reserved no-op, `overrides.*.locked` remains an ownership hint,
`fragments[].source` and generic `sync` stay out of v1, and no built-in
Agentfile migration is required for the freeze.
- 2026-05-04: Closed the v1.0 migration-tooling availability surface. The
existing `anamnesis migrate agentfile` pipeline remains dry-run first,
backs up before writes, has no built-in transforms because the v1 freeze
requires none, preserves current no-op formatting/comment content, and now
reports the next recommended command in both human and JSON output.
- 2026-05-04: Closed the v1.0 public TypeScript API boundary by documenting
the semver-governed stability contract in `docs/API.md`, keeping command
result shapes internal, and adding an exports-map test so only
`@mcprotein/anamnesis` plus `@mcprotein/anamnesis/package.json` are public
package imports.
- 2026-05-04: Closed the registry/signing MVP decision in
`docs/REGISTRY-V1-DECISION.md`: remote registry installation, cache,
checksum, signature verification, trust store, Agentfile source metadata,
and unsigned remote escape hatches are post-v1.0; v1.0 keeps built-in and
local-library fragments as the only installable sources.
- 2026-05-04: Closed the public documentation completeness item with
`docs/DOCS-V1-AUDIT.md`, mapping install, lifecycle, adapter parity,
ontology generation, handoff, monorepo, release, fragment authoring,
troubleshooting, schema/API/migration, registry/sync scope, and evidence
docs to canonical repo entry points plus known v1.0 limitations.
- 2026-05-04: Closed evidence-backed README claims with
`docs/README-CLAIMS.md`, mapping current README claims to dogfood records,
switching fixtures, tests, benchmark reports, and explicit disallowed
wording for unsupported ecosystem, native-UX, automatic-ontology, registry,
signing, and no-review claims.
- 2026-05-04: Verified the pre-v1 upgrade exit criterion and recorded it in
`docs/DOGFOOD.md`: fresh fixtures initialized with published `0.7.0`,
`0.8.0`, and `0.9.0` all updated with the current candidate while preserving
user-authored `AGENTS.md` sentinel prose, retaining continuity `ready (6/6)`,
and reporting doctor `0 error(s)`.
- 2026-05-04: Published `@mcprotein/anamnesis@1.0.0` from the tag-triggered
workflow and completed npmjs.org post-publish smoke: version lookup returned
`1.0.0`, published CLI execution returned `1.0.0`, and a fresh Prisma
fixture reached continuity `ready (6/6)` with doctor `0 error(s)`.
---
## v1.1 — *shipped 2026-05-07*
> **Theme: remove avoidable fallback-only gaps after the v1 surface freeze**
External review input, 2026-05-04:
- [`openai/codex`](https://github.com/openai/codex) now exposes a broader
native lifecycle surface than the SessionStart-only path anamnesis first
targeted. The official Codex docs describe config-layer hook discovery,
project/user `.codex/hooks.json`, inline `[hooks]`, plugin lifecycle
config, and current hook events including `SessionStart`,
`UserPromptSubmit`, `PreToolUse`, `PermissionRequest`, `PostToolUse`, and
`Stop`.
- [`Yeachan-Heo/oh-my-codex`](https://github.com/Yeachan-Heo/oh-my-codex)
is useful prior art for separating native Codex hooks, runtime/plugin hook
dispatch, derived fallback signals, persistent state, logs, and team-safety
behavior. anamnesis should learn from those boundaries without becoming
dependent on OMX or turning into a runtime orchestrator.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Codex native SessionStart continuity** | shipped | Add a Codex native SessionStart wrapper for the base ontology + handoff continuity path. `--allow-exec-adapters` installs `.anamnesis/codex-native-hooks/session-start.mjs`, enables `.codex/config.toml` `[features].hooks = true`, and merges `.codex/hooks.json` while preserving user hook entries. AGENTS.md fallback instructions remain for environments without native hook installation. |
| 2 | **Codex native hook surface refresh** | shipped | Refreshed the Codex adapter against the current official hook vocabulary instead of treating Codex as SessionStart-only. `SessionStart`, `UserPromptSubmit`, `PreToolUse`, `PermissionRequest`, `PostToolUse`, and `Stop` are modeled as event-aware render targets with explicit fallback notes for unsupported or version-gated behavior. |
| 3 | **Prompt-time and stop-time continuity** | stop-time implemented; prompt-time deferred | Codex `Stop` now handles the same dirty-work / handoff reminder role Claude Code already gets. `UserPromptSubmit` transport is supported and smoke-proven, but compact prompt-time context delta injection is deferred until a real dogfood gap justifies budget policy, dedupe rules, and noise controls. |
| 4 | **Native executable-hook bridge for Codex** | shipped | Where Codex `PreToolUse`, `PermissionRequest`, and `PostToolUse` support useful matchers (`Bash`, `apply_patch`/`Edit`/`Write`, MCP tool names), safe fragment hooks render natively before falling back to AGENTS.md instructions or the Git pre-commit bridge. `Stop` and `UserPromptSubmit` also use matcherless native wrappers when installed. Supply-chain gating stays under `--allow-exec-adapters`. |
| 5 | **Shared Codex hook ownership diagnostics** | shipped | Teach `status` / `doctor` to explain active Codex hook sources and ownership: user config, project config, anamnesis-managed entries, OMX-managed entries, plugin-provided lifecycle config, duplicate handlers, relative-path fragility, and project-trust gating. Preserve unrelated hook entries during every update. |
| 6 | **Real native-hook smoke tests** | shipped | Add reproducible smoke tests that prove native Codex hook behavior, not just rendered files. Dogfood now separates synthetic Codex JSON dispatch from opt-in real Codex CLI execution, proves both isolated `CODEX_HOME/hooks.json` and trusted project-local `.codex/hooks.json` SessionStart discovery, proves real `UserPromptSubmit` additional-context output before model transport completes, and proves authenticated Bash tool-turn `PreToolUse`/`PostToolUse` execution through the CLI. |
| 7 | **Codex plugin packaging research** | researched; implementation deferred | [`docs/CODEX-PLUGIN-PACKAGING.md`](CODEX-PLUGIN-PACKAGING.md) records the v1.1 decision: do not emit a Codex plugin by default yet. Keep required runtime hooks in config-layer `.codex/hooks.json`; treat future plugin output as optional packaging for skills, examples, or MCP/app metadata until plugin-local hook execution and trust semantics are verified in real Codex. |
| 8 | **Runtime inspiration from OMX, not dependency** | v1.1 slice implemented; expansion deferred | Added a small anamnesis-owned runtime evidence layer inspired by OMX `.omx/` state/log patterns; see [`docs/RUNTIME-EVIDENCE.md`](RUNTIME-EVIDENCE.md). `dogfood check --append` and `benchmark report --append` now write machine-readable records to `.anamnesis/evidence/events.jsonl`, and `status` reports the latest record. Later scope: hook-log events, install/update/doctor evidence, benchmark trace rollups, and public README evidence surfacing. Do not add task orchestration, HUD, team runtime, or OMX as a dependency. |
Progress:
- 2026-05-05: Started the Codex hook surface refresh by adding native
lifecycle shell-hook wrappers for Codex-supported events. The renderer now
treats `PostToolUse:Edit` as `PostToolUse` with
`Edit|Write|apply_patch`, supports matcherless `Stop` wrappers, and keeps
AGENTS.md plus git pre-commit fallbacks. Base v10 uses this path for Codex
dirty-work reminders and stop-time handoff reminders.
- 2026-05-07: Added shared Codex hook ownership diagnostics. `status` now
reports `.codex/hooks.json` ownership counts for anamnesis, OMX, plugin,
user, and invalid entries; `doctor` warns on duplicate commands, malformed
hook entries, and stale relative anamnesis-managed hook commands.
- 2026-05-07: Added Codex native-hook dogfood evidence. The default
self-check runs synthetic Codex JSON dispatch against generated
SessionStart, PostToolUse, and Stop wrappers; the opt-in
`ANAMNESIS_REAL_CODEX_SMOKE=1` path proved the Codex CLI invokes a
SessionStart hook from isolated `CODEX_HOME/hooks.json` before the expected
isolated auth failure.
- 2026-05-07: Extended the opt-in real Codex smoke to a trusted project-local
`.codex/hooks.json` fixture. `ANAMNESIS_REAL_CODEX_SMOKE=1 npm run dogfood`
now records both real SessionStart paths: isolated `CODEX_HOME/hooks.json`
and project-local `.codex/hooks.json` discovered through `codex exec -C`.
- 2026-05-19: Tightened stop-time continuity UX. The Stop handoff reminder is
now deduped by dirty git fingerprint, so repeated agent Stop invocations do
not keep blocking on the same unchanged worktree state while still warning
again after the git changes differ.
- 2026-05-19: Published `@mcprotein/anamnesis@1.4.4` from the tag-triggered
GitHub Actions `Publish` workflow. npmjs.org returned `1.4.4`, and a
published-package smoke verified fresh init/status/doctor plus Stop hook
first-run/second-run dedupe behavior.
- 2026-05-07: Added real `UserPromptSubmit` smoke coverage. The opt-in real
dogfood path now verifies Codex invokes `UserPromptSubmit` before model
transport completes and accepts the `hookSpecificOutput.additionalContext`
output shape.
- 2026-05-07: Added authenticated Codex tool-turn smoke coverage. When
`ANAMNESIS_REAL_CODEX_TOOL_SMOKE=1` is set, dogfood asks Codex to run a
safe Bash `printf` command inside an isolated temp project and verifies both
`PreToolUse` and `PostToolUse` hook payloads are emitted for `tool: Bash`.
- 2026-05-07: Started the anamnesis-owned runtime evidence layer. Dogfood
and benchmark append runs now write versioned JSONL records under
`.anamnesis/evidence/events.jsonl`, and `status` reports total/invalid
evidence counts plus the latest record kind and timestamp.
- 2026-05-07: Closed Codex plugin packaging research for v1.1 with
[`docs/CODEX-PLUGIN-PACKAGING.md`](CODEX-PLUGIN-PACKAGING.md). The current
decision is to keep required continuity hooks in config-layer
`.codex/hooks.json` and reserve optional plugin packaging for skills,
examples, and integration metadata until plugin-local lifecycle hooks have
real Codex CLI smoke evidence.
- 2026-05-07: Locked the v1.1 Codex hook surface as a release candidate.
Renderer tests now cover event-aware native wrapper registration for
`PreToolUse`, `PermissionRequest`, and `UserPromptSubmit`, in addition to
the existing `SessionStart`, `PostToolUse`, and `Stop` coverage. Prompt-time
delta injection and broader runtime evidence collection are explicitly
deferred beyond the v1.1 release cut.
- 2026-05-07: Post-publish smoke for `@mcprotein/anamnesis@1.1.0` caught
hard-coded CLI / ontology-generator version metadata. The v1.1 patch line
now reads package metadata from `package.json`, adds regression coverage,
and treats published CLI version mismatches as release blockers.
- 2026-05-07: Published `@mcprotein/anamnesis@1.1.1` as the v1.1 patch.
npmjs.org version lookup returned `1.1.1`, published CLI execution from
a fresh temp directory returned `1.1.1`, and a fresh Prisma fixture initialized
with `--tools all --allow-exec-adapters` reached continuity `ready (6/6)`,
Codex hook warnings `0`, and doctor `0 error(s)` with only the expected
agent-required `.enriched.yaml` warning.
Exit criteria:
- Fresh `--tools codex --allow-exec-adapters` install gets automatic
ontology and handoff context at Codex SessionStart.
- `status` / `doctor` report the Codex native hook wrapper, feature flag,
hook registrations, and shared hook ownership as part of adapter continuity.
- Existing user `.codex/hooks.json` entries are preserved and stale
anamnesis-managed wrapper entries are deduped on update.
- `docs/ADAPTER-PARITY.md` and switching fixtures distinguish Codex native
hook parity, fallback parity, and version-gated gaps for every supported
capability.
- Real Codex native-hook smokes prove each newly claimed native path before
README or benchmark claims mention it. Synthetic dispatch evidence is
recorded separately from real CLI execution evidence.
- Markdown dogfood and benchmark reports have a machine-readable evidence
counterpart that future status, benchmark gallery, and README-claims
surfaces can consume without scraping prose.
- Codex plugin packaging has a documented boundary: optional UX packaging may
follow later, but no core continuity promise depends on plugin install state
or unverified plugin-local hooks.
- OMX remains compatible as a co-installed runtime, but anamnesis does not
require OMX to provide its context/ontology/handoff continuity promise.
---
## v1.2 — shipped 2026-05-08
> **Theme: numeric evidence for context quality and agent-switch continuity**
Benchmarking is useful here only when the metric is honest about what it
measures. anamnesis can measure deterministic context surfaces numerically:
ready layers, continuity checks, ontology gap counts, adapter/hook diagnostics,
doctor errors, evidence freshness, and before/after deltas on the same repo
snapshot. It should not pretend those are raw model-intelligence scores.
Model-dependent outcomes such as "time to first correct action" need a
separate controlled task harness with repeated runs, fixed prompts, and clear
limitations.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Benchmark scorecard v2** | shipped | Extend `anamnesis benchmark report` with a stable numeric scorecard that keeps raw dimensions visible: ready layers `/5`, continuity `/6`, ontology warning/error counts, doctor error/warning counts, Codex hook warning counts, adapter parity, and evidence freshness. Composite scores are allowed only as a convenience summary, not as the source of truth. |
| 2 | **Before/after adoption harness** | shipped | Add a repeatable workflow for sanitized snapshots: baseline report -> install/update/bootstrap/enrich -> follow-up report -> delta summary. Report the numeric movement for ready layers, continuity, ontology gaps, doctor issues, adapter surfaces, generated files, and evidence records. |
| 3 | **Agent-effectiveness task benchmark** | shipped | Introduce an optional, explicitly model-dependent harness for controlled tasks. Candidate metrics: prompts/questions needed before work starts, tool turns to locate key context, first-correct-action success, handoff recovery success, and elapsed time. Store this separately from deterministic `benchmark-report` evidence so README claims do not confuse product surfaces with model capability. |
| 4 | **Evidence gallery automation** | shipped | Generate or validate `docs/BENCHMARK-GALLERY.md` and README claim candidates from `.anamnesis/evidence/events.jsonl` plus sanitized benchmark artifacts. Claims without matching evidence should be flagged before release. |
| 5 | **Public-safe multi-shape collection** | shipped | Collect at least three public-safe benchmark shapes: a frontend app, a backend plus infra repo, and a Python/API repo. Each entry must include fragment set, raw score dimensions, before/after or fresh-install state, and limitations. |
| 6 | **Prompt-time context delta decision gate** | shipped | Revisit Codex `UserPromptSubmit` context delta injection only through `anamnesis benchmark prompt-gate`. The gate reads benchmark/task evidence, estimates duplicate ontology/handoff prompt overhead, reports duplicate-context risk, and keeps injection disabled unless repeated continuity failures justify a bounded non-default prototype. |
| 7 | **Runtime evidence expansion** | shipped | Expand runtime evidence beyond dogfood and benchmark append runs. `anamnesis doctor --append` records install integrity diagnostics as `doctor-check`, `anamnesis hooks summary --append` records hook runtime summaries as `hook-log-summary`, `anamnesis init` records first-install evidence as `init-install`, `anamnesis update --apply` records write-path evidence as `update-apply`, `anamnesis benchmark trace --append` records trace rollups as `benchmark-trace-rollup`, and `status` reports per-kind evidence counts/freshness. |
Progress:
- 2026-05-07: Implemented benchmark scorecard v2 for
`anamnesis benchmark report`. The report now exposes raw numeric dimensions
through `scorecard` in JSON/evidence output, a markdown scorecard table, and
concise CLI lines for continuity, doctor health, Codex hook warnings, and
evidence record counts.
- 2026-05-07: Implemented `anamnesis benchmark compare` for before/after
adoption evidence. It reads two `benchmark report --json` snapshots, reports
raw scorecard deltas, and can append markdown plus `benchmark-compare`
runtime evidence.
- 2026-05-07: Implemented `anamnesis benchmark gallery --write|--validate`.
The command refreshes a generated evidence region in
`docs/BENCHMARK-GALLERY.md`, derives README claim candidates from runtime
evidence, and fails validation when the generated region is stale.
- 2026-05-07: Added public-safe multi-shape benchmark evidence for a fresh
Next.js frontend, a fresh NestJS/Kubernetes backend, an existing Python/uv
repo, and two before/after comparisons. The generated gallery now reports
12 valid evidence records, 7 entries, 5 claim candidates, and no release
warnings while still marking weak/regressed shapes as non-claim evidence.
- 2026-05-07: Implemented `anamnesis benchmark task --template|--input`.
The command validates controlled task-run JSON, reports model-dependent
metrics such as questions before action and tool turns to context, appends
to `docs/AGENT-TASK-BENCHMARKS.md`, and writes separate
`agent-task-benchmark` evidence that the deterministic gallery ignores.
- 2026-05-07: Implemented `anamnesis benchmark prompt-gate`. The command
turns prompt-time context delta into an evidence gate instead of a default
hook behavior: it consumes deterministic and model-dependent evidence,
estimates duplicated ontology/handoff token overhead, reports duplicate
context risk, and records `prompt-delta-gate` evidence when appended.
- 2026-05-07: Added `anamnesis doctor --append` as the first v1.2 runtime
evidence expansion beyond dogfood/benchmark checks. Doctor append writes
`docs/DOCTOR.md` snapshots plus `doctor-check` JSONL evidence with
install-integrity issue summaries.
- 2026-05-07: Expanded `anamnesis status` runtime evidence output from a
single latest record to a kind-level freshness rollup with per-kind counts,
latest timestamps, age, and stale flags for both CLI and JSON consumers.
- 2026-05-07: Added automatic `update-apply` runtime evidence for
`anamnesis update --apply`. Dry-runs remain side-effect free, while apply
records summarize planned change counts, backup/prune counts,
Claude/Codex hook registration outcomes, suggested fragments, and apply
flags.
- 2026-05-08: Added automatic `init-install` runtime evidence for
`anamnesis init`. `init --dry-run` remains side-effect free, while first
install records summarize selected fragments, installed tools, planned
change counts, monorepo detection, post-install bootstrap outcomes,
Claude/Codex hook registration outcomes, and install flags.
- 2026-05-08: Added `anamnesis hooks summary --append` for hook-log
summaries. It reads `.anamnesis/logs/hooks.jsonl`, reports valid/invalid
hook runtime records by event and status, appends `docs/HOOKS.md`, and
records `hook-log-summary` runtime evidence.
- 2026-05-08: Added `anamnesis benchmark trace --append` for benchmark trace
rollups. It reads `.anamnesis/logs/benchmark-traces.jsonl`, aggregates
trace records by phase/status plus numeric metrics, appends
`docs/BENCHMARK-TRACES.md`, and records `benchmark-trace-rollup` runtime
evidence.
- 2026-05-08: Published `@mcprotein/anamnesis@1.2.0` from the tag-triggered
workflow. npmjs.org `latest` returned `1.2.0`, published CLI execution
from a fresh temp directory returned `1.2.0`, and a fresh Prisma fixture initialized
with continuity `ready (6/6)`, `init-install` evidence, and the expected
Layer B enrichment follow-up.
- 2026-05-08: Published `@mcprotein/anamnesis@1.2.1` as a package-facing
README patch after the `1.2.0` tarball still showed the old status badges.
npmjs.org `latest` returned `1.2.1`, the package README showed
`500 passing` and `v1.2 stable`, published CLI execution returned `1.2.1`,
and the fresh Prisma fixture smoke remained continuity `ready (6/6)`.
Exit criteria:
- `anamnesis benchmark report` exposes stable numeric raw dimensions and a
clear scorecard schema that can be compared over time.
- At least one before/after adoption benchmark and at least three public-safe
repo shapes are represented in the benchmark gallery.
- Any README benchmark claim points to raw evidence and states limitations.
- Model-dependent task metrics are separated from deterministic context-quality
metrics in both schema and documentation.
- Prompt-time context injection is either justified by benchmark evidence and
bounded by token/noise rules, or explicitly kept deferred.
- Runtime evidence records are usable by `status`, benchmark gallery, and
README-claims workflows without markdown scraping.
---
## v1.3 — *shipped 2026-05-08*
> **Theme: fragment lifecycle intelligence**
v1.3 should make installed fragments behave less like manually curated
snippets and more like a small dependency graph with observable update
signals. The scope stays narrow: resolve fragment dependencies and version
constraints before rendering, then expose local update events that future
automation can consume. This is still configuration lifecycle management, not
project scaffolding or a hosted control plane.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Fragment dependency resolution** | done | Replace the current `requires` behavior from simple topological ordering with explicit dependency resolution. A selected fragment should be able to require another fragment id plus a minimum integer version. `init`, `update`, `status`, and `doctor` should report missing dependencies, unsatisfied minimum versions, pinned fragments that block a requirement, and dependency cycles before rendering managed files. |
| 2 | **Fragment update event hooks** | done | Add a local update notification surface for fragment lifecycle changes. Start with deterministic event records for installed, updated, pinned-blocked, yanked/invalid, and dependency-blocked fragments; keep external webhook delivery optional and disabled until the local event schema and trust boundary are stable. |
Progress:
- 2026-05-08: Promoted `Fragment dependency resolution` and fragment update
notifications from cross-cutting backlog to the v1.3 planned scope. Deferred
project templates and WebUI work so v1.3 stays focused on fragment lifecycle
correctness and observable update signals.
- 2026-05-08: Implemented dependency parsing for `requires: [id]` and
`requires: [{ id, min_version }]`, auto-inclusion in `init`/`update`, and
dependency diagnostics in `status`/`doctor`.
- 2026-05-08: Added local `fragment-lifecycle` evidence records for
first-install and update/apply fragment events. External webhook delivery
remains intentionally absent until there is an opt-in delivery smoke.
- 2026-05-08: Published `@mcprotein/anamnesis@1.3.0` from the tag-triggered
workflow. npmjs.org `@mcprotein/anamnesis@1.3.0` returned `1.3.0`,
published CLI execution from a fresh temp directory returned `1.3.0`, and a fresh
Prisma fixture reached continuity `ready (6/6)` with both `init-install`
and `fragment-lifecycle` evidence records plus the expected Layer B
enrichment follow-up.
Exit criteria:
- Fragment dependency requirements are parsed from fragment metadata without
changing existing Agentfile v1 installed-fragment entries.
- `init` and `update` can auto-include or clearly report required dependency
fragments before rendering.
- `status` and `doctor` explain missing, incompatible, pinned-blocked, and
cyclic fragment dependencies with actionable next steps.
- Update/apply flows write local machine-readable fragment lifecycle events
without sending data to any external service by default.
- README and release claims do not mention external webhook delivery unless a
real opt-in delivery smoke exists.
---
## v1.4 — *shipped 2026-05-11*
> **Theme: adoption automation and project context bootstrap**
v1.4 should reduce the manual work needed when anamnesis is first applied to
an existing project. The product target is not more public proof; it is better
first-run UX: install the cross-agent surfaces, preserve pre-existing local
agent affordances safely, and create a useful project context draft even when
no framework-specific fragment exists.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Generic project context bootstrap** | shipped | During `init`, create a conservative `system_graph.yaml` draft when one does not already exist. Use safe local signals such as `package.json`, README/CLAUDE/docs headings, common source directories, and dependency names when available. If no safe signals exist yet, still create a zero-context draft with safety invariants and open questions rather than inventing facts or leaving the next agent with no project-level ontology file. Do not read or emit secret values from env files, Terraform state, tfvars, PEM keys, logs, or credentials. |
| 2 | **Existing surface conflict handling** | shipped | When a pre-existing project-specific `.claude/skills/load-context` blocks the managed base surface, preserve it under a project-specific name and install the standard anamnesis `load-context` surface so first-run continuity can reach `6/6` without manual rename work. Keep the behavior conservative and visible in CLI output/evidence. |
| 3 | **Adoption UX report** | shipped | Make `init` output explain which context was generated, which local surfaces were preserved, and which follow-ups remain agent-required. The report should answer "what did this just do?" without forcing users to inspect manifest internals. |
| 4 | **Opt-in project docs scaffold/enhance** | shipped | Add gated first-run documentation support: `--scaffold-docs` creates missing `README.md` and `docs/PROJECT-CONTEXT.md` starter docs, while `--enhance-docs` adds managed context-review regions to existing docs without replacing user prose. Add the `anamnesis-init` agent skill so agents ask a multiple-choice README/docs question before selecting those flags for the user. Keep default init conservative so user-owned docs are not rewritten unexpectedly. |
Progress:
- 2026-05-11: Implemented the v1.4 adoption helpers in the CLI. `init`
now writes or plans `system_graph.yaml`, `init`/`update` preserve
conflicting project-specific `load-context` skills before installing the
managed surface, and runtime evidence records both outcomes.
- 2026-05-11: Ran a sanitized TypeScript service-shaped CLI smoke from
a fresh temp directory. The smoke reached continuity `6/6`, doctor `0/0`, and
benchmark ready layers `3/5` without publishing any private-project
evidence.
- 2026-05-11: Cut the v1.4.0 release prep after `npm run release:check`
passed locally.
- 2026-05-11: Published `@mcprotein/anamnesis@1.4.0` from the tag-triggered
workflow. npmjs.org returned `1.4.0`, published CLI execution from
a fresh temp directory returned `1.4.0`, and a fresh sanitized TypeScript service
fixture verified context bootstrap plus load-context preservation.
- 2026-05-11: Prepared `1.4.1` to follow Codex CLI `0.130.0`'s renamed hook
feature flag, replacing `[features].codex_hooks` with `[features].hooks`
and removing the deprecated key during updates.
- 2026-05-11: Published `@mcprotein/anamnesis@1.4.1`. npmjs.org returned
`1.4.1`, published CLI execution returned `1.4.1`, and a published-package
migration smoke verified a v1.4.0 install upgrades from `codex_hooks = true`
to `hooks = true` with doctor `0/0`.
- 2026-05-19: Tightened the v1.4 bootstrap plan for completely blank
projects: `init` should still write `system_graph.yaml`, but from the
pre-install project state and with open questions plus invariants only when
no safe signals exist, so downstream `/ontology-enrich` or human review can
add semantics without the CLI pretending to know project intent.
- 2026-05-19: Added opt-in `init` docs support. `--scaffold-docs` creates
missing starter docs, and `--enhance-docs` adds managed review regions to
existing README/docs so users can explicitly decide when anamnesis touches
user-facing documentation.
- 2026-05-19: Added the `anamnesis-init` base skill. When an agent performs
setup for the user, it asks one multiple-choice README/docs question and maps
the answer to no docs flag, `--scaffold-docs`, or `--enhance-docs`.
- 2026-05-19: Published `@mcprotein/anamnesis@1.4.4` from the tag-triggered
GitHub Actions `Publish` workflow. The published-package smoke verified
the deduped Stop handoff reminder: first unchanged dirty fingerprint warns,
the second run with the same dirty state is silent.
Private validation notes:
- Use private validation only as internal development evidence.
Do not add project-specific private evidence to README,
`docs/BENCHMARK-GALLERY.md`, public claim candidates, or public benchmark
fixtures unless it has been explicitly sanitized and approved later.
Exit criteria:
- Fresh adoption on a TypeScript service-style repo can produce
cross-agent surfaces plus a useful `system_graph.yaml` draft without an
agent manually writing it.
- Optional docs scaffolding can create or enhance README/context docs only
when explicitly requested by flag.
- Existing project-specific `load-context` content is preserved instead of
overwritten, while the standard anamnesis `load-context` surface becomes
cleanly managed.
- `status` / `doctor` / `benchmark report` can reach continuity `6/6` and
doctor `0/0` on the target dogfood shape after init/apply.
- Public docs describe the feature generically without exposing private repo
names, secrets, tokens, infra identifiers, or internal benchmark records.
---
## v1.5 — *shipped 2026-06-19; follow-ups planned*
> **Theme: compact session context with numeric proof**
The next product risk was context over-injection. SessionStart continuity is
valuable only when it gives agents the minimum current state they need and
clear pointers to retrieve the rest. v1.5 moved ontology and handoff startup
behavior from "print everything we found" toward a compact, retrieval-first
contract, then proved the change with numeric reports and graphs before
making broad claims.
This version is informed by recent agent/LLM ecosystem signals around
context-budget discipline, large-context failure modes, and local-model cost
pressure, but the roadmap below is the canonical plan.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Compact SessionStart default** | shipped | Change ontology and handoff startup injection to emit a short invariant digest, active-task summary, source pointers, and retrieval instructions by default. Full file injection remains an explicit compatibility/debug mode via `ANAMNESIS_SESSION_CONTEXT_MODE=full`, not the default. |
| 2 | **Session context budget policy** | partial | Add a documented budget contract for startup payloads: estimated tokens, chars, lines, source-pointer count, required-rule presence, and cap-exceeded status. `benchmark session-context` now reports those dimensions and hard-cap outcomes; status/doctor surfacing remains open. |
| 3 | **Deterministic `benchmark session-context`** | shipped | Add a model-free benchmark comparing `full` and `compact` session context across sanitized fixtures. Metrics include startup chars, lines, estimated tokens, included file bytes, source pointers, required rules present, and hard-cap outcomes. |
| 4 | **Numeric graph artifacts** | shipped | Generate dependency-free SVG charts from the same benchmark JSON so context tradeoffs are visible without reading raw data or adding a chart runtime. Required graphs are generated: mode-by-mode token bar chart, stacked payload composition, fixture-size growth line, and cap/success summary. Store public-safe generated artifacts under docs or benchmark output paths. |
| 5 | **Model-dependent retrieval benchmark** | partial | `benchmark task` now accepts optional compact/full retrieval metrics, `benchmark task-compare` compares paired full/compact runs, and `benchmark task-series` rolls repeated compare evidence into averages, ranges, standard deviations, and SVG charts. The remaining follow-up is repeated public-safe full-vs-compact task runs before any success-rate claim. |
| 6 | **Session-context fixture suite** | shipped | Add fixtures for tiny, normal, large ontology, stale handoff, conflicting ontology, missing handoff, and multi-scope projects so compact mode is tested against the failure modes that caused full injection to look attractive. |
| 7 | **Prompt-gate integration** | shipped | `benchmark prompt-gate` now reads deterministic session-context JSON and retrieval-aware task evidence so prompt-time context deltas stay disabled unless repeated measured failures justify bounded extra injection. |
Progress notes:
- 2026-06-19: Shipped compact SessionStart defaults for Claude Code and
Codex native wrappers. Default startup context now emits invariant digest,
active handoff summary, source pointers, and retrieval instructions; full
file-body injection remains available through
`ANAMNESIS_SESSION_CONTEXT_MODE=full`.
- 2026-06-19: Added deterministic `anamnesis benchmark session-context`.
Current public-safe fixture run covers 7 fixture shapes, reports compact
required rules `7/7`, compact source pointer fixtures `7/7`, large-fixture
token reduction `94%`, and cap exceeded counts `compact=0`, `full=2`.
Generated artifacts live under
`docs/benchmark-evidence/session-context/`.
- 2026-06-19: Extended `anamnesis benchmark task` with optional
`session_context_mode` and retrieval metrics, and taught
`anamnesis benchmark prompt-gate` to consume both
`docs/benchmark-evidence/session-context/session-context.json` and
retrieval-aware `agent-task-benchmark` records. This enables the
compact-vs-full model-dependent comparison, but repeated public-safe runs
are still required before claiming compact task success parity.
- 2026-06-19: Added `anamnesis benchmark task-compare` for paired full vs
compact retrieval runs. It validates that the two run inputs share the same
project/task/prompt/agent/model/context state, records compact/full deltas,
and emits `agent-task-benchmark-compare` evidence that `prompt-gate` can use
as retrieval friction/failure signal.
- 2026-06-19: Added `anamnesis benchmark task-compare --template` so repeated
full/compact retrieval runs can start from matched public-safe input pairs
before observed model metrics are filled in.
- 2026-06-19: Recorded the first public-safe Codex full-vs-compact retrieval
diagnostic pair under `docs/benchmark-evidence/agent-task/`. Both modes
completed the fixed task with `3/3` required source reads, `0` missed
invariants, and `0` hallucinated facts. The compact run was slower and used
more total tokens in this single pair, so it is evidence for retrieval
instrumentation and prompt-gate friction tracking, not success parity.
- 2026-06-19: Added `anamnesis benchmark task-series --write` to roll repeated
compare evidence into average/stddev/min/max metrics and SVG charts. The
current committed series has only one pair, so it is a pipeline check, not a
parity claim.
Exit criteria:
- Compact SessionStart includes required invariants and source pointers in
100% of fixture runs.
- Compact mode reduces startup estimated tokens by at least 60% on the
large-ontology fixture.
- Model-dependent compact task success is no more than 5 percentage points
below full mode on the controlled task suite.
- Compact mode increases required-source-read rate versus full mode, showing
agents retrieve exact context instead of relying on startup payload memory.
- Numeric chart artifacts are generated from the same benchmark data used for
JSON/markdown reports.
- `status`, `doctor`, or benchmark output can explain when a project is over
the session-context budget and which source category dominates the payload.
---
## v1.6 — *shipped 2026-06-25*
> **Theme: repo-local executable context and contradiction diagnostics**
After startup payloads are compact, the missing piece is retrieval quality.
v1.6 should make repo-local context easier for agents to query without
turning anamnesis into a cloud memory service or an agent runtime.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **Local context index design** | done | Designed a read-only JSONL index over `AGENTS.md`, `system_graph.yaml`, `.anamnesis/ontology/*.yaml`, `.bootstrap.yaml`, `.enriched.yaml`, handoff files, manifest data, runtime evidence, and selected docs. Draft/decision record: [`docs/CONTEXT-INDEX-DESIGN.md`](CONTEXT-INDEX-DESIGN.md). |
| 2 | **Context index prototype** | done | Added and hardened `anamnesis context index` and `anamnesis context query` prototype commands that build/query a disposable JSONL index with source paths, stable refs, freshness, kinds, tags, snippets, malformed-index tolerance, and diagnostic source-pointer coverage. |
| 3 | **Ontology and handoff contradiction report** | done | Added `anamnesis context diagnose` for stale/missing handoff archive pointers, duplicate ontology entity IDs, conflicting relationship IDs, explicit docs-vs-bootstrap fact conflicts, superseded semantic entries still treated as current, malformed evidence lines, and evidence records with missing artifacts. `status` exposes a short context diagnostic summary and `doctor` exposes detailed advisory issues. |
| 4 | **Compact resume bundle** | done | Added `anamnesis context resume` and `--write` to produce a repo-native compact bundle with active task lines, active/latest handoff pointers, touched files, latest evidence, diagnostic warnings, retrieval rules, and line/char/token estimates. |
| 5 | **Export interface decision** | done | Deferred MCP/API export for v1.6. Core continuity stays on local CLI commands and regenerable `.anamnesis/context/` files; revisit MCP only if dogfood shows file/CLI access is materially blocking cross-session use. |
Exit criteria:
- The index can be regenerated from tracked and local anamnesis files without
requiring network access or credentials.
- Query output cites source file paths and stable IDs rather than anonymous
memory blobs.
- Doctor/status diagnostics identify at least stale handoff pointers and
contradictory ontology claims in fixtures.
- Resume output stays compact enough to fit within the v1.5 session-context
budget.
- MCP/API export work is explicitly deferred based on current product scope;
core cross-session use remains file/CLI based.
Progress notes:
- 2026-06-19: Started the JSONL prototype with local indexing for `AGENTS.md`,
`CLAUDE.md`, ontology YAML, active handoff plus referenced archives,
manifest entries, runtime evidence summaries, and selected docs.
- 2026-06-22: Added `anamnesis context diagnose` as an advisory context
consistency report over handoffs, ontology YAML, and runtime evidence.
- 2026-06-22: Surfaced context diagnostics through `status` summary output
and detailed `doctor` advisory issues without adding prompt-time injection.
- 2026-06-22: Added an explicit docs-vs-bootstrap contradiction fixture using
`anamnesis-fact: facts... = ...` markers, closing the v1.6 contradiction
report item without free-form prose inference.
- 2026-06-22: Added `anamnesis context resume` for compact handoff/evidence
resumption; targeted tests assert the generated bundle stays below 300
estimated tokens on the fixture.
- 2026-06-25: Deferred MCP/API export from v1.6; local CLI commands and
regenerable `.anamnesis/context/` files remain the continuity interface.
- 2026-06-25: Hardened context index/query fixtures around `system_graph.yaml`,
bootstrap facts, docs fact markers, runtime evidence, stale handoff pointers,
malformed JSONL rows, repo-relative JSON output, and diagnostic follow-up
source pointers.
---
## v1.7 — *planned*
> **Theme: task harnesses, behavior verification, and adapter security**
Once agents receive compact context and can retrieve exact project memory, the
next step is to make agent work verifiable. v1.7 should promote the strongest
Hada-radar signals around harnesses, rubrics, agentic review, and executable
adapter safety into repo-native capabilities and diagnostics.
Task harness storage must be lifecycle-bounded. The default design should
separate one-task `current` harnesses from reusable task templates, inject at
most one matched harness at session start, and leave the rest as indexed
retrieval targets. Completed `current` harnesses should be removed from active
startup context and either deleted or archived under bounded retention.
Reusable harnesses should carry lifecycle metadata such as `last_used`,
`use_count`, `deprecated`, and `superseded_by`, so old or replaced templates can
be reported by `anamnesis gc --dry-run` before any deletion. The goal is not to
grow an unbounded task-memory store; it is to keep a small, useful set of
retrievable contracts with explicit disk and injection budgets.
| # | Item | Status | Description |
|---|---|---|---|
| 1 | **`task_harness` capability design** | done | Specified a tool-agnostic capability for task goal, stop condition, read/write scope, required evidence, test commands, role/subagent hints, rubric, lifecycle kind (`current` or `reusable`), and lifecycle metadata. Preserves adapter parity semantics across Claude Code, Codex, and Cursor through a shared repo-local retrieval file. Design: [`TASK-HARNESS-DESIGN.md`](TASK-HARNESS-DESIGN.md). |
| 2 | **Task harness retention and GC policy** | preview shipped; deletion planned | Added preview-only cleanup reporting for active `current` harnesses, reusable templates, disk/count budgets, stale age, `last_used`/`use_count`, deprecation/supersession behavior, and managed vs user-authored cleanup recommendations. Deletion/apply mode remains planned. |
| 3 | **Base task harness fixture** | done | Added one base-fragment harness fixture and adapter-rendering tests before expanding to stack-specific harnesses. The first fixture targets context/ontology/handoff continuity behavior and stays retrievable through `context index` without adding all harness bodies to startup context. |
| 4 | **Behavior benchmark expansion** | partial | Extended `benchmark task` and `task-compare` with numeric behavior metrics for source citations, managed-region edit attempts, `.bootstrap.yaml` edit attempts, handoff refresh success, matched harness reads, and non-matched harness reads. `task-series --write` now emits a source-citation delta SVG alongside token and quality charts. Repeated public-safe runs remain planned before claiming compact/full behavior parity. |
| 5 | **Executable capability side-effect metadata** | planned | Add metadata for read-only, local-write, git-hook, network, credential-touching, and external-production behavior on executable capabilities and rendered wrappers. |
| 6 | **Executable adapter security diagnostics** | planned | Add `doctor` warnings for generated or managed hooks that write outside the project, access network unexpectedly, touch likely secrets, omit shell safety settings, or drift from managed wrapper content. |
| 7 | **Malicious and unsafe-fragment fixtures** | planned | Add fixtures for unsafe executable adapters, suspicious native wrappers, network egress, repo-external writes, and stale hook registrations so security diagnostics are test-backed. |
| 8 | **Review diagnostics for AI-agent config damage** | planned | Add advisory checks for copied handoff archives in startup context, generated docs that overclaim adapter parity, managed regions changed outside anchors, and bootstrap ontology files edited by hand. |
Exit criteria:
- `task_harness` has a documented schema or design decision and at least one
adapter-parity fixture.
- Harness lifecycle rules distinguish `current` and `reusable` artifacts,
update usage/deprecation metadata, bound disk growth, and keep non-matched
harnesses out of startup injection.
- Cleanup remains preview-first: retention and stale-template candidates are
reported before deletion, and user-authored files are not silently removed.
- Behavior benchmarks report numeric pass/fail dimensions separately from
deterministic context-quality scorecards.
- Executable adapter security diagnostics are visible in `doctor` or `status`
and backed by unsafe fixture tests.
- Security checks remain advisory unless a command would generate unsafe
managed executable output; user-authored files are not auto-reverted.
- README or public claims mention task harnesses or security diagnostics only
after fixture and dogfood evidence exist.
Progress notes:
- 2026-06-27: Added the initial `task_harness` capability, base
`context-continuity` harness fixture, adapter parity row, renderer tests, and
context-index retrieval support. Runtime GC deletion remains planned; the
current implementation only renders and indexes bounded harness files.
- 2026-06-27: Added preview-only `anamnesis gc --dry-run` reporting for
task-harness lifecycle candidates. The dogfood repo currently reports one
managed reusable harness, 2026 bytes, and zero cleanup candidates.
- 2026-06-28: Added v1.7 behavior metrics to the model-dependent task
benchmark path. The intended contract is now explicit: `AGENTS.md` and
`CLAUDE.md` should act as compact control planes with source pointers, while
project facts live in ontology/docs and behavior benchmarks verify that
agents retrieve, cite, and protect those sources.
- 2026-06-29: Recorded the first public-safe v1.7 behavior benchmark pair for
`context-continuity`. Full and compact modes both read and cited `4/4`
required sources, had zero missed invariants, zero hallucinated facts, zero
managed-region or bootstrap edit attempts, and read the matched harness.
Compact reduced total tokens by `46.833%` in this pair, but still scored
lower on convenience due elapsed time. This is diagnostic evidence only;
repeated pairs remain required before compact/full behavior parity claims.
---
## Parked ideas (outside the accepted roadmap)
These have been discussed, but they are not active roadmap work. Bring them
back only if repeated dogfood evidence shows they directly improve the core
goal: automatic context/ontology continuity across agent tools.
- **Project type templates** — `init --template react-app` style scaffolding for first-time users
- **WebUI for Agentfile editing** — visual editor for non-CLI users
---
## Changing the plan
Versions move based on verified signal. If a planned item turns out to
be hard or low-value, it gets bumped. If a v0.4 item becomes urgent (e.g.,
heavy daily use of agent-handoff), it can move into v0.3.
When the plan changes, update this file in the same commit.