# Architecture & Roadmap An architect's view of where ContextDevKit is, what it learned from the production system it was distilled from (the "Ruiva" project's `devAItools/`), and where it goes next. > **Status key.** Every roadmap item carries a marker, kept current as work moves: > โœ… **done** (shipped/implemented) ยท โณ **in progress** (being executed in a session > right now) ยท ๐ŸŸก **partial / awaiting input** (started, or blocked on external data โ€” > not actively in a session) ยท ๐Ÿ“‹ **planned** (not started) ยท โž– **dropped** > (intentionally not ported). **Process:** when a session picks up an item, mark it > โณ; when it ships, switch it to โœ…. ## Lineage ContextDevKit is the **generalized, stack-agnostic distillation** of a real single-project AI dev platform. That source system was deeply coupled to its stack (Cloudflare Workers + Hono + Expo + Drizzle) and domain (LGPD). The kit keeps the *engine and the discipline*, drops the *stack-specific content*, and adds a **level system** so adoption is gradual. ## What was ported (and how it was generalized) | Source capability | In the kit | Generalization | | --- | --- | --- | | Session ledger + drift hooks | โœ… L2 | path classification is **config-driven** (any stack) | | Boot context injection | โœ… L1 | reads kit-canonical paths; project-name auto-detected | | Multi-session (claims, worktrees, indices) | โœ… L3 | unchanged in spirit | | Squad of agents | โœ… L4 | shipped as stack-agnostic archetypes + `_TEMPLATE` | | L5 simulate-impact gate | โœ… L5 | `highRiskPaths` configurable | | Deterministic tech-debt detectors | โœ… L5 | regex detectors generalized (JS/TS/py + React) | | Contract drift gate | โœ… L5/L6 | `contractGlobs` declares the surface; export-diff | | Session telemetry | โœ… L6 | `stats.mjs` (drift rate, cadence, ADRs) | | Auto-distill loop | โœ… L5/L6 | `/distill-sessions` + `/distill-apply` + `/retro` | | Config + Zod schema | โœ… | **zero-dep loader** on the hot path; zod optional | | Zod-coupled hooks | โž– removed | hooks must run on a fresh project with no installs | ## The new layer โ€” L6: Autonomy & Insight L1โ€“L3 buy **context fidelity** (the platform never forgets). L4โ€“L5 buy **quality governance** (review, tests, gates). The missing frontier is making the platform *act and learn*, not just remember and enforce. That's **L6**: - **Insight** โ€” `stats.mjs` / `/context-stats`: drift rate, cadence, ADR/agent counts. You can't improve a practice you don't measure. - **Autonomy** โ€” `/ship`: an orchestrated pipeline that drives the whole squad (architect โ†’ implement โ†’ code-review โ†’ QA โ†’ record) with human checkpoints. This is "a full team of capable agents", coordinated, not ad hoc. - **Learning loop** โ€” `/retro`: turns recurring drift/debt/patterns into concrete governance (new CLAUDE.md rules, ADRs, config tweaks). The platform gets smarter *about this project* over time. L6 adds no new Claude hook (same wiring as L5) โ€” it's a **capability tier**: commands + metrics + orchestration on top of the L5 gates. ## Honest gaps / not yet ported - **Contract drift: regex by default, AST optional.** The regex extractor covers the common JS/TS forms (named/declaration exports incl. `declare`/`abstract`/generators, `export default`, namespace re-exports `export * [as N] from`, type-only `export type { โ€ฆ }`). For AST precision, install `acorn` (or set `CONTEXT_CONTRACT_PARSER`): `contract-scan` uses it **only if importable**, so the zero-dep default holds. Residual: still "signal, not proof" for exotic TS without a TS-aware parser. [โ†’ ADR-0003] _The earlier gaps have since shipped: **two-tier briefings** (v1.1.0, `/squad brief`), **workflow docs/playbooks** (`contextkit/workflows/`), and the **predictions review cadence** (`/predictions-review`)._ ## 1.0 โ€” harden & prove โœ… SHIPPED (2026-05-22 ยท npm `contextdevkit@1.0.0`) L6 was reached in a single quarter; **1.0 earned it by hardening, not adding levels**: 1. โœ… **Froze the surface.** Thin wrappers (`/state`, `/context-doctor`, `/context-refresh`) deprecated toward `/audit`; `/release` paired with `/claim`. 2. ๐ŸŸก **Prove the value of each level.** Tooling shipped (`/context-stats`, analysis โ†’ backlog); still needs **real-world data** to confirm L4โ€“L6 earn their keep โ€” the one item that needs *usage*, not code. *Ongoing.* 3. โœ… **Ate our own dog food.** `install.mjs` refactored 487 โ†’ 234 (out of the RED zone); a `tech-debt-scan --ci` gate keeps it green in CI. 4. โœ… **Locked the public contracts.** Documented in `CONTRIBUTING.md`; changes need an ADR + `/contract-check`. 5. โœ… **Deepened the thin spots.** `qa-unit` / `qa-perf` / `qa-e2e` got anti-pattern tables; `architect`โ†”`security` and `test-engineer`โ†”`qa-orchestrator` clarified. 6. โœ… **Dependency & supply-chain control.** `/deps-audit` + the **security-team** (`security` AppSec ยท `infra-security` IaC/cloud ยท `devops` delivery). **Also delivered in 1.0:** standardized **WSJF (SAFe) prioritization + bug severity (S1โ€“S4) + SLA** with a **known-bugs map** in the DevPipeline; **`/deep-analysis`** (global sweep โ†’ report โ†’ ADRs โ†’ backlog); an **active security-mode** boot trigger (runs every N sessions, on by default); and the **`business-rules/`** memory folder. ## Next โ€” post-1.0 focus: ancestor parity Complete the distillation from the source platform (`app-ruivo/devAItools`) โ€” the three pieces deliberately flattened pre-1.0 (see *Honest gaps*). **All three are now shipped** (โœ… below): - โœ… **`memory/predictions/`** โ€” `/simulate-impact` writes a prediction file per run; `/predictions-review` (auto-run by `/log-session`) closes the loop, filling each file's *Actual* section from the ledger (changed vs predicted paths, both deltas). *Shipped: write half in v1.1.0; predicted-vs-actual review closed here.* - โœ… **Two-tier squad briefings** โ€” `contextkit/squads//.md` rich briefings behind the lean `.claude/agents/` agents. *Shipped v1.1.0: `/squad brief ` scaffolds a briefing, `/squad list` shows coverage.* - โœ… **`workflows/playbooks/`** โ€” per-level workflow docs (L1โ€“L5) + reusable playbooks (tech-debt sweep, simulate-impact, distillation, security batch). *Shipped: installed under `contextkit/workflows/`, seeded write-if-missing. The foundation for **playbook management** (Future directions #8).* ## Then โ€” supply-chain & code security (deepen the security-team) 1.0 shipped the *foundation*: the **security-team** (`security` AppSec ยท `infra-security` IaC/cloud ยท `devops` delivery) and `/deps-audit` (lockfile/pinning + native CVE audit โ†’ backlog). Two things are still missing โ€” a **code-facing** lane (today's agents own auth/secrets and the *platform*, not the code's exposure *through* its dependencies and third-party integrations) and any **GitHub-native** automation (the kit ships no `.github/` scaffolding). Three moves, all on the existing rails: - โœ… **`code-security` agent** โ€” a security-team **sub-specialist** (mirrors `infra-security`, no overlap with the `security` AppSec lead). Lane: the code's *external* attack surface โ€” third-party integration code (API clients / SDK usage, webhook & callback handling, (de)serialization of external responses), dependency **provenance / SBOM**, and SAST / CodeQL findings. Lean agent under `.claude/agents/` + a two-tier briefing in `contextkit/squads/security-team/`. - โœ… **Dependency control of the system** โ€” grow `/deps-audit` from "CVEs + loose ranges" into a real **dependency policy**: license allow/deny + SBOM generation, lockfile-drift detection, unmaintained / abandoned-package flags, and a scheduled (not just on-demand) sweep. Policy lives in `contextkit/config.json` (allowed licenses, max package age, pinning rules); findings still flow into the DevPipeline backlog like every other finding. - โœ… **GitHub / Dependabot integration** โ€” the kit scaffolds **`.github/dependabot.yml`** + a **security workflow** (CodeQL + `dependency-review` on PRs + the `/deps-audit` gate), ecosystem auto-detected, via `/security-setup` (or folded into `/setupcontextdevkit`). The *loop-closer* (the on-brand half): a sync pulls **Dependabot / GitHub security alerts** (`gh api`) into the **same backlog**, where the `code-security` agent triages reachability โ€” so GitHub's alerts become prioritized, owned tasks instead of an ignored tab. **Stays inside the invariants:** the `.github/` files, SBOM and CodeQL run in the *project's* CI, never on the kit's zero-dep hot path; the PR security workflow is **advisory by default** (opt into blocking); everything is plain files (`dependabot.yml`, workflow YAML, findings JSON) and **config-driven** (ecosystems + license policy in `config.json`). โœ… **Shipped** โ€” the `code-security` agent (security-team sub-specialist), `/deps-audit` grown with license policy + CycloneDX SBOM (`--sbom`) + lockfile-drift and a `deps` config block, `.github/` scaffolding (Dependabot + an advisory `security.yml`), and `gh-alerts.mjs` (GitHub alerts โ†’ DevPipeline backlog) behind a new `/security-setup`. *Deferred (โ†’ ADR-0047):* registry-backed staleness, scheduled alert-sync, required-check enforcement. ## L7 โ€” Ecosystem & Scale (the former "Future directions", now shipped) These were the *candidate L7+* items. With **v1.3.0** they ship as the **L7 capability tier** (`/context-level 7`) โ€” cross-cutting capabilities layered on top of L6, **no new hook** (same pattern as L6; see ADR-0008). Items #2โ€“#8 are the L7 set; #1 shipped earlier. 1. โœ… **Design / Product / Ops squads** โ€” **Shipped in v0.5.2:** `compliance-team` (LGPD), `design-team` (UX/UI/a11y), plus `product-owner` / `devops` starters, organized by a `contextkit/squads/` manifest with a sovereignty rule. The squad pattern is proven; further families (docs/data/growth/support) follow it. 2. โœ… **Fleet mode (MVP).** One control plane over many repos via `/fleet` + `fleet.mjs` โ€” registry at `~/.contextdevkit/fleet.json`; aggregate `stats` / `audit` across a portfolio; detect CLAUDE.md rule drift (`propagate --check`, detect-only). *Deferred (โ†’ ADR-0047, trigger-gated): auto-applying rule edits across repos; remote repos.* 3. โœ… **Outcome-driven agent tuning (MVP).** `/tune-agents` + `agent-tuning.mjs` aggregate per-agent signals (briefing coverage, usage) and **propose** briefing refinements (mirrors `/distill-sessions`; applies nothing). *Deferred (โ†’ ADR-0047, re-evaluate after ADR-0044/F3 ships attribution): a closed auto-loop + real per-agent outcome capture (PR-review / test attribution).* 4. โœ… **Editor/CI surfaces (MVP).** Status-line widget (`statusline.mjs`, wired as `settings.statusLine`, preserves a user's own) + a **quality CI workflow** (`contract-scan --ci` + `tech-debt --ci`, shipped to `.github/workflows/`). *Deferred (โ†’ ADR-0047, trigger-gated): the Claude-driven PR-review bot (needs Claude in CI); making the checks **required** is a branch-protection setting, not code (refused as kit code in ADR-0047).* 5. โœ… **Pluggable detectors & language packs (MVP).** Drop-in detectors from `contextkit/detectors/*.mjs` (loaded by `tech-debt-scan`) + stack **presets** (`install.mjs --preset next|go|python`, merged into config). *Deferred (โ†’ ADR-0047, demand-driven): a larger preset library.* 6. โœ… **Diverse & visual testing harness (MVP).** `/visual-test` + `visual-test.mjs` **scaffold** a browser-driven, visual layer (screenshot / visual-regression) for the detected stack โ€” **Playwright JS** (`@playwright/test`) + **Python** (pytest-playwright); `status` detects an existing harness. Owned by `qa-e2e` (+ `design-team` for baselines), wired into `/scaffold-tests`, `/qa-signoff`, and the `/ship` gate. The runner is a **project** dependency (the kit scaffolds, never bundles/runs browsers) โ€” true to the zero-dep hot-path invariant. *Deferred (โ†’ ADR-0047): real baselines/diffing (trigger-gated); browsers in the kit's own CI and a hosted diff service are **refused** there (invariant violations).* **v2.6 follow-up shipped:** `scaffold-tests.mjs` now gives the whole QA squad a deterministic stack map for Node/JavaScript, Python, Go, Rust, and PHP, plus happy/edge/failure cases and explicit-`--write` starter harnesses. 7. โœ… **Token economy & usage insight.** *Shipped (first cut):* `/token-report` + `token-report.mjs` read Claude Code's local session transcripts and aggregate **per-session token usage** (input / output / cache) and **per ISO week**, with a configurable **budget** (`tokens.budgetPerSession`) that flags hot sessions โ€” the cost extension of L6 **Insight**. Read-only, local, zero-dep, aggregated counts only. Next refinements: the per-agent/command breakdown is now owned by **ADR-0044** (autonomy F3, task 113); automated optimization hints remain open. 8. โœ… **Playbook management.** *Shipped:* the `workflows/playbooks/` foundation is now a **managed, runnable** layer โ€” `playbook.mjs` + **`/playbook`** to **list** the registry (discover what exists), **show** a procedure, and **run** one (records a tracked entry in `contextkit/memory/playbook-runs.md`, then prints the steps to execute). `/ship` and the squads can `run` a playbook instead of restating it. Turns repeatable procedures into first-class, auditable assets โ€” same "plain files, advisory, inspectable" posture as the rest of the kit. ## DevPipeline `working/` stage + declarative squad pipelines (ADR-0015) โ€” โœ… SHIPPED (v1.9.0โ€“v1.13.0) A single ADR opens two adjacent moves, sharing one substrate (`state.json` per in-flight item). Inspired by a read of [opensquad](https://github.com/renatoasse/opensquad)'s declarative pipeline, *not* a copy โ€” the kit's zero-dep + model-router + simulate-impact invariants reshape the grammar (no full expression eval, no vendor model names, opt-in per squad, dry-run as a first-class mode). - โœ… **DevPipeline gains a `working/` stage** (task **037**, ADR-0015 ยงB). Today `testing/` carries two meanings โ€” "actively being worked on" and "code written, awaiting QA". That conflation hides cross-session conflicts: session A can be hammering on task `031` while session B has no idea unless A manually `/claim`ed the right paths. The new stage holds **only WIP**; `testing/` reclaims its sign-off meaning. `/pipeline start ` and `/pipeline stop ` move tasks in/out; the workspace record (`.claude/.workspace/.json`) gains a `tasks[]` array so the dashboard surfaces *which session owns which task, right now*. Stale auto-eviction (default 90m without a heartbeat) keeps the lane honest. *Shipped: `pipeline-session.mjs` (start/stop + ADR-0032/072 gates), `claim.mjs` attach/detach, stale sweep in `workspace-sync.mjs`; lifecycle automation (auto-start / auto-conclude) followed in ADR-0034 (v1.12.0). Residual: the pre-push hook still refuses on paths only โ€” the task-id extension wasn't done.* - โœ… **Declarative `pipeline.yaml` per squad + engine** (task **036**, ADR-0015 ยงA). Optional file per squad declaring steps, `condition`, `on_reject`, `max_review_cycles`, `model_tier`, `execution`, and `type: checkpoint`. Parsed via `lib/yaml.mjs` (ADR-0013 optional dynamic import); engine refuses *with a clear message* when `yaml` is absent โ€” pipelines are opt-in, not a hot-path feature. First consumer is `agent-forge` (Fase 6). Dry-run is a first-class mode โ€” `squad-pipeline.mjs --dry-run` prints the would-be execution order so `/simulate-impact` can map pipeline-edit blast radius before changes ship. *Shipped: `squad-pipeline.mjs` + `squad-pipeline-condition.mjs` + `agent-forge/pipeline.yaml` + `docs/SQUAD-PIPELINE-FORMAT.md` (session 30). Residual: `/ship` still runs its own staged flow (`ship-state.mjs`), it hasn't adopted the DSL.* - โœ… **Canonical `state.json` substrate** (task **038**, ADR-0015 ยงC). One schema for both "task in flight" and "pipeline run", recording owner session/user/branch, current step, heartbeat, retry cycles. *Shipped: `runtime/state/state-io.mjs`, stamped by `pipeline start/stop` (`kind: task`) and `/ship` resume (`kind: pipeline-run`, ticket 074); `/runs` (task **039**, `runs.mjs`) lists recent transitions + runs. Residual: Forge Stats v2 doesn't read it yet (success rate + retry distribution).* **Stays inside the invariants:** the DSL is opt-in per squad, hot-path stays zero-dep (yaml only behind the sanctioned optional import), and `condition` accepts a whitelisted grammar โ€” no arbitrary expression evaluation. ## GitHub sync awareness in the dev flow (ADR-0026) โ€” โœ… SHIPPED (v1.9.0, session 35) The boot banner already shows **branch/commit** divergence (`checkGitDivergence`) and the 20 most-recent remote branches (`activeBranches`), and `pre-push` blocks real conflicts. The missing layer is **PR awareness** โ€” nothing asks GitHub "what PRs are open, which are awaiting CI/review, is there already a PR for this branch?". Two moments are PR-blind: starting work (`/dev-start`) and opening a PR (`/git pr`). - โœ… **`sync-check.mjs` โ€” `preflight` + `prepr` modes** (ADR-0026). One zero-dep script. `preflight` (run by `/dev-start`, before scope-lock): ahead/behind, recent branches, and **open PRs with CI/review status**, flagging PRs *awaiting status* that may overlap the objective. `prepr` (run by `/git pr`, before push): re-check divergence vs `main` and **detect a duplicate open PR** for the current branch. `gh` is optional โ€” absent/unauthed degrades to the git-only half and reports the PR check as **skipped, never a pass** (Rule 8); offline โ‡’ silent exit 0. *Refined in v1.13.0 (ticket 065): no implicit `git fetch` on read-only checks โ€” network is opt-in via `--fetch`.* - โœ… **Wiring** โ€” `/dev-start` gained a step 0 (preflight) and `/git pr` a pre-step (prepr). PR queries stay **off** the `SessionStart` hot path (network + `gh` auth would violate the never-block invariant, Rule 2). **Stays inside the invariants:** zero-dep script, `gh` optional with a graceful skip, PR discovery only in explicit opt-in commands (never the hot path), and the script is read-only โ€” it never creates, edits, or merges a PR. *Deferred (โ†’ ADR-0047):* `glab` (GitLab) parity, a PR line in `/git status` (bucket A โ€” implement), a `--watch` checks poll, and a latency cache. ## Token economy: the digest layer (ADR-0027) โ€” โœ… SHIPPED (PR #41, 2026-06-03) The kit already **measures** token usage (`/token-report` + `token-report.mjs`, roadmap #7) but has no **reducer**. A measurement pass over the 65 command files + the boot hook found the lever: in this kit, tokens are spent when *the AI reads files and reasons*, and a cluster of high-value commands still makes the AI ingest **full raw markdown** instead of a pre-digested view โ€” the biggest single-run cost being the periodic L5/L6 commands that read the **last ~10 session logs raw** (~1,100โ€“1,300 lines โ‰ˆ 13โ€“16K tokens *before reasoning*), and the highest-frequency cost being the **boot banner injecting 60 raw lines of the last session every session**. ADR-0027 adds a deterministic, zero-dep **digest layer** that pre-digests the two artifacts the AI re-reads most (session logs, ADRs) so it reasons over compact output. Estimated **~120โ€“200K input tokens/week** saved on an active project (full per-command estimate + assumptions in [docs/token-economy-plan.md](token-economy-plan.md)). - โœ… **Shared single-source extractor** (ADR-0027 ยง1). Shipped as `runtime/hooks/md-extract.mjs` (generic markdown primitives) + `runtime/hooks/session-digest-core.mjs` (session parser) โ€” flat-module convention, not a `lib/digest/` dir; reused by the boot hook and the script wrappers, no duplicated parsing (Rule 4). - โœ… **`session-digest.mjs`** (ADR-0027 ยง2). Session logs โ†’ ~6-line structured digest (`--last N` / `--id` / `--json`). Rewired `/distill-sessions` + `/retro` to read digests, not raw logs โ€” the **biggest single-run wins**. - โœ… **`adr-digest.mjs`** (ADR-0027 ยง3). An ADR catalog (status ยท title ยท one-line decision) + `--search`. Replaces "read 3โ€“5 ADRs to find the relevant one" with "read the catalog, open at most one". Wired into `/ship`, `/new-adr` (dup-decision check), `/deep-analysis`. - โœ… **`context-pack.mjs`** (ADR-0027 ยง4). One bounded "start of work" bundle (latest-session digest + `[Unreleased]` + immutable rules + open backlog + recent ADRs) that collapses the 3โ€“5 sequential reads in `/dev-start`, `/state`, `/ship` into **one script call** โ€” fewer tokens *and* fewer round-trips. - โœ… **Boot hook digest** (ADR-0027 ยง5). `session-start.mjs` emits a ~6-line digest for "Last registered session" instead of 60 raw lines, **falling back to the raw-truncated output on any parse miss** (Rule 2/8 โ€” degrade, never break). The highest-frequency saving (every session). **Stays inside the invariants:** no new hook and no new dependency (plain scripts + one shared pure module); digests are **deterministic extraction**, not AI-written summaries (reproducible, free); a digest miss is a **visible raw-fallback**, never a silent drop; each slice ships with a `selfcheck`/`integration-test` assertion and is re-measured against `/token-report`. *Deferred:* per-command/agent token attribution in `/token-report` โ€” now owned by **ADR-0044** (autonomy F3, task 113); the DevPipeline board digest for `/pipeline` and an mtime-keyed digest cache โ†’ **ADR-0047**. ## Proactive Advisor: a six-lane improvement engine (ADR-0028) โ€” โœ… SHIPPED The kit ships strong *individual* analysis surfaces but no single capability that reads the project and โ€” **proactively, not reactively** โ€” surfaces improvement suggestions **before and after each change**, classified into six lanes: **architecture ยท features ยท deepen existing ยท security ยท UX ยท growth/retention**. A sweep mapped each: architecture and security are strong and (for security) already proactive; features and UX exist but are reactive; **two lanes have no owner at all** โ€” *deepen existing features* and *growth/retention* (only acquisition exists, via `seo-specialist`). And nothing **aggregates** the lanes or fires automatically at the end of a change. - โœ… **Core shipped** โ€” `/advise`, the aggregator. Two modes (`--before ` = opportunities + risks; `--after` = improvements, scoped to the changed surface), an optional `--lane ` filter. It **delegates** to the owning agent per lane (`/analyze-code-ia-practices`, `/deep-analysis`, `/roadmap`, the design-team) โ€” it never re-implements them โ€” and feeds every surviving finding into the DevPipeline backlog tagged `advise:`. It does not edit code. - โœ… **Single-sourced taxonomy** โ€” `advisor.lanes` in config maps each lane to an `owner`; `deepen` and `growth` ship as declared `owner: null` **seams** that `/advise` surfaces as *skipped โ€” no owner*, never faked (rule 8/9). - โœ… **Proactive trigger** โ€” the Stop hook (`check-registration.mjs`) nudges `/advise` after a productive session (โ‰ฅ 2 important paths touched, debounced 24h, config-gated) โ€” the "after each implementation" moment. Mirrors the `securityMode` posture: active by default, silent otherwise, never blocks (Rule 2). **Stays inside the invariants:** the expensive fan-out lives in the command, never the hot path; the Stop nudge is cheap, zero-dep, debounced, exit-0-on-error; the lane โ†’ owner map is config-driven; unowned lanes degrade to a visible skip. - โœ… **Both seams filled โ€” 6/6 lanes owned.** The **`growth-team`** squad shipped (`growth` lead + `retention`, with the shared `seo-specialist` for acquisition) wired to `growth.owner`; and **`deepen`** got an owner โ€” the `product-owner` **depth lens** (maturing existing features, distinct from greenfield `features`). *Deferred (โ†’ ADR-0047):* a `--since ` diff scope for `--after`; optional two-tier briefings for `growth` / `retention`. *Since shipped:* recurring advisor findings now feed `/retro` + `/tune-agents` โ€” `advise-review.mjs` reports per-lane hit-rate (ticket 070, PR #50). ## Behavioral discipline layer (ADR-0029) โ€” โœ… SHIPPED A review of the MIT-licensed [`andrej-karpathy-skills`](https://github.com/multica-ai/andrej-karpathy-skills) repo surfaced a clean asymmetry: the kit is strong on the **structural** layer (*what good code looks like* โ€” `best-practices.md`, the constitution) and on governance, but **thin on the *behavioral* layer** (*how the agent acts while producing the diff*). Karpathy's four principles mapped to: โœ… *Simplicity first* (had it, ยง9), ๐ŸŸก *Surgical changes* (only inside `/dev-start`), ๐ŸŸก *Goal-driven* (tools but no rule), โŒ *Think before coding* (**no rule told the agent to surface assumptions and ask when ambiguous before coding** โ€” the biggest gap). - โœ… **`behaviors.md` + `behaviors-examples.md`** โ€” the behavioral sibling of `best-practices.md`: the four guidelines (think-before-coding ยท simplicity ยท surgical ยท goal-driven) with Do/Don't + a "Fits the kit" map, plus before/after diffs of each anti-pattern. Credits the MIT source. - โœ… **Constitution ยง8** (`CLAUDE.md.tpl`) โ€” a concise behavioral-discipline section pointing to `behaviors.md`; **reconciles the one tension** (refactor by responsibility is *deliberate* via `/dev-start` / `/analyze-code-ia-practices`, never an opportunistic side effect). - โœ… **`behaviors.active` (default ON)** + a ~4-line boot reminder mirroring the best-practices block; installer seeds both docs; `selfcheck` asserts each piece. **Stays inside the invariants:** the boot reminder never blocks (Rule 2), the guidance is single-sourced in one doc referenced by the constitution + boot (Rule 4), and we adopted the proven four rather than inventing a fifth (Rule 9). *Noticed, since fixed:* `best-practices.md` links to `review-protocol.md`, which the installer's seed list didn't copy โ€” `review-protocol.md` now ships in `templates/contextkit/` and is in the engine installer's seed list (selfcheck asserts it). ## Next โ€” Autonomy dial: a consent axis orthogonal to levels (ADR-0041) Six same-day deliberations (4 Gemini rounds + 2 full-staff Claude rounds, 8 voices each โ€” see `contextkit/memory/deliberations/2026-06-10-06-โ€ฆ`) converged: the kit gains a user-configurable **autonomy degree** โ€” `autonomy.grade` (1 manual ยท 2 suggest+supervise *default* ยท 3 auto-except-ADR ยท 4 full-auto, experimental) โ€” as a **consent axis orthogonal to L1โ€“L7** (levels = what the kit *can* do; grade = what it *may* do without asking). One pure `resolveAutonomy(area)` resolver is the only read path; **hooks stay grade-blind** (only commands + `/ship` checkpoints consult it); a **non-negotiable floor in code** (secrets path-class, no force-push, no gate self-edit, ADR-class always human, grade escalation always human) that config can never lower. Five phases, hard gates between them: The implementation sequence โ€” ADR-0041 is the umbrella; each phase's *new* technical contracts live in a delta-only continuation ADR (nothing repeated): | # | Etapa | ADR | Motivo (why now) | Impacto | Tasks | Status | | --- | --- | --- | --- | --- | --- | --- | | 1 | **F0 โ€” Trust floor & hygiene** | 0041 (direct โ€” fully specified there) | No autonomy can be built on a bypassable gate; the accidental level-4 bypass proved the failure *pattern* | Kills the "silently-disabled gate" regression class (wiring-drift selfcheck = highest-leverage control of the package); secrets path-class exists before any grade can act; selfcheck regains constitutional room | 100โ€“105 (P0/P1) | ๐Ÿ“‹ | | 2 | **F1 โ€” Dial core (grades 1โ€“3)** | **0042** โ€” resolver contract, precedence (flag > session > config, floor clamps last), floor API, derived surfaces | The user's single trust decision is today shattered across โ‰ฅ5 surfaces; displayed grade must โ‰ก enforced grade by construction | One stable function for every consumer; `--session` makes trying a grade cheap and reversible; repeated re-consent disappears | 106โ€“110 (P2) | ๐Ÿ“‹ | | 3 | **F2 โ€” Observable substrate** | **0043** โ€” state.json schema v1 + transition-legality matrix (accepts ADR-0015 ยงC) | Grade โ‰ฅ3 without an append-only trail = "the LLM narrating its own success"; auto-moves were rejected precisely for lacking it | Every autonomous action gets a record, an inverse (undo) and a metric; the board becomes an observable state machine; feeds the grade-4 bar | 110 (legality), 111โ€“112 (P3) | ๐Ÿ“‹ | | 4 | **F3 โ€” Token fan-out economy** | **0044** โ€” subagent pack contract, per-command attribution (first), budget-gate semantics, deterministic memory retriever | Fan-outs are the dominant cost post-ADR-0027 (~25โ€“80K/deliberation); the grade-4 budget gate cannot govern spend nobody attributes | ~25โ€“80K saved per multi-agent run + ~10โ€“20K/wk at boot; more context headroom in every session; the distill-memory seam is re-owned without inflation risk | 113โ€“115 (P2/P3) | ๐Ÿ“‹ | | 5 | **F4 โ€” Grade 4, experimental** | **0045** โ€” eligibility bar (numbers), hardened quorum (deterministic voice + security veto), session-scoping, kill-switch | Inter-AI review only becomes a *control* with deterministic anchors and a veto; eligibility must be measured, not vibed | Delivers "full-auto with rigorous inter-AI control" without inverting advisory-by-default; one-line kill-switch; branch-only blast radius | 116 (P3) | ๐Ÿ“‹ | **Refused, on record:** ambient idle-escalation (absence โ‰  consent), AI-written self-registering plugins, Wasm sandboxing / genetic swarms / local LoRA (deliberation 03 โ€” violate the zero-dep and filesystem invariants). **Stays inside the invariants:** the dial never changes hook behavior (rule 2); the floor is code, not config (rule 8); every phase ships merge-blocking tests (rule 3); grades 1โ€“3 are plumbing of existing flags through one single-sourced resolver (rule 4); grade 4 is opt-in, experimental, and reversible. ## Next โ€” Deferred-items consolidation (ADR-0047) ๐Ÿ“‹ A roadmap verification pass (2026-06-10) confirmed every section above as shipped and gathered the surviving *Deferred:* footnotes (from ADR-0002 / 0005 / 0006 / 0007 / 0026 / 0027 / 0028) into one decision โ€” **ADR-0047** โ€” explicitly **excluding** everything the autonomy package (ADR-0041โ€ฆ0045, tasks 100โ€“116) already owns. Three buckets: - ๐Ÿ“‹ **Bucket A โ€” implement now** (each has a real consumer today): a PR line in `/git status` (facts already computed by `sync-check.mjs`) ยท `/advise --after --since ` diff scope ยท a DevPipeline **board digest** (extends the ADR-0027 layer for `/pipeline` + `/plan-week`) ยท **scheduled alert-sync** (opt-in cron in the scaffolded `security.yml`, runs in the *project's* CI) ยท **registry-backed staleness** in `/deps-audit` (opt-in `--registry`; no network โ‡’ skipped, never a pass). - **Bucket B โ€” stays deferred, trigger named:** `glab` parity ยท `--watch` poll + latency cache ยท fleet `--apply`/remote repos ยท tuning closed-loop (after ADR-0044/F3 attribution) ยท PR-review bot (Claude in CI) ยท preset library (demand-driven) ยท visual baselines (first committed baseline) ยท growth/retention briefings. - **Bucket C โ€” refused, on record:** hosted visual-diff service, browsers in the kit's own CI, required-check enforcement as kit code (it's branch protection). ## Conversion squad + deterministic LP scaffold (ADR-0050) โ€” โœ… SHIPPED (PR #69, 2026-06-11) The landing surface (ADR-0023/0025) gained its missing four fronts in one package โ€” strategy, legal, instrumentation and token economy: - โœ… **`lp-scaffold.mjs` + `lp-build.mjs` + `starters/landing/`** โ€” componentized source (one fold per file; `content/copy.json` is the AI's only editing surface) assembled into one atomic indexable `dist/`; `--check` refuses leftover tokens/sentinels and runs `seo-audit` + `aiso-audit` on the output (born green, asserted in CI). Resolves ADR-0023's deferred starter without inventing domain content. *Estimated ~30โ€“60K โ†’ ~5โ€“10K tokens per LP; baseline to be measured via `/token-report` on the first real page.* - โœ… **LGPD by default** โ€” consent banner ON (Consent Mode default-denied), GTM direct but ID-less (inert, consent-gated), pixels as commented consent-wrapped models only, privacy policy + terms generated as drafts (non-removable lawyer-review disclaimer), webhook-decoupled lead forms. - โœ… **`conversion-strategist` + `tracking-integrator`** (design-team) โ€” interview- first neurodesign (refuses invented proof) and consent-first instrumentation (pairs with `privacy-lgpd`); lean agents + tier-2 briefings. - โœ… **Playbook + `/landing-page` v2** โ€” fold-anatomy menu, neurodesign verify-don't-vibe table, legal & consent defaults, deterministic path. *Refused, on record: fixed Next/React stack mandate, parallel 150-line cap, 7-fold minimum, example social proof, auto-wired pixels.* *Deferred:* EN-language legal templates (pt-BR/LGPD market first, rule 9); framework-variant (Astro/Next) scaffold parity; measuring the real token delta. ## Next โ€” Agent swarm + cost-tiered model routing (ADR-0051 + ADR-0052) Two linked decisions (session 52, 2026-06-11) ride on the completed autonomy substrate (F0โ€“F4, v2.0.0). Full analyses: `docs/explanation/swarm-feasibility-study.md` and `docs/explanation/model-tier-routing-study.md`. **ADR-0052 โ€” cost-tiered model routing** (Accepted; Phase 1 โœ… shipped, session 52): expensive models think, cheap models execute. All 34 agents declare a `model:` tier alias (7 opus ยท 21 sonnet ยท 4 haiku ยท 2 inherit, selfcheck-pinned); `/ship`/`/advise`/`/debate`/`/scaffold-tests` classify think-vs-execute at dispatch with floors (security โ‰ฅ sonnet), one-step QA escalation and budget de-escalation; capability-matrix refreshed to 2026-06 prices (+ Fable 5). Deterministic-first โ€” no LLM-judge per dispatch; agy host gap on record. | # | Etapa | ADR | Motivo (why now) | Impacto | Tasks | Status | | --- | --- | --- | --- | --- | --- | --- | | 1 | **Routing Phase 1 โ€” frontmatter tiers + dispatch rules + matrix refresh** | 0052 | Every subagent burned session-model (premium) tokens; Claude Code enforces `model:` natively, zero engine code | Execution fan-outs ~5โ€“10ร— cheaper per call (Haiku $0.275 vs Fable $2.75 on a typical /ship fan-out); cache-safe by construction | โ€” | โœ… session 52 | | 2 | **Routing Phase 2 โ€” deterministic resolver + per-model attribution** | 0052 (follow-up) | The โ‰ฅ60%-savings acceptance needs `byModel` data on top of D3; floors/escalation deserve code, not only skill text | `routing-policy.json` + `model-policy.mjs` (reuses router `loadMatrix`) + `selfcheck-model-policy` + `byModel` in token attribution | 139 | ๐Ÿ“‹ | | 3 | **Swarm P0 โ€” zero-code validation run** | 0051 ยง9 | One afternoon answers the two killer questions (cost/task, conflict rate) before any engine code | Baseline measured: 52โ€“61K tok/task, 0/2 conflicts, sonnet+haiku tiers; the test-file-spillover finding now drives the planner | 123 (precondition) | โœ… session 52/56 | | 4 | **Swarm v1 โ€” grade-3 coordinator** | **0051** | The substrate is complete; what's missing is the coordinator: nothing pulls N backlog tasks into parallel governed workstreams | `/swarm` + pure `swarm-plan.mjs` + `swarm-state.mjs` manifest; `swarm-dispatch` area; `by` field on events; worktree-per-workstream; **finishes at `testing`, never `done`** โ€” `/swarm review` batches human approval; 23-check integration test | 123 | โœ… session 52/56 | | 5 | **Swarm v2 โ€” grade-4** | 0051 ยง8 | Only after the ADR-0045 bar **plus โ‰ฅ3 clean grade-3 swarm runs** (eligibility data accumulates passively โ€” the real bottleneck) | Per-workstream hardened quorum, branch-only auto-push; permanently-grade-3 is an acceptable outcome | post-123 | ๐Ÿ“‹ | **Refused / deferred, on record (ADR-0052 ยง7):** LLM-judge per dispatch (costs what it saves + frozen quality opinions, ADR-0012 ยง5) ยท cross-CLI delegation (spend invisible to `/token-report` โ‡’ unmeasurable savings) ยท invented agy Gemini mapping (host exposes no per-dispatch model API โ€” rule 8). ## Design invariants (don't regress these) - **Zero runtime deps on the hot path.** Levels 1โ€“3 run with nothing installed. - **Hooks never break real work.** Exit 0 on error; silent unless they must speak. - **Everything is plain files in the repo.** Inspectable, reversible, no lock-in. - **Config-driven, not hardcoded.** Stack specifics live in `contextkit/config.json`. - **Advisory by default; enforce by choice.** Gates inform; you opt into blocking.