# Why ContextDevKit: the engineering case

_The rationale behind treating AI-assisted development as engineering — who this is
for, what problem it actually solves, and what you give up to get it._

## The audience-conflation problem

Most AI coding tools are designed for one of two audiences:

- **Explorers** who want fast results and don't mind throwaway context. A blank chat
  window and a capable model are enough.
- **Engineers** who need results that compound — decisions that are traceable,
  sessions that hand off cleanly, work that a future agent (or colleague) can resume
  without a briefing.

The problem is that the same tool is sold to both. An "explorer" feature (no session
state, no persistent memory, no governance) is a liability for an engineer. An
"engineer" feature (enforced workflows, drift detection, decision records) feels like
overhead to an explorer. Most tools pick a lane by pretending the other audience
doesn't exist.

ContextDevKit picks the engineering lane explicitly. It is designed for developers
who are using AI for work that matters — work that ships, work that is maintained,
work where a bad decision made in session 3 still has consequences in session 47.
If you want fast throwaway results, this kit is genuine overhead and you should not
install it. If you want AI-assisted development to behave like engineering, read on.

## The core thesis: enforcement beats instruction

The canonical way to control AI behavior is to add instructions to `CLAUDE.md` (or
the equivalent). "Always write tests." "Always record decisions in ADRs." "Never
commit without running the quality gates." These instructions work until they don't
— which is whenever the model is under time pressure, context is long, or a
simpler path is available. The more capable the model, the more persuasively it
argues for the shortcut.

ContextDevKit's thesis is that **the right unit of enforcement is a hook, not an
instruction**. A hook is code that runs deterministically. It does not reason, it
does not weigh trade-offs, it does not get tired. The Stop hook either sees a
registered session or it doesn't. The pre-push gate either passes quality checks or
it exits 1. The `advance` engine either finds the required deliverable in the right
place or it names the gap and refuses.

The implication is significant: the kit's governance does not depend on the quality
of any particular model. A capable model and a weak model are held to the same bar
by the same code. Governance that lives in a prompt is a guideline; governance that
lives in a hook is a constraint. ContextDevKit builds constraints.

## What "durable memory" actually means

The phrase "project memory" is used loosely in AI tooling to mean anything from
conversation history to vector database embeddings. ContextDevKit's memory is
deliberately narrow and deliberately plain-text:

- **ADRs** record *why* a decision was made — the forces considered, the
  alternatives rejected, the trade-offs accepted. Not what was built; why it was
  built that way. An ADR that says "we chose PostgreSQL" is useless. An ADR that
  says "we chose PostgreSQL over SQLite because the project has three concurrent
  writers and we need row-level locking" is recoverable context six months later.
- **Session logs** record *what* happened in each working session — files changed,
  decisions made, tasks advanced. Not a transcript; a structured record that the
  next session can diff against the current state.
- **The glossary** records the mapping between UI language and code identifiers.
  When a product term and a code term diverge, bugs follow. The glossary is the
  single source of truth for that mapping.

All of this lives in your repository, under version control, in Markdown. No
external service, no API key, no database. It is readable by a human, diffable by
git, and loadable by any AI session that can read files. The durability is not a
feature of the storage system — it is a consequence of choosing the right format.

## Why the level system

The seven levels exist because governance has a cost, and that cost should match the
stage of the project.

A greenfield experiment in week one does not need an L5 mutation guard or an
enforced workflow journey. Adding those constraints too early kills momentum without
yielding the return they are designed for. The kit installs at L3 for an empty
folder — enough to have memory, track sessions, and prevent the common failure mode
of "we started but nothing is recorded."

An existing production codebase with multiple contributors working in parallel needs
the full stack: branch-scoped workflow guards, pre-commit compliance auditing, the
deliberation council before architectural decisions, and cost-tiered model routing
to keep the token economy sustainable. That is L6/L7.

The level system lets you start where you are and climb as the project's needs
mature. Climbing adds capability; descending removes now-unnecessary constraints
without losing the memory that was already accumulated.

## The autonomy dial and its floor

The autonomy dial (`autonomy.grade` 1–4) answers a different question from the
level system. Levels control *what capabilities are active*. The autonomy grade
controls *how much the AI may do without asking* at whatever level is active.

Grade 2 (the default) is the engineering-conservative posture: the AI suggests,
explains its reasoning, and waits for confirmation before mutating state. Grade 3
is appropriate for mature projects where the AI has a track record — it auto-executes
most actions but defers the irreversible ones (ADR writes, force-push, grade changes)
to a human quorum. Grade 4 is full-auto with a deliberation quorum at each gate,
designed for supervised batch work.

The floor is non-negotiable and encoded in the engine, not in a prompt. At every
grade:

- Secrets are never auto-committed.
- Force-push to the default branch is always blocked.
- ADR writes always require a human signature.
- Gate and hook self-edits (the governance machinery editing itself) are always blocked.
- The grade itself cannot be changed by the AI.

These are not guidelines. They are conditions the `resolveAutonomy()` function checks
before returning a permitted state. The AI cannot argue past them because they are
not in the argument layer.

## The token economy rationale

Token cost in AI-assisted development compounds in ways that are not obvious until
they are large. A session that loads all eight squad playbooks at boot because the
developer might need any of them is spending tokens on context that is almost never
used. A session that spawns every subagent at the premium model tier because that is
the default is paying a 5–10× premium on work that does not require premium capacity.

ContextDevKit approaches this as an engineering problem, not a product feature. The
squad director computes at boot which squads the current diff actually implicates,
and loads only those playbooks — subtraction, not addition. Cost-tiered model routing
assigns the reasoning tier to work that genuinely requires it (architecture,
security, privacy) and the fast tier to work that does not (scaffolding, packaging,
read-only exploration). The economy runtime measures spend per command and per agent
and makes it visible on the Execution Contract after each run.

The goal is not to make AI-assisted development cheap. It is to make the cost
*proportional to the value* — to close the gap between what you pay and what you
get by spending expensive capacity only where it changes the outcome.

## What you give up

Honest accounting requires naming the costs:

- **Setup time.** The kit does not configure itself. `/setupcontextdevkit` does the
  heavy lifting, but you still need to review and tune `config.json`, populate your
  CLAUDE.md coding constitution, and mark your high-risk paths. This is thirty
  minutes the first time and five minutes on each subsequent project.
- **Governance latency.** The deliberation council adds time to opening a feature
  or recording a decision. The workflow engine adds time to each phase advance when
  deliverables are missing. These delays are the point — they exist because the
  work they gate deserves the delay — but they are delays.
- **Ceremony on small changes.** The kit is optimized for projects with meaningful
  architectural surface area. A one-file script does not need a workflow spec, an ADR,
  and a QA sign-off. The level system mitigates this (stay at L1 or L2 for small
  work), but the kit's ceiling is genuinely higher than its floor.
- **Learning curve.** The command set is large. The governance model has concepts
  (autonomy grades, deliberation phases, workflow journey gates) that require a
  mental model before they are fluent. The docs exist to build that model, but
  they are docs you need to read.

For projects where AI-assisted development is doing real engineering work — complex
decisions, multiple sessions, multiple contributors, maintained over time — these
costs are small relative to the alternative: context that evaporates between sessions,
decisions that can't be traced, and governance that exists only in a prompt the model
is free to rationalize past.

## See also

- [docs/explanation/workflow-governance.md](workflow-governance.md) — how the
  workflow journey is enforced in the engine, not in a prompt.
- [docs/explanation/deliberation-council.md](deliberation-council.md) — why the
  deliberation council fires automatically at the two moments it matters.
- [docs/explanation/active-squads.md](active-squads.md) — how squads went from
  declared-but-passive to actively routed and governed.
- [docs/LEVELS.md](../LEVELS.md) — what each level adds and when to climb.