# Plan: The Dark Factory

## Vision

TKO should reach a state where the repository is essentially maintained by AI
agents. Not as a novelty, but because the infrastructure — tests, CI, docs,
verified behaviors — is robust enough that agents can work autonomously and
humans can trust the output.

The term "dark factory" comes from manufacturing: a facility that runs without
lights because the robots don't need to see. In software, it means agents
handle the implementation — writing code, fixing bugs, updating dependencies,
writing docs — while humans focus on direction, design, and validation.

Simon Willison [describes five levels](https://www.alldevblogs.com/article/simon-willison/the-five-levels-from-spicy-autocomplete-to-the-dark-factory)
of AI-assisted programming, from "spicy autocomplete" (Level 0) to the fully
autonomous "dark software factory" (Level 5). StrongDM's AI team operates at
Level 5: ["Code must not be written by humans. Code must not be reviewed by
humans."](https://simonwillison.net/2026/Feb/7/software-factory/) Engineers
design specs, curate test scenarios, and watch scores. Agents do everything
else.

Willison's key observation: engineers shift from **building code** to
**building the systems that build the code**. The critical unsolved question
is how agents prove their code works without human review. The answer, for
StrongDM and for TKO, is tests — not as a checkbox, but as the primary
artifact that defines correctness.

TKO is currently between Level 3 and Level 4. Agents (Claude Code, Copilot)
do most of the implementation. Humans review PRs, set priorities, and make
architectural decisions. The goal is to push toward Level 5 where practical,
while being honest about where human judgment is still required.

## References

- [The Five Levels: from Spicy Autocomplete to the Dark Factory](https://www.alldevblogs.com/article/simon-willison/the-five-levels-from-spicy-autocomplete-to-the-dark-factory) — Simon Willison
- [StrongDM's Software Factory](https://simonwillison.net/2026/Feb/7/software-factory/) — Willison's writeup
- [An AI State of the Union](https://www.lennysnewsletter.com/p/an-ai-state-of-the-union) — Willison on the inflection point
- [Built by Agents, Tested by Agents, Trusted by Whom?](https://law.stanford.edu/2026/02/08/built-by-agents-tested-by-agents-trusted-by-whom/) — Stanford CodeX

## What makes this possible

The tooling modernization (Phases 1–6) was the foundation:

- **Verified behaviors** — `verified-behaviors.json` files are test-backed
  contracts that AI agents can check their work against
- **SOUL.md** — the philosophical foundation of Knockout, so agents
  understand *why* the framework works the way it does
- **AGENTS.md** — instructions that any AI coding tool can follow
- **llms.txt** — concise project context for LLM consumption
- **`bun run verify`** — single command to confirm nothing is broken
  (biome + tsc + build + verify:esm + vitest)
- **`bun run knip`** — detect dead code, unused deps
- **Changesets** — structured release management
- **CI safety net** — lint, typecheck, test, ESM verification on every PR
- **Branch protection** — all changes go through PRs
- **plans/** — documented intent so agents understand context

## What's not there yet

| Gap | What's needed |
|-----|---------------|
| **Dependency updates** | Renovate/Dependabot with 48h minimumReleaseAge |
| **Bundle size tracking** | CI check comparing browser.min.js against main |
| **Benchmarks** | vitest bench for observable/computed hot paths |
| **Coverage confidence** | Know which code paths are covered, which aren't |
| **Autonomous PR review** | AI reviewer that checks against verified behaviors |
| **Release automation** | Tag + release triggered by changeset merge, not manual |
| **Copilot/Cursor support** | `.github/copilot-instructions.md` extending AGENTS.md |
| **Issue triage** | AI can read an issue, reproduce it, propose a fix |
| **Scenario testing** | StrongDM-style holdout scenarios for end-to-end validation |

## Principles

1. **Tests are the source of truth.** If it's not tested, it doesn't exist.
   AI agents should never make changes they can't verify. Tests are not a
   checkbox — they are the primary artifact that defines correctness.

2. **Safety by default.** The CI pipeline should catch any regression an AI
   introduces. `bun run verify` must pass before any commit.

3. **Intent over implementation.** Plans and SOUL.md describe *why* things
   work the way they do. Code describes *what*. AI can change the what if
   it understands the why.

4. **Small, reviewable changes.** One concern per PR. A human should be able
   to review any AI-generated PR in under 5 minutes.

5. **No magic.** Every tool, script, and CI step should be understandable
   by reading the code. No hidden state, no implicit dependencies.

6. **Earn trust incrementally.** Start with low-risk automation (deps, docs,
   formatting). Expand to bug fixes and features as confidence grows. The
   level of autonomy an agent gets should match the level of safety net
   around it.