# Architecture

The repository is organised around one task-agnostic core and several task packages that plug into it via a single contract.

## agent_core

`agent_core/` provides everything that does not depend on a specific training recipe.

- `harness/` runs the closed loop. Blackboard, event log, result tracker, baseline audit, and trial dispatch all live here.
- `agents/` wraps the Claude Agent SDK into a specialist session. It owns the per-iteration system prompt assembly, the tool-bind protocol, and the lifecycle hooks.
- `tools/` defines task-shared MCP tools such as `syntax_check`, `param_count`, `diff_snapshots`, `read_snapshot`, and `rebase_to`.
- `supervisor/` is the top-level controller. It iterates trial slots, dispatches them to specialist sessions, persists state, and shuts down cleanly on signals.

The core does not know that a task is Parameter Golf or CIFAR. It reads everything task-specific through the `TaskAdapter` interface declared in `agent_core/task_adapter.py`.

## Task packages

Each task package implements one `TaskAdapter` subclass plus the task-side artefacts the harness needs:

- `task_config.py` defines the adapter and registers it on import.
- `train_gpt.py` (PG) or `airbench96.py` (CIFAR) or `experiment.py` (NC) is the editable recipe.
- `run_trial.sh` is the shell entry the harness invokes for each trial.
- `swarm_config.json` carries per-specialist model assignment.
- `knowledge/` carries static markdown documents pinned at the top of every system prompt.
- `agents/` declares per-domain specialist preambles plus the `prompts.py` assembler.
- `tools/` carries task-specific tools such as `pack_submission` for PG.

The variant packages `single_agent_pg/` and `multi_agent_generic_pg/` are peers of `multi_agent_pg/`. They reuse PG's editable recipe, run script, and knowledge tree by symlink, and override only the adapter to change the specialist roster.

## Closed-loop flow

A submitted trial goes through five steps.

1. The supervisor selects a domain that needs work and starts a Claude Agent SDK session with the corresponding specialist preamble.
2. The agent reads the rendered lineage view, picks a hypothesis, edits its workdir copy of the editable recipe, and calls `submit_trial`.
3. `submit_trial` runs local checks (syntax, projected packed size on PG, recipe shape on CIFAR), then dispatches the trial by invoking `bash run_trial.sh` as a subprocess on the local GPU node.
4. The harness collects the per-trial log, parses the score and status with the task's `run_classify`, and appends a row to `results.tsv` plus an event to `events.jsonl`.
5. The next session reads a freshly rendered lineage view that includes this row and refines its proposal.

External evaluators own the score, the legality checks, and the timing source. The recipe cannot rewrite them.

## Where to look next

- `docs/task_adapter.md` is the property-by-property contract every task package must implement.