# UAP Architecture Overview `v1.93.1` Β· 223 TypeScript modules across 18 `src/` subsystems Β· 170+ test suites > **🏭 Where this fits:** Whole pipeline β€” this is the factory-floor map. A bare > agent walks work from an idea to a shipped change with no stations in between, > so it forgets context, edits the wrong branch, writes plausible-but-broken > code, declares "done" on a red build, and repeats the mistake next session. > **What it delivers:** a station at every stage where an agentic workflow > normally breaks, so your agent's output comes off the line understood, > isolated, built, verified, coordinated, shipped, and remembered. Think of UAP as the **software-delivery pipeline** your agent runs on β€” a sausage-factory floor where each station guards one stage of turning an instruction into a merged, working change. The stations are UAP's subsystems. This document is the floor plan: what each station is, and which break it prevents. For the end-to-end walkthrough, see the [delivery pipeline guide](../guides/DELIVERY_PIPELINE.md). The Universal Agent Protocol (UAP) is a layer that sits **underneath** an AI coding agent's harness β€” Claude Code, Factory, Cursor, OpenCode, Codex, and others. It does not replace the model or the harness. Instead it installs **hooks** that intercept the harness's tool calls, then mediates each call through three services β€” memory injection, policy enforcement, and tool-output compression β€” before handing control back. On top of that mediation layer it ships a rich CLI for memory, delivery, worktrees, tasks, deployment, and multi-model routing. This document describes the system architecture. For the normative harness↔UAP contract, see [PROTOCOL.md](PROTOCOL.md). --- ## The pipeline at a glance Every station below maps to a stage where a normal agentic workflow breaks, and to the subsystem that guards it. Read this as the conveyor belt your work rides: | Stage | What breaks without a station | Station (subsystem) | |-------|-------------------------------|---------------------| | **1. Intake** β€” understand the work | agent forgets past sessions, hallucinates scope | 4-tier **memory**, **reactor** per-prompt injection, DESIGN.md | | **2. Prep / routing** β€” pick the approach | wrong model, wrong tactic | pattern router, query-complexity, **multi-model routing**, recipe selection | | **3. Isolation** β€” protect the tree | agent edits `master`, clobbers files, collides | **worktree**-per-feature, **file coordination**, delivery gate | | **4. Build** β€” write the code | plausible-but-wrong code, stubs, empty/looping local-model output | **deliver** loop, serving-layer recipes, proxy guardrails, local-model handling | | **5. QC / verify** β€” prove it works | *the big one:* agent says "done" on broken code and self-grades wrong | **completion gates**, execution/runtime verify, **acceptance judge**, generatorβ‰ evaluator, tiered gates | | **6. Line coordination** β€” many hands | parallel agents collide, duplicate, deadlock | **coordination DB**, collaboration board + challenge, model-slot concurrency, deploy batching | | **7. Shipping** β€” merge & release | regressions, broken CI, skipped version bumps | worktreeβ†’PR flow, version/completion gates, CI feedback watcher, never-regress, git-safety | | **8. Feedback** β€” learn from it | same mistake every session | **memory promotion**, pattern RL, session analysis | Cross-cutting through every station: **policy gates** are the executable rules posted at each station, the **MCP Router** keeps the context lean so the belt never jams, and UAP runs across **9 harnesses** so the same floor plan applies wherever your agent works. --- ## The hook-mediation model A bare agent harness calls a tool (Edit, Write, Bash, a spawned sub-agent, an MCP tool) and the model sees the raw result. UAP inserts itself between the harness and the tool by registering hooks at the harness's interception points β€” this is how every station gets a chance to inspect the work before it moves down the line: - **Claude Code / VSCode / Factory / Cursor** β€” `PreToolUse` hooks - **OpenCode** β€” the `tool.execute.before` plugin hook - **Codex** β€” gating via the UAP MCP server (`execute_tool`) - **Hermes** β€” the `pre_tool_call` event The same logical lifecycle runs on every harness: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AGENT HARNESS β”‚ β”‚ (Claude Code / Factory / OpenCode) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ tool call β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ UAP HOOK LAYER ──────────────────────┐ β”‚ β”‚ session β”‚ SessionStart hook β”‚ start ─────┼─▢ β€’ inject last-24h memory ( … ) β”‚ β”‚ β€’ clean stale agents / work claims β”‚ β”‚ β”‚ per tool β”‚ PreToolUse / tool.execute.before hook β”‚ call ──────┼─▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ MEMORY │──▢│ POLICY │──▢│ MCP ROUTER β”‚ β”‚ β”‚ β”‚ injection β”‚ β”‚ gates β”‚ β”‚ token compression β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ deny (exit 2) β”‚ β”‚ β”‚ β–Ό β–Ό β”‚ β”‚ BLOCK call compressed result β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ allow β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ THE ACTUAL TOOL β”‚ β”‚ (fs / shell / MCP server) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` Hooks are **fail-open for context** (memory injection never blocks) and **fail-closed for safety** (a required policy violation returns exit code 2 and the harness aborts the call). Hook scripts are generated and installed by `uap hooks install` (`src/cli/hooks.ts`) from templates in `templates/hooks/`. --- ## Component map ``` src/ β”œβ”€β”€ memory/ 4-tier memory: working, session, semantic (Qdrant), graph β”œβ”€β”€ mcp-router/ hierarchical MCP router β€” tool hiding + FTS5 output compression β”œβ”€β”€ policies/ hook-based policy gates + 20 Python enforcers β”œβ”€β”€ delivery/ `uap deliver` convergence loop (15 modules) β”œβ”€β”€ coordination/ multi-agent registry, overlap detection, deploy batching β”œβ”€β”€ models/ multi-model routing, planning, execution profiles β”œβ”€β”€ tasks/ dependency-aware task tracker (SQLite, DAG) β”œβ”€β”€ dashboard/ live task / agent / memory / policy visualization β”œβ”€β”€ observability/ HALO / OpenInference span export for harness analysis β”œβ”€β”€ analyzers/ project structure analysis + metadata generation β”œβ”€β”€ generators/ CLAUDE.md / config generation β”œβ”€β”€ benchmarks/ Terminal-Bench harness + scoring β”œβ”€β”€ browser/ cloaked browser automation for agents β”œβ”€β”€ telemetry/ run telemetry β”œβ”€β”€ models/… (see above) β”œβ”€β”€ bin/ CLI entry (cli.ts), policy bin, llama-server-optimize β”œβ”€β”€ cli/ ~35 command modules wired into bin/cli.ts β”œβ”€β”€ types/ shared types └── utils/ logging and shared helpers ``` --- ## Subsystems (station by station) ### Memory (`src/memory/`) β€” Intake & Feedback stations **The break it prevents:** at intake, a fresh agent has amnesia β€” it re-litigates solved problems and hallucinates scope. At feedback, it drops the lesson it just learned and repeats the mistake tomorrow. Memory is the station that carries context onto the belt and carries lessons back off it. A four-tier memory system that gives the agent persistent context across sessions. The tiers (`src/memory/README.md`): | Tier | Backend | Purpose | |------|---------|---------| | **L1 Working** | SQLite `memories` table (~50 cap, FTS5) | recent actions | | **L2 Session** | SQLite `session_memories` table (FTS5) | current-session context, "open loops" | | **L3 Semantic** | Qdrant, 768-dim vectors | long-term learnings, semantic recall | | **L4 Knowledge** | SQLite entity/relationship graph | entity relationships, N-hop traversal | Embeddings (`src/memory/embeddings.ts`) use **`nomic-embed-text-v2-moe` (768-dim)** via a llama-server `--embeddings` endpoint, with fallbacks down a chain (Ollama `nomic-embed-text` β†’ OpenAI `text-embedding-3-small` β†’ local `all-MiniLM-L6-v2` β†’ TF-IDF). The provider is pluggable and cached (SHA-256-keyed LRU). Key modules: - `hierarchical-memory.ts` β€” in-memory hot/warm/cold tier manager with auto promote/demote, time-decay importance, and token-budget enforcement, persisted to its own SQLite DB. - `dynamic-retrieval.ts` β€” the per-task orchestrator: classifies the task, sets adaptive retrieval depth + token budget, queries all sources, dedups, compresses, and formats the final context block. - `memory-consolidator.ts` β€” summarizes working entries into session memory, extracts lessons, and dedups by content hash + embedding similarity. - `write-gate.ts` β€” a quality filter that scores candidate memories and only persists those above threshold (prevents memory pollution). This is the feedback station's own gate: only lessons that change future behavior get promoted. - `knowledge-graph.ts` β€” the L4 graph: upsert entities, strengthen relationships, recursive-CTE traversal. - `context-compressor.ts` / `semantic-compression.ts` β€” token budgeting and distillation of context into atomic typed facts. - `predictive-memory.ts` / `speculative-cache.ts` β€” prefetch likely-needed memories before they are queried. - `task-classifier.ts` β€” classifies an instruction to drive retrieval hints. - `model-router.ts` β€” benchmark-fingerprint LLM routing with feedback learning (consumed by `src/models/unified-router.ts`). ### MCP Router (`src/mcp-router/`) β€” cross-cutting: keep the belt from jamming **The break it prevents:** a context window flooded with tool schemas and raw tool output stalls every station downstream. The router keeps context lean so work keeps moving. A hierarchical Model Context Protocol router that achieves large token savings by two independent mechanisms (`src/mcp-router/server.ts`, `output-compressor.ts`): 1. **Tool hiding.** Instead of exposing 150+ downstream MCP tool schemas (~500 tokens each) to the model, the router exposes just three meta-tools β€” `discover_tools`, `execute_tool`, `deliver`. The model issues a natural-language `discover_tools` query, gets back matching tool paths, then calls `execute_tool({path, args})`. Downstream tools live in an in-memory fuzzy search index and are never surfaced as definitions. 2. **Output compression (FTS5).** `execute_tool` accepts an `intent`. Large tool output is chunked, indexed into an in-memory SQLite **FTS5** virtual table, queried with the intent using **BM25 ranking**, and only the top matching snippets (plus a searchable-vocabulary footer) are returned. Small outputs pass through unchanged; very large outputs without an intent are head+tail truncated. The design target documented in source is ~75,000 tokens of tool definitions collapsed to ~700 (98%+). Per-output FTS5 savings are computed live per call. See [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) for setup. ### Policies (`src/policies/`) β€” cross-cutting: the rules posted at every station **The break it prevents:** rules that live in a README are advisory and get skipped under pressure. Policies are the executable rules bolted to each station β€” worktree, task, delivery, tests, review β€” so a violation stops the belt instead of sliding through. Project guidelines expressed as **executable hook gates** rather than prose. Two layers: - **TypeScript middleware** β€” `policy-gate.ts` (`PolicyGate.executeWithGates`) is the in-process gate used by the MCP server's `execute_tool`. It loads REQUIRED policies from a SQLite store (`policies.db`), evaluates keyword / anti-pattern rules against the operation, and throws `PolicyViolationError` on a REQUIRED violation. Stages: `pre-exec | post-exec | review | always`; completion/merge/deploy operations auto-force a `review` stage. - **Shell gate + Python enforcers** β€” `templates/hooks/uap-policy-gate.sh` binds to harness hook events and invokes the ~20 enforcers in `src/policies/enforcers/`. A blocked verdict is **exit code 2** (hard block). Enforcers cover the worktree gate (`worktree_required.py`), task discipline (`task_required.py`), delivery routing (`delivery_enforcement.py`), test deltas (`test_gate.py`), expert review (`expert_review_required.py`), schema diffs (`schema_diff_gate.py`), memory-before-plan, MCP-router-first, RTK wrapping, and more. Levels: **REQUIRED** blocks, **RECOMMENDED** logs, **OPTIONAL** informs. ### Delivery β€” `uap deliver` (`src/delivery/`) β€” Build & QC stations **The break it prevents:** this is the heart of the floor. Left alone, an agent emits plausible-but-wrong code and declares victory on a red build. The deliver loop won't let the work leave the Build/QC stations until the project's *real* gates actually pass. A 15-module convergence loop that drives an underlying model against the project's **real** completion gates until the work actually passes β€” the mechanism behind UAP's "agents stop declaring victory on broken code." See the [deliver flow](#how-uap-deliver-orchestrates) below. ### Coordination (`src/coordination/`) β€” Isolation & Line-coordination stations **The break it prevents:** put two agents on one repo and they edit the same file, duplicate work, or deadlock. This station gives every agent its own lane, announces who's touching what, and blocks live same-file collisions. Lets multiple agents work the same repo without colliding. A singleton SQLite DB (`database.ts`) backs an agent registry, work announcements, work claims, inter-agent messages, and a deploy queue. The DB is **shared across all worktrees** (resolved via `git --git-common-dir` to the main worktree), so agents in different `.worktrees/` see each other. Coordination is **always-on, not advisory**: `session-start.sh` auto-registers every agent (+ heartbeat), and the pre-edit hook (`coordinate-file.sh`) announces each file edit and **blocks** when another *live* agent (heartbeat < 120s) is editing the same repo-relative path, warning + self-healing stale announcements otherwise; `session-end.sh` reaps only stale state so one agent ending never wipes live peers. `service.ts` detects **overlap** when agents announce work on the same files and suggests merge order; `deploy-batcher.ts` queues git/CI actions with per-type batch windows (commit 30s, push 5s, merge 10s, deploy 60s), folds/squashes similar pending actions, and executes batches sequentially or in parallel. `expert-orchestrator.ts` builds an ordered expert-droid chain across the `plan β†’ design β†’ implement β†’ review β†’ release` lifecycle, drawing implement droids from `capability-router.ts`. `pattern-router.ts` matches tasks to Terminal-Bench patterns (always enforcing **P12** Output Existence and **P35** Decoder-First). ### Models (`src/models/`) β€” Prep / routing station **The break it prevents:** an agent that reaches for the same model on every task either overpays for trivial edits or under-powers a hard migration. This station picks the right approach and model before the work hits the belt. Multi-model routing. `router.ts` classifies a task (complexity + type from keyword scoring) and `selectModel()` picks a model per the routing strategy β€” `performance-first`, `cost-optimized`, `adaptive`, or `balanced` (default, which walks priority-ordered routing rules). `unified-router.ts` layers a benchmark signal on top: it returns a consensus when the rule-based and benchmark routers agree, otherwise trusts the benchmark router only when it has enough data. `planner.ts` decomposes a task into a subtask DAG and assigns a model per subtask; `executor.ts` runs the plan level-by-level with retries and fallback; `execution-profiles.ts` tunes *how* the chosen model runs (temperature, budgets); `analytics.ts` records outcomes so routing improves. ### Tasks (`src/tasks/`) β€” Prep / routing & Line-coordination stations **The break it prevents:** work without a tracked task drifts scope, races other agents on the same files, and starts before its prerequisites are ready. This station gates each task through a decoder-first check and hands claims to the coordination lane. A dependency-aware task tracker (a Beads alternative) backed by SQLite (`tasks`, `task_dependencies`, `task_history`, `task_activity`, `task_summaries`). Tasks form a DAG; closing a task transitions its dependents from `blocked` to `open` and emits events on an in-process bus (`event-bus.ts`). `coordination.ts` bridges tasks to the multi-agent coordination layer (claim / release with overlap detection); `decoder-gate.ts` implements the P35 Decoder-First pre-execution validator (droid schema, tool availability, claim conflicts, worktree requirement, ambiguity). ### Dashboard (`src/dashboard/`) β€” floor visibility **The break it prevents:** a floor you can't see is a floor you can't trust. This gives you a live view of every station. A live visualization layer (`uap dashboard`) over tasks, agents, memory, policies, models, and benchmark/session history, with an event stream for real-time updates. ### Observability (`src/observability/`) β€” Feedback station **The break it prevents:** without traces you can't tell *why* a run failed, so the floor never improves. This station records what really happened so the pipeline can be tuned. Emits HALO / OpenInference spans for delivery runs and tool calls, consumed by `uap harness analyze` to optimize agent execution from real traces. ### Benchmark harness (`src/benchmarks/paired/`) β€” measuring the floor **The break it prevents:** claims that a station helps are worthless without a controlled measurement. This harness toggles only the UAP scaffold and reports the honest delta. A controlled paired-A/B harness (`uap bench paired`) for measuring UAP's impact without confounds. It holds the base model + agent constant and toggles **only** the UAP scaffold over the same real-gate task suite and seeds, reporting a vector of paired deltas (correctness + tokens/turns/latency) with bootstrap confidence intervals, a McNemar gate-value 2Γ—2, and per-component leave-one-out ablation. Pluggable `AgentAdapter`s drive the agent under test: `opencode`/`claude` subprocess adapters, a deterministic `mock`, and a non-agentic `raw` single-shot-vs-gate-loop adapter that isolates gate value. Ground truth is a deterministic per-task `verifyCmd` (no LLM judge). The headline result lives in [benchmarks/PAIRED_FINDINGS.md](../benchmarks/PAIRED_FINDINGS.md): UAP gate value is **+20pp** over a non-agentic baseline and **~0pp** over an agentic one. --- ## How a tool call flows: memory β†’ policy β†’ MCP Router This is one item riding the belt through the mediation stations. For a representative `execute_tool` call routed through the UAP MCP server: ``` 1. Tool call arrives at the PreToolUse hook. 2. MEMORY β€” On session start, recent memory is injected as . Per task, dynamic-retrieval has already surfaced relevant long-term learnings into the prompt. (Read/Query stage.) 3. POLICY β€” PolicyGate.executeWithGates loads REQUIRED policies and checks the operation + args. A REQUIRED violation β†’ PolicyViolationError (or exit 2 from the shell enforcer) β†’ the harness ABORTS the call. Otherwise the call proceeds. 4. ROUTER β€” execute_tool resolves the tool path, dispatches to the downstream MCP client (or an expert droid), and captures the raw result. 5. COMPRESSβ€” output-compressor indexes large output into FTS5 and returns only the top BM25 snippets for the call's `intent`, plus a vocabulary footer. The model sees a compact result, not the full payload. 6. RECORD β€” The agent records the observation to short-term memory; the consolidator later promotes significant lessons to long-term memory through the write-gate. (Record/Promote stage.) ``` The four-stage **read β†’ query β†’ act β†’ record β†’ promote** loop is the agent decision loop defined normatively in [PROTOCOL.md](PROTOCOL.md#agent-decision-loop). --- ## How `uap deliver` orchestrates This is the Build/QC station's inner mechanism β€” the part that refuses to pass work down the line until it actually runs green. `uap deliver ""` runs `ConvergenceLoop.deliver()` (`src/delivery/convergence-loop.ts`). The staged flow: ``` detect gates ──▢ baseline check ──▢ protect files ──▢ ╔═════ turn loop ═════╗ (verifier- (already green? (snapshot tests/ β•‘ build prompt β•‘ ladder reads skip, no model oracle + integrity β•‘ EXECUTE model β•‘ package.json) call) guard) β•‘ APPLY file blocks β•‘ β•‘ VERIFY (ladder) β•‘ β•‘ passed? ─▢ done β•‘ β•‘ else: CRITIC + β•‘ β•‘ ESCALATE β•‘ β•šβ•β•β•β•β•β•β•β•β•β•€β•β•β•β•β•β•β•β•β•β•β•β• β”‚ repeat until β–Ό gates pass or budget spent ``` - **Verifier ladder** (`verifier-ladder.ts`) β€” derives gate rungs from `package.json` scripts (build β†’ typecheck via `tsc --noEmit` β†’ test β†’ lint), runs each as a real command in a secret-stripped env, fails fast on required rungs. "Delivered" means all *required* rungs pass; lint is optional. - **Explorer + ideation** (`explorer.ts`, `ideation.ts`) β€” best-of-N: generate N candidates with distinct strategy seeds, apply/verify/rollback each on the same baseline, commit only the winner. A model **judge** (`judge.ts`) tie-breaks candidates with equal gate scores. - **Critic** (`critic.ts`) β€” turns a failed turn's gate output into a file-scoped repair plan fed into the next turn (not a raw compiler dump). - **Escalation** (`escalation.ts`) β€” on score stagnation, climbs a cost ladder: widen exploration β†’ enable critic β†’ switch to a stronger model + raise budget. - **Protection** (`spec-imports.ts`, `integrity.ts`, `applier.ts`) β€” protects pre-existing test/spec files and their transitive oracle imports, and byte-verifies protected files after each gate run, so the model can't satisfy a spec by rewriting what it asserts against. - **Auto / optimize** (`auto-optimizer.ts`) β€” by default, the run classifies task complexity and enables the matching aids automatically; `--optimize` enables every aid at once. - **Coordination + observability** (`run-coordinator.ts`, `halo-trace.ts`) β€” optionally registers the run as a coordination agent, heartbeats, queues applied files into the deploy batcher, and emits HALO spans. The default model preset is `qwen35-a3b`; `--until-delivered` (on by default) keeps extending the turn budget while the best score is improving, up to a ceiling (default 30, hard cap 50), and stops once progress stalls. --- ## See also - [PROTOCOL.md](PROTOCOL.md) β€” the harness↔UAP contract and agent loop - [../guides/DELIVERY_PIPELINE.md](../guides/DELIVERY_PIPELINE.md) β€” the end-to-end delivery-pipeline walkthrough - [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) β€” MCP Router setup - [../integrations/RTK.md](../integrations/RTK.md) β€” RTK (Rust Token Killer) - [../../CONTRIBUTING.md](../../CONTRIBUTING.md) β€” development workflow