--- name: multi-agent-architecture description: Patterns for designing multi-agent systems with Claude Code - job description method, shared folder communication, handbook consolidation, context management. Use when building complex agent orchestrations. --- # Multi-Agent Architecture Design principles for orchestrating many sub-agents without context overflow. ## Core Philosophy Treat agent design like human hiring: write a job description first, then translate to architecture. The framing shapes every decision. ## The Job Description Method Before writing any agent code: 1. **Write a human JD** — What would you want this person to do? What qualities? What indicates success? 2. **Identify handoff points** — Where would a human need to check in or escalate? 3. **Define the onboarding** — What handbook would you give a new hire? 4. **Translate to agents** — Each JD section becomes architecture | JD Section | Architecture Element | |------------|---------------------| | Responsibilities | Agent workflows | | Required skills | Tool permissions | | Success indicators | Output schemas | | Escalation criteria | Error handling | | Onboarding materials | Skills/handbook | ## The Shared Folder Pattern **Problem:** Orchestrator context overwhelmed when 10+ sub-agents return detailed reports simultaneously. **Solution:** Sub-agents write to temp folder → downstream agents read directly. ``` .claude/workspace/ ├── phase-1/ │ ├── gmail-analysis.md │ ├── calendar-analysis.md │ └── drive-inventory.md ├── phase-2/ │ ├── client-summary.md │ └── action-items.md └── manifest.yml ``` **Workflow:** ``` 1. Orchestrator spawns sub-agents 2. Each sub-agent: - Does work - Writes report to .claude/workspace/{phase}/{name}.md - Returns only: { status: "complete", path: "..." } 3. Downstream agents read prior phase outputs directly 4. Orchestrator reads manifest, not full reports ``` **Benefits:** - Orchestrator context stays minimal - Sub-agents get full upstream context - No signal loss from summarization relay ## The Handbook Pattern **Problem:** Many narrow skills create fragility and maintenance burden. **Solution:** One handbook organized by chapters, read foundation + relevant sections. ``` skills/project-manager/ ├── SKILL.md # Entry point, routes to chapters └── references/ ├── 01-foundation.md # Who we are, tools, escalation, standards ├── 02-daily-ops.md # Data gathering procedures ├── 03-dashboards.md # Structure, quality checks └── 04-onboarding.md # New client setup ``` **Chapter Structure:** | Chapter | Contents | |---------|----------| | Foundation | Team, tools, data sources, escalation rules, quality standards | | Domain chapters | Specific procedures for each responsibility area | **Reading Pattern:** ``` Sub-agent reads: 1. Foundation chapter (always) 2. Relevant domain chapter(s) (based on task) ``` ## Context Budget Strategies | Strategy | When to Use | |----------|-------------| | Shared folder | 5+ sub-agents, inter-agent dependencies | | Context proxy | Research agents returning verbose results | | Manifest files | Orchestrator needs status, not details | | Chunked execution | Serial phases when parallel overwhelms | ## Architecture Decision Tree ``` How many sub-agents? ├── 1-3 → Direct orchestration (return reports to main) ├── 4-10 → Shared folder pattern └── 10+ → Phased execution with manifest Do teammates need direct communication? ├── No → Sub-agents (Task tool) — report results back only └── Yes → Agent Teams — shared task list, inter-agent messaging ├── See agent-teams skill for full guide └── Best for: parallel review, competing hypotheses, multi-module features Do sub-agents need each other's output? ├── No → Parallel execution, merge results └── Yes → Shared folder, dependency ordering Is orchestrator context a concern? ├── No → Return full reports └── Yes → Status-only returns + file paths ``` ## Shared-State Mutation Safety When multiple agents can transition the same work item (issue, task, label) through states, silent race conditions cause duplicate work or skipped transitions. **Three-step guard (apply before every state write):** ``` 1. READ — Re-fetch current state immediately before mutation (never use cached state) 2. VERIFY — Confirm state matches expected pre-condition 3. WRITE — Apply transition in a single atomic operation ``` **Implementation pattern:** ``` CURRENT_STATE = fetch_state(item_id) # live read, not cached If CURRENT_STATE != EXPECTED_PRE_STATE: ABORT — log stale-race, do not mutate Else: apply_transition(item_id, NEW_STATE) # single atomic call ``` **For GitHub labels specifically:** ``` CURRENT_LABELS = gh issue view {id} --json labels If EXPECTED_LABEL not in CURRENT_LABELS: ABORT — another agent already transitioned this item Else: gh issue edit {id} --remove-label {old} --add-label {new} # one call ``` **Decision table:** | Scenario | Action | |----------|--------| | State matches expectation | Apply transition atomically | | State already at target | Skip silently (idempotent) | | State at unexpected value | Abort, log conflict, do not retry | | Item not found | Abort, log missing item | **Why single atomic call matters:** Two separate `--remove-label` and `--add-label` calls create a window where another agent reads the item in an intermediate state. Combine into one `gh issue edit` invocation. ## Anti-Patterns | Pattern | Problem | Fix | |---------|---------|-----| | Slash commands as orchestration | Context exhaustion before work starts | Move to sub-agents | | Orchestrator relays all context | Bottleneck, signal loss | Shared folder | | One skill per micro-task | Fragile, hard to maintain | Handbook chapters | | Sub-agents return full reports | Context overflow at 10+ agents | Path-only returns | ## Iteration Path Most multi-agent systems evolve through: 1. **Slash commands** — Quick start, context limits emerge 2. **Orchestrator + sub-agents** — Solves context, creates relay bottleneck 3. **Shared folder** — Solves relay, reveals skill fragmentation 4. **Handbook consolidation** — Unified knowledge, maintainable Skip earlier stages when building new systems. ## Agent Teams When workers need to communicate directly with each other — not just report back to an orchestrator — use Agent Teams instead of sub-agents. Agent Teams provide shared task lists, inter-agent messaging, and independent context windows. **Key differences from sub-agent patterns above:** - Teammates message each other directly (not just back to caller) - Shared task list with self-claiming and dependency auto-unblock - Each teammate is a full Claude Code session with own context **When to upgrade from sub-agents to Agent Teams:** - Sub-agents need to share findings mid-task - You need adversarial debate or competing hypotheses - 3+ workers need self-organizing coordination For full setup, operations reference, and orchestration patterns, apply the `agent-teams` skill.