---
name: building-multiagent-systems
description: This skill should be used when designing or implementing systems with multiple AI agents that coordinate to accomplish tasks. Triggers on "multi-agent", "orchestrator", "sub-agent", "coordination", "delegation", "parallel agents", "sequential pipeline", "fan-out", "map-reduce", "spawn agents", "agent hierarchy".
---

# Building Multi-Agent, Tool-Using Agentic Systems

## Overview

Comprehensive architecture patterns for multi-agent systems where AI agents coordinate to accomplish complex tasks using tools. Language-agnostic and applicable across TypeScript, Python, Go, Rust, and other environments.

## Discovery Questions (Required)

Before architecting any system, ask these six mandatory questions:

1. **Starting Point** - Greenfield, adding to existing system, or fixing current implementation?
2. **Primary Use Case** - Parallel work, sequential pipeline, recursive delegation, peer collaboration, work queues, or other?
3. **Scale Expectations** - Small (2-5 agents), medium (10-50), or large (100+)?
4. **State Requirements** - Stateless runs, session-based, or persistent across crashes?
5. **Tool Coordination** - Independent agents, shared read-only resources, write coordination, or rate-limited APIs?
6. **Existing Constraints** - Language, framework, performance needs, compliance requirements?

## Foundational Architecture

### Four-Layer Stack

Every agent follows the four-layer architecture for testability, safety, and modularity:

| Layer | Name | Responsibility |
|-------|------|----------------|
| 1 | Reasoning (LLM) | Plans, critiques, decides which tools to call |
| 2 | Orchestration | Validates, routes, enforces policy, spawns sub-agents |
| 3 | Tool Bus | Schema validation, tool execution coordination |
| 4 | Deterministic Adapters | File I/O, APIs, shell commands, database access |

**Critical Rule**: Everything below Layer 1 must be deterministic. No LLM calls in tools.

See `references/four-layer-architecture.md` for detailed implementation with code examples.

### Foundational Patterns

| Pattern | Purpose |
|---------|---------|
| **Event-Sourcing** | All state changes as events for audit trails and replay |
| **Hierarchical IDs** | Encode delegation hierarchy (e.g., `session.1.2`) for cost aggregation |
| **Agent State Machines** | Explicit states (idle → thinking → tool_execution → stopped) with invalid transition errors |
| **Communication** | EventEmitter for state changes, promises for result collection |

## Seven Coordination Patterns

Choose based on discovery question answers:

| Pattern | Use Case | Trade-offs |
|---------|----------|------------|
| **Fan-Out/Fan-In** | Parallel independent work | Fast but costly; watch for orphans |
| **Sequential Pipeline** | Multi-stage transformations | Bottleneck at slowest stage |
| **Recursive Delegation** | Hierarchical task breakdown | Must add depth limits |
| **Work-Stealing Queue** | 1000+ tasks with load balancing | No built-in priority |
| **Map-Reduce** | Cost optimization | Cheap map ($0.01), smart reduce ($0.15) |
| **Peer Collaboration** | LLM council for bias reduction | Expensive (3N+1 calls), slow |
| **MAKER** | Zero-error tasks (100K+ steps) | 5× cost but ~0% error rate |

See `references/coordination-patterns.md` for detailed implementations.

### Pattern Selection Guide

| Requirement | Recommended Pattern |
|-------------|---------------------|
| Parallel independent tasks | Fan-Out/Fan-In |
| Each stage depends on previous | Sequential Pipeline |
| Complex task decomposition | Recursive Delegation |
| Large batch processing | Work-Stealing Queue |
| Cost-sensitive analysis | Map-Reduce |
| Need diverse perspectives | Peer Collaboration |
| Zero error tolerance | MAKER |

### MAKER Pattern (Zero Errors)

For tasks requiring 100K+ steps with zero error tolerance (medical, financial, legal domains):

1. **Extreme Decomposition** - Recursive breakdown until each subtask <100 steps
2. **Microagents** - Single tool, focused expertise, cheap models
3. **Multi-Agent Voting** - N parallel attempts per subtask, majority consensus
4. **Error Correction** - Deterministic validation + retry with failure context

**Cost comparison**: Same cost as traditional approach, zero errors vs. 10+ errors.

See `references/maker-pattern.md` for full implementation with medical diagnosis example.

## Tool Coordination

| Mechanism | Purpose |
|-----------|---------|
| **Permission Inheritance** | Children inherit subset of parent permissions (cannot escalate) |
| **Resource Locking** | Acquire/release patterns for shared resources |
| **Rate Limiting** | Token bucket algorithm across all agents |
| **Result Caching** | Cache read-only, idempotent, expensive operations |

**Sub-Agent as Tool Pattern**: Wrap specialized agents as tools the parent can call, providing composable abstractions and natural lifecycle management.

See `references/tool-coordination.md` for implementations.

## Critical Lifecycle: Cascading Stop

"Always stop children before stopping self." This prevents orphaned agents.

```
1. Get all child agents
2. Stop all children in parallel
3. Stop self
4. Cancel ongoing work
5. Flush events
```

If pause/resume unavailable, implement manual checkpointing: save agent state (messages, context, tool results), then restore later.

## Production Hardening

| Concern | Solution |
|---------|----------|
| **Orphan Detection** | Heartbeat monitoring every 30 seconds |
| **Cost Tracking** | Hierarchical aggregation across agent tree |
| **Session Persistence** | Project-level task store for cross-session work |
| **Checkpointing** | Save after 10+ tools, $1.00 cost, or 5 minutes elapsed |
| **Self-Modification Safety** | Blast radius assessment, branch isolation, test-first |

See `references/production-hardening.md` for detailed implementations.

## Real-World Example: Code Review System

A pull request orchestrator using Fan-Out/Fan-In:

1. Spawns four specialist reviewers in parallel (security, performance, style, tests)
2. Security and tests use smart models (Sonnet); style and performance use fast models (Haiku)
3. Each reviewer has 2-minute timeout
4. Results aggregate regardless of partial failures
5. Costs track per reviewer
6. All agents stop cleanly via cascading stop after completion

## Execution Checklist

When guiding implementation of multi-agent systems:

1. **Ask discovery questions** - Understand requirements before architecting
2. **Assess error tolerance** - Zero errors → MAKER; some acceptable → simpler patterns
3. **Establish four-layer architecture** - Reasoning, orchestration, tool bus, adapters
4. **Design schema-first tools** - Typed contracts before implementation
5. **Define deterministic boundary** - No LLM in Layers 3-4
6. **Choose orchestration model** - YOLO, Safety-First, or Hybrid
7. **Select coordination pattern** - Fan-out, pipeline, delegation, queue, map-reduce, peer, or MAKER
8. **Design tool coordination** - Permission inheritance, locking, rate limiting
9. **Implement cascading cleanup** - Always stop children before parent
10. **Add monitoring and cost tracking** - Hierarchical aggregation across agent tree
11. **Consider self-modification safety** - If agents can modify code, add safety protocol

## Common Pitfalls

| Pitfall | Impact |
|---------|--------|
| Missing four-layer architecture | Untestable, unsafe, hard to debug |
| LLM calls in tools (Layer 3-4) | Non-deterministic, can't unit test |
| No schema-first tool design | Sub-agents can't discover tools |
| Missing cascading stop | Orphaned agents consuming resources |
| No permission inheritance | Sub-agents can escalate privileges |
| No timeouts | Indefinite hangs waiting for sub-agents |
| Unbounded concurrency | Resource exhaustion from too many agents |
| Ignoring cost tracking | Budget surprises |
| No partial-failure handling | One failure cascades to all agents |
| Unpersisted state | Unrecoverable workflows on crash |
| Uncoordinated tool access | Race conditions on shared resources |
| Wrong model selection | Cost inefficiency (Sonnet for simple tasks) |
| Self-modification without safety | Sub-agents break themselves |
| No heartbeat monitoring | Can't detect orphans after parent crash |

## Reference Files

Detailed implementations with code examples:

| File | Contents |
|------|----------|
| `references/four-layer-architecture.md` | Four-layer stack, deterministic boundary, schema-first tools |
| `references/coordination-patterns.md` | Seven coordination patterns with code |
| `references/maker-pattern.md` | MAKER implementation, voting, medical diagnosis example |
| `references/tool-coordination.md` | Permission inheritance, locking, rate limiting, caching |
| `references/production-hardening.md` | Cascading stop, orphan detection, cost tracking, checkpointing |