# lauren-ai — Full API Reference

> First-party AI/LLM companion for the Lauren web framework.

Version: see `lauren_ai.__version__`
Requires: Python ≥ 3.11, lauren ≥ 0.1.0

---

## Installation

```bash
pip install "lauren-ai[anthropic]"   # Anthropic Claude (default)
pip install "lauren-ai[openai]"      # OpenAI / Ollama
pip install "lauren-ai[all]"         # All providers + all extras
```

Extras available: `anthropic`, `openai`, `ollama`, `knowledge`, `eval`, `dev`.

---

## Package layout

```
lauren_ai/
├── __init__.py          # Public API surface
├── _agents/             # @agent(), @use_tools(), AgentRunner, AgentContext, AgentResponse
├── _config.py           # LLMConfig, AgentConfig
├── _eval/               # AccuracyEval, TrajectoryEval, PerformanceEval
├── _exceptions.py       # Exception hierarchy
├── _extractors.py       # Lauren extractor integration (Agent[T], Embed[T], StreamCompletion[T])
├── _guards.py           # Guard factories (token_budget_guard, requires_capability, safety_guard)
├── _interceptors.py     # ai_metrics_interceptor, token_usage_response_interceptor
├── _chains/             # Chain, Runnable, RunnableLambda, chain()
├── _cost/               # CostTracker, PricingTable, TokenBudget, RateLimiter
├── _guardrails/         # @use_guardrails() (agent), @guardrail() (injectable), TopicFilter, PIIRedactor, LengthFilter, PromptInjectionFilter
├── _knowledge/          # KnowledgeBase, TextLoader, FixedSizeChunker, SentenceChunker
├── _memory/             # ShortTermMemory + UserMemoryStore + @remember()
├── _middleware.py       # Lauren middleware integration (conversation_middleware, ai_rate_limit)
├── _module.py           # LLMModule, AgentModule, LLMService, EmbedService
├── _output_parsers/     # StrOutputParser, JSONOutputParser, PydanticOutputParser, RetryOutputParser
├── _prompts/            # PromptTemplate, ChatPromptTemplate, FewShotPromptTemplate
├── _routing/            # SemanticRouter, Route, RouteMatch
├── _signals.py          # SignalBus + signal dataclasses
├── _skills/             # WebSearchTool, HttpFetchTool, CodeExecutionTool
├── _teams/              # @team(), TeamRunner, TeamResult, TeamMemory
├── _tools/              # @tool(), ToolMeta, ToolRegistry, ToolExecutor, ToolContext, ToolResult
├── _tracing/            # @traced(), Span, Trace, TraceStore, TraceExporter
├── _transport/          # Transport protocol, AnthropicTransport, StructuredLLM, multimodal types
├── _workflows/          # Workflow, Step, Parallel, Condition, Loop
└── testing.py           # AgentTestClient
```

---

## Configuration

### `LLMConfig`

```python
from lauren_ai import LLMConfig

# Direct construction — provider is required
config = LLMConfig(
    provider="anthropic",        # "anthropic" | "openai" | "ollama" | "litellm" (required)
    model="claude-opus-4-6",     # Model ID (required)
    api_key=None,                # API key; falls back to provider env var when None
    base_url=None,               # Override provider base URL (proxies, self-hosted, Ollama)
    max_tokens=4096,             # Max output tokens (default: 4096)
    temperature=1.0,             # Sampling temperature 0.0–2.0 (default: 1.0)
    timeout=60.0,                # Request timeout in seconds (default: 60.0)
    max_retries=3,               # Retries on transient errors (default: 3)
    cache_system_prompt=False,   # Anthropic prompt caching for system prompt
    cache_tools=False,           # Anthropic prompt caching for tool definitions
    embed_model=None,            # Embedding model (defaults to model when None)
    embed_dimensions=None,       # Desired embedding dimensionality
)

# Recommended factory methods
config = LLMConfig.for_anthropic(model="claude-opus-4-6")  # reads ANTHROPIC_API_KEY
config = LLMConfig.for_openai(model="gpt-4o")              # reads OPENAI_API_KEY
config = LLMConfig.for_ollama(model="llama3.2")            # base_url="http://localhost:11434"
cfg, mock = LLMConfig.for_testing()                        # returns (LLMConfig, MockTransport)
```

### `AgentConfig`

```python
from lauren_ai import AgentConfig

config = AgentConfig(
    system_prompt="You are a helpful assistant.",  # Default system prompt (default shown)
    max_turns=10,                                  # Max agentic loop iterations (default: 10)
    max_tokens_per_turn=4096,                      # Output token cap per turn (default: 4096)
    temperature=1.0,                               # Per-agent temperature override (default: 1.0)
    memory_window_tokens=40_000,                   # Sliding history window in tokens (default: 40_000)
    max_cost_usd=None,                             # Hard USD budget cap per run (None = unlimited)
    parallel_tool_calls=False,                     # Execute tool calls concurrently (default: False)
    tool_error_policy="return_error",              # "return_error" | "raise" | "skip"
    # Extended thinking (Anthropic only — silently ignored elsewhere)
    thinking=False,
    thinking_budget_tokens=8_000,
    # OpenAI reasoning models
    reasoning_effort=None,                         # "low" | "medium" | "high" | None
    include_reasoning_in_response=False,
)
```

---

## Transport

### `Transport` protocol

```python
from lauren_ai._transport import Transport

class MyTransport(Transport):
    async def complete(self, messages, *, model, **kwargs) -> Completion: ...
    async def complete_stream(self, messages, *, model, **kwargs) -> AsyncIterator[CompletionChunk]: ...
    async def embed(self, texts, *, model, **kwargs) -> list[Embedding]: ...
    async def count_tokens(self, messages, *, model) -> int: ...
```

### `AnthropicTransport`

```python
from lauren_ai._transport._anthropic import AnthropicTransport

transport = AnthropicTransport(config)  # Takes a LLMConfig
```

### `MockTransport`

```python
from lauren_ai._transport._mock import MockTransport

mock = MockTransport()
mock.queue_response(completion)             # Queue a Completion object
mock.queue_tool_use("tool_name", {"k": v}) # Queue a tool_use response + follow-up completion
mock.queue_error(RuntimeError("oops"))     # Queue an error to be raised
mock.reset()                               # Clear queue and call history

# After run:
print(mock.calls)        # list[dict] — each complete() call recorded
print(mock.call_count)   # int
```

### Core transport types

```python
from lauren_ai import Message, Completion, TokenUsage, ToolCall, ContentBlock

# Messages
msg = Message(role="user", content="Hello!")
msg = Message.user("Hello!")                        # convenience factory
msg = Message.assistant("Hi!")                      # convenience factory
msg = Message.from_multimodal("user", [parts...])   # multimodal message

# ContentBlock factories
block = ContentBlock.text_block("Hello!")
block = ContentBlock.tool_use_block(name="get_weather", tool_input={"city": "Paris"})
block = ContentBlock.tool_result_block(tool_use_id="toolu_abc", content="22°C sunny")

# Completion
result = Completion(
    id="comp_1",
    model="claude-opus-4-6",
    content="Hello!",
    tool_calls=[],
    stop_reason="end_turn",
    usage=TokenUsage(input_tokens=10, output_tokens=5),
    thinking_blocks=[],        # list[ThinkingBlock | RedactedThinkingBlock] — Anthropic only
)

# ThinkingBlock / RedactedThinkingBlock (Anthropic extended thinking)
from lauren_ai._transport import ThinkingBlock, RedactedThinkingBlock

block = ThinkingBlock(thinking="Let me reason through this...", signature="ant-sig-abc")
# block.thinking  — str — the model's reasoning text
# block.signature — str — Anthropic cryptographic signature

redacted = RedactedThinkingBlock(data="<opaque-base64-blob>")
# redacted.data   — str — opaque base64 blob (safety-redacted content)

# TokenUsage
usage = TokenUsage(input_tokens=100, output_tokens=50)
print(usage.total_tokens)    # 150 (property)
print(usage.cost_usd("claude-opus-4-6"))  # float

# ToolCall
call = ToolCall(id="toolu_abc", name="get_weather", input={"city": "Paris"})
```

---

## Tools

### `@tool()` decorator

```python
from lauren_ai import tool, ToolContext, ToolResult

@tool()
async def get_weather(city: str, ctx: ToolContext | None = None) -> dict:
    """Return current weather for a city.

    Args:
        city: The city name to look up.

    Returns current temperature and conditions.
    """
    return {"city": city, "temperature": 22, "condition": "sunny"}
```

Rules:
- **Must use parentheses**: `@tool()` not `@tool` (raises `DecoratorUsageError`)
- Schema generated from PEP-3107 annotations + Google-style `Args:` docstring section
- Any parameter annotated `ToolContext` or `ToolContext | None` is injected at runtime
  and **excluded from the JSON schema** — the parameter may be **named anything**
- `from __future__ import annotations` is supported, but tool annotations must still
  resolve when `@tool()` builds the schema. Avoid unresolved forward refs or circular
  imports in function-form tool files.
- Optional params (with defaults) are excluded from the `required` array
- Class-form tools receive `@injectable(scope=SINGLETON)` automatically; override with
  `@tool() @injectable(scope=Scope.REQUEST)` for request-scoped tools
- Subclasses of `@tool()`-decorated classes must re-apply `@tool()` — the
  `ToolRegistry` raises `MetadataInheritanceError` for inherited-but-not-redeclared tools

### `@tool()` with class form (DI-injectable)

```python
@tool()
class DelegateToResearcher:
    """Delegate a research task to the ResearchAgent.

    Args:
        task: The research task description.
    """

    def __init__(self, research: ResearchAgent) -> None:
        self._research = research

    async def run(self, task: str) -> dict:
        """Run the delegation."""
        ...
```

Class-form tools:
- Are registered as DI providers — dependencies are injected by the container
- Their `run()` method is the entry point (must be defined)
- May use `from __future__ import annotations`, provided the referenced types resolve
  when schema generation runs

### `@tool()` with options

```python
@tool(
    name="weather",                    # Override tool name (default: function/class name)
    description="Get current weather", # Override description (default: from docstring)
    requires_confirmation=True,        # Pause for human-in-the-loop approval before execution
    cache_ttl=300,                     # Cache results for 300 seconds
    pre_hook=my_pre_hook,              # Called before execution
    post_hook=my_post_hook,            # Called after successful execution
    error_hook=my_error_hook,          # Called on exception
)
async def get_weather(city: str) -> dict: ...
```

### `ToolResult`

```python
from lauren_ai import ToolResult

# Success
result = ToolResult.ok("The weather is sunny.", tool_use_id="toolu_abc")
result = ToolResult.ok({"temp": 22}, tool_use_id="toolu_abc")  # dict → JSON string

# Error
result = ToolResult.error("City not found.", tool_use_id="toolu_abc")

print(result.content)   # str
print(result.is_error)  # bool
```

### `ToolContext`

```python
from lauren_ai import ToolContext

# Injected by runner into any parameter annotated as ToolContext — name doesn't matter
@tool()
async def my_tool(query: str, ctx: ToolContext | None = None) -> dict:
    """..."""
    # ctx.agent_context       — AgentContext for the running agent
    # ctx.tool_use_id         — Provider-assigned tool call identifier for this invocation
    # ctx.turn                — Agentic loop iteration (0-based) that triggered this call
    # ctx.request             — Originating HTTP Request, or None
    # ctx.execution_context   — lauren ExecutionContext (route/handler metadata), or None
    # ctx.state               — Mutable dict[str, Any] for per-call local storage
    # ctx.get_metadata(key, default=None) — Read from agent_context.metadata
    return {"result": query}
```

---

## Agents

### `@agent()` decorator

```python
from lauren_ai import agent, AgentConfig

@agent(
    model="claude-opus-4-6",                     # LLM model ID (overrides LLMConfig)
    system="You are a helpful assistant.",        # System prompt for this agent
    config=AgentConfig(max_turns=20),             # Runtime behaviour overrides
    description="Answers questions",             # Human-readable description
)
class AssistantAgent: ...
```

### `@use_tools()` decorator

```python
from lauren_ai import use_tools

@agent(model="claude-opus-4-6")
@use_tools(get_weather, search_web, calculate)
class PlanningAgent: ...
```

**Critical ordering**: `@agent()` is the **outermost** (topmost) decorator,
`@use_tools()` is below it. Python applies decorators bottom-up so `@use_tools()`
runs first and sets `USE_TOOLS_META`; `@agent()` reads it.

### Lifecycle hooks

```python
from lauren_ai import agent, AgentContext, AgentResponse

@agent(model="claude-opus-4-6")
class MyAgent:
    async def on_start(self, ctx: AgentContext) -> None:
        """Called before the first turn."""
        ...

    async def on_finish(self, response: AgentResponse, ctx: AgentContext) -> None:
        """Called after the final turn."""
        ...
```

### `AgentContext`

```python
# Passed to lifecycle hooks; also accessible from tools via ctx.agent_context
# AgentContext fields:
# .agent_id              — Unique identifier for this agent instance (random hex)
# .agent_run_id          — Unique identifier for this specific run (random hex)
# .agent_class           — The @agent()-decorated class
# .config                — Effective AgentConfig for this run
# .memory                — ShortTermMemory for this conversation
# .turn                  — Current loop iteration (0-based)
# .metadata              — dict[str, Any] passed via runner.run(metadata=...)
# .request               — Originating HTTP Request, or None
# .execution_context     — lauren ExecutionContext (route/handler metadata), or None
# .signals               — SignalBus, or None
# .get_metadata(key, default=None) — convenience accessor for .metadata
```

### `AgentResponse`

```python
# Returned by AgentRunner.run()
# .content           — str — final text output from the agent
# .turns             — int — agentic loop iterations executed
# .total_usage       — TokenUsage — cumulative tokens across all turns
# .tool_calls_made   — list[ToolCall] — all tool executions during the run
# .stop_reason       — "end_turn" | "max_turns" | "budget_exceeded" | "error"
# .metadata          — dict[str, Any]
# .reasoning_traces  — list[str] — extended thinking traces (Anthropic only)
# await .as_stream() — AsyncIterator[str] yielding content as a single item
```

### `AgentRunner` (Protocol) / `AgentRunnerBase` (concrete)

`AgentRunner` is a `@runtime_checkable Protocol` — the structural interface for all
runner implementations. `AgentRunnerBase` is the concrete class that implements it.

In production, runners are created and injected by `AgentModule.for_root()`.
For testing, instantiate `AgentRunnerBase` directly:

```python
from lauren_ai import AgentRunner, AgentRunnerBase

# Testing: instantiate the concrete class
runner = AgentRunnerBase(
    transport=transport,         # Transport instance
    signals=signal_bus,          # Optional SignalBus
    cache_backend=None,          # Optional cache backend for tool results
)

isinstance(runner, AgentRunner)     # True — Protocol is @runtime_checkable
isinstance(runner, AgentRunnerBase) # True

# Run an agent
response = await runner.run(
    agent_instance,              # @agent()-decorated class instance
    "What's the weather?",       # User message string
    conversation_id="sess-1",    # Optional session ID
    metadata={"user_id": "u1"},  # Optional metadata injected into AgentContext
    request=None,                # Optional HTTP Request
    execution_context=None,      # Optional lauren ExecutionContext (forwarded to ToolContext)
    run_id=None,                 # Optional explicit run ID (random hex if omitted)
)
```

### Streaming

```python
async for chunk in runner.run_stream(agent_instance, "Hello"):
    if chunk.thinking_delta is not None:
        print("[thinking]", chunk.thinking_delta, end="", flush=True)
    elif chunk.delta:
        print(chunk.delta, end="", flush=True)
# chunk.delta          — str | None — incremental response text
# chunk.thinking_delta — str | None — incremental thinking text (Anthropic only)
```

---

## Extended Thinking

### Anthropic extended thinking

```python
from lauren_ai import agent, AgentConfig

@agent(
    model="claude-opus-4-6",
    system="You are a careful analyst.",
    thinking=True,
    thinking_budget_tokens=10_000,   # tokens the model may spend on internal reasoning
)
class AnalystAgent: ...
```

> **Temperature is suppressed when `thinking=True`.**  Anthropic's API does not
> accept a temperature parameter when extended thinking is enabled.  `AgentRunner`
> detects `thinking=True` and omits temperature from the call entirely, regardless
> of what `AgentConfig.temperature` is set to.

Supported models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`.
Older pinned-version model IDs (e.g. `claude-3-opus-20240229`) do not support it.

`AgentResponse.reasoning_traces` — `list[str]` — flat list of thinking text collected
across all agentic-loop turns.  `Completion.thinking_blocks` — `list[ThinkingBlock |
RedactedThinkingBlock]` — raw per-turn blocks before flattening.

```python
response = await runner.run(agent_inst, "Analyse the pros and cons of microservices.")
print(response.content)          # final answer
print(response.reasoning_traces) # list[str] — one entry per thinking block
```

`thinking_budget_tokens` guidance:

| Budget | Use case |
|---|---|
| `2_000 – 5_000` | Simple reasoning, quick checks |
| `8_000` | Default — balanced |
| `16_000` | Complex multi-step analysis |
| `32_000` | Very hard problems (significant cost increase) |

The budget is a ceiling, not a target.

### OpenAI reasoning models (o1 / o3)

```python
@agent(
    model="o3",                           # or "o1", "o1-mini", "o3-mini"
    reasoning_effort="high",              # "low" | "medium" | "high" | None
    include_reasoning_in_response=True,   # expose reasoning in AgentResponse
)
class ReasoningAgent: ...
```

`reasoning_effort` is silently ignored by non-o-series OpenAI models and by
Anthropic / Ollama transports.

---

## Memory

### `ShortTermMemory`

```python
from lauren_ai import ShortTermMemory

mem = ShortTermMemory(max_tokens=40_000)
mem.add_user("Hello!")
mem.add_assistant(completion)        # Completion object
mem.add_tool_result(tool_result)     # ToolResult object

msgs = mem.messages()  # list[dict] — ready to pass to transport

snapshot = mem.snapshot()  # Save state
mem.restore(snapshot)      # Restore state
mem.clear()                # Clear all messages
```

### `InMemoryConversationStore`

```python
from lauren_ai import InMemoryConversationStore

store = InMemoryConversationStore()
await store.save("conv-1", messages)
loaded = await store.load("conv-1")  # [] if missing
await store.delete("conv-1")
```

When `AgentRunner` is configured with a `conversation_store` and `runner.run()`
is called with a `conversation_id`, the runner automatically loads the prior
history before adding the new user message and saves the updated history after
the run completes.  Wire it through `AgentModule.for_root()`:

```python
AIModule = AgentModule.for_root(
    agents=[MyAgent],
    conversation_store=InMemoryConversationStore(),
    imports=LLMProvider,
)
# Then in your controller:
resp = await runner.run(agent, "Hello", conversation_id="sess-1")
```

Without a `conversation_store` the `conversation_id` parameter is accepted but
unused — each call starts with an empty `ShortTermMemory`.

### `InMemoryVectorStore`

```python
from lauren_ai import InMemoryVectorStore

store = InMemoryVectorStore()
doc_id = await store.upsert("Some text content.", metadata={"tag": "docs"})
results = await store.search("query text", k=5)
result = await store.get(doc_id)   # MemoryResult | None
await store.delete([doc_id])       # Takes a list
await store.clear()

# MemoryResult fields:
# result.id       — str
# result.content  — str
# result.score    — float (0..1, cosine similarity)
# result.metadata — dict
```

---

## Knowledge Base

```python
from lauren_ai import KnowledgeBase, TextLoader, FixedSizeChunker, SentenceChunker
from lauren_ai import InMemoryVectorStore

# Build a knowledge base
kb = KnowledgeBase(
    store=InMemoryVectorStore(),
    chunker=FixedSizeChunker(chunk_size=512, overlap=64),
)

# Load documents
n = await kb.load(TextLoader("docs/faq.txt"))              # from file path
n = await kb.load(TextLoader("raw text here", is_file=False))  # from string

# Search
results = await kb.search("How do I reset my password?", top_k=5)

# Use as agent tool — manual pattern
tool_fn = kb.as_tool(name="search_knowledge_base", top_k=5)

@agent(model="claude-opus-4-6")
@use_tools(kb.as_tool())
class SupportAgent: ...

# Module-level pattern — auto-attach KB to every agent in the module.
# No @use_tools needed; the framework calls .as_tool() internally.
# When loaders= is supplied, the framework also generates a singleton
# @post_construct hook that loads them at app startup — no asyncio.run
# at module-import time, safe inside any async context.
from lauren_ai._knowledge import KnowledgeSource

AIModule = AgentModule.for_root(
    agents=[SupportAgent],
    imports=LLMProvider,
    knowledge=[
        KnowledgeSource(
            kb=KnowledgeBase(store=InMemoryVectorStore(), chunker=SentenceChunker()),
            tool_name="search_manual",
            top_k=3,
            loaders=[TextLoader("docs/manual.txt")],     # loaded at app startup
        ),
    ],
)

# Pre-populated KB — pass the bare instance, no loaders=:
# (caller has already done: await kb.load(TextLoader(...)))
AIModule = AgentModule.for_root(
    agents=[SupportAgent],
    imports=LLMProvider,
    knowledge=[kb],                                       # default tool name
)
```

### Chunkers

| Class | Description |
|---|---|
| `FixedSizeChunker(chunk_size=512, overlap=64)` | Split at fixed character count |
| `SentenceChunker(max_chunk_size=512)` | Split at sentence boundaries |

---

## Signals

```python
from lauren_ai import (
    SignalBus,
    ModelCallStarted, ModelCallComplete,
    ToolCallStarted, ToolCallComplete,
    AgentRunComplete,
)

bus = SignalBus()

@bus.on(ModelCallComplete)
async def on_model_call(event: ModelCallComplete) -> None:
    print(f"model={event.model} cost=${event.cost_usd:.6f}")
    print(f"tokens={event.usage.total_tokens} stop={event.stop_reason}")

@bus.on(ToolCallStarted)
async def on_tool_start(event: ToolCallStarted) -> None:
    print(f"calling tool: {event.tool_name}({event.input})")

@bus.on(ToolCallComplete)
async def on_tool_end(event: ToolCallComplete) -> None:
    print(f"tool done: success={event.success}")

# Remove handler
bus.off(ModelCallComplete, on_model_call)

# Emit manually (rarely needed)
await bus.emit(ModelCallStarted(model="mock", messages_count=1))
```

### Signal fields

| Signal | Fields |
|---|---|
| `ModelCallStarted` | `timestamp, model, agent_id, agent_class, messages_count, input_tokens_estimate` |
| `ModelCallComplete` | `timestamp, model, agent_id, agent_class, usage, duration_ms, stop_reason, cost_usd` |
| `ToolCallStarted` | `timestamp, tool_name, tool_use_id, input, agent_class` |
| `ToolCallComplete` | `timestamp, tool_name, tool_use_id, success, error_message, duration_ms` |
| `AgentRunComplete` | `timestamp, turns, agent_class, total_usage, total_cost_usd, stop_reason` |

---

## Modules

### `LLMModule`

```python
from lauren_ai import LLMModule, LLMConfig

LLMProvider = LLMModule.for_root(
    LLMConfig.for_anthropic(),
    transport_override=None,   # Optional custom Transport (useful in tests)
)

# In your Lauren app:
@module(imports=[LLMProvider])
class AppModule: ...

# Access the built service before startup (class attributes):
# LLMModule.for_root() sets:
#   .transport_instance    — the Transport object
#   .llm_service_instance  — the LLMService object
```

### `AgentModule`

**Rule: every `AgentModule.for_root()` call MUST have its own dedicated runner.**
Each call auto-generates a unique `AgentRunnerBase` subclass as the module's runner
token. Inject it with `runner: AgentRunner` in any provider that belongs to the same
module — the DI container resolves it via structural Protocol scan.

```python
from lauren_ai import AgentModule, AgentRunner

AgentProvider = AgentModule.for_root(
    agents=[WeatherAgent, SupportAgent],
    tools=[get_weather, search_docs],     # function-form and class-form tools
    imports=LLMProvider,                   # LLMModule (or list of modules)
    signals=signal_bus,                    # Optional SignalBus
)

@module(imports=[LLMProvider, AgentProvider])
class AppModule: ...

# Providers inside AgentProvider's scope inject the runner with no ceremony:
@injectable()
class ChatService:
    runner: AgentRunner   # → the auto-generated runner for this module
```

`AgentModule.for_root()` registers:
- All `@agent()` classes as `Scope.SINGLETON` DI providers
- Function-form tools directly in the runner's tool map
- Class-form tools as DI providers (own-module only — not re-exported by default)
- A unique `AgentRunnerBase` subclass via `use_factory` (injecting Transport + LLMConfig)

#### Sharing a tool across multiple AgentModules (`shared_tools`)

When the same `@tool()` class is used by agents in more than one `AgentModule`,
declaring it as a provider in each module raises `ModuleExportViolation` at startup.

The solution is a dedicated ownership module that provides and exports the tool,
combined with `shared_tools=` in every `AgentModule` that imports it:

```python
from lauren import module
from app.ai.check_auth_tool import CheckAuthenticationTool

# Ownership module — declares and exports the tool once.
@module(providers=[CheckAuthenticationTool], exports=[CheckAuthenticationTool])
class CheckAuthModule: ...

# Each AgentModule imports CheckAuthModule and lists the tool in shared_tools=
# so the framework skips re-registering it as a provider here.
UnauthMod = AgentModule.for_root(
    agents=[UnauthenticatedCRMAgent],
    imports=[LLMProvider, CheckAuthModule],
    shared_tools=[CheckAuthenticationTool],   # owned by CheckAuthModule
)

AuthMod = AgentModule.for_root(
    agents=[AuthenticatedCRMAgent],
    imports=[LLMProvider, CheckAuthModule],
    shared_tools=[CheckAuthenticationTool],   # owned by CheckAuthModule
)
```

`shared_tools` only suppresses the DI *declaration* — agents in both modules can
still call `CheckAuthenticationTool` normally.  The singleton is provided once by
`CheckAuthModule` and resolved via the import chain.

#### Multi-module runner disambiguation (advanced)

When a controller or service sits in a scope that **imports two or more
AgentModules**, `runner: AgentRunner` is ambiguous — both runners are visible and
the DI container raises `ProtocolAmbiguityError`. Each module MUST define a
dedicated named runner class and pass it via `runner=MyRunner`:

```python
from lauren import injectable, Scope
from lauren_ai import AgentRunnerBase

@injectable(scope=Scope.SINGLETON)
class TransferAgentRunner(AgentRunnerBase):
    """Distinct DI token for the Transfer module's runner."""

@injectable(scope=Scope.SINGLETON)
class CRMAgentRunner(AgentRunnerBase):
    """Distinct DI token for the CRM module's runner."""

TransferMod = AgentModule.for_root(
    agents=[TransferAgent], tools=[...],
    runner=TransferAgentRunner,   # explicit runner token
    imports=[LLMProvider],
)
CRMMod = AgentModule.for_root(
    agents=[CRMAgent], tools=[DelegateToBankingTransfer],
    runner=CRMAgentRunner,        # explicit runner token
    imports=[LLMProvider, TransferMod],
)

# Controller that needs both runners uses concrete types — no ambiguity:
class ChatController:
    def __init__(self, runner: CRMAgentRunner, transfer_runner: TransferAgentRunner): ...
```

`isinstance(crm_runner, AgentRunner)` and `isinstance(transfer_runner, AgentRunner)`
both remain `True` — the named classes are concrete DI tokens only; `AgentRunner`
is still the shared Protocol interface. You can always define your own runner
subclass to get a stable, named token for multi-module wiring.

### `LLMService`

Injected automatically when `LLMModule` is imported.

```python
from lauren_ai import LLMService, Message

class MyController:
    def __init__(self, llm: LLMService) -> None:
        self._llm = llm

    async def chat(self, text: str) -> str:
        result = await self._llm.complete(
            [Message.user(text)]
        )
        return result.content

    async def stream(self, text: str):
        async for chunk in await self._llm.complete(
            [Message.user(text)], stream=True
        ):
            yield chunk.delta

    async def embed_text(self, text: str) -> list[float]:
        embeddings = await self._llm.embed([text])
        return embeddings[0].vector

    def with_structured_output(self, schema_class) -> StructuredLLM:
        return self._llm.with_structured_output(schema_class)
```

---

## Testing

### `AgentTestClient`

```python
from lauren_ai.testing import AgentTestClient
from lauren_ai import Completion, TokenUsage
from lauren_ai._transport._mock import MockTransport

mock = MockTransport()
mock.queue_tool_use("get_weather", {"city": "Paris"})
mock.queue_response(Completion(
    id="c2", model="mock",
    content="It's sunny in Paris!",
    tool_calls=[], stop_reason="end_turn",
    usage=TokenUsage(input_tokens=50, output_tokens=10),
))

client = AgentTestClient(WeatherAgent, mock)

# Sync run
response = client.run("What's the weather in Paris?")
assert "sunny" in response.content
assert len(client.calls) == 2  # Two model calls: tool-use turn + final answer

# Async run
response = await client.run_async("Weather?")

# Reset state between tests
client.reset()
assert client.calls == []
```

### `LLMConfig.for_testing()` pattern

```python
from lauren_ai import LLMConfig, Completion, TokenUsage

cfg, mock = LLMConfig.for_testing()

mock.queue_response(Completion(
    id="t1", model="mock-model",
    content="Hello from mock!",
    tool_calls=[], stop_reason="end_turn",
    usage=TokenUsage(input_tokens=10, output_tokens=5),
))

LLMProvider = LLMModule.for_root(cfg, transport_override=mock)
```

---

## Interceptors

```python
from lauren_ai import ai_metrics_interceptor, token_usage_response_interceptor
from lauren import use_interceptors, module

@module(imports=[LLMProvider])
@use_interceptors(ai_metrics_interceptor(), token_usage_response_interceptor())
class AppModule: ...
```

- `ai_metrics_interceptor()` — reads `request.state.ai_token_usage` and emits metrics
- `token_usage_response_interceptor()` — adds `x-token-usage` and `x-ai-cost-usd` response headers

---

## Guards

```python
from lauren_ai import token_budget_guard, requires_capability, safety_guard, SafetyPolicy
from lauren import use_guards, controller

@use_guards(
    token_budget_guard(max_tokens_per_window=100_000, window_seconds=3600),
)
@controller("/api")
class ChatController: ...
```

| Guard factory | Description |
|---|---|
| `token_budget_guard(max_tokens_per_window, window_seconds)` | Enforce per-window token budget |
| `requires_capability(capability)` | Require a named model capability in request state |
| `safety_guard(policy)` | Block requests that violate a `SafetyPolicy` |

---

## Workflows

```python
from lauren_ai._workflows import Workflow, Step, Parallel, Condition, Loop

async def summarise(ctx):
    text = ctx["text"]
    result = await llm_service.complete([Message.user(f"Summarise: {text}")])
    return {"summary": result.content}

async def translate(ctx):
    lang = ctx.get("lang", "Spanish")
    result = await llm_service.complete([Message.user(f"Translate to {lang}: {ctx['summary']}")])
    return {"translation": result.content}

pipeline = (
    Workflow()
    .then(Step("summarise", summarise))
    .then(Parallel(
        Step("translate_es", translate),
        Step("translate_fr", translate),
    ))
)

result = await pipeline.run({"text": long_document})
print(result.outputs["translate_es"])
```

---

## Evaluation

```python
from lauren_ai._eval import AccuracyEval, EvalDataset, EvalExample

dataset = EvalDataset(examples=[
    EvalExample(input="What is 2+2?", expected_output="4"),
    EvalExample(input="Capital of France?", expected_output="Paris"),
])

eval_suite = AccuracyEval(agent=MyAgent, dataset=dataset)
report = await eval_suite.run()

print(f"Pass rate: {report.pass_rate:.1%}")  # e.g. "100.0%"
report.assert_pass_rate(0.9)  # Raises AssertionError if below 90%
```

---

## Built-in Skills

```python
from lauren_ai._skills import WebSearchTool, HttpFetchTool, CodeExecutionTool

@agent(model="claude-opus-4-6")
@use_tools(WebSearchTool, HttpFetchTool)
class ResearchAgent: ...
```

| Tool class | Description |
|---|---|
| `WebSearchTool` | Search the web via DuckDuckGo or Google |
| `HttpFetchTool` | Make HTTP requests and return response body |
| `CodeExecutionTool` | Execute Python code in a sandboxed subprocess |

---

## Exception hierarchy

```
LaurenAIError
├── TransportError
│   ├── TransientTransportError  (retriable: 429, 503, etc.)
│   └── AuthTransportError       (401, 403)
├── AgentMaxTurnsError
├── AgentBudgetExceededError
├── AgentConfigError
├── DecoratorUsageError
├── ToolExecutionError
├── ToolConfigError
├── ToolSchemaError
├── KnowledgeLoadError
├── WorkflowError
├── OutputParserError
│   └── MaxRetryError
├── EmptyQueueError
└── TracingError
```

---

## Section 32 — Prompt Templates & Chains

```python
from lauren_ai import PromptTemplate, ChatPromptTemplate, FewShotPromptTemplate
from lauren_ai import FewShotExample, Chain, StrOutputParser

# String template
tmpl = PromptTemplate("Translate the following to French: {text}")

# Chat template
chat = ChatPromptTemplate([
    ("system", "You are a professional translator."),
    ("user", "{text}"),
])

# Few-shot template
few_shot = FewShotPromptTemplate(
    examples=[
        FewShotExample(input="Good morning", output="Bonjour"),
        FewShotExample(input="Thank you", output="Merci"),
    ],
    template="Translate: {text}",
)

# Pipe composition — each element must implement Runnable
chain: Chain = tmpl | llm_service | StrOutputParser()
result: str = await chain.invoke({"text": "Hello world"})
```

**Key invariant**: `|` returns a new `Chain`; the operands are never mutated.

---

## Section 33 — Output Parsers

```python
from lauren_ai import (
    StrOutputParser,
    JSONOutputParser,
    RegexParser,
    CommaSeparatedListParser,
    MarkdownCodeBlockParser,
    PydanticOutputParser,
    RetryOutputParser,
)
from pydantic import BaseModel

class City(BaseModel):
    name: str
    country: str

# Parse LLM output as a Pydantic model
parser = PydanticOutputParser(City)
chain = ChatPromptTemplate([("user", "Name a capital city as JSON.")]) | llm_service | parser
city: City = await chain.invoke({})

# Retry up to 3 times if parsing fails
safe_parser = RetryOutputParser(parser, max_retries=3, llm=llm_service)

# Comma-separated list
list_parser = CommaSeparatedListParser()
items: list[str] = await list_parser.parse("apples, bananas, cherries")

# Extract first fenced code block
code_parser = MarkdownCodeBlockParser(language="python")
code: str = await code_parser.parse(llm_output)
```

**Key invariant**: Parsers raise `OutputParserError` on failure.
`RetryOutputParser` wraps with automatic re-prompting up to `max_retries`.

---

## Section 34 — Agent Teams

```python
from lauren_ai import team, TeamRunner, TeamResult, TeamMemory
from lauren_ai import (
    TeamWorkerStarted, TeamWorkerFinished,
    TeamCoordinatorDecision, TeamFinalAnswer,
)

@agent(model="claude-opus-4-6", system="You research topics.")
class ResearchAgent: ...

@agent(model="claude-opus-4-6", system="You write clear summaries.")
class WriterAgent: ...

@team(
    name="content-pipeline",
    mode="coordinator",          # "coordinator" | "collaborate"
    max_rounds=5,
    model="claude-opus-4-6",
)
class ContentTeam:
    def __init__(self, researcher: ResearchAgent, writer: WriterAgent) -> None:
        self.researcher = researcher
        self.writer = writer

# Run the team
runner = TeamRunner(
    team_cls=ContentTeam,
    llm=llm_service,            # LLMService (from DI or LLMModule)
    agent_runner=agent_runner,  # shared AgentRunner
)
result: TeamResult = await runner.run("Write a post about async Python.")

print(result.final_answer)     # str — the team's final output
print(result.rounds)           # int
print(result.worker_outputs)   # dict[str, str] — per-agent contributions

# Stream team events
async for event in runner.run_stream("Write a post about async Python."):
    if isinstance(event, TeamWorkerStarted):
        print(f"[R{event.round}] Starting: {event.worker_name}")
    elif isinstance(event, TeamWorkerFinished):
        print(f"[R{event.round}] Done: {event.worker_name} — {event.result_content[:100]}")
    elif isinstance(event, TeamCoordinatorDecision):
        print(f"[Coordinator] {event.decision}")
    elif isinstance(event, TeamFinalAnswer):
        print(f"[Final] {event.content}")
```

**Key invariants**:
- `@team()` must use parentheses; bare `@team` raises `DecoratorUsageError`.
- Workers are declared as typed `__init__` parameters on the team class.
- `TeamRunner(team_cls, llm, agent_runner)` — `llm` is the `LLMService`.
- `run_stream()` is an async generator; iterate with `async for`.

---

## Section 35 — Tracing & Observability

```python
from lauren_ai import (
    traced, SpanKind, Span, Trace, TraceStore, TracingConfig,
    InMemoryTraceExporter, ConsoleTraceExporter, FileTraceExporter,
    set_trace_store, get_trace_store,
)

# Configure tracing
exporter = InMemoryTraceExporter()
config = TracingConfig(service_name="my-app", sample_rate=1.0)
store = TraceStore(config=config, exporters=[exporter])
set_trace_store(store)

# Decorate an async function
@traced(name="summarise", kind=SpanKind.AGENT)
async def summarise(text: str) -> str:
    result = await llm_service.complete([Message.user(f"Summarise: {text}")])
    return result.content

# After execution:
traces: list[Trace] = exporter.traces
span: Span = traces[0].root_span
print(span.name, span.duration_ms, span.status)

# Implement a custom exporter
from lauren_ai import TraceExporter

class MyExporter(TraceExporter):
    async def export(self, trace: Trace) -> None: ...
```

**Key invariant**: `@traced()` must use parentheses. Spans are exported
asynchronously; the decorated function's return value is unchanged.

---

## Section 36 — Persistent User Memory

```python
from lauren_ai import remember, MemoryFact, InMemoryUserMemoryStore

store = InMemoryUserMemoryStore()

@agent(model="claude-opus-4-6", system="You are a personal assistant.")
@remember(store=store, extract=True, inject=True, top_k=5)
class PersonalAgent: ...

# MemoryFact — stored facts
fact = MemoryFact(
    user_id="user-42",
    content="The user prefers concise answers.",
    confidence=0.9,
)

# Implement custom storage
from lauren_ai import UserMemoryStore

class RedisUserMemoryStore(UserMemoryStore):
    async def save(self, user_id: str, fact: MemoryFact) -> None: ...
    async def search(self, user_id: str, query: str, top_k: int) -> list[MemoryFact]: ...
    async def delete(self, user_id: str, fact_id: str) -> None: ...
```

**Key invariant**: `@remember()` must be placed **below** `@agent()` (applied first).
`extract=True` adds a post-run hook; `inject=True` prepends facts before each run.

---

## Section 37 — Structured LLM Outputs

```python
from lauren_ai import StructuredLLM, LLMService, Message
from pydantic import BaseModel

class SentimentResult(BaseModel):
    sentiment: str       # "positive" | "negative" | "neutral"
    confidence: float    # 0.0 – 1.0
    summary: str

# Obtain a typed wrapper from LLMService
structured: StructuredLLM[SentimentResult] = llm_service.with_structured_output(SentimentResult)

result: SentimentResult = await structured.complete(
    [Message.user("This product is amazing!")]
)
print(result.sentiment, result.confidence)

# Also works in a Chain
chain = prompt | structured
result = await chain.invoke({"review": "Great quality."})
```

**Key invariant**: Uses tool-calling under the hood; the model must support
function/tool calling. Validation errors raise `OutputParserError`.

---

## Section 38 — Multimodal Inputs

```python
from lauren_ai import ImageContent, AudioContent, DocumentContent, ContentPart, Message

# Image from URL
img = ImageContent(url="https://example.com/chart.png")

# Image from bytes
img_bytes = ImageContent(data=raw_bytes, mime_type="image/png")

# Audio
audio = AudioContent(data=audio_bytes, mime_type="audio/mp3")

# PDF / document
doc = DocumentContent(data=pdf_bytes, mime_type="application/pdf")

# Build a mixed-content user message
msg = Message.from_multimodal("user", [
    "Please describe the chart and summarise the document:",
    img,
    doc,
])

# Pass directly to LLMService or as part of agent.run()
result = await llm_service.complete([msg])
```

**Key invariant**: Only transports that support multimodal will accept
`ImageContent`/`AudioContent`/`DocumentContent`; unsupported transports raise
`TransportError` at call time, not at startup.

---

## Section 39 — Semantic Router

```python
from lauren_ai import SemanticRouter, Route, RouteMatch

async def embed_fn(texts: list[str]) -> list[list[float]]:
    embeddings = await embed_service.embed(texts)
    return [e.vector for e in embeddings]

router = SemanticRouter(
    routes=[
        Route(name="weather", examples=["What's the weather?", "Is it raining today?"]),
        Route(name="booking", examples=["Book a flight", "Reserve a hotel room"]),
    ],
    embed_fn=embed_fn,
    min_confidence=0.7,
)

await router.compile()  # Must be called before route()

match: RouteMatch = await router.route("Will it snow tomorrow?")
print(match.route)       # "weather"
print(match.confidence)  # float, e.g. 0.91
print(match.matched)     # bool — False if below min_confidence
```

**Key invariant**: `compile()` must be awaited before any `route()` call;
calling `route()` before `compile()` raises `RouterNotCompiledError`.

---

## Section 40 — Cost & Rate Tracking

```python
from lauren_ai import (
    CostTracker, TokenBudget, RateLimiter,
    PricingTable, ModelPricing, CostEstimate,
    default_pricing_table,
    BudgetExceededError, RateLimitExhaustedError,
)

# Pricing table
table: PricingTable = default_pricing_table()
custom_table = PricingTable(models={
    "claude-opus-4-6": ModelPricing(input_per_1k=0.015, output_per_1k=0.075),
})

# Cost tracker (injectable singleton)
tracker = CostTracker(pricing=table)

async with tracker.session() as session:
    result = await agent_runner.run(agent_instance, "Hello!")
    estimate: CostEstimate = session.estimate   # cost so far

report = await tracker.report()
print(report.total_usd, report.total_tokens)

# Token budget per run
budget = TokenBudget(max_tokens_per_conversation=100_000)
# Raises BudgetExceededError when limit exceeded

# Rate limiter
limiter = RateLimiter(requests_per_minute=60)
await limiter.acquire()
# Raises RateLimitExhaustedError when ceiling breached
```

**Key invariant**: `CostTracker` is safe to inject as `Scope.SINGLETON`; its
internal counters are protected by an asyncio lock.

---

## Section 41 — Guardrails & Content Safety

Two decorators are provided:

| Decorator | Purpose |
|---|---|
| `@use_guardrails()` | **Agent decorator** — attaches pre-built guardrail instances to an `@agent()` class. |
| `@guardrail()` | **Class decorator** — marks a class as a DI-injectable guardrail provider. Applies `@injectable(scope=Scope.SINGLETON)` automatically. |

```python
from lauren_ai import (
    use_guardrails, guardrail,
    TopicFilter, PIIRedactor, LengthFilter,
    PromptInjectionFilter, LLMGuardrail,
    GuardrailDecision, GuardrailContext, GuardrailViolated,
    InputGuardrail, OutputGuardrail,
    GUARDRAIL_CLASS_META, GuardrailClassMeta,
    USE_GUARDRAILS_META, UseGuardrailsMeta,
)

# --- @use_guardrails() — attach instances to an agent ---

@agent(model="claude-opus-4-6", system="Customer support assistant.")
@use_guardrails(
    input=[
        TopicFilter(allowed_topics=["billing", "support", "account"]),
        PIIRedactor(entities=["EMAIL", "PHONE", "SSN"]),
        PromptInjectionFilter(),
    ],
    output=[
        LengthFilter(max_chars=500),
    ],
)
class SupportAgent: ...

# None entries are silently dropped (conditional selection):
@agent(model="claude-opus-4-6")
@use_guardrails(
    input=[
        PromptInjectionFilter(),
        TopicFilter(allowed_topics=allowed) if allowed else None,
    ],
)
class DynamicAgent: ...

# --- @guardrail() — DI-injectable guardrail class ---

@guardrail(kind="input")   # also marks as @injectable(scope=Scope.SINGLETON)
class ProfanityFilter:
    async def check(self, message: str, ctx: GuardrailContext) -> GuardrailDecision:
        if "badword" in message.lower():
            return GuardrailDecision(
                action="block",
                violation="Profanity detected.",
                guardrail_name=type(self).__name__,
            )
        return GuardrailDecision(action="pass", guardrail_name=type(self).__name__)

# kind may be "input", "output", or "any" (default)

# --- LLMGuardrail — use another LLM to evaluate safety ---
# Basic usage (action="block" by default):
llm_guard = LLMGuardrail(
    llm=llm_service,
    prompt="Does the following text contain harmful content? Reply YES or NO.\n\n{content}",
    block_if="YES",
)

# Extended parameters — graceful redirect + cost-efficient judge call:
llm_guard = LLMGuardrail(
    llm=llm_service,
    prompt="Is this response outside the agent's scope?\n\n{content}\n\nYES or NO.",
    block_if="YES",
    action="modify",                          # replaces response instead of raising
    violation_message="I can't help with that. Redirecting you.",
    system="Answer with YES or NO only.",     # judge system prompt
    max_tokens=5,                             # YES/NO needs ≤1 token
    temperature=0.0,                          # deterministic
    guardrail_name="ScopeGuard",              # shown in activity feeds
)

# --- GuardrailDecision ---
decision = GuardrailDecision(
    action="modify",                          # "pass" | "block" | "modify"
    modified_content="[REDACTED]",
    violation="PII detected: email address",
    guardrail_name="PIIRedactor",
)
```

**Key invariants**:
- `@use_guardrails()` must use parentheses; bare `@use_guardrails` raises `DecoratorUsageError`.
- `@guardrail()` must use parentheses; bare `@guardrail` raises `DecoratorUsageError`.
- Input guardrails run before the model call; output guardrails run after.
- A `"block"` decision raises `GuardrailViolated` (a `LaurenAIError`).
- `@guardrail()` checks for an existing `__lauren_injectable__` sentinel and skips re-applying `@injectable()` if already present (idempotent).

---

## Section 42 — Delegation Pattern (cross-module tool routing)

The recommended way to give an orchestrator agent access to a specialist sub-agent
via a class-form `@tool()`.

### How the wiring works

- The delegation tool lives in the **calling module's `tools=`**.
- The calling module imports the target module (`imports=[..., SpecialistMod]`), which
  makes the target runner token visible to DI when resolving the delegation tool.
- The delegation tool's `__init__` uses the **named concrete runner subclass** (not
  the `AgentRunner` Protocol) — this is unambiguous even when two runners are in scope.

No `DelegationWiring` singleton, no `runner: AgentRunner | None = None` workaround.

### Step 1 — define the named runner tokens and the delegation tool

```python
# delegation.py — do NOT add `from __future__ import annotations`
# (@tool() uses inspect.signature() at decoration time)
from lauren import injectable, Scope
from lauren_ai import AgentRunnerBase, ToolContext, tool
from .specialist_agent import SpecialistAgent

@injectable(scope=Scope.SINGLETON)
class SpecialistAgentRunner(AgentRunnerBase):
    """Named DI token for the Specialist module's runner.

    Using the named subclass (not ``AgentRunner`` Protocol) avoids
    ``ProtocolAmbiguityError`` in any scope that sees both runners.
    """

@injectable(scope=Scope.SINGLETON)
class OrchestratorAgentRunner(AgentRunnerBase):
    """Named DI token for the Orchestrator module's runner."""

@tool()
class DelegateToSpecialist:
    """Delegate a task to the SpecialistAgent.

    Args:
        task: Full description of what the Specialist Agent should do.
    """

    def __init__(self, agent: SpecialistAgent, runner: SpecialistAgentRunner) -> None:
        self._agent = agent
        self._runner = runner  # named subclass — no ambiguity

    async def run(self, ctx: ToolContext, task: str) -> dict:
        response = await self._runner.run(
            self._agent, task,
            execution_context=ctx.execution_context,
        )
        return {"result": response.content, "stop_reason": response.stop_reason}
```

### Step 2 — target module registers its named runner token

```python
# specialist_module.py
from lauren_ai import AgentModule, LLMModule, LLMConfig
from .specialist_agent import SpecialistAgent
from .delegation import SpecialistAgentRunner

LLMProvider = LLMModule.for_root(LLMConfig.for_anthropic())

SpecialistMod = AgentModule.for_root(
    agents=[SpecialistAgent],
    tools=[SpecialistTool1, SpecialistTool2],
    imports=[LLMProvider],
    runner=SpecialistAgentRunner,   # registers SpecialistAgentRunner as the runner token
)
```

### Step 3 — calling module owns the delegation tool and imports the target module

```python
# orchestrator_module.py
from lauren_ai import AgentModule
from .orchestrator_agent import OrchestratorAgent
from .delegation import DelegateToSpecialist, OrchestratorAgentRunner
from .specialist_module import SpecialistMod

OrchestratorMod = AgentModule.for_root(
    agents=[OrchestratorAgent],
    tools=[DelegateToSpecialist],          # ← delegation tool lives in the CALLING module
    imports=[LLMProvider, SpecialistMod],  # ← import makes SpecialistAgentRunner visible to DI
    runner=OrchestratorAgentRunner,
)

# A controller in the outer module can inject both runners unambiguously:
class AppController:
    def __init__(
        self,
        runner: OrchestratorAgentRunner,
        specialist_runner: SpecialistAgentRunner,
    ) -> None: ...
```

### Architectural rule

> The delegation tool belongs in the **calling module's `tools=`**. The calling
> module imports the target module so the target runner token is visible. The tool's
> `__init__` must use the named concrete runner subclass — not `AgentRunner`
> Protocol — because two runners are in scope and Protocol scan would be ambiguous.
>
> Every `AgentModule.for_root()` call MUST have its own dedicated runner token.
> Use `runner=MyRunner` with an explicit `AgentRunnerBase` subclass whenever
> a controller, service, or tool needs to inject a specific module's runner.

### `AgentRunner.run()` accurate signature

**Critical invariant — pass an instance, not a class.**
`@agent()` applies `@injectable(scope=Scope.SINGLETON)`.  `AgentModule.for_root()`
registers and **exports** each agent so DI injects an instance into controllers.
Passing the class itself bypasses DI and raises `TypeError` on lifecycle hooks:

```python
# WRONG — passes the class, not an instance; breaks on_start / on_finish hooks
response = await runner.run(MyAgent, "message")

# CORRECT — inject the agent via constructor DI, then pass the instance.
# runner: AgentRunner resolves to this module's runner via structural Protocol scan
# (unambiguous when only one AgentModule is in scope; use a named runner subclass
# when two or more AgentModules are visible — see Multi-module runner disambiguation).
class MyController:
    def __init__(self, runner: AgentRunner, agent: MyAgent) -> None:
        self._runner = runner
        self._agent = agent  # DI-resolved singleton

    async def handle(self, message: str) -> str:
        response = await self._runner.run(self._agent, message)
        return response.content
```

```python
# CORRECT
response = await runner.run(agent_instance, "User message here")

# CORRECT with optional kwargs
response = await runner.run(
    agent_instance,
    "User message",
    conversation_id="sess-123",
    metadata={"user_id": "u1"},
    request=http_request,          # optional Lauren Request
    execution_context=exec_ctx,    # optional lauren ExecutionContext (forwarded to ToolContext)
    run_id="run-abc",              # optional — random hex if omitted
)
```

---

## Section 43 — @team() Accurate API

```python
from lauren_ai import team, TeamRunner

@team(
    name="research-team",
    mode="coordinator",          # "coordinator" | "collaborate"
    model="claude-opus-4-6",     # model for coordinator decisions
    max_rounds=4,
    coordinator_prompt=None,     # Optional: override default routing prompt
)
class ResearchTeam:
    """Multi-agent team for research tasks."""

    def __init__(
        self,
        researcher: ResearchAgent,
        code_assistant: CodeAssistantAgent,
    ) -> None:
        self.researcher = researcher
        self.code_assistant = code_assistant


# TeamRunner constructor
runner = TeamRunner(
    team_cls=ResearchTeam,
    llm=llm_service,            # LLMService
    agent_runner=agent_runner,  # shared AgentRunner
)

# Blocking run
result = await runner.run("Research quantum computing breakthroughs.")
print(result.final_answer)
print(result.worker_outputs)  # {"researcher": "...", "code_assistant": "..."}
print(result.rounds)

# Streaming run
async for event in runner.run_stream("Research topic"):
    if isinstance(event, TeamWorkerStarted):
        print(f"[R{event.round}] Starting: {event.worker_name}")
    elif isinstance(event, TeamWorkerFinished):
        print(f"[R{event.round}] Done: {event.worker_name} — {event.result_content[:80]}")
    elif isinstance(event, TeamCoordinatorDecision):
        print(f"[Coordinator] {event.decision}")
    elif isinstance(event, TeamFinalAnswer):
        print(f"[Final] {event.content}")
```

**Default coordinator prompt format**:
```
ROUTE: <worker_name>   — route to a worker
DONE: <final answer>   — declare task complete
```

**Key invariants**:
- `@team()` must use parentheses.
- Worker parameters in `__init__` must have agent-class type annotations.
- `run_stream()` is an async generator; iterate with `async for`.

---

## Decorator ordering — mandatory summary

```
@agent()            ← outermost decorator (sets AGENT_META, reads USE_TOOLS_META)
@remember()         ← optional (sets REMEMBER_META)
@use_guardrails()   ← optional (sets USE_GUARDRAILS_META)
@use_tools()        ← innermost (sets USE_TOOLS_META)
class MyAgent: ...
```

Python applies decorators bottom-up, so `@use_tools` runs first and sets
`USE_TOOLS_META`; `@agent` runs last and reads it. Swapping any two decorators
produces silently broken behaviour (missing tools, missing guardrails, etc.).

---

## Critical invariant — tool annotations must resolve

```python
# WRONG — unresolved forward refs still break schema generation:
from __future__ import annotations

@tool()
async def my_tool(param: LaterType) -> dict: ...   # LaterType is not defined yet

# CORRECT — future annotations are fine when every type resolves:
from __future__ import annotations

from my_types import ToolInput

@tool()
async def my_tool(param: ToolInput) -> dict: ...

# ALSO CORRECT — class-form tools can use the same pattern:
from __future__ import annotations

@tool()
class MyClassTool:
    def __init__(self, dep: SomeDependency, runner: AgentRunner | None = None) -> None:
        ...
    async def run(self, param: str) -> dict: ...
```

---

## Links

- Documentation: https://docs.lauren-framework.dev/lauren-ai/
- GitHub: https://github.com/lauren-framework/lauren-ai
- PyPI: https://pypi.org/project/lauren-ai/
- Changelog: https://github.com/lauren-framework/lauren-ai/blob/main/CHANGELOG.md