# AI SDK Agents - Comprehensive Notes ## 1. Directory Structure The agents documentation contains 6 core markdown files: - `overview.md` - High-level agent concepts and ToolLoopAgent class introduction - `building-agents.md` - Creating agents, configuration, and usage patterns - `loop-control.md` - Controlling execution flow, stopping conditions, prepareStep - `workflows.md` - Structured patterns for complex workflows - `configuring-call-options.md` - Runtime configuration and dynamic behavior - `subagents.md` - Delegating to specialized subagents ## 2. Agent Architecture **Core Concept**: Agents are LLMs that use tools in a loop to accomplish tasks. **Three Components**: 1. **LLMs** - Process input and decide next action 2. **Tools** - Extend capabilities beyond text (files, APIs, databases) 3. **Loop** - Orchestrates execution with context management and stopping conditions **ToolLoopAgent Class**: - Main abstraction for building agents - Handles loop iteration, message history, and tool execution - Default behavior: stops after 20 steps (`stepCountIs(20)`) - Configurable with system instructions, tools, and output schemas ## 3. Agent Protocols & Execution Model **Loop Execution Flow**: ``` Input -> Model Generation -> Check Result | Tool Call? -> Execute Tool -> Add to Context -> Continue | Text Generated? -> Return Result -> Done | Stop Condition Met? -> Done ``` **Each Step Represents**: - One model generation (results in either text or tool call) - Either a tool call to execute, or text response to return **Loop Continues Until**: - Model generates text (finish reasoning other than tool-calls) - Tool invoked has no execute function - Tool call needs approval - Stop condition is met ## 4. Tool Execution Model **Tool Definition** (using `tool()` helper): ```typescript tool({ description: '...', // What the tool does inputSchema: z.object({...}), // Zod schema for inputs execute: async ({...}) => {}, // Function that runs }) ``` **Tool Execution Mechanics**: - Model decides which tool to call and what inputs to provide - SDK executes the tool automatically - Result is added to message history as "tool" role message - Model sees result and decides next action **Tool Calling Control** (`toolChoice`): - `'auto'` (default) - Model decides to use tools or generate text - `'required'` - Force model to always call a tool - `'none'` - Disable tools entirely - `{ type: 'tool', toolName: 'specific' }` - Force specific tool **Preliminary Tool Results** (for streaming): - `execute` can be async generator function - Each `yield` sends partial result to UI - Allows showing progress while tool executes - Used for subagent streaming ## 5. Multi-Step/Agentic Loops **Default Loop**: - Maximum 20 steps by default - Each step: generate -> evaluate -> execute tool or stop - Full context window available for each generation **Loop Control Mechanisms**: ### A) Stop Conditions (`stopWhen`): - `stepCountIs(N)` - Stop after N steps - `hasToolCall('toolName')` - Stop after specific tool called - Custom conditions - Define logic based on steps array - Can combine multiple conditions (stops if ANY met) ### B) Dynamic Control (`prepareStep`): - Runs before each step - Can modify: - Model selection (switch based on complexity) - Available tools (`activeTools`) - Tool choice enforcement - Messages (e.g., truncate for context limits) - Any agent setting - Receives: stepNumber, steps array, messages, model config - Enables phased workflows (search -> analyze -> summarize) ### C) Message Management: - System message + user message + alternating assistant/tool messages - `prepareStep` can modify message history (e.g., keep recent only) - All previous steps visible to stop conditions ## 6. Memory & Context Management **Message History Structure**: ``` [system, user, assistant (or tool calls), tool, assistant (or tool calls), ...] ``` **Context Issues**: - Long conversations can exceed model context limits - `prepareStep` can implement sliding window (keep only recent N messages) - Summarization possible: replace old messages with summaries **Subagent Context Isolation**: - Each subagent starts with fresh context window - Critical benefit: heavy exploration doesn't bloat main agent - Can pass main agent's message history to subagent if needed **State Persistence**: - All steps available in `steps` array - Tool calls and results preserved - Token usage tracked per step via `onStepFinish` callback ## 7. Human-in-the-Loop Support **Tool Approvals**: - Tools can require human approval before execution - When `needsApproval` set, execution pauses waiting for confirmation - Stops the agent loop (counts as loop termination condition) **Callback Hooks**: - `onStepFinish`: Called after each step completes - Receives: usage, finishReason, toolCalls - Can be in constructor (agent-wide) or method (per-call) - Constructor callback runs first, then method callback **User Interaction Points**: - Tool approvals for sensitive operations - Custom UI via `createAgentUIStreamResponse` - Type-safe message types via `InferAgentUIMessage` ## 8. Streaming in Agents **Streaming Text**: ```typescript const result = await agent.stream({ prompt: '...' }) for await (const chunk of result.textStream) { console.log(chunk) } ``` **Streaming UI Messages**: - `createAgentUIStreamResponse()` - Create API response for client apps - Works with Next.js routes, server actions - Streams both text and tool calls with their results **Streaming Tool Results** (Preliminary Results): - Execute as async generator: `async function* ({ /* inputs */ })` - Each `yield` sends partial result - `readUIMessageStream()` accumulates chunks into complete UIMessage - Frontend displays growing message as it arrives **Subagent Progress**: - Subagent invoked via tool's `execute` function - Tool can stream subagent's progress to UI - `toModelOutput` controls what main agent sees (full UI vs summary) ## 9. Error Handling in Agents **Loop Termination on Error**: - Tool execution failure stops the loop - Incomplete tool calls handled via `convertToModelMessages` with `ignoreIncompleteToolCalls` - Error can be logged and gracefully handled **Cancellation Support**: - `abortSignal` parameter in tool execute - Pass through to async operations - Triggers `AbortError` on cancellation - Subagent cancellation propagates from parent **Custom Error Recovery**: - `prepareStep` can implement retry logic - Modify model/tools on error (switch to stronger model) - Early termination via stop conditions ## 10. Best Practices & Patterns ### A) 5 Core Workflow Patterns: 1. **Sequential Processing (Chains)** - Steps execute in order, each output -> next input - Use for pipelines with clear sequence - Example: generate copy -> evaluate quality -> improve if needed 2. **Routing** - Model classifies input -> determines processing path - First generation determines model/system prompt for next - Example: route query by complexity to different models 3. **Parallel Processing** - Independent tasks run simultaneously (Promise.all) - Aggregate results afterward - Example: parallel code review (security, performance, maintainability) 4. **Orchestrator-Worker** - Primary agent plans -> specialized workers execute - Each worker optimized for subtask type - Example: architect plans feature -> specialized agents implement files 5. **Evaluator-Optimizer** - Dedicated evaluation step assesses results - Based on evaluation: proceed, retry, or take corrective action - Iterative improvement loop - Example: translate -> evaluate -> improve if quality < threshold ### B) Agent Design Principles: - Start with simplest approach that works - Add complexity only when needed - Balance flexibility (LLM freedom) vs control (constraints) - Consider error tolerance and cost implications ### C) System Instructions: - Define agent role and expertise - Specify behavioral guidelines and rules - Explain tool usage patterns - Guide response format/style - Set boundaries and constraints ### D) Tool Design: - Clear descriptions for model understanding - Structured Zod schemas for type safety - Explicit execute functions - No-execute tools for termination signals (with `toolChoice: 'required'`) ### E) Cost Optimization: - Monitor token usage via `onStepFinish` - Use budget-based stop conditions - Switch to smaller models for simple tasks - Summarize tool results in `prepareStep` ### F) Context Management: - Track message count in `prepareStep` - Implement sliding window for long conversations - Use subagents for context-heavy work - Extract summaries instead of full exploration ### G) Structured Workflows: - Use core functions (`generateText`, `generateObject`) for predictable flows - Combine with agents where flexibility needed - Agents excel at exploration/multi-step reasoning - Core functions excel at deterministic workflows ### H) Subagent Patterns: - Use only when benefits exceed latency cost - Context isolation is primary value proposition - Streaming progress with `readUIMessageStream` - `toModelOutput` keeps main agent focused (full UI != model input) - Subagent instructions must explicitly produce summaries ### I) Call Options for Dynamic Behavior: - Define `callOptionsSchema` (Zod) - Implement `prepareCall` to modify settings - Enables RAG (fetch docs -> inject into instructions) - Dynamic model/tool selection per request - Provider-specific options (e.g., reasoning effort) - Async `prepareCall` for data fetching ### J) End-to-End Type Safety: - `InferAgentUIMessage` for UI types - Tool parts have states: input-streaming, input-available, output-available, output-error - Detect streaming vs complete: `part.preliminary` flag - Subagent output accessible in tool part ### K) When NOT to Use Agents: - Simple, one-shot tasks - Workflows with fixed steps - Deterministic processes with known paths - Tasks requiring guaranteed specific behavior - Use core functions (`generateText`) instead ### L) Manual Loop Control: - Direct use of `generateText` with custom loop - Complete control over messages, stopping, tools - When `stopWhen`/`prepareStep` insufficient - Lower-level but more flexible ## 11. Key Configuration Options **ToolLoopAgent Constructor**: - `model` - LLM to use - `instructions` - System prompt - `tools` - Object of available tools - `stopWhen` - Stop condition(s) - `prepareStep` - Callback before each step - `toolChoice` - How model uses tools - `output` - Structured output schema - `onStepFinish` - Per-step callback - `callOptionsSchema` - Runtime configuration schema - `prepareCall` - Transform call options to settings **Generation Methods**: - `generate(options)` - Single generation - `stream(options)` - Streaming response - Both return: `text`, `steps` array, `staticToolCalls`, `usage` ## 12. Integration Points **With UI** (React): - `createAgentUIStreamResponse()` in API route - `useChat()` hook - Full streaming support - Type-safe messages **With Next.js**: - App Router: server actions with agent delegation - API routes: `POST /api/chat` with `createAgentUIStreamResponse` - Server components for data fetching into `callOptions` **With External APIs**: - Tools wrap API calls - Errors caught and handled - Context can be fetched and injected via `prepareCall`