# Tool Use and Agents

Sources: Huyen (AI Engineering, ch. 6 & 10), Lanham (AI Agents in Action), Arsanjani & Bustos (Agentic Architectural Patterns), 2025–2026 production patterns

Covers: tool design, function calling mechanics, parallel execution, error recovery, agent architecture spectrum, multi-agent orchestration patterns, planning strategies, failure modes.

## Tool Design Principles

A tool is any capability the model can invoke: API calls, database queries, calculations, file reads, web searches. Well-designed tools are the difference between a useful agent and an unreliable one.

### Design Rules

| Rule | Rationale |
|------|-----------|
| One tool does one thing | Compound tools obscure failure attribution; hard to retry partial steps |
| Use verb_noun naming | `search_documents`, `get_user`, `send_email` — unambiguous to the model |
| Type all parameters explicitly | JSON schema prevents the model guessing argument types |
| Write descriptions as instructions | "Returns the top 5 documents matching the query" — tell the model what to expect |
| Return structured data, not prose | The model processes JSON better than free-text tool results |
| Include error states in return type | `{success: bool, data: ..., error: str \| null}` — never raise exceptions that halt the loop |
| Make tools idempotent when possible | Safe to retry on failure without side effects |

### Tool Definition Pattern

```
Tool definition:
    name:        "search_documents"
    description: "Search the knowledge base for documents relevant to a query.
                  Returns up to k document chunks with their source metadata."
    parameters:
        query:   {type: string,  description: "Search terms or natural language question"}
        k:       {type: integer, description: "Number of results to return (1–20)", default: 5}
        filters: {type: object,  description: "Optional metadata filters (e.g., {doc_type: 'policy'})"}
    returns:
        results: [{content: string, source: string, score: float}]
        error:   string | null
```

The description field is critical — models select tools based on descriptions, not names. Write descriptions as if explaining to a smart colleague who has never seen your codebase.

## Function Calling Mechanics

### Request → Execute → Continue Loop

```
1. Send messages + tool definitions to model
2. Model responds with one of:
   a. tool_call(s): model wants to execute tools
   b. text response: model has enough information to answer
3. If tool_call(s):
   a. Execute each tool (see parallel execution below)
   b. Append tool_result(s) to messages
   c. Send messages back to model (go to step 2)
4. If text response: return to user
```

This loop continues until the model produces a text response or a max_steps limit is hit.

### Message Thread Structure

```
messages = [
    {role: "system",    content: "You are a helpful assistant with access to tools."},
    {role: "user",      content: "What are our refund policies?"},
    {role: "assistant", tool_calls: [{id: "call_1", name: "search_documents", input: {query: "refund policy"}}]},
    {role: "tool",      tool_call_id: "call_1", content: {results: [...], error: null}},
    {role: "assistant", content: "Based on the documents, our refund policy is..."},
]
```

Always maintain the full message thread. The model uses tool results to generate the final response.

## Parallel Tool Calls

When the model returns multiple tool_calls in a single response, execute them concurrently unless there is an explicit dependency between them.

### When to Parallelize

| Scenario | Parallelize? |
|----------|-------------|
| Independent lookups (get_user + get_order) | Yes |
| Sequential dependency (search → then filter results) | No |
| Same tool with different arguments | Yes |
| Tool B uses output of Tool A | No |

```
parallel execution pattern:
    tool_calls = [call_1, call_2, call_3]
    results = await Promise.all([
        execute(call_1),
        execute(call_2),
        execute(call_3),
    ])
    # All three finish in max(latency_1, latency_2, latency_3) instead of sum
```

3 independent tools at 300ms each: 900ms sequential → 300ms parallel. Always parallelize independent calls.

## Tool Error Recovery

Errors are inevitable. Design the recovery strategy at the tool level, not in the LLM loop.

### Recovery Decision Table

| Error Type | Return to Model | Retry | Escalate |
|------------|----------------|-------|----------|
| Validation error (bad arguments) | Yes — with schema hint | After model corrects args | Never |
| Not found (empty result) | Yes — return null with context | No | If critical |
| Permission denied | Yes — explain limitation | No | To user or admin |
| Rate limit (429) | No — retry silently | Yes, with backoff | After 3 fails |
| Timeout | Yes — return partial or error | Once | After 2 fails |
| Service unavailable | No — retry silently | Yes | After 3 fails |
| Tool logic error (bug) | Yes — return error message | No | Alert on-call |

### Max Steps Guard

Always set a maximum number of tool call iterations. Without it, a confused model can loop indefinitely.

```
max_steps = 10
steps = 0

while steps < max_steps:
    response = model.generate(messages, tools=tools)
    if response.type == "text":
        return response.content
    execute_tools(response.tool_calls)
    messages.append(tool_results)
    steps += 1

return fallback_response("Maximum steps reached. Could not complete the task.")
```

## Agent Architecture Spectrum

Not every task needs a fully autonomous agent. Match architecture to task complexity.

### Architecture Options

| Architecture | Control Flow | Predictability | Use When |
|-------------|-------------|---------------|----------|
| Single LLM call | Fixed | Highest | Simple Q&A, classification |
| LLM + tools (1 loop) | Semi-structured | High | Lookup + generate |
| ReAct agent | LLM-directed | Medium | Open-ended, multi-step |
| Multi-agent | LLM-orchestrated | Lower | Complex, parallelizable |

Prefer workflows (predefined code paths) over agents for production. Workflows are auditable, debuggable, and predictable. Agents excel at tasks where the path is genuinely unknown at design time.

### ReAct Loop (Reason + Act)

The fundamental single-agent pattern. The model alternates between reasoning about the current state and taking an action (tool call).

```
Thought: I need to find the user's order history to answer this question.
Action: get_order_history(user_id="u_123", limit=10)
Observation: [order_1, order_2, order_3]
Thought: The user's most recent order is order_1. Now I need the tracking status.
Action: get_shipment_status(order_id="order_1")
Observation: {status: "shipped", eta: "2026-03-02"}
Thought: I have all the information needed.
Response: Your most recent order ships on March 2nd.
```

## Multi-Agent Orchestration Patterns

### 1. Orchestrator-Worker

```
Orchestrator (central planner)
    ├─ Worker A (retrieval specialist)
    ├─ Worker B (code executor)
    └─ Worker C (summarizer)
```

**Topology**: Hub and spoke. Orchestrator receives the task, decomposes it, delegates to specialized workers, aggregates results.

**Use when**: Task has distinct phases requiring different expertise. Orchestrator enforces sequencing and handles failures.

**Failure mode**: Orchestrator becomes a bottleneck; single point of failure. Fix: make orchestrator stateless; retry at orchestration level.

### 2. Sequential Pipeline

```
Agent A → output → Agent B → output → Agent C → final result
```

**Topology**: Linear chain. Each agent receives the previous agent's output.

**Use when**: Task is a defined sequence of transformations (extract → classify → enrich → format).

**Failure mode**: Cascading errors. A bad output from Agent A corrupts all downstream agents. Fix: validate output schema at each stage before passing forward.

### 3. Fan-Out / Gather (Parallel)

```
Orchestrator
    ├─ Worker 1 (subtask 1) ─┐
    ├─ Worker 2 (subtask 2) ─┼─ Aggregator → final result
    └─ Worker 3 (subtask 3) ─┘
```

**Use when**: Task decomposes into independent subtasks (research 5 competitors simultaneously, process 100 documents in parallel).

**Failure mode**: Partial failure — some workers succeed, some fail. Fix: define quorum (e.g., 3/5 required), use partial results if acceptable.

### 4. Generator-Critic (Reflection)

```
Generator agent → draft
Critic agent → critique → Generator agent → revised draft → ...
```

**Use when**: Output quality matters more than speed. Code review, document editing, plan validation.

**Failure mode**: Infinite refinement loop. Fix: hard limit on iterations (3–5 max); accept-or-escalate after limit.

### 5. Human-in-the-Loop

```
Agent operates autonomously
    → Reaches decision gate (irreversible action, high stakes)
    → Pauses, surfaces to human
    → Human approves/rejects/modifies
    → Agent continues
```

**Use when**: Actions are irreversible (send email, make purchase, delete records) or high-stakes (financial, legal, medical).

**Implementation**: Define approval gates explicitly. Log every human decision with context for audit.

## Planning Strategies

How agents decompose complex tasks before acting.

| Strategy | Approach | When to Use |
|----------|----------|-------------|
| Zero-shot | Model selects tools directly based on tool descriptions | Simple, well-defined tasks |
| Chain-of-thought | Model reasons step-by-step before each tool call | Complex, multi-step tasks |
| Plan-then-execute | Generate full plan upfront, execute sequentially | Tasks with known structure |
| Adaptive planning | Revise plan based on intermediate tool results | Tasks with uncertain paths |

For high-stakes tasks, use plan-then-execute and validate the plan before execution. For exploratory tasks, use adaptive planning.

## Agent Failure Modes

### Common Failures and Fixes

| Failure Mode | Detection | Fix |
|-------------|-----------|-----|
| Wrong tool selected | Tool results irrelevant to task | Improve tool descriptions; reduce tool count |
| Bad tool arguments | Tool returns validation error | Stricter parameter schemas; add argument examples |
| Hallucinated tool name | Tool_call references non-existent tool | Validate tool name before execution; return error to model |
| Context overflow | Generation quality drops in long sessions | Summarize conversation history at regular intervals |
| Infinite loop | Same tool called repeatedly with same args | Track call history; break if (tool, args) pair repeats |
| Unnecessary tool calls | Retrieval for questions the model already knows | Teach the model when NOT to retrieve (self-RAG prompt) |
| Cascading error | Early tool failure corrupts later steps | Validate and sanitize each tool result before appending |

### Loop Detection

```
call_history = {}

before executing tool_call(name, args):
    key = hash(name + JSON(args))
    if key in call_history and call_history[key] >= 2:
        abort("Loop detected: same tool call repeated 2+ times")
    call_history[key] += 1
```

## Tool Count Guidelines

| Tool Count | Model Behavior | Strategy |
|-----------|---------------|----------|
| 1–10 | Reliable selection | Include all tools in every request |
| 10–30 | Occasional confusion | Group tools by task; prefilter by intent |
| 30+ | Frequent tool selection errors | Dynamic tool loading: select 5–10 tools relevant to current task |

For large tool libraries, add a tool-routing step before the main agent loop: classify the user intent, load only the relevant subset of tools.