---
name: agent-orchestration-planner
description: Designs multi-step agent workflows with tool usage, retry logic, state management, and budget controls. Provides orchestration diagrams, tool execution order, fallback strategies, and cost limits. Use for "AI agents", "agentic workflows", "multi-step AI", or "autonomous systems".
---

# Agent Orchestration Planner

Design robust multi-step agent systems with tools and error handling.

## Agent Architecture

```
User Query → Planning → Tool Selection → Tool Execution → Result Synthesis → Response
              ↓            ↓                ↓                    ↓
           Memory      Retry Logic      Validation         Cost Tracking
```

## Agent Loop Pattern

```python
from typing import List, Dict, Any

class Agent:
    def __init__(self, tools: List[Tool], max_iterations: int = 5):
        self.tools = tools
        self.max_iterations = max_iterations
        self.memory = []
        self.cost_tracker = CostTracker()

    def run(self, query: str) -> str:
        self.memory.append({"role": "user", "content": query})

        for iteration in range(self.max_iterations):
            # Decide next action
            action = self.plan_next_action()

            if action["type"] == "final_answer":
                return action["content"]

            # Execute tool
            result = self.execute_tool(action["tool"], action["params"])

            # Track cost
            self.cost_tracker.add(result["cost"])

            # Check budget
            if self.cost_tracker.exceeds_limit():
                return self.budget_exceeded_response()

            # Add to memory
            self.memory.append({
                "role": "tool",
                "tool": action["tool"],
                "result": result["data"]
            })

        return "Max iterations reached"

    def plan_next_action(self) -> Dict:
        prompt = self.build_planning_prompt()
        response = llm(prompt)
        return parse_action(response)
```

## Tool Orchestration

```python
TOOL_ORDER = {
    "search_web": 1,        # Always try search first
    "query_database": 2,    # Then database
    "call_api": 3,          # Then external APIs
    "generate_content": 4,  # Finally generate
}

def select_tools(query: str, available_tools: List[Tool]) -> List[Tool]:
    """Select and order tools based on query"""
    # Use LLM to select relevant tools
    tool_selection_prompt = f"""
    Given this query: "{query}"

    Which of these tools are needed? {[t.name for t in available_tools]}

    Return JSON array of tool names in execution order.
    """

    selected_names = json.loads(llm(tool_selection_prompt))
    selected_tools = [t for t in available_tools if t.name in selected_names]

    # Sort by predefined order
    selected_tools.sort(key=lambda t: TOOL_ORDER.get(t.name, 999))

    return selected_tools
```

## Retry & Fallback Logic

```python
def execute_with_retry(tool: Tool, params: Dict, max_retries: int = 3):
    """Execute tool with exponential backoff retry"""
    for attempt in range(max_retries):
        try:
            result = tool.execute(params)
            return {"success": True, "data": result}
        except ToolError as e:
            if attempt == max_retries - 1:
                # Try fallback tool
                fallback = get_fallback_tool(tool.name)
                if fallback:
                    return execute_with_retry(fallback, params, 1)

                return {"success": False, "error": str(e)}

            # Wait before retry
            time.sleep(2 ** attempt)

FALLBACK_TOOLS = {
    "search_web": "query_database",
    "call_api": "use_cached_data",
}
```

## State Management

```python
class AgentState:
    def __init__(self):
        self.memory = []
        self.tool_results = {}
        self.costs = 0.0
        self.iteration = 0

    def add_message(self, role: str, content: str):
        self.memory.append({"role": role, "content": content})

    def add_tool_result(self, tool_name: str, result: Any):
        self.tool_results[tool_name] = result

    def get_context(self) -> str:
        """Build context from memory for next LLM call"""
        return "\n".join([
            f"{msg['role']}: {msg['content']}"
            for msg in self.memory[-5:]  # Last 5 messages
        ])
```

## Budget & Cost Controls

```python
class CostTracker:
    def __init__(self, max_cost: float = 1.0):
        self.max_cost = max_cost
        self.total_cost = 0.0
        self.breakdown = {}

    def add(self, cost: float, category: str = "llm"):
        self.total_cost += cost
        self.breakdown[category] = self.breakdown.get(category, 0) + cost

    def exceeds_limit(self) -> bool:
        return self.total_cost >= self.max_cost

    def remaining(self) -> float:
        return self.max_cost - self.total_cost

# Use in agent
if cost_tracker.exceeds_limit():
    return f"Budget limit reached. Used ${cost_tracker.total_cost:.4f}"
```

## Orchestration Diagram

```mermaid
graph TD
    A[User Query] --> B[Plan Action]
    B --> C{Action Type?}
    C -->|Tool Call| D[Execute Tool]
    C -->|Final Answer| E[Return Response]
    D --> F[Validate Result]
    F -->|Success| G[Update Memory]
    F -->|Failure| H[Retry/Fallback]
    H --> D
    G --> I{Budget OK?}
    I -->|Yes| B
    I -->|No| J[Budget Exceeded]
    J --> E
```

## Planning Prompt

```python
def build_planning_prompt(state: AgentState) -> str:
    return f"""
You are an agent that can use tools to answer questions.

Available tools:
{json.dumps([t.schema for t in tools], indent=2)}

Conversation history:
{state.get_context()}

Based on the conversation, decide your next action:
1. Call a tool (specify tool name and parameters)
2. Provide final answer

If calling a tool, respond with:
{{"action": "tool_call", "tool": "tool_name", "params": {{...}}}}

If providing final answer, respond with:
{{"action": "final_answer", "content": "your answer"}}

Think step by step about what information you need.
"""
```

## Multi-Agent Coordination

```python
class MultiAgentSystem:
    def __init__(self):
        self.agents = {
            "researcher": ResearchAgent(),
            "coder": CodeAgent(),
            "reviewer": ReviewAgent(),
        }

    def run(self, task: str):
        # Researcher gathers information
        context = self.agents["researcher"].run(task)

        # Coder generates solution
        code = self.agents["coder"].run(f"{task}\nContext: {context}")

        # Reviewer validates
        review = self.agents["reviewer"].run(f"Review this code:\n{code}")

        if review["approved"]:
            return code
        else:
            # Iterate with feedback
            return self.agents["coder"].run(
                f"Fix this code based on feedback:\n{review['feedback']}"
            )
```

## Best Practices

1. **Limit iterations**: Prevent infinite loops
2. **Budget controls**: Track and limit costs
3. **Tool validation**: Verify tool outputs
4. **Error handling**: Graceful fallbacks
5. **State persistence**: Save progress
6. **Observability**: Log all actions
7. **Human-in-loop**: Critical decisions

## Output Checklist

- [ ] Agent loop implementation
- [ ] Tool selection logic
- [ ] Retry & fallback strategies
- [ ] State management
- [ ] Cost tracking
- [ ] Budget limits
- [ ] Orchestration diagram
- [ ] Planning prompts
- [ ] Error handling
- [ ] Observability/logging