--- name: agentic-design description: "Use when building AI agent systems. Covers agent loops, tool calling, planning patterns, memory systems, multi-agent coordination, and safety guardrails. Apply when creating autonomous AI workflows, coding assistants, or task automation systems." --- # Agentic Design ## Core Principle Agents are LLMs in a loop with tools. Design for reliability, observability, and graceful failure—not just capability. ## When to Use This Skill - Building AI assistants with tool use - Creating autonomous task executors - Designing multi-agent systems - Implementing coding agents - Building workflow automation - Adding planning capabilities to AI ## The Iron Law **AGENTS FAIL SILENTLY. BUILD FOR OBSERVABILITY.** If you can't see what your agent is doing, you can't fix it when it breaks. ## Why Agentic Systems? **Benefits:** - Automate complex multi-step tasks - Handle dynamic, open-ended problems - Scale human expertise - 24/7 autonomous operation - Adaptable to new situations **Challenges:** - Unpredictable behavior - Error propagation - Cost accumulation - Safety concerns - Debugging complexity ## Agent Architecture Overview ``` ┌─────────────────────────────────────────────────────┐ │ AGENT LOOP │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Observe │───▶│ Think │───▶│ Act │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ▲ │ │ │ │ ┌──────────┐ │ │ │ └─────────│ Memory │◀─────────┘ │ │ └──────────┘ │ │ │ │ │ ┌───────┴───────┐ │ │ ▼ ▼ │ │ ┌──────────┐ ┌──────────┐ │ │ │ Tools │ │ Knowledge│ │ │ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────┘ ``` ## Pattern 1: ReAct (Reasoning + Acting) The foundational agent pattern: Think → Act → Observe → Repeat. ### Basic ReAct Loop ```python def react_agent(task, max_iterations=10): """ReAct pattern: Reason, Act, Observe.""" messages = [ {"role": "system", "content": REACT_SYSTEM_PROMPT}, {"role": "user", "content": task} ] for i in range(max_iterations): # Think: Get model's reasoning and action response = llm.chat(messages, tools=TOOLS) # Check for completion if response.stop_reason == "end_turn": return extract_final_answer(response) # Act: Execute tool calls if response.tool_calls: for tool_call in response.tool_calls: # Execute tool result = execute_tool( tool_call.name, tool_call.arguments ) # Observe: Add result to context messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": str(result) }) messages.append({"role": "assistant", "content": response.content}) return "Max iterations reached" ``` ### ReAct System Prompt ```python REACT_SYSTEM_PROMPT = """You are an AI assistant that solves tasks step by step. For each step: 1. THINK: Analyze what you know and what you need 2. ACT: Use a tool to gather information or take action 3. OBSERVE: Review the result 4. REPEAT or ANSWER: Continue until you can answer Always explain your reasoning before acting. When you have enough information, provide a final answer. Available tools: {tool_descriptions} """ ``` ## Pattern 2: Plan-and-Execute Separate planning from execution for complex tasks. ### Planner-Executor Architecture ```python class PlanAndExecuteAgent: def __init__(self, planner_llm, executor_llm, tools): self.planner = planner_llm self.executor = executor_llm self.tools = tools def run(self, task): # Phase 1: Create plan plan = self.create_plan(task) # Phase 2: Execute steps results = [] for step in plan.steps: result = self.execute_step(step, results) results.append(result) # Replan if needed if result.needs_replanning: plan = self.replan(task, plan, results) # Phase 3: Synthesize return self.synthesize(task, results) def create_plan(self, task): """Generate step-by-step plan.""" response = self.planner.chat([{ "role": "user", "content": f"""Create a plan to accomplish this task: {task} Return a numbered list of steps. Each step should be specific and actionable. Consider dependencies between steps.""" }]) return parse_plan(response.content) def execute_step(self, step, previous_results): """Execute a single step with access to tools.""" context = format_previous_results(previous_results) return react_agent( f"Previous context:\n{context}\n\nExecute this step:\n{step}", tools=self.tools ) def replan(self, task, current_plan, results): """Adapt plan based on execution results.""" response = self.planner.chat([{ "role": "user", "content": f"""Original task: {task} Original plan: {current_plan} Results so far: {results} The plan needs adjustment. Create a revised plan for the remaining work.""" }]) return parse_plan(response.content) ``` ## Pattern 3: Tool Use ### Tool Definition (OpenAI Format) ```python TOOLS = [ { "type": "function", "function": { "name": "search_web", "description": "Search the web for current information", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Search query" }, "num_results": { "type": "integer", "description": "Number of results (default 5)", "default": 5 } }, "required": ["query"] } } }, { "type": "function", "function": { "name": "read_file", "description": "Read contents of a file", "parameters": { "type": "object", "properties": { "path": { "type": "string", "description": "File path to read" } }, "required": ["path"] } } }, { "type": "function", "function": { "name": "write_file", "description": "Write content to a file", "parameters": { "type": "object", "properties": { "path": {"type": "string"}, "content": {"type": "string"} }, "required": ["path", "content"] } } } ] ``` ### Tool Execution with Safety ```python class ToolExecutor: def __init__(self, tools, sandbox=True): self.tools = {t["function"]["name"]: t for t in tools} self.sandbox = sandbox self.execution_log = [] def execute(self, tool_name, arguments): """Execute tool with safety checks.""" # Validate tool exists if tool_name not in self.tools: return {"error": f"Unknown tool: {tool_name}"} # Log execution self.execution_log.append({ "tool": tool_name, "arguments": arguments, "timestamp": datetime.now() }) # Safety checks if self.sandbox: if tool_name == "write_file": if not self._is_safe_path(arguments.get("path")): return {"error": "Path outside sandbox"} if tool_name == "execute_code": arguments = self._sandbox_code(arguments) # Execute try: handler = getattr(self, f"_tool_{tool_name}") result = handler(**arguments) return {"success": True, "result": result} except Exception as e: return {"error": str(e)} def _is_safe_path(self, path): """Check if path is within allowed directories.""" allowed_dirs = ["/tmp", "./workspace"] return any(path.startswith(d) for d in allowed_dirs) def _tool_search_web(self, query, num_results=5): """Web search implementation.""" # Implementation here pass def _tool_read_file(self, path): """File read implementation.""" with open(path) as f: return f.read() def _tool_write_file(self, path, content): """File write implementation.""" with open(path, 'w') as f: f.write(content) return f"Written {len(content)} bytes to {path}" ``` ## Pattern 4: Memory Systems ### Short-Term Memory (Conversation) ```python class ConversationMemory: def __init__(self, max_tokens=4000): self.messages = [] self.max_tokens = max_tokens def add(self, role, content): self.messages.append({"role": role, "content": content}) self._trim_if_needed() def _trim_if_needed(self): """Keep memory within token limits.""" while self._count_tokens() > self.max_tokens: # Keep system message, remove oldest if len(self.messages) > 1: self.messages.pop(1) def get_context(self): return self.messages.copy() ``` ### Long-Term Memory (Vector Store) ```python class LongTermMemory: def __init__(self, vector_store): self.store = vector_store def remember(self, content, metadata=None): """Store information for later retrieval.""" self.store.add( content=content, metadata={ "timestamp": datetime.now().isoformat(), **(metadata or {}) } ) def recall(self, query, k=5): """Retrieve relevant memories.""" return self.store.search(query, k=k) def summarize_and_store(self, conversation): """Compress conversation into memorable facts.""" summary = llm.chat([{ "role": "user", "content": f"""Extract key facts and decisions from this conversation: {conversation} Return as bullet points.""" }]) self.remember( content=summary.content, metadata={"type": "conversation_summary"} ) ``` ### Working Memory (Scratchpad) ```python class WorkingMemory: def __init__(self): self.scratchpad = {} self.current_task = None self.subtasks = [] def set_task(self, task): self.current_task = task self.subtasks = [] def note(self, key, value): """Store intermediate result.""" self.scratchpad[key] = value def get(self, key): return self.scratchpad.get(key) def get_context(self): """Format working memory for prompt.""" return f"""Current Task: {self.current_task} Subtasks: {self.subtasks} Notes: {json.dumps(self.scratchpad, indent=2)} """ ``` ## Pattern 5: Multi-Agent Systems ### Hierarchical Agents ```python class OrchestratorAgent: """Coordinates specialist agents.""" def __init__(self): self.agents = { "researcher": ResearchAgent(), "coder": CodingAgent(), "reviewer": ReviewAgent(), } def run(self, task): # Analyze task and delegate plan = self.plan_delegation(task) results = {} for step in plan: agent_name = step["agent"] subtask = step["task"] # Delegate to specialist result = self.agents[agent_name].run( subtask, context=results ) results[step["id"]] = result # Quality check if step.get("needs_review"): review = self.agents["reviewer"].run( f"Review this output:\n{result}" ) if not review.approved: # Retry with feedback result = self.agents[agent_name].run( subtask, feedback=review.feedback ) return self.synthesize(results) ``` ### Collaborative Agents ```python class CollaborativeSystem: """Agents that discuss and reach consensus.""" def __init__(self, agents): self.agents = agents def debate(self, topic, rounds=3): """Have agents debate a topic.""" discussion = [] for round in range(rounds): for agent in self.agents: response = agent.respond( topic=topic, discussion_so_far=discussion ) discussion.append({ "agent": agent.name, "response": response }) return self.synthesize_consensus(discussion) def divide_and_conquer(self, task): """Split task among agents.""" # Decompose task subtasks = self.decompose(task) # Parallel execution with ThreadPoolExecutor() as executor: futures = { executor.submit(agent.run, subtask): agent for agent, subtask in zip(self.agents, subtasks) } results = {} for future in as_completed(futures): agent = futures[future] results[agent.name] = future.result() return self.merge_results(results) ``` ## Pattern 6: Guardrails and Safety ### Input Validation ```python class InputGuardrail: def __init__(self): self.blocked_patterns = [ r"ignore.*instructions", r"pretend.*you.*are", r"jailbreak", ] def validate(self, user_input): """Check input for prompt injection.""" # Pattern matching for pattern in self.blocked_patterns: if re.search(pattern, user_input, re.IGNORECASE): return False, "Blocked pattern detected" # Length check if len(user_input) > 10000: return False, "Input too long" # LLM-based check for sophisticated attacks is_safe = self.llm_safety_check(user_input) if not is_safe: return False, "Potential prompt injection" return True, None ``` ### Output Validation ```python class OutputGuardrail: def __init__(self): self.validators = [ self.check_pii, self.check_harmful_content, self.check_code_safety, ] def validate(self, output, context): """Validate agent output before returning.""" for validator in self.validators: is_valid, reason = validator(output, context) if not is_valid: return self.sanitize(output, reason) return output def check_pii(self, output, context): """Check for leaked PII.""" pii_patterns = { 'ssn': r'\d{3}-\d{2}-\d{4}', 'credit_card': r'\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}', 'email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', } for pii_type, pattern in pii_patterns.items(): if re.search(pattern, output): return False, f"Contains {pii_type}" return True, None def check_code_safety(self, output, context): """Check generated code for dangerous patterns.""" dangerous = [ 'rm -rf', 'DROP TABLE', 'exec(', 'eval(', 'os.system', ] for pattern in dangerous: if pattern in output: return False, f"Dangerous code pattern: {pattern}" return True, None ``` ### Action Limits ```python class ActionLimiter: def __init__(self): self.limits = { 'api_calls': 100, 'file_writes': 10, 'cost_usd': 5.0, } self.counts = defaultdict(int) self.cost = 0.0 def check(self, action_type, cost=0): """Check if action is within limits.""" self.counts[action_type] += 1 self.cost += cost if self.counts[action_type] > self.limits.get(action_type, float('inf')): raise LimitExceeded(f"{action_type} limit reached") if self.cost > self.limits['cost_usd']: raise LimitExceeded(f"Cost limit reached: ${self.cost:.2f}") def require_approval(self, action): """Flag dangerous actions for human approval.""" dangerous_actions = ['delete_file', 'send_email', 'deploy'] if action in dangerous_actions: return self.request_human_approval(action) return True ``` ## Pattern 7: Error Handling and Recovery ```python class ResilientAgent: def __init__(self, max_retries=3): self.max_retries = max_retries def run_with_recovery(self, task): """Execute with automatic error recovery.""" errors = [] for attempt in range(self.max_retries): try: result = self.execute(task) # Validate result if self.validate_result(result): return result errors.append(f"Invalid result: {result}") except ToolExecutionError as e: errors.append(f"Tool error: {e}") # Try alternative approach task = self.reframe_task(task, e) except RateLimitError as e: # Exponential backoff wait_time = 2 ** attempt time.sleep(wait_time) except Exception as e: errors.append(f"Unexpected error: {e}") # All retries failed return self.graceful_failure(task, errors) def reframe_task(self, task, error): """Adjust task based on error.""" return f"""Previous attempt failed with: {error} Try a different approach for: {task}""" def graceful_failure(self, task, errors): """Return partial results or helpful error.""" return { "status": "failed", "task": task, "errors": errors, "partial_results": self.get_partial_results(), "suggestions": self.suggest_manual_steps(task) } ``` ## Observability ### Structured Logging ```python import structlog logger = structlog.get_logger() class ObservableAgent: def __init__(self, name): self.name = name self.trace_id = None def run(self, task): self.trace_id = generate_trace_id() with logger.bind( agent=self.name, trace_id=self.trace_id, task=task[:100] ): logger.info("agent_started") try: result = self._execute(task) logger.info("agent_completed", result_length=len(str(result))) return result except Exception as e: logger.error("agent_failed", error=str(e)) raise def _log_tool_call(self, tool_name, args, result): logger.info( "tool_called", tool=tool_name, args=args, result_preview=str(result)[:200] ) def _log_llm_call(self, messages, response, tokens, cost): logger.info( "llm_called", message_count=len(messages), response_length=len(response), tokens=tokens, cost_usd=cost ) ``` ### Tracing with LangSmith/Phoenix ```python from langsmith import traceable @traceable(name="agent_run") def run_agent(task): """Full agent run with tracing.""" plan = create_plan(task) for step in plan: result = execute_step(step) return synthesize_results() @traceable(name="tool_execution") def execute_tool(name, args): """Traced tool execution.""" return tool_registry[name](**args) ``` ## Testing Agents ### Unit Testing Tools ```python def test_search_tool(): """Test individual tool.""" result = tools.search_web("test query") assert "results" in result assert len(result["results"]) > 0 def test_tool_error_handling(): """Test tool handles errors gracefully.""" result = tools.read_file("/nonexistent/path") assert "error" in result assert "not found" in result["error"].lower() ``` ### Integration Testing ```python def test_agent_simple_task(): """Test agent on known task.""" agent = create_agent() result = agent.run("What is 2 + 2?") assert "4" in result def test_agent_tool_use(): """Test agent uses tools correctly.""" agent = create_agent() result = agent.run("Search for the current weather in NYC") assert agent.tool_calls_made > 0 assert "weather" in result.lower() ``` ### Evaluation Datasets ```python EVAL_DATASET = [ { "task": "Find the capital of France", "expected_tools": ["search_web"], "expected_answer_contains": ["Paris"], }, { "task": "Create a file called test.txt with 'hello'", "expected_tools": ["write_file"], "expected_side_effects": [("file_exists", "test.txt")], }, ] def evaluate_agent(agent, dataset): """Evaluate agent on test dataset.""" results = [] for case in dataset: result = agent.run(case["task"]) score = { "task": case["task"], "tools_correct": check_tools(agent.tool_calls, case["expected_tools"]), "answer_correct": check_answer(result, case.get("expected_answer_contains", [])), "side_effects_correct": check_side_effects(case.get("expected_side_effects", [])), } results.append(score) return calculate_metrics(results) ``` ## Common Mistakes | Mistake | Impact | Fix | |---------|--------|-----| | No iteration limit | Infinite loops, cost explosion | Set max_iterations | | Missing error handling | Silent failures | Wrap tool calls in try/except | | No observability | Can't debug issues | Add structured logging | | Overly complex tools | LLM confusion | Keep tools focused | | No output validation | Harmful/wrong outputs | Add guardrails | | Single LLM for all | Suboptimal cost/quality | Use appropriate model per task | ## Integration with Skills **Use with:** - `rag-architecture` - RAG as a retrieval tool - `llm-integration` - API patterns and error handling - `subagent-driven-development` - Coordinating agent tasks - `test-driven-development` - Testing agent behavior ## Checklist Before deploying an agent: - [ ] Clear task boundaries defined - [ ] Tools are well-documented - [ ] Input validation in place - [ ] Output guardrails active - [ ] Action limits configured - [ ] Error recovery implemented - [ ] Logging and tracing enabled - [ ] Cost monitoring active - [ ] Human escalation path defined - [ ] Evaluation suite passing ## Authority **Based on:** - Anthropic agent design patterns - OpenAI function calling best practices - LangChain agent architectures - Academic research on LLM agents - Production systems at scale --- **Bottom Line**: Agents = LLM + Tools + Loop. Design for failure, log everything, limit actions, validate outputs. Capability without safety is a liability.