# Agent Failure Mode Catalog The 8 most common fatal scenarios when building AI agents, and how agent-fender defends against each. --- ## 1. "Why is it spinning forever?" — No Timeout Hang **Without guardrails**: `ollama.chat()` and `execute_tool()` have no timeout. Requests hang indefinitely when stuck, leaking coroutines. ```python # Before: bare calls response = ollama.chat(model="qwen", messages=[...]) result = execute_tool("delete_file", {...}) ``` **With agent-fender**: ```python # After: safe_llm + safe_tool result = await fender.safe_llm(ollama.chat, model="qwen", messages=[...]) if not result.success: return {"final_reply": result.user_message} tr = await fender.safe_tool(execute_tool, "delete_file", {...}) if not tr.success: tool_failures += 1 ``` **Defense**: `asyncio.wait_for` timeout control. LLM defaults to 60s, tools to 30s. Returns structured errors instead of raising exceptions on timeout. --- ## 2. "Why is my bill so high?" — Infinite Loops **Without guardrails**: LLM repeatedly selects tools without stopping. Hundreds of LLM calls burned in a single conversation. ```python # Before: no loop limit async def action_node(state): response = ollama.chat(messages=..., tools=...) # LLM may keep picking tools tool_names = [tc["function"]["name"] for tc in response["message"]["tool_calls"]] ... ``` **With agent-fender**: ```python # After: preflight circuit breaker breaker = fender.preflight(loop_count=state.loop_counter, tool_failures=failures) if breaker.should_break: return {"final_reply": breaker.fallback_reply} # Passed → continue LLM call ``` **Defense**: Circuit breaks when `loop_count >= max_loop_count`. Returns fallback reply. Doesn't burn the LLM, doesn't burn the user. --- ## 3. "Why did a simple request cost $47?" — No Token Budget **Without guardrails**: The agent works correctly — no loops, no timeouts, no errors. But a single conversation makes 15 LLM calls with large prompts, burning 200k tokens. The developer discovers this when the API bill arrives. Guards 1–6 cover execution safety; none of them cover resource consumption. ```python # Before: no token tracking async def action_node(state): response = await llm.call(messages=state.messages, tools=[...]) # Each call uses context window, but no one is counting ``` **With agent-fender**: ```python # After: token budget in config + preflight config = FenderConfig(token_budget=100_000) # 100K token limit fender = AgentFender(config) # In the agent loop — accumulate token usage tokens_used = 0 result = await fender.safe_llm(llm.chat, messages=[...]) if result.success: tokens_used += fender.count_tokens(str(result.data)) # Check at preflight before the next LLM call breaker = fender.preflight( loop_count=state.loop_count, tool_failures=failures, tokens_used=tokens_used, ) if breaker.should_break: return {"final_reply": breaker.fallback_reply} ``` **Defense**: `AgentFender.preflight()` checks cumulative token usage against a configurable limit. The agent stops before the bill does. The default `count_tokens()` uses `len(text)//4` approximation; pass a custom `token_counter` (e.g., tiktoken) via `FenderConfig` for precise counting. --- ## 4. "Why was that file deleted?" — Silent Dangerous Execution **Without guardrails**: LLM selects `delete_file` and executes it immediately without any human confirmation. ```python # Before: execute whatever the LLM picks for tc in response["message"]["tool_calls"]: result = execute_tool(tc.name, tc.args) ``` **With agent-fender**: ```python # After: check_dangerous pre-execution check tool_names = [tc["function"]["name"] for tc in raw_calls] approval = fender.check_tools(tool_names) if approval.requires_approval: # Trigger LangGraph interrupt(), wait for human approval decision = interrupt(approval.message) ``` **Defense**: `check_dangerous()` intercepts before execution. Dangerous tool list is configurable. --- ## 5. "Why was it deleted when I just said 'yes'?" — Accidental Approval **Without guardrails**: Keyword matching happens before pending-check. A previous approval is still pending, the user says "yes" in normal conversation, and it's misinterpreted as approving the cancellation. ```python # Before: match keyword first, check pending later if msg in ("yes", "approve"): return _resume_graph(approved=True) # ← previous round's approval triggered by mistake if _has_pending_interrupt(thread_id): ... ``` **With agent-fender + main.py fix**: ```python # After: check pending first, then match keyword if _has_pending_interrupt(thread_id): if msg in ("yes", "approve"): return _resume_graph(approved=True) return "You have a pending approval" # No pending interrupt → treat "yes" as normal message ``` **Defense**: `check_dangerous()` provides pure judgment (which tools need approval). The main.py fix ensures keywords are only treated as approval signals when an interrupt is actually pending. --- ## 6. "Why does it keep retrying after failure?" — Tool Cascade Failure **Without guardrails**: One tool fails, the LLM picks another tool and keeps going. Errors accumulate until unrecoverable. ```python # Before: no failure counting for tc in raw_calls: result = execute_tool(tc.name, tc.args) reply = polish(result) # continues even after failure ``` **With agent-fender**: ```python # After: preflight checks tool_failures breaker = fender.preflight(loop_count=state.loop_counter, tool_failures=tool_failures) if breaker.should_break: return {"final_reply": breaker.fallback_reply} for tc in raw_calls: tr = await fender.safe_tool(execute_tool, tc.name, tc.args) if not tr.success: tool_failures += 1 # Accumulate; next preflight may trigger tool_failures breaker ``` **Defense**: After 3 consecutive failures (configurable), `preflight()` trips the circuit breaker. --- ## 7. "Why does it forget everything after restart?" — In-Memory Amnesia **Without guardrails**: `MemorySaver()` stores all state in process memory. `uvicorn --reload` or `docker restart` → everything is lost. ```python # Before: in-memory storage graph.compile(checkpointer=MemorySaver()) ``` **Fix**: ```python # After: SqliteSaver for persistence from langgraph.checkpoint.sqlite import SqliteSaver graph.compile(checkpointer=SqliteSaver.from_conn_string("checkpoints.db")) ``` **Note**: This is a LangGraph-level fix, not part of the agent-fender library itself. It's included in `failure-modes.md` because it's one of the most common pitfalls in real-world development. --- ## 8. "Why does it work sometimes but not others?" — Errors Silently Swallowed **Without guardrails**: Bare `try/except` returns only "Service unavailable" without distinguishing timeout/connection/format errors. Debugging means guessing from logs. ```python # Before: errors swallowed try: response = ollama.chat(...) except Exception: reply = "Service unavailable" # Timeout? Connection error? Unknown ``` **With agent-fender**: ```python # After: LLMResult.error_type classification result = await fender.safe_llm(ollama.chat, ...) if not result.success: log.error(f"LLM fail: {result.error_type} - {result.error_message}") return {"final_reply": result.user_message} # error_type: "timeout" | "connection" | "response" ``` **Defense**: `LLMResult.error_type` and `SafeToolResult.error_type` precisely classify error types. Retryable (timeout) vs non-retryable (connection) is immediately clear.