--- name: agent-debugger description: Systematic debugging toolkit for AI agentic workflows in customer support. Use when diagnosing issues with AI agents including wrong responses, tool/function calling problems, conversation loops, stuck states, or performance/latency issues. Works with any framework (LangChain, custom agents, Claude API) and accepts conversation logs, API logs, tool execution logs, and agent configurations. --- # Agent Debugger ## Overview Debug AI agent issues systematically using analysis scripts and proven debugging patterns. This skill helps identify root causes of common agent failures: incorrect responses, tool calling errors, conversation loops, performance problems, and more. ## When to Use This Skill Trigger this skill when: - Agent gives wrong or irrelevant responses - Tools are not being called or are called incorrectly - Conversation gets stuck in loops or repeated patterns - Agent performance is slow or inconsistent - Tool executions are failing or returning errors - Need to analyze conversation logs or API traces ## Debugging Workflow ### Step 1: Gather Diagnostic Data Collect these artifacts from the user: - **Conversation logs** - Full transcript or chat history - **API request/response logs** - Raw LLM API calls if available - **Tool execution logs** - Records of tool calls and outputs - **Agent configuration** - System prompts, tool schemas, settings - **Description of the issue** - What's wrong and when it occurs ### Step 2: Run Automated Analysis Use the appropriate analysis scripts based on symptoms: **For general conversation issues:** ```bash python scripts/analyze_conversation.py ``` Analyzes role distribution, message patterns, detects potential issues, provides summary metrics. **For suspected loops or stuck states:** ```bash python scripts/detect_loops.py [--threshold 2] [--window 5] ``` Detects exact loops, fuzzy patterns, stuck states, and ping-pong exchanges. **For tool/function calling problems:** ```bash python scripts/analyze_tool_calls.py [--schema tool_schema.json] ``` Analyzes tool usage patterns, validates against schema, detects errors and retry loops. **For performance/latency issues:** ```bash python scripts/analyze_performance.py ``` Calculates latency statistics, identifies slow responses, analyzes performance by role. **Note:** Scripts accept JSON-formatted logs. For text logs, `analyze_conversation.py` can auto-detect and parse common formats. ### Step 3: Interpret Results Review script outputs and identify patterns: - Check for **warnings and issues** flagged by scripts - Look at **metrics** (latency, token usage, tool call counts) - Examine **repeated patterns** or anomalies - Cross-reference with common failure modes ### Step 4: Match to Known Patterns Consult the debugging patterns reference: ``` Read references/debugging-patterns.md ``` This comprehensive guide covers: 1. **Conversation Loops** - Symptoms, causes, solutions 2. **Tool Calling Failures** - Detection and fixes 3. **Context Window Exhaustion** - Management strategies 4. **Incorrect Responses** - Prompt engineering fixes 5. **Performance Issues** - Optimization techniques 6. **Tool Execution Errors** - Error handling approaches 7. **State Management Issues** - Tracking strategies Each pattern includes: - Observable symptoms - Root causes - Concrete solutions - Detection methods ### Step 5: Recommend Solutions Based on analysis and pattern matching: 1. **Identify root cause** - What's actually broken? 2. **Propose specific fixes** - Concrete changes to prompts, tools, or config 3. **Explain reasoning** - Why this will solve the problem 4. **Suggest testing** - How to verify the fix works 5. **Preventive measures** - How to avoid similar issues ### Step 6: Provide Best Practices For broader improvements, reference: ``` Read references/agent-best-practices.md ``` Covers: - System prompt design principles - Tool design and implementation - Conversation management strategies - Error handling approaches - Quality assurance and monitoring - Optimization techniques ## Log Format Requirements Scripts work best with structured JSON logs: **Minimal format:** ```json [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"} ] ``` **With tool calls (OpenAI/Anthropic format):** ```json [ { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_123", "type": "function", "function": { "name": "search_kb", "arguments": "{\"query\": \"password reset\"}" } } ] }, { "role": "tool", "tool_call_id": "call_123", "content": "Article: How to reset your password..." } ] ``` **With timestamps and metadata:** ```json [ { "role": "user", "content": "Hello", "timestamp": "2024-01-15T10:30:00Z", "message_id": "msg_1" }, { "role": "assistant", "content": "Hi there!", "timestamp": "2024-01-15T10:30:02Z", "usage": { "prompt_tokens": 50, "completion_tokens": 10, "total_tokens": 60 } } ] ``` Scripts auto-detect format and extract available information. ## Quick Diagnostic Checklist **Agent not responding:** - [ ] Check API connectivity and auth - [ ] Review error logs - [ ] Verify configuration is valid - [ ] Check rate limits **Wrong/irrelevant responses:** - [ ] Review system prompt clarity - [ ] Check if appropriate tools are called - [ ] Verify necessary context is present - [ ] Test with clearer user input **Conversation stuck/looping:** - [ ] Run `detect_loops.py` - [ ] Check for repeated tool errors - [ ] Review last few agent responses - [ ] Add explicit loop break conditions **Tool calling issues:** - [ ] Run `analyze_tool_calls.py` with schema - [ ] Validate tool descriptions are clear - [ ] Check tool implementation for bugs - [ ] Test tools independently **Performance problems:** - [ ] Run `analyze_performance.py` - [ ] Check token usage and context length - [ ] Review tool execution times - [ ] Consider model/infrastructure ## Example Debugging Session **User reports:** "Agent keeps asking for the same information repeatedly" **Analysis approach:** 1. Collect conversation log 2. Run `detect_loops.py` → Confirms ping-pong pattern detected 3. Run `analyze_conversation.py` → Shows high repeated content 4. Review conversation → Agent not retaining context from earlier messages 5. Consult `debugging-patterns.md` → Matches "State Management Issues" 6. **Solution:** Add explicit state tracking to system prompt, include conversation summary 7. **Test:** Verify agent now references earlier information 8. **Document:** Record fix and add to monitoring ## Resources ### scripts/ Analysis utilities that can be run directly on log files: - `analyze_conversation.py` - General conversation analysis - `detect_loops.py` - Loop and pattern detection - `analyze_tool_calls.py` - Tool usage analysis and validation - `analyze_performance.py` - Performance and latency analysis ### references/ In-depth debugging knowledge: - `debugging-patterns.md` - Common failure modes and solutions (read when interpreting analysis results) - `agent-best-practices.md` - Design and implementation best practices (read when providing recommendations)