{ "cells": [ { "cell_type": "raw", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "# 🛡️ Advanced Error Handling in MCP\n", "\n", "Learn how to implement robust error handling strategies in MCP tools and applications. This notebook covers error types, handling patterns, recovery strategies, and best practices.\n", "\n", "## 🎯 Learning Objectives\n", "\n", "By the end of this notebook, you will:\n", "- Design custom error hierarchies\n", "- Implement error recovery strategies\n", "- Create fault-tolerant tools\n", "- Handle distributed errors\n", "- Monitor and log errors effectively\n", "\n", "## 📋 Prerequisites\n", "\n", "- Completed notebooks 01-11\n", "- Understanding of Python exceptions\n", "- Knowledge of async/await\n", "- Familiarity with logging\n", "\n", "## 🔑 Key Concepts\n", "\n", "1. **Error Types**\n", " - Tool errors\n", " - Resource errors\n", " - Protocol errors\n", " - System errors\n", "\n", "2. **Error Handling**\n", " - Error hierarchies\n", " - Recovery patterns\n", " - Retry strategies\n", " - Circuit breakers\n", "\n", "3. **Error Reporting**\n", " - Structured logging\n", " - Error metrics\n", " - Alert systems\n", " - Debug information\n", "\n", "## 📚 Table of Contents\n", "\n", "1. [Error Hierarchies](#hierarchies)\n", "2. [Recovery Patterns](#recovery)\n", "3. [Monitoring & Logging](#monitoring)\n", "4. [Best Practices](#practices)\n", "5. [Exercises](#exercises)\n", "\n", "---\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from typing import Optional, Dict, Any, Type, Callable\n", "import logging\n", "import asyncio\n", "from datetime import datetime\n", "from dataclasses import dataclass\n", "from enum import Enum\n", "import traceback\n", "import modelcontextprotocol as mcp\n", "\n", "# Set up logging\n", "logging.basicConfig(level=logging.INFO)\n", "logger = logging.getLogger(__name__)\n", "\n", "# Error hierarchy\n", "class MCPError(Exception):\n", " \"\"\"Base class for all MCP errors.\"\"\"\n", " def __init__(self, message: str, details: Optional[Dict[str, Any]] = None):\n", " super().__init__(message)\n", " self.message = message\n", " self.details = details or {}\n", " self.timestamp = datetime.now()\n", " \n", " def __str__(self) -> str:\n", " return f\"{self.__class__.__name__}: {self.message}\"\n", "\n", "class ToolError(MCPError):\n", " \"\"\"Errors related to tool execution.\"\"\"\n", " pass\n", "\n", "class ResourceError(MCPError):\n", " \"\"\"Errors related to resource management.\"\"\"\n", " pass\n", "\n", "class ProtocolError(MCPError):\n", " \"\"\"Errors related to the MCP protocol.\"\"\"\n", " pass\n", "\n", "class ValidationError(MCPError):\n", " \"\"\"Errors related to input validation.\"\"\"\n", " pass\n", "\n", "# Error severity\n", "class ErrorSeverity(Enum):\n", " INFO = \"info\"\n", " WARNING = \"warning\"\n", " ERROR = \"error\"\n", " CRITICAL = \"critical\"\n", "\n", "# Error context\n", "@dataclass\n", "class ErrorContext:\n", " error: Exception\n", " severity: ErrorSeverity\n", " tool_name: Optional[str] = None\n", " operation: Optional[str] = None\n", " timestamp: datetime = datetime.now()\n", " stack_trace: str = \"\"\n", " \n", " def __post_init__(self):\n", " self.stack_trace = \"\".join(traceback.format_tb(self.error.__traceback__))\n", "\n", "# Error handler\n", "class ErrorHandler:\n", " def __init__(self):\n", " self.handlers: Dict[Type[Exception], Callable] = {}\n", " \n", " def register(self, error_type: Type[Exception], handler: Callable):\n", " \"\"\"Register an error handler for a specific error type.\"\"\"\n", " self.handlers[error_type] = handler\n", " \n", " async def handle(self, context: ErrorContext) -> None:\n", " \"\"\"Handle an error using registered handlers.\"\"\"\n", " error_type = type(context.error)\n", " \n", " # Find the most specific handler\n", " handler = None\n", " for err_type, h in self.handlers.items():\n", " if isinstance(context.error, err_type):\n", " handler = h\n", " break\n", " \n", " if handler:\n", " try:\n", " await handler(context)\n", " except Exception as e:\n", " logger.error(f\"Error in error handler: {e}\")\n", " else:\n", " # Default handling\n", " logger.error(f\"Unhandled error: {context}\")\n", " \n", " def wrap(self, error_type: Type[Exception], severity: ErrorSeverity):\n", " \"\"\"Decorator to wrap a function with error handling.\"\"\"\n", " def decorator(func):\n", " async def wrapper(*args, **kwargs):\n", " try:\n", " return await func(*args, **kwargs)\n", " except Exception as e:\n", " if not isinstance(e, error_type):\n", " e = error_type(str(e))\n", " context = ErrorContext(\n", " error=e,\n", " severity=severity,\n", " operation=func.__name__\n", " )\n", " await self.handle(context)\n", " raise\n", " return wrapper\n", " return decorator\n", "\n", "# Create global error handler\n", "error_handler = ErrorHandler()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example error handlers\n", "async def log_error(context: ErrorContext):\n", " \"\"\"Log error details.\"\"\"\n", " logger.error(\n", " f\"Error in {context.operation}: {context.error}\\n\"\n", " f\"Severity: {context.severity.value}\\n\"\n", " f\"Stack trace:\\n{context.stack_trace}\"\n", " )\n", "\n", "async def notify_critical(context: ErrorContext):\n", " \"\"\"Simulate sending notifications for critical errors.\"\"\"\n", " if context.severity == ErrorSeverity.CRITICAL:\n", " logger.critical(f\"🚨 CRITICAL ERROR: {context.error}\")\n", " # In real implementation: send email, Slack message, etc.\n", "\n", "# Register handlers\n", "error_handler.register(MCPError, log_error)\n", "error_handler.register(MCPError, notify_critical)\n", "\n", "# Example tool with error handling\n", "class Calculator:\n", " \"\"\"A simple calculator tool with error handling.\"\"\"\n", " \n", " class CalculationError(ToolError):\n", " \"\"\"Error during calculation.\"\"\"\n", " pass\n", " \n", " class ValidationError(ValidationError):\n", " \"\"\"Invalid input error.\"\"\"\n", " pass\n", " \n", " @error_handler.wrap(ValidationError, ErrorSeverity.WARNING)\n", " async def validate_input(self, x: float, y: float) -> None:\n", " \"\"\"Validate input values.\"\"\"\n", " if not isinstance(x, (int, float)) or not isinstance(y, (int, float)):\n", " raise self.ValidationError(\"Inputs must be numbers\")\n", " \n", " @error_handler.wrap(CalculationError, ErrorSeverity.ERROR)\n", " async def divide(self, x: float, y: float) -> float:\n", " \"\"\"Divide x by y with error handling.\"\"\"\n", " await self.validate_input(x, y)\n", " \n", " if y == 0:\n", " raise self.CalculationError(\n", " \"Division by zero\",\n", " details={\"x\": x, \"y\": y}\n", " )\n", " \n", " return x / y\n", " \n", " @error_handler.wrap(CalculationError, ErrorSeverity.CRITICAL)\n", " async def complex_calculation(self, x: float, y: float) -> float:\n", " \"\"\"Perform a complex calculation that might fail.\"\"\"\n", " await self.validate_input(x, y)\n", " \n", " try:\n", " result = x ** y\n", " if result > 1e308: # Max float value\n", " raise OverflowError(\"Result too large\")\n", " return result\n", " except OverflowError as e:\n", " raise self.CalculationError(\n", " str(e),\n", " details={\"x\": x, \"y\": y}\n", " )\n", "\n", "# MCP models\n", "class CalculationRequest(BaseModel):\n", " operation: str = Field(..., description=\"Operation to perform (divide, power)\")\n", " x: float = Field(..., description=\"First number\")\n", " y: float = Field(..., description=\"Second number\")\n", "\n", "class CalculationResult(BaseModel):\n", " result: float = Field(..., description=\"Calculation result\")\n", "\n", "# Create calculator tool\n", "calculator = Calculator()\n", "\n", "# Test error handling\n", "async def test_error_handling():\n", " try:\n", " # Test division by zero\n", " print(\"Testing division by zero...\")\n", " await calculator.divide(10, 0)\n", " except Calculator.CalculationError as e:\n", " print(f\"Caught expected error: {e}\\n\")\n", " \n", " try:\n", " # Test invalid input\n", " print(\"Testing invalid input...\")\n", " await calculator.divide(\"10\", 2)\n", " except Calculator.ValidationError as e:\n", " print(f\"Caught expected error: {e}\\n\")\n", " \n", " try:\n", " # Test overflow\n", " print(\"Testing overflow...\")\n", " await calculator.complex_calculation(10, 1000)\n", " except Calculator.CalculationError as e:\n", " print(f\"Caught expected error: {e}\\n\")\n", " \n", " # Test successful calculation\n", " print(\"Testing successful calculation...\")\n", " result = await calculator.divide(10, 2)\n", " print(f\"10 / 2 = {result}\")\n", "\n", "# Run tests\n", "await test_error_handling()\n" ] }, { "cell_type": "raw", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "# Best Practices for Error Handling in MCP\n", "\n", "## 1. Error Hierarchy\n", "\n", "Create a well-organized error hierarchy:\n", "```python\n", "MCPError\n", "├── ToolError\n", "│ ├── ValidationError\n", "│ ├── ExecutionError\n", "│ └── TimeoutError\n", "├── ResourceError\n", "│ ├── ResourceNotFoundError\n", "│ ├── ResourceExhaustedError\n", "│ └── ResourceStateError\n", "└── ProtocolError\n", " ├── SerializationError\n", " ├── CommunicationError\n", " └── AuthenticationError\n", "```\n", "\n", "## 2. Error Context\n", "\n", "Always include relevant context with errors:\n", "- Error type and message\n", "- Timestamp\n", "- Operation details\n", "- Input parameters\n", "- System state\n", "- Stack trace\n", "\n", "## 3. Error Recovery\n", "\n", "Implement appropriate recovery strategies:\n", "1. **Retry Logic**\n", " - Use exponential backoff\n", " - Set maximum retry attempts\n", " - Handle permanent failures\n", "\n", "2. **Circuit Breaker**\n", " - Track failure rates\n", " - Prevent cascading failures\n", " - Allow system recovery\n", "\n", "3. **Fallback Mechanisms**\n", " - Provide default values\n", " - Use cached results\n", " - Degrade functionality gracefully\n", "\n", "## 4. Error Monitoring\n", "\n", "Set up comprehensive error monitoring:\n", "1. **Logging**\n", " - Use structured logging\n", " - Include context details\n", " - Set appropriate log levels\n", "\n", "2. **Metrics**\n", " - Track error rates\n", " - Monitor recovery success\n", " - Measure system health\n", "\n", "3. **Alerts**\n", " - Define severity levels\n", " - Set up notifications\n", " - Create escalation paths\n", "\n", "## Exercises\n", "\n", "1. **Enhanced Calculator**\n", " - Add retry logic for transient errors\n", " - Implement a circuit breaker\n", " - Add metric collection\n", " - Create custom error types\n", "\n", "2. **Resource Manager**\n", " - Handle resource cleanup errors\n", " - Implement recovery strategies\n", " - Add health monitoring\n", " - Create error reporting\n", "\n", "3. **API Client**\n", " - Handle network errors\n", " - Implement request retries\n", " - Add rate limiting\n", " - Create error responses\n", "\n", "## Tips\n", "\n", "1. **Error Design**\n", " - Make errors descriptive\n", " - Include actionable information\n", " - Use appropriate severity levels\n", " - Consider error recovery\n", "\n", "2. **Testing**\n", " - Test error conditions\n", " - Verify recovery mechanisms\n", " - Check monitoring systems\n", " - Validate error handling\n", "\n", "3. **Documentation**\n", " - Document error types\n", " - Describe recovery strategies\n", " - Explain monitoring setup\n", " - Include troubleshooting guides\n" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }