{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 7: Agentic RAG with LangGraph\n", "\n", "**What We're Testing This Week:**\n", "\n", "Week 7 extends our RAG system with **intelligent, adaptive retrieval** using LangGraph's agentic architecture with guardrail validation and iterative query refinement.\n", "\n", "## Agentic RAG Features\n", "\n", "### Traditional RAG vs. Agentic RAG\n", "\n", "**Traditional RAG (Week 5-6)**:\n", "```\n", "Query → Always Retrieve → Generate Answer\n", "```\n", "\n", "**Agentic RAG (Week 7)**:\n", "```\n", "Query → Guardrail Validation (Score 0-100)\n", " ├─ Score < 60 → Out of Scope (reject with helpful message)\n", " └─ Score >= 60 → Retrieve Documents\n", " ↓\n", " Grade Documents\n", " ├─ Relevant → Generate Answer\n", " └─ Not Relevant → Rewrite Query → Retry (max 2 attempts)\n", "```\n", "\n", "### Key Capabilities\n", "\n", "1. **Guardrail Validation** - LLM validates query scope (0-100 score) before retrieval\n", " - Score < 60: Query is out-of-scope (e.g., \"What is a dog?\")\n", " - Score >= 60: Query is relevant to ML/NLP research papers\n", "2. **Out-of-Scope Handling** - Automatically rejects queries outside ML/NLP domain\n", "3. **Document Grading** - Validates that retrieved papers are relevant\n", "4. **Query Refinement** - Rewrites vague queries for better results\n", "5. **Reasoning Transparency** - Shows the agent's decision-making steps\n", "6. **Iterative Improvement** - Can retry with better queries if needed (max 2 attempts)\n", "\n", "### Architecture: LangGraph Workflow\n", "\n", "![LangGraph Agentic RAG Workflow](../../static/langgraph-mermaid.png)\n", "\n", "**Workflow Nodes:**\n", "- **start** → **guardrail** (LLM scoring 0-100)\n", "- **retrieve** → **tool_retrieve** (executes search)\n", "- **grade_documents** (LLM relevance check)\n", "- **rewrite_query** (query refinement if documents not relevant)\n", "- **end** (terminates with answer or rejection)\n", "\n", "### New Response Fields\n", "\n", "- `reasoning_steps`: Detailed decision-making trace\n", "- `retrieval_attempts`: Number of search attempts (0-2)\n", "- `rewritten_query`: Query after refinement (if rewritten)\n", "\n", "### Configuration (GraphConfig)\n", "\n", "- `max_retrieval_attempts`: 2\n", "- `guardrail_threshold`: 60/100\n", "- `model`: \"llama3.2:1b\"\n", "- `temperature`: 0.0\n", "- `top_k`: 3\n", "\n", "---\n", "\n", "## 1. Prerequisites\n", "\n", "### 1. Environment Variables Setup\n", "\n", "**Copy the example file and add your API keys:**\n", "\n", "```bash\n", "cp .env.example .env\n", "```\n", "\n", "Then edit `.env` and add your:\n", "- `JINA_API_KEY` - Get from [Jina AI](https://jina.ai/) for hybrid search\n", "- `LANGFUSE_PUBLIC_KEY` - Get from Langfuse UI after setup (see step 2 below)\n", "- `LANGFUSE_SECRET_KEY` - Get from Langfuse UI after setup (see step 2 below)\n", "\n", "The other values in `.env.example` can be kept as-is for now.\n", "\n", "### 2. Langfuse v3 Self-Hosted Setup\n", "\n", "This project uses **Langfuse v3** (self-hosted) which includes:\n", "- **langfuse-web**: Web UI at http://localhost:3001\n", "- **langfuse-worker**: Background job processor\n", "- **langfuse-postgres**: Database for traces\n", "- **langfuse-redis**: Cache and queue management\n", "- **langfuse-minio**: S3-compatible object storage\n", "- **clickhouse**: Analytics database\n", "\n", "**First-time setup:**\n", "1. Make sure `.env` has all the auto-generated secrets from `.env.example`\n", "2. Start services: `docker compose up langfuse-web langfuse-worker langfuse-postgres langfuse-redis langfuse-minio clickhouse -d`\n", "3. Visit http://localhost:3001 and create your first user\n", "4. Go to Settings → API Keys to get your `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`\n", "5. Copy these keys to your `.env` file\n", "\n", "**Note:** If Langfuse keys are missing, tracing will be disabled but the API will still work.\n", "\n", "### 3. Ollama Model Setup\n", "\n", "**The `llama3.2:1b` model is automatically pulled when you start the Docker services.**\n", "\n", "If you need to manually pull it:\n", "```bash\n", "# Pull model in the Ollama container\n", "docker exec rag-ollama ollama pull llama3.2:1b\n", "\n", "# Or if running Ollama locally\n", "ollama pull llama3.2:1b\n", "```\n", "\n", "**Verify model is available:**\n", "```bash\n", "docker exec rag-ollama ollama list\n", "```\n", "\n", "### 4. Start All Services\n", "\n", "**Ensure all services are running:**\n", "```bash\n", "docker compose up --build -d\n", "```\n", "\n", "**Service Access Points:**\n", "- **FastAPI**: http://localhost:8000/docs\n", "- **OpenSearch**: http://localhost:9200\n", "- **Ollama**: http://localhost:11434\n", "- **Langfuse UI**: http://localhost:3001\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Service Health Check" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "from pathlib import Path\n", "import requests\n", "import time\n", "\n", "print(f\"Python Version: {sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\")\n", "\n", "# Find project root\n", "current_dir = Path.cwd()\n", "if current_dir.name == \"week7\" and current_dir.parent.name == \"notebooks\":\n", " project_root = current_dir.parent.parent\n", "elif (current_dir / \"compose.yml\").exists():\n", " project_root = current_dir\n", "else:\n", " project_root = current_dir.parent.parent\n", "\n", "if project_root.exists():\n", " print(f\"Project root: {project_root}\")\n", " sys.path.insert(0, str(project_root))\n", "else:\n", " print(\"⚠ Project root not found - check directory structure\")\n", "\n", "# Load .env file if it exists\n", "env_file = project_root / \".env\"\n", "if env_file.exists():\n", " print(f\"\\n✓ Loading environment from: {env_file}\")\n", " with open(env_file) as f:\n", " for line in f:\n", " line = line.strip()\n", " if line and not line.startswith('#') and '=' in line:\n", " key, value = line.split('=', 1)\n", " if key not in os.environ:\n", " os.environ[key] = value\n", " print(\"✓ Environment variables loaded\")\n", "else:\n", " print(f\"\\n⚠ No .env file found at: {env_file}\")\n", " print(\" Run: cp .env.example .env\")\n", " print(\" Then add your JINA_API_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_SECRET_KEY\")\n", "\n", "# Configuration for notebook tests\n", "REQUEST_TIMEOUT = 300\n", "TRUNCATE_ANSWERS = True\n", "TRUNCATE_LENGTH = 200\n", "\n", "print(\"\\n✓ Setup complete\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"WEEK 7 SERVICE HEALTH CHECK\")\n", "print(\"=\" * 40)\n", "\n", "services = {\n", " \"FastAPI\": \"http://localhost:8000/api/v1/health\",\n", " \"Ollama\": \"http://localhost:11434/api/version\"\n", "}\n", "\n", "all_healthy = True\n", "for service_name, url in services.items():\n", " try:\n", " response = requests.get(url, timeout=5)\n", " if response.status_code == 200:\n", " print(f\"✓ {service_name}: Healthy\")\n", " else:\n", " print(f\"✗ {service_name}: HTTP {response.status_code}\")\n", " all_healthy = False\n", " except:\n", " print(f\"✗ {service_name}: Not accessible\")\n", " all_healthy = False\n", "\n", "# Check if Ollama model is available\n", "print(\"\\nChecking Ollama model availability...\")\n", "try:\n", " response = requests.get(\"http://localhost:11434/api/tags\", timeout=5)\n", " if response.status_code == 200:\n", " models = [m['name'] for m in response.json().get('models', [])]\n", " if 'llama3.2:1b' in models:\n", " print(\"✓ llama3.2:1b model is available\")\n", " else:\n", " print(\"⚠ llama3.2:1b not found. Run: docker exec rag-ollama ollama pull llama3.2:1b\")\n", " all_healthy = False\n", "except:\n", " print(\"⚠ Could not check Ollama models\")\n", "\n", "if all_healthy:\n", " print(\"\\n✓ All services ready for Week 7!\")\n", "else:\n", " print(\"\\n⚠ Some services need attention. Run: docker compose up --build -d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Test Traditional RAG (Baseline)\n", "\n", "First, let's test the traditional RAG endpoint to establish a baseline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"TRADITIONAL RAG TEST (Baseline)\")\n", "print(\"=\" * 40)\n", "\n", "question = \"What are attention mechanisms?\"\n", "print(f\"Question: {question}\\n\")\n", "\n", "start_time = time.time()\n", "\n", "try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask\",\n", " json={\n", " \"query\": question,\n", " \"top_k\": 3,\n", " \"use_hybrid\": True,\n", " \"model\": \"llama3.2:3b\"\n", " },\n", " timeout=REQUEST_TIMEOUT\n", " )\n", " \n", " elapsed = time.time() - start_time\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " print(f\"✓ Traditional RAG ({elapsed:.1f}s)\")\n", " \n", " # Display answer with configurable truncation\n", " answer = data['answer']\n", " if TRUNCATE_ANSWERS and len(answer) > TRUNCATE_LENGTH:\n", " print(f\"\\nAnswer: {answer[:TRUNCATE_LENGTH]}...\")\n", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\n", " print(f\"\\nAnswer: {answer}\")\n", " \n", " # Display sources with validation\n", " sources = data.get('sources', [])\n", " print(f\"\\nSources: {len(sources)} papers\")\n", " if sources:\n", " for i, source in enumerate(sources[:3], 1): # Show first 3\n", " if isinstance(source, dict):\n", " print(f\" {i}. {source.get('title', 'Unknown')}\")\n", " else:\n", " print(f\" {i}. {source}\")\n", " \n", " print(f\"Search mode: {data.get('search_mode', 'unknown')}\")\n", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " \n", "except Exception as e:\n", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Test Agentic RAG - Scenario 1: Out-of-Scope Rejection\n", "\n", "Test if the guardrail correctly rejects queries outside the ML/NLP domain." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG - SCENARIO 1: Out-of-Scope Rejection\")\n", "print(\"=\" * 50)\n", "\n", "question = \"What is a dog?\"\n", "print(f\"Question: {question}\")\n", "print(\"Expected: Guardrail should reject (score < 60) and explain scope\\n\")\n", "\n", "start_time = time.time()\n", "\n", "try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\n", " \"query\": question,\n", " \"top_k\": 3,\n", " \"use_hybrid\": True,\n", " },\n", " timeout=REQUEST_TIMEOUT\n", " )\n", " \n", " elapsed = time.time() - start_time\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " print(f\"✓ Agentic RAG ({elapsed:.1f}s)\")\n", " print(f\"\\nAnswer: {data['answer']}\")\n", " print(f\"\\nRetrieval attempts: {data.get('retrieval_attempts', 0)}\")\n", " print(f\"\\nReasoning steps:\")\n", " for i, step in enumerate(data.get('reasoning_steps', []), 1):\n", " print(f\" {i}. {step}\")\n", " \n", " # Check if guardrail score is in reasoning steps\n", " guardrail_step = next(\n", " (s for s in data.get('reasoning_steps', []) if 'validated' in s.lower() and 'score' in s.lower()),\n", " None\n", " )\n", " if guardrail_step:\n", " print(f\"\\nGuardrail validation: {guardrail_step}\")\n", " \n", " if data.get('retrieval_attempts', 0) == 0:\n", " print(\"\\n✓ SUCCESS: Query correctly rejected by guardrail (no retrieval)!\")\n", " else:\n", " print(\"\\n⚠ UNEXPECTED: Query should have been rejected without retrieval\")\n", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " print(f\"Response: {response.text}\")\n", " \n", "except Exception as e:\n", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Test Agentic RAG - Scenario 2: Successful Retrieval\n", "\n", "Test if the agent correctly retrieves and grades documents for research questions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG - SCENARIO 2: Successful Retrieval\")\n", "print(\"=\" * 50)\n", "\n", "question = \"What are transformers in machine learning?\"\n", "print(f\"Question: {question}\")\n", "print(\"Expected: Agent should pass guardrail, retrieve documents and generate answer\\n\")\n", "\n", "start_time = time.time()\n", "\n", "try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\n", " \"query\": question,\n", " \"top_k\": 3,\n", " \"use_hybrid\": True,\n", " \"model\": \"llama3.2:3b\"\n", " },\n", " timeout=REQUEST_TIMEOUT\n", " )\n", " \n", " elapsed = time.time() - start_time\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " print(f\"✓ Agentic RAG ({elapsed:.1f}s)\")\n", " \n", " # Display answer with better formatting\n", " answer = data.get('answer', '')\n", " print(f\"\\nAnswer:\\n{'-'*50}\")\n", " if TRUNCATE_ANSWERS and len(answer) > 500: # Use longer limit for detailed answers\n", " print(answer[:500] + \"...\")\n", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\n", " print(answer)\n", " print('-'*50)\n", " \n", " # Display sources with validation\n", " sources = data.get('sources', [])\n", " print(f\"\\nSources: {len(sources)} papers\")\n", " if sources:\n", " for i, source in enumerate(sources, 1):\n", " if isinstance(source, dict):\n", " print(f\" {i}. {source.get('title', source.get('id', 'Unknown'))}\")\n", " elif isinstance(source, str):\n", " print(f\" {i}. {source}\")\n", " else:\n", " print(f\" {i}. {str(source)}\")\n", " \n", " print(f\"\\nRetrieval attempts: {data.get('retrieval_attempts', 0)}\")\n", " print(f\"\\nReasoning steps:\")\n", " for i, step in enumerate(data.get('reasoning_steps', []), 1):\n", " print(f\" {i}. {step}\")\n", " \n", "\n", " # Check rewritten_query field\n", " if data.get('rewritten_query') is None:\n", " print(\"\\n✓ Query was not rewritten (worked on first attempt)\")\n", " else:\n", " print(f\"\\n→ Query was rewritten to: {data['rewritten_query']}\")\n", " \n", " if data.get('retrieval_attempts', 0) >= 1:\n", " print(\"\\n✓ SUCCESS: Agent retrieved and used documents!\")\n", " else:\n", " print(\"\\n⚠ UNEXPECTED: Agent didn't retrieve for research question\")\n", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " print(f\"Response: {response.text}\")\n", " \n", "except Exception as e:\n", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Test Agentic RAG - Scenario 3: Query Rewriting\n", "\n", "Test if the agent rewrites vague queries for better results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG - SCENARIO 3: Query Rewriting\")\n", "print(\"=\" * 50)\n", "\n", "question = \"Tell me about ML stuff\"\n", "print(f\"Question: {question}\")\n", "print(\"Expected: Agent may rewrite query if documents aren't relevant\\n\")\n", "\n", "start_time = time.time()\n", "\n", "try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\n", " \"query\": question,\n", " \"top_k\": 3,\n", " \"use_hybrid\": True,\n", " \"model\": \"llama3.2:3b\"\n", " },\n", " timeout=REQUEST_TIMEOUT\n", " )\n", " \n", " elapsed = time.time() - start_time\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " print(f\"✓ Agentic RAG ({elapsed:.1f}s)\")\n", " \n", " # Display answer with better formatting\n", " answer = data.get('answer', '')\n", " print(f\"\\nAnswer:\\n{'-'*50}\")\n", " if TRUNCATE_ANSWERS and len(answer) > 500:\n", " print(answer[:500] + \"...\")\n", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\n", " print(answer)\n", " print('-'*50)\n", " \n", " print(f\"\\nRetrieval attempts: {data.get('retrieval_attempts', 0)}\")\n", " print(f\"\\nReasoning steps:\")\n", " for i, step in enumerate(data.get('reasoning_steps', []), 1):\n", " print(f\" {i}. {step}\")\n", " \n", " # Check for guardrail validation step\n", " print(\"\\nValidating guardrail and rewrite steps:\")\n", " reasoning_steps = data.get('reasoning_steps', [])\n", " if any(\"validated\" in step.lower() for step in reasoning_steps):\n", " guardrail_step = next(s for s in reasoning_steps if \"validated\" in s.lower())\n", " print(f\" ✓ Guardrail validation: {guardrail_step}\")\n", " else:\n", " print(\" ⚠ Guardrail validation step missing\")\n", " \n", " # Check for query rewriting\n", " if data.get('rewritten_query'):\n", " print(f\"\\n✓ Query was rewritten!\")\n", " print(f\" Original: {question}\")\n", " print(f\" Rewritten: {data['rewritten_query']}\")\n", " elif data.get('retrieval_attempts', 0) > 1:\n", " print(\"\\n→ Multiple retrieval attempts detected\")\n", " if any(\"rewritten\" in step.lower() for step in reasoning_steps):\n", " print(\" ✓ Rewrite step found in reasoning\")\n", " else:\n", " print(\" ⚠ Multiple attempts but no rewrite info\")\n", " else:\n", " print(\"\\n→ Query worked on first attempt (no rewrite needed)\")\n", " \n", " if data.get('retrieval_attempts', 0) > 1:\n", " print(f\"\\n✓ Agent performed {data['retrieval_attempts']} retrieval attempts\")\n", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " print(f\"Response: {response.text}\")\n", " \n", "except Exception as e:\n", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG - SCENARIO 4: Multiple Out-of-Scope Queries\")\n", "print(\"=\" * 50)\n", "\n", "test_queries = [\n", " (\"What is a dog?\", \"Biology question\"),\n", " (\"What's the weather today?\", \"Weather question\"),\n", " (\"Hello, how are you?\", \"Greeting\"),\n", "]\n", "\n", "print(\"Testing guardrail rejection with various non-ML/NLP queries:\\n\")\n", "\n", "for query, description in test_queries:\n", " print(f\"Query: {query}\")\n", " print(f\"Type: {description}\")\n", " \n", " try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\"query\": query, \"top_k\": 3, \"use_hybrid\": True},\n", " timeout=30\n", " )\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " \n", " # Check if rejected (no retrieval)\n", " is_rejected = data['retrieval_attempts'] == 0\n", " \n", " # Get guardrail score from reasoning if available\n", " guardrail_step = next(\n", " (s for s in data['reasoning_steps'] if 'validated' in s.lower() and 'score' in s.lower()),\n", " None\n", " )\n", " \n", " print(f\"Result: {'✓ REJECTED' if is_rejected else '✗ ACCEPTED'} (attempts: {data['retrieval_attempts']})\")\n", " if guardrail_step:\n", " print(f\"Guardrail: {guardrail_step}\")\n", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " except Exception as e:\n", " print(f\"✗ Error: {e}\")\n", " \n", " print(\"-\" * 50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Interactive Testing\n", "\n", "Try your own questions!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def ask_agentic(question: str, show_full_answer: bool = False):\n", " \"\"\"Helper function to test agentic RAG.\n", " \n", " Args:\n", " question: The question to ask\n", " show_full_answer: If True, show full answer regardless of TRUNCATE_ANSWERS setting\n", " \"\"\"\n", " print(f\"Question: {question}\\n\")\n", " \n", " start = time.time()\n", " \n", " try:\n", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\"query\": question, \"top_k\": 3, \"use_hybrid\": True},\n", " timeout=REQUEST_TIMEOUT\n", " )\n", " \n", " elapsed = time.time() - start\n", " \n", " if response.status_code == 200:\n", " data = response.json()\n", " print(f\"✓ Response in {elapsed:.1f}s\\n\")\n", " \n", " # Display answer\n", " answer = data.get('answer', '')\n", " print(f\"Answer:\\n{'-'*50}\")\n", " if not show_full_answer and TRUNCATE_ANSWERS and len(answer) > 500:\n", " print(answer[:500] + \"...\")\n", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\n", " print(answer)\n", " print('-'*50)\n", " \n", " # Display metadata\n", " print(f\"\\nRetrieval attempts: {data.get('retrieval_attempts', 0)}\")\n", " \n", " # Display sources with validation\n", " sources = data.get('sources', [])\n", " print(f\"Sources: {len(sources)}\")\n", " if sources:\n", " for i, source in enumerate(sources[:3], 1): # Show first 3\n", " if isinstance(source, dict):\n", " print(f\" {i}. {source.get('title', source.get('id', 'Unknown'))}\")\n", " elif isinstance(source, str):\n", " print(f\" {i}. {source}\")\n", " \n", " # Display reasoning\n", " print(f\"\\nReasoning:\")\n", " for step in data.get('reasoning_steps', []):\n", " print(f\" • {step}\")\n", " else:\n", " print(f\"✗ Error: {response.status_code}\")\n", " print(response.text)\n", " except Exception as e:\n", " print(f\"✗ Exception: {e}\")\n", "\n", "# Try it!\n", "ask_agentic(\"How does BERT differ from GPT?\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Try more questions\n", "ask_agentic(\"What is the capital of France?\") # Should reject as out-of-scope" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ask_agentic(\"Explain self-attention mechanisms\") # Should retrieve papers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "### What We Tested in Week 7:\n", "\n", "**Agentic RAG Capabilities**:\n", "1. ✅ **Guardrail Validation** - LLM validates query scope (0-100 score) before retrieval\n", "2. ✅ **Out-of-Scope Handling** - Automatically rejects queries outside ML/NLP domain\n", "3. ✅ **Document Grading** - Validates retrieved papers for relevance\n", "4. ✅ **Query Rewriting** - Improves queries if needed\n", "5. ✅ **Reasoning Transparency** - Shows decision-making steps\n", "6. ✅ **Iterative Improvement** - Can retry with better queries (max 2 attempts)\n", "\n", "### Key Improvements Over Traditional RAG:\n", "\n", "| Feature | Traditional RAG | Agentic RAG |\n", "|---------|----------------|-------------|\n", "| **Query Validation** | None | Guardrail scoring (0-100) |\n", "| **Out-of-Scope Handling** | None | Automatic rejection with helpful message |\n", "| **Retrieval Decision** | Always retrieves | Only if guardrail passes (score >= 60) |\n", "| **Relevance Check** | None | LLM-based document grading |\n", "| **Query Refinement** | None | LLM-based rewriting |\n", "| **Iterations** | Single pass | Up to 2 retrieval attempts |\n", "| **Transparency** | Black box | Detailed reasoning steps |\n", "| **Configuration** | Hardcoded | GraphConfig with thresholds |\n", "\n", "### Architecture: 7-Node LangGraph Workflow\n", "\n", "```\n", "LangGraph Workflow:\n", " START\n", " ↓\n", " guardrail (LLM scoring 0-100)\n", " ├─ score < 60 → out_of_scope → END (rejection message)\n", " └─ score >= 60 → retrieve\n", " ↓\n", " tool_retrieve (ToolNode - executes search)\n", " ↓\n", " grade_documents (LLM relevance check)\n", " ├─ Relevant → generate_answer → END\n", " └─ Not relevant → rewrite_query → retrieve (retry, max 2 attempts)\n", "```\n", "\n", "### Reasoning Step Format:\n", "\n", "The new agentic RAG returns structured reasoning steps:\n", "\n", "1. **\"Validated query scope (score: X/100)\"** - Guardrail validation result\n", "2. **\"Retrieved documents (N attempt(s))\"** - Number of retrieval attempts\n", "3. **\"Graded documents (N relevant)\"** - Document relevance check\n", "4. **\"Rewritten query for better results\"** - Query refinement (if needed)\n", "5. **\"Generated answer from context\"** - Final answer generation\n", "\n", "### Configuration Parameters (GraphConfig):\n", "\n", "- `max_retrieval_attempts`: 2 - Maximum retry attempts\n", "- `guardrail_threshold`: 60/100 - Minimum score to proceed\n", "- `model`: \"llama3.2:1b\" - Default LLM model\n", "- `temperature`: 0.0 - Deterministic generation\n", "- `top_k`: 3 - Documents to retrieve\n", "\n", "### Next Steps:\n", "\n", "- **Experiment** with different question types and query complexity\n", "- **Monitor** reasoning steps to understand agent decision-making\n", "- **Compare** performance and accuracy with traditional RAG\n", "- **Adjust** guardrail threshold based on your domain requirements\n", "- **Extend** with additional tools (web search, calculations, code execution)\n", "\n", "**Week 7 Complete! You now have an intelligent, adaptive RAG system with guardrail validation! 🎉**" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 4 }