{ "cells": [ { "cell_type": "markdown", "id": "header", "metadata": { "papermill": { "duration": 0.003981, "end_time": "2026-01-21T11:19:20.881077", "exception": false, "start_time": "2026-01-21T11:19:20.877096", "status": "completed" }, "tags": [] }, "source": [ "# Agentic RAG with langchain-oracledb\n", "\n", "This notebook demonstrates the **langchain-oracledb** integration in the **Agentic RAG** application, showcasing how Oracle Database 26ai powers an intelligent multi-agent RAG system.\n", "\n", "## What You'll Learn\n", "\n", "1. **langchain-oracledb Components**\n", " - `OracleVS` - Vector store for similarity search\n", " - `OracleEmbeddings` - In-database embedding generation\n", " - `OracleTextSplitter` - Server-side text chunking\n", "\n", "2. **Multi-Collection Vector Store**\n", " - PDF, Web, Repository, and General Knowledge collections\n", " - Metadata management and sanitization\n", " - Similarity search across collections\n", "\n", "3. **Multi-Agent Chain of Thought (CoT)**\n", " - Planner, Researcher, Reasoner, Synthesizer agents\n", " - A2A protocol for agent communication\n", " - Distributed agent architecture" ] }, { "cell_type": "markdown", "id": "toc", "metadata": { "papermill": { "duration": 0.003264, "end_time": "2026-01-21T11:19:20.887956", "exception": false, "start_time": "2026-01-21T11:19:20.884692", "status": "completed" }, "tags": [] }, "source": [ "## Table of Contents\n", "\n", "1. [Setup and Configuration](#1-setup-and-configuration)\n", "2. [Oracle Database Connection](#2-oracle-database-connection)\n", "3. [langchain-oracledb Core Components](#3-langchain-oracledb-core-components)\n", " - 3.1 OracleEmbeddings\n", " - 3.2 OracleVS (Vector Store)\n", " - 3.3 OracleTextSplitter\n", "4. [Multi-Collection Vector Store](#4-multi-collection-vector-store)\n", "5. [Document Processing Pipeline](#5-document-processing-pipeline)\n", "6. [End-to-End langchain-oracledb Workflow](#6-end-to-end-langchain-oracledb-workflow)\n", "7. [RAG Query Workflow](#7-rag-query-workflow)\n", "8. [Multi-Agent Architecture](#8-multi-agent-architecture)\n", "9. [A2A Protocol Integration](#9-a2a-protocol-integration)" ] }, { "cell_type": "markdown", "id": "section-1", "metadata": { "papermill": { "duration": 0.003207, "end_time": "2026-01-21T11:19:20.894408", "exception": false, "start_time": "2026-01-21T11:19:20.891201", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 1. Setup and Configuration" ] }, { "cell_type": "code", "execution_count": 21, "id": "imports", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:20.901709Z", "iopub.status.busy": "2026-01-21T11:19:20.901559Z", "iopub.status.idle": "2026-01-21T11:19:20.918552Z", "shell.execute_reply": "2026-01-21T11:19:20.918105Z" }, "papermill": { "duration": 0.021618, "end_time": "2026-01-21T11:19:20.919230", "exception": false, "start_time": "2026-01-21T11:19:20.897612", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Working directory: /home/ubuntu/git/oracle-ai-developer-hub/apps/agentic_rag\n", "Agentic RAG path: /home/ubuntu/git/oracle-ai-developer-hub/apps/agentic_rag\n" ] } ], "source": [ "# Core imports\n", "import sys\n", "import os\n", "from pathlib import Path\n", "import json\n", "import yaml\n", "import time\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# Set up path to agentic_rag - handle both execution contexts:\n", "# 1. Running from notebooks/ directory (manual Jupyter)\n", "# 2. Running from agentic_rag/ directory (papermill with --cwd)\n", "\n", "cwd = Path.cwd()\n", "project_root = Path(__file__).parent.parent if '__file__' in dir() else cwd\n", "\n", "# Detect if we're already in agentic_rag directory\n", "if cwd.name == \"agentic_rag\" or (cwd / \"src\" / \"OraDBVectorStore.py\").exists():\n", " AGENTIC_RAG_PATH = cwd\n", "elif (cwd.parent / \"apps\" / \"agentic_rag\").exists():\n", " AGENTIC_RAG_PATH = (cwd.parent / \"apps\" / \"agentic_rag\").resolve()\n", "elif (cwd / \"apps\" / \"agentic_rag\").exists():\n", " AGENTIC_RAG_PATH = (cwd / \"apps\" / \"agentic_rag\").resolve()\n", "else:\n", " # Fallback - search upward for project root\n", " search_path = cwd\n", " while search_path != search_path.parent:\n", " if (search_path / \"apps\" / \"agentic_rag\").exists():\n", " AGENTIC_RAG_PATH = (search_path / \"apps\" / \"agentic_rag\").resolve()\n", " break\n", " search_path = search_path.parent\n", " else:\n", " raise FileNotFoundError(\"Could not locate agentic_rag directory\")\n", "\n", "sys.path.insert(0, str(AGENTIC_RAG_PATH))\n", "os.chdir(AGENTIC_RAG_PATH)\n", "\n", "print(f\"Working directory: {os.getcwd()}\")\n", "print(f\"Agentic RAG path: {AGENTIC_RAG_PATH}\")" ] }, { "cell_type": "code", "execution_count": 22, "id": "check-deps", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:20.926594Z", "iopub.status.busy": "2026-01-21T11:19:20.926473Z", "iopub.status.idle": "2026-01-21T11:19:26.668084Z", "shell.execute_reply": "2026-01-21T11:19:26.667485Z" }, "papermill": { "duration": 5.745943, "end_time": "2026-01-21T11:19:26.668518", "exception": false, "start_time": "2026-01-21T11:19:20.922575", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ langchain-oracledb installed\n", " Available components:\n", " • OracleVS - Vector store\n", " • OracleEmbeddings - Embedding generation\n", " • OracleTextSplitter - Text chunking\n" ] } ], "source": [ "# Check langchain-oracledb availability\n", "try:\n", " from langchain_oracledb import OracleVS, OracleEmbeddings\n", " from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n", " print(\"✅ langchain-oracledb installed\")\n", " print(\" Available components:\")\n", " print(\" • OracleVS - Vector store\")\n", " print(\" • OracleEmbeddings - Embedding generation\")\n", " print(\" • OracleTextSplitter - Text chunking\")\n", "except ImportError as e:\n", " print(f\"❌ langchain-oracledb not available: {e}\")\n", " print(\" Install with: pip install langchain-oracledb\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "load-config", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:26.676679Z", "iopub.status.busy": "2026-01-21T11:19:26.676383Z", "iopub.status.idle": "2026-01-21T11:19:26.681241Z", "shell.execute_reply": "2026-01-21T11:19:26.680835Z" }, "papermill": { "duration": 0.009612, "end_time": "2026-01-21T11:19:26.681845", "exception": false, "start_time": "2026-01-21T11:19:26.672233", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📋 Configuration Loaded\n", "==================================================\n", "Oracle Username: ADMIN\n", "Oracle DSN: (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)...\n", "Embedding Params: {'provider': 'database', 'model': 'ALL_MINILM_L12_V2'}\n" ] } ], "source": [ "# Load configuration\n", "from src.db_utils import load_config\n", "\n", "config = load_config()\n", "\n", "print(\"📋 Configuration Loaded\")\n", "print(\"=\"*50)\n", "print(f\"Oracle Username: {config.get('ORACLE_DB_USERNAME', 'N/A')}\")\n", "print(f\"Oracle DSN: {config.get('ORACLE_DB_DSN', 'N/A')[:80]}...\")\n", "print(f\"Embedding Params: {config.get('ORACLE_EMBEDDINGS_PARAMS', {'provider': 'database', 'model': 'ALL_MINILM_L12_V2'})}\")" ] }, { "cell_type": "markdown", "id": "section-2", "metadata": { "papermill": { "duration": 0.003888, "end_time": "2026-01-21T11:19:26.689170", "exception": false, "start_time": "2026-01-21T11:19:26.685282", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 2. Oracle Database Connection" ] }, { "cell_type": "code", "execution_count": 24, "id": "db-connection", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:26.696700Z", "iopub.status.busy": "2026-01-21T11:19:26.696582Z", "iopub.status.idle": "2026-01-21T11:19:27.525651Z", "shell.execute_reply": "2026-01-21T11:19:27.525115Z" }, "papermill": { "duration": 0.833519, "end_time": "2026-01-21T11:19:27.526144", "exception": false, "start_time": "2026-01-21T11:19:26.692625", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connecting (no wallet) to dsn (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=g2f4dc3e5463897_mern_tpurgent.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes))) and user ADMIN\n", "✅ Oracle Database connected!\n", " Version: Oracle AI Database 26ai Enterprise Edition Release 23.26.0.1.0 - for Oracle Cloud and Engineered Systems\n" ] } ], "source": [ "import oracledb\n", "from src.db_utils import get_db_connection\n", "\n", "# Establish connection\n", "try:\n", " connection = get_db_connection(config)\n", " print(\"✅ Oracle Database connected!\")\n", " \n", " # Get version info\n", " cursor = connection.cursor()\n", " cursor.execute(\"SELECT BANNER FROM V$VERSION WHERE ROWNUM = 1\")\n", " version = cursor.fetchone()[0]\n", " print(f\" Version: {version}\")\n", " cursor.close()\n", " \n", "except Exception as e:\n", " print(f\"❌ Connection failed: {e}\")\n", " connection = None" ] }, { "cell_type": "code", "execution_count": 25, "id": "check-onnx-models", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:27.534545Z", "iopub.status.busy": "2026-01-21T11:19:27.534407Z", "iopub.status.idle": "2026-01-21T11:19:27.613849Z", "shell.execute_reply": "2026-01-21T11:19:27.613288Z" }, "papermill": { "duration": 0.084168, "end_time": "2026-01-21T11:19:27.614243", "exception": false, "start_time": "2026-01-21T11:19:27.530075", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 Checking ONNX Models in Database\n", "==================================================\n", "Available models:\n", " • ALL_MINILM_L12_V2 (EMBEDDING)\n" ] } ], "source": [ "# Check available ONNX models in the database\n", "if connection:\n", " print(\"🔍 Checking ONNX Models in Database\")\n", " print(\"=\"*50)\n", " \n", " cursor = connection.cursor()\n", " try:\n", " cursor.execute(\"SELECT MODEL_NAME, MINING_FUNCTION FROM USER_MINING_MODELS\")\n", " models = cursor.fetchall()\n", " \n", " if models:\n", " print(\"Available models:\")\n", " for name, func in models:\n", " print(f\" • {name} ({func})\")\n", " else:\n", " print(\" No ONNX models found in USER_MINING_MODELS\")\n", " print(\" Note: Models may be in a different schema\")\n", " except Exception as e:\n", " print(f\" Could not query models: {e}\")\n", " finally:\n", " cursor.close()" ] }, { "cell_type": "markdown", "id": "section-3", "metadata": { "papermill": { "duration": 0.003654, "end_time": "2026-01-21T11:19:27.621773", "exception": false, "start_time": "2026-01-21T11:19:27.618119", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 3. langchain-oracledb Core Components\n", "\n", "The agentic_rag application uses three key components from langchain-oracledb:\n", "\n", "| Component | Purpose | Usage in Agentic RAG |\n", "|-----------|---------|----------------------|\n", "| **OracleEmbeddings** | Generate vector embeddings | Used by OraDBVectorStore for all collections |\n", "| **OracleVS** | Store and search vectors | Multiple instances for PDF, Web, Repo, General |\n", "| **OracleTextSplitter** | Chunk documents | Used by PDFProcessor for server-side splitting |" ] }, { "cell_type": "markdown", "id": "section-3-1", "metadata": { "papermill": { "duration": 0.003624, "end_time": "2026-01-21T11:19:27.629001", "exception": false, "start_time": "2026-01-21T11:19:27.625377", "status": "completed" }, "tags": [] }, "source": [ "### 3.1 OracleEmbeddings\n", "\n", "OracleEmbeddings generates vector embeddings using models loaded in Oracle Database. This eliminates external API calls and keeps data within the database." ] }, { "cell_type": "code", "execution_count": 26, "id": "oracle-embeddings", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:27.637176Z", "iopub.status.busy": "2026-01-21T11:19:27.636958Z", "iopub.status.idle": "2026-01-21T11:19:28.693229Z", "shell.execute_reply": "2026-01-21T11:19:28.692689Z" }, "papermill": { "duration": 1.060939, "end_time": "2026-01-21T11:19:28.693625", "exception": false, "start_time": "2026-01-21T11:19:27.632686", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔢 OracleEmbeddings Demo\n", "==================================================\n", "Embedding params: {'provider': 'database', 'model': 'ALL_MINILM_L12_V2'}\n", "\n", "✅ OracleEmbeddings initialized\n", "\n", "Generating embeddings for test texts...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Results:\n", " Texts embedded: 3\n", " Vector dimension: 384\n", " Time: 995.67 ms\n", " Sample (first 5 values): [0.0019562128, -0.0239340458, -0.0302166697, -0.0731686875, -0.0598886609]\n" ] } ], "source": [ "from langchain_oracledb import OracleEmbeddings\n", "\n", "if connection:\n", " print(\"🔢 OracleEmbeddings Demo\")\n", " print(\"=\"*50)\n", " \n", " # Initialize with database provider (ONNX model)\n", " embed_params = config.get(\"ORACLE_EMBEDDINGS_PARAMS\", \n", " {\"provider\": \"database\", \"model\": \"ALL_MINILM_L12_V2\"})\n", " \n", " if isinstance(embed_params, str):\n", " embed_params = json.loads(embed_params)\n", " \n", " print(f\"Embedding params: {embed_params}\")\n", " print(\"\")\n", " \n", " try:\n", " embeddings = OracleEmbeddings(conn=connection, params=embed_params)\n", " print(\"✅ OracleEmbeddings initialized\")\n", " \n", " # Generate embeddings for test texts\n", " test_texts = [\n", " \"Oracle Database provides enterprise-grade AI capabilities.\",\n", " \"Vector search enables semantic similarity matching.\",\n", " \"Multi-agent systems coordinate to solve complex problems.\"\n", " ]\n", " \n", " print(\"\")\n", " print(\"Generating embeddings for test texts...\")\n", " start = time.time()\n", " vectors = embeddings.embed_documents(test_texts)\n", " elapsed = time.time() - start\n", " \n", " print(f\"\")\n", " print(f\"Results:\")\n", " print(f\" Texts embedded: {len(vectors)}\")\n", " print(f\" Vector dimension: {len(vectors[0])}\")\n", " print(f\" Time: {elapsed*1000:.2f} ms\")\n", " print(f\" Sample (first 5 values): {vectors[0][:5]}\")\n", " \n", " except Exception as e:\n", " print(f\"❌ Error: {e}\")\n", " print(\" Note: Ensure ONNX model is loaded in Oracle Database\")" ] }, { "cell_type": "markdown", "id": "section-3-2", "metadata": { "papermill": { "duration": 0.00378, "end_time": "2026-01-21T11:19:28.701445", "exception": false, "start_time": "2026-01-21T11:19:28.697665", "status": "completed" }, "tags": [] }, "source": [ "### 3.2 OracleVS (Vector Store)\n", "\n", "OracleVS provides vector storage and similarity search. In agentic_rag, it's used to create multiple collections for different document types." ] }, { "cell_type": "code", "execution_count": 27, "id": "oracle-vs", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:28.710143Z", "iopub.status.busy": "2026-01-21T11:19:28.710011Z", "iopub.status.idle": "2026-01-21T11:19:29.576343Z", "shell.execute_reply": "2026-01-21T11:19:29.575774Z" }, "papermill": { "duration": 0.871271, "end_time": "2026-01-21T11:19:29.576764", "exception": false, "start_time": "2026-01-21T11:19:28.705493", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📦 OracleVS Demo\n", "==================================================\n", "✅ OracleVS instance created\n", " Table: NOTEBOOK_TEST_COLLECTION\n", " Distance: EUCLIDEAN_DISTANCE\n", "\n", "Adding 5 test documents...\n", "✅ Documents added\n" ] } ], "source": [ "from langchain_oracledb import OracleVS\n", "from langchain_core.documents import Document\n", "\n", "if connection and 'embeddings' in dir():\n", " print(\"📦 OracleVS Demo\")\n", " print(\"=\"*50)\n", " \n", " # Create a test vector store\n", " test_store = OracleVS(\n", " client=connection,\n", " embedding_function=embeddings,\n", " table_name=\"NOTEBOOK_TEST_COLLECTION\",\n", " distance_strategy=\"EUCLIDEAN_DISTANCE\"\n", " )\n", " print(\"✅ OracleVS instance created\")\n", " print(f\" Table: NOTEBOOK_TEST_COLLECTION\")\n", " print(f\" Distance: EUCLIDEAN_DISTANCE\")\n", " \n", " # Add test documents\n", " test_docs = [\n", " \"Agentic RAG combines multi-agent reasoning with vector retrieval.\",\n", " \"The Planner agent breaks down complex queries into steps.\",\n", " \"The Researcher agent gathers information from vector stores.\",\n", " \"The Reasoner agent applies logical analysis to findings.\",\n", " \"The Synthesizer agent combines reasoning into final answers.\"\n", " ]\n", " \n", " test_metadatas = [\n", " {\"source\": \"overview\", \"type\": \"concept\"},\n", " {\"source\": \"planner\", \"type\": \"agent\"},\n", " {\"source\": \"researcher\", \"type\": \"agent\"},\n", " {\"source\": \"reasoner\", \"type\": \"agent\"},\n", " {\"source\": \"synthesizer\", \"type\": \"agent\"}\n", " ]\n", " \n", " print(\"\")\n", " print(f\"Adding {len(test_docs)} test documents...\")\n", " test_store.add_texts(texts=test_docs, metadatas=test_metadatas)\n", " connection.commit()\n", " print(\"✅ Documents added\")" ] }, { "cell_type": "code", "execution_count": 28, "id": "similarity-search", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:29.585739Z", "iopub.status.busy": "2026-01-21T11:19:29.585536Z", "iopub.status.idle": "2026-01-21T11:19:29.934288Z", "shell.execute_reply": "2026-01-21T11:19:29.933773Z" }, "papermill": { "duration": 0.353833, "end_time": "2026-01-21T11:19:29.934812", "exception": false, "start_time": "2026-01-21T11:19:29.580979", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 Similarity Search Demo\n", "==================================================\n", "Query: What agents are involved in the reasoning process?\n", "\n", "Top 3 results:\n", "----------------------------------------\n", "[1] The Reasoner agent applies logical analysis to findings.\n", " Metadata: {'source': 'reasoner', 'type': 'agent'}\n", "\n", "[2] The Reasoner agent applies logical analysis to findings.\n", " Metadata: {'source': 'reasoner', 'type': 'agent'}\n", "\n", "[3] The Synthesizer agent combines reasoning into final answers.\n", " Metadata: {'source': 'synthesizer', 'type': 'agent'}\n", "\n" ] } ], "source": [ "# Test similarity search\n", "if 'test_store' in dir():\n", " print(\"🔍 Similarity Search Demo\")\n", " print(\"=\"*50)\n", " \n", " query = \"What agents are involved in the reasoning process?\"\n", " print(f\"Query: {query}\")\n", " print(\"\")\n", " \n", " results = test_store.similarity_search(query, k=3)\n", " \n", " print(f\"Top {len(results)} results:\")\n", " print(\"-\"*40)\n", " for i, doc in enumerate(results):\n", " print(f\"[{i+1}] {doc.page_content}\")\n", " print(f\" Metadata: {doc.metadata}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "section-3-3", "metadata": { "papermill": { "duration": 0.003916, "end_time": "2026-01-21T11:19:29.943048", "exception": false, "start_time": "2026-01-21T11:19:29.939132", "status": "completed" }, "tags": [] }, "source": [ "### 3.3 OracleTextSplitter\n", "\n", "OracleTextSplitter performs server-side text chunking with normalization. This is used by PDFProcessor to split documents before embedding." ] }, { "cell_type": "code", "execution_count": 29, "id": "oracle-splitter", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:29.951724Z", "iopub.status.busy": "2026-01-21T11:19:29.951595Z", "iopub.status.idle": "2026-01-21T11:19:30.223692Z", "shell.execute_reply": "2026-01-21T11:19:30.223171Z" }, "papermill": { "duration": 0.277232, "end_time": "2026-01-21T11:19:30.224205", "exception": false, "start_time": "2026-01-21T11:19:29.946973", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✂️ OracleTextSplitter Demo\n", "==================================================\n", "✅ OracleTextSplitter initialized\n", " Params: {'normalize': 'all'}\n", "\n", "Input text: 919 characters\n", "\n", "Results:\n", " Chunks created: 2\n", "\n", "Chunks preview:\n", "----------------------------------------\n", "[1] # Multi-Agent RAG Architecture The agentic RAG system uses a Chain of Thought (CoT) approach with f...\n", "[2] ## Reasoner Agent The Reasoner agent applies logical analysis to the gathered information. It draws ...\n" ] } ], "source": [ "from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n", "\n", "if connection:\n", " print(\"✂️ OracleTextSplitter Demo\")\n", " print(\"=\"*50)\n", " \n", " # Initialize with normalize=\"all\" (as used in agentic_rag)\n", " splitter_params = {\"normalize\": \"all\"}\n", " \n", " try:\n", " splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n", " print(\"✅ OracleTextSplitter initialized\")\n", " print(f\" Params: {splitter_params}\")\n", " print(\"\")\n", " \n", " # Test text (simulating a document)\n", " test_text = \"\"\"\n", " # Multi-Agent RAG Architecture\n", " \n", " The agentic RAG system uses a Chain of Thought (CoT) approach with four \n", " specialized agents working together to answer complex queries.\n", " \n", " ## Planner Agent\n", " The Planner agent receives the initial query and breaks it down into \n", " manageable steps. It creates a strategic plan for addressing the question.\n", " \n", " ## Researcher Agent\n", " The Researcher agent searches the vector stores for relevant information.\n", " It gathers context from PDFs, web content, and repository code.\n", " \n", " ## Reasoner Agent\n", " The Reasoner agent applies logical analysis to the gathered information.\n", " It draws conclusions and identifies patterns in the data.\n", " \n", " ## Synthesizer Agent\n", " The Synthesizer agent combines all the reasoning steps into a coherent\n", " final answer that directly addresses the original query.\n", " \"\"\"\n", " \n", " print(f\"Input text: {len(test_text)} characters\")\n", " \n", " # Split the text\n", " chunks = splitter.split_text(test_text)\n", " \n", " print(f\"\")\n", " print(f\"Results:\")\n", " print(f\" Chunks created: {len(chunks)}\")\n", " print(\"\")\n", " print(\"Chunks preview:\")\n", " print(\"-\"*40)\n", " for i, chunk in enumerate(chunks[:3]):\n", " preview = chunk[:100].replace('\\n', ' ')\n", " print(f\"[{i+1}] {preview}...\")\n", " if len(chunks) > 3:\n", " print(f\"... and {len(chunks)-3} more chunks\")\n", " \n", " except Exception as e:\n", " print(f\"❌ Error: {e}\")" ] }, { "cell_type": "markdown", "id": "section-4", "metadata": { "papermill": { "duration": 0.003999, "end_time": "2026-01-21T11:19:30.232427", "exception": false, "start_time": "2026-01-21T11:19:30.228428", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 4. Multi-Collection Vector Store\n", "\n", "The `OraDBVectorStore` class in agentic_rag manages multiple collections:\n", "\n", "| Collection | Purpose | Table Name |\n", "|------------|---------|------------|\n", "| PDF | PDF documents | PDFCOLLECTION |\n", "| Web | Web content | WEBCOLLECTION |\n", "| Repository | Code/repos | REPOCOLLECTION |\n", "| General | General knowledge | GENERALCOLLECTION |" ] }, { "cell_type": "code", "execution_count": 30, "id": "ora-vector-store", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:30.241267Z", "iopub.status.busy": "2026-01-21T11:19:30.241065Z", "iopub.status.idle": "2026-01-21T11:19:32.496122Z", "shell.execute_reply": "2026-01-21T11:19:32.495619Z" }, "papermill": { "duration": 2.260295, "end_time": "2026-01-21T11:19:32.496788", "exception": false, "start_time": "2026-01-21T11:19:30.236493", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📦 OraDBVectorStore - Multi-Collection Demo\n", "==================================================\n", "Connecting (no wallet) to dsn (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=g2f4dc3e5463897_mern_tpurgent.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes))) and user ADMIN\n", "Oracle DB Connection successful!\n", "✅ OraDBVectorStore initialized\n", "\n", "Available collections:\n", " • PDFCOLLECTION: 92 documents (table: PDFCOLLECTION)\n", " • WEBCOLLECTION: 234 documents (table: WEBCOLLECTION)\n", " • REPOCOLLECTION: 2755 documents (table: REPOCOLLECTION)\n", " • GENERALCOLLECTION: 1 documents (table: GENERALCOLLECTION)\n" ] } ], "source": [ "from src.OraDBVectorStore import OraDBVectorStore\n", "\n", "print(\"📦 OraDBVectorStore - Multi-Collection Demo\")\n", "print(\"=\"*50)\n", "\n", "try:\n", " # Initialize the store (this creates OracleVS instances for each collection)\n", " ora_store = OraDBVectorStore()\n", " print(\"✅ OraDBVectorStore initialized\")\n", " print(\"\")\n", " \n", " # Show available collections\n", " print(\"Available collections:\")\n", " for name, table in ora_store.collections.items():\n", " count = ora_store.get_collection_count(name)\n", " print(f\" • {name}: {count} documents (table: {table})\")\n", " \n", "except Exception as e:\n", " print(f\"❌ Error: {e}\")\n", " ora_store = None" ] }, { "cell_type": "code", "execution_count": 31, "id": "collection-stats", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:32.506422Z", "iopub.status.busy": "2026-01-21T11:19:32.506280Z", "iopub.status.idle": "2026-01-21T11:19:33.273475Z", "shell.execute_reply": "2026-01-21T11:19:33.272957Z" }, "papermill": { "duration": 0.772679, "end_time": "2026-01-21T11:19:33.273994", "exception": false, "start_time": "2026-01-21T11:19:32.501315", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📊 Collection Statistics\n", "==================================================\n", "\n", "PDFCOLLECTION:\n", "------------------------------\n", " Documents: 92\n", " Embedding dimension: 384\n", " Sample: Methods for physics-based character animation that use forward dynamic simulatio...\n", "\n", "WEBCOLLECTION:\n", "------------------------------\n", " Documents: 234\n", " Embedding dimension: 384\n", " Sample: In the seven seasons of giving out the Avco World Trophy, one of (or both) the W...\n", "\n", "REPOCOLLECTION:\n", "------------------------------\n", " Documents: 2755\n", " Embedding dimension: 384\n", " Sample: IN NO EVENT SHALL THE COPYRIGHT\n", "// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIREC...\n", "\n", "GENERALCOLLECTION:\n", "------------------------------\n", " Documents: 1\n", " Embedding dimension: 384\n", " Sample: # Test Document\n", "\n", "This is a test document about machine learning.\n", "\n", "Machine learni...\n" ] } ], "source": [ "# Get detailed collection statistics\n", "if ora_store:\n", " print(\"📊 Collection Statistics\")\n", " print(\"=\"*50)\n", " \n", " for name in ora_store.collections:\n", " print(f\"\\n{name}:\")\n", " print(\"-\"*30)\n", " \n", " count = ora_store.get_collection_count(name)\n", " print(f\" Documents: {count}\")\n", " \n", " if count > 0:\n", " dim = ora_store.get_embedding_dimension(name)\n", " print(f\" Embedding dimension: {dim}\")\n", " \n", " # Get sample chunk\n", " sample = ora_store.get_latest_chunk(name)\n", " if sample:\n", " preview = str(sample.get('content', ''))[:80]\n", " print(f\" Sample: {preview}...\")" ] }, { "cell_type": "code", "execution_count": 32, "id": "query-collections", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:33.283998Z", "iopub.status.busy": "2026-01-21T11:19:33.283871Z", "iopub.status.idle": "2026-01-21T11:19:34.102524Z", "shell.execute_reply": "2026-01-21T11:19:34.101956Z" }, "papermill": { "duration": 0.824195, "end_time": "2026-01-21T11:19:34.102952", "exception": false, "start_time": "2026-01-21T11:19:33.278757", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 Cross-Collection Query Demo\n", "==================================================\n", "Query: How does the system process documents?\n", "\n", "\n", "PDFCOLLECTION:\n", "------------------------------\n", "🔍 [OracleVS] Querying PDFCOLLECTION\n", "🔍 [OracleVS] Retrieved 2 chunks from PDFCOLLECTION\n", " [1] Our creature model consists of a hierarchy of rigid bodies, which are actuated using an established ...\n", " Source: /tmp/gradio/20393179ac438e2cfb49d9c71edef977aeb0831599bc5ad856ca1b5cf3284654/2013-TOG-MuscleBasedBipeds.pdf\n", " [2] Our creature model consists of a hierarchy of rigid bodies, which are actuated using an established ...\n", " Source: /tmp/gradio/20393179ac438e2cfb49d9c71edef977aeb0831599bc5ad856ca1b5cf3284654/2013-TOG-MuscleBasedBipeds.pdf\n", "\n", "WEBCOLLECTION:\n", "------------------------------\n", "🔍 [OracleVS] Querying WEBCOLLECTION\n", "🔍 [OracleVS] Retrieved 2 chunks from WEBCOLLECTION\n", " [1] Although a constitution is a written document, there is also an unwritten constitution. The unwritte...\n", " Source: https://en.wikipedia.org/wiki/Politics\n", " [2] Political science, as one of the social sciences, uses methods and techniques that relate to the kin...\n", " Source: https://en.wikipedia.org/wiki/Politics\n", "\n", "GENERALCOLLECTION:\n", "------------------------------\n", "🔍 [OracleVS] Querying GENERALCOLLECTION\n", "🔍 [OracleVS] Retrieved 1 chunks from GENERALCOLLECTION\n", " [1] # Test Document\n", "\n", "This is a test document about machine learning.\n", "\n", "Machine learning is a subset of ar...\n", " Source: documents/test_ml.md\n" ] } ], "source": [ "# Query different collections\n", "if ora_store:\n", " print(\"🔍 Cross-Collection Query Demo\")\n", " print(\"=\"*50)\n", " \n", " query = \"How does the system process documents?\"\n", " print(f\"Query: {query}\")\n", " print(\"\")\n", " \n", " # Query each collection\n", " for collection in [\"PDFCOLLECTION\", \"WEBCOLLECTION\", \"GENERALCOLLECTION\"]:\n", " print(f\"\\n{collection}:\")\n", " print(\"-\"*30)\n", " \n", " try:\n", " if collection == \"PDFCOLLECTION\":\n", " results = ora_store.query_pdf_collection(query, n_results=2)\n", " elif collection == \"WEBCOLLECTION\":\n", " results = ora_store.query_web_collection(query, n_results=2)\n", " else:\n", " results = ora_store.query_general_collection(query, n_results=2)\n", " \n", " if results:\n", " for i, r in enumerate(results):\n", " content = r.get('content', '')[:100]\n", " source = r.get('metadata', {}).get('source', 'unknown')\n", " print(f\" [{i+1}] {content}...\")\n", " print(f\" Source: {source}\")\n", " else:\n", " print(\" No results\")\n", " except Exception as e:\n", " print(f\" Error: {e}\")" ] }, { "cell_type": "markdown", "id": "section-5", "metadata": { "papermill": { "duration": 0.004675, "end_time": "2026-01-21T11:19:34.112635", "exception": false, "start_time": "2026-01-21T11:19:34.107960", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 5. Document Processing Pipeline\n", "\n", "The `PDFProcessor` class demonstrates the full document processing pipeline:\n", "1. **Load** - Docling extracts text from PDFs\n", "2. **Split** - OracleTextSplitter chunks the text\n", "3. **Store** - OraDBVectorStore adds chunks with embeddings" ] }, { "cell_type": "code", "execution_count": 33, "id": "pdf-processor", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:34.122588Z", "iopub.status.busy": "2026-01-21T11:19:34.122456Z", "iopub.status.idle": "2026-01-21T11:19:34.125450Z", "shell.execute_reply": "2026-01-21T11:19:34.125008Z" }, "papermill": { "duration": 0.008614, "end_time": "2026-01-21T11:19:34.125803", "exception": false, "start_time": "2026-01-21T11:19:34.117189", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📄 Document Processing Pipeline\n", "==================================================\n", "\n", "┌─────────────────────────────────────────────────────────────┐\n", "│ DOCUMENT PROCESSING PIPELINE │\n", "├─────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ 1. LOAD (Docling) │\n", "│ ├── PDF → Markdown conversion │\n", "│ └── Preserve document structure │\n", "│ ↓ │\n", "│ 2. SPLIT (OracleTextSplitter) │\n", "│ ├── Server-side chunking │\n", "│ ├── Text normalization │\n", "│ └── Configurable parameters │\n", "│ ↓ │\n", "│ 3. EMBED (OracleEmbeddings) │\n", "│ ├── In-database ONNX model │\n", "│ └── ALL_MINILM_L12_V2 (384 dims) │\n", "│ ↓ │\n", "│ 4. STORE (OracleVS) │\n", "│ ├── Vector + metadata storage │\n", "│ └── Collection-based organization │\n", "│ │\n", "└─────────────────────────────────────────────────────────────┘\n", "\n" ] } ], "source": [ "# Demonstrate the PDFProcessor pattern\n", "print(\"📄 Document Processing Pipeline\")\n", "print(\"=\"*50)\n", "\n", "# Show the processing flow\n", "pipeline_steps = \"\"\"\n", "┌─────────────────────────────────────────────────────────────┐\n", "│ DOCUMENT PROCESSING PIPELINE │\n", "├─────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ 1. LOAD (Docling) │\n", "│ ├── PDF → Markdown conversion │\n", "│ └── Preserve document structure │\n", "│ ↓ │\n", "│ 2. SPLIT (OracleTextSplitter) │\n", "│ ├── Server-side chunking │\n", "│ ├── Text normalization │\n", "│ └── Configurable parameters │\n", "│ ↓ │\n", "│ 3. EMBED (OracleEmbeddings) │\n", "│ ├── In-database ONNX model │\n", "│ └── ALL_MINILM_L12_V2 (384 dims) │\n", "│ ↓ │\n", "│ 4. STORE (OracleVS) │\n", "│ ├── Vector + metadata storage │\n", "│ └── Collection-based organization │\n", "│ │\n", "└─────────────────────────────────────────────────────────────┘\n", "\"\"\"\n", "print(pipeline_steps)" ] }, { "cell_type": "code", "execution_count": 34, "id": "simulate-processing", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:34.135775Z", "iopub.status.busy": "2026-01-21T11:19:34.135659Z", "iopub.status.idle": "2026-01-21T11:19:34.299530Z", "shell.execute_reply": "2026-01-21T11:19:34.299042Z" }, "papermill": { "duration": 0.169469, "end_time": "2026-01-21T11:19:34.299923", "exception": false, "start_time": "2026-01-21T11:19:34.130454", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔄 Simulating Document Processing\n", "==================================================\n", "Input: 768 characters\n", "\n", "Step 1: Splitting with OracleTextSplitter...\n", " ✅ Created 2 chunks\n", "\n", "Step 2: Creating chunk metadata...\n", " ✅ Prepared 2 chunks with metadata\n", "\n", "Step 3: Ready for vector store insertion\n", " Document ID: f0321bb3-cfce-49c7-ab1a-19744e7d38d4\n", " Collection: PDFCOLLECTION\n", " (Skipping actual insertion for demo)\n" ] } ], "source": [ "# Simulate the processing pipeline\n", "if connection and ora_store:\n", " print(\"🔄 Simulating Document Processing\")\n", " print(\"=\"*50)\n", " \n", " # Sample document content (simulating extracted PDF text)\n", " sample_document = \"\"\"\n", " # Oracle AI Database 26ai Features\n", " \n", " Oracle Database 26ai introduces powerful AI capabilities directly within \n", " the database engine, enabling developers to build intelligent applications\n", " without moving data to external systems.\n", " \n", " ## Vector Search\n", " Native support for vector similarity search with HNSW and IVF indexing.\n", " Enables semantic search across documents and unstructured data.\n", " \n", " ## In-Database ML\n", " Run machine learning models directly in the database using ONNX format.\n", " Supports embedding generation and inference without data movement.\n", " \n", " ## langchain-oracledb Integration\n", " Seamless integration with LangChain for building RAG applications.\n", " Components include OracleVS, OracleEmbeddings, and OracleTextSplitter.\n", " \"\"\"\n", " \n", " print(f\"Input: {len(sample_document)} characters\")\n", " print(\"\")\n", " \n", " # Step 1: Split using OracleTextSplitter\n", " print(\"Step 1: Splitting with OracleTextSplitter...\")\n", " try:\n", " splitter = OracleTextSplitter(conn=connection, params={\"normalize\": \"all\"})\n", " chunks = splitter.split_text(sample_document)\n", " print(f\" ✅ Created {len(chunks)} chunks\")\n", " except Exception as e:\n", " print(f\" ❌ Splitter error: {e}\")\n", " chunks = [sample_document] # Fallback\n", " \n", " # Step 2: Create chunk objects with metadata\n", " print(\"\")\n", " print(\"Step 2: Creating chunk metadata...\")\n", " import uuid\n", " document_id = str(uuid.uuid4())\n", " \n", " processed_chunks = []\n", " for i, chunk_text in enumerate(chunks):\n", " processed_chunks.append({\n", " \"text\": chunk_text,\n", " \"metadata\": {\n", " \"source\": \"notebook_demo.pdf\",\n", " \"document_id\": document_id,\n", " \"chunk_index\": i\n", " }\n", " })\n", " print(f\" ✅ Prepared {len(processed_chunks)} chunks with metadata\")\n", " \n", " # Step 3: Add to vector store (optional - uncomment to actually store)\n", " print(\"\")\n", " print(\"Step 3: Ready for vector store insertion\")\n", " print(f\" Document ID: {document_id}\")\n", " print(f\" Collection: PDFCOLLECTION\")\n", " print(\" (Skipping actual insertion for demo)\")\n", " \n", " # To actually insert:\n", " # ora_store.add_pdf_chunks(processed_chunks, document_id)" ] }, { "cell_type": "markdown", "id": "ed613e63", "metadata": {}, "source": [ "---\n", "## 6. End-to-End langchain-oracledb Workflow\n", "\n", "This section demonstrates the complete workflow used in the Agentic RAG application:\n", "1. **Add a document** - Create a sample document\n", "2. **Process with langchain-oracledb** - Split and embed using `OracleTextSplitter` and `OracleEmbeddings`\n", "3. **Store in OracleVS** - Add to vector store\n", "4. **Query with similarity search** - Use `OraDBVectorStore` to retrieve relevant content\n", "\n", "This mirrors exactly how the production system processes and queries documents." ] }, { "cell_type": "code", "execution_count": 35, "id": "f08c9078", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📄 Step 1: Create Sample Document\n", "============================================================\n", "Document title: Oracle AI Vector Search Guide\n", "Document size: 1978 characters\n", "\n", "Preview (first 300 chars):\n", "------------------------------------------------------------\n", "\n", "# Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It allows you to store, index, and query vector embeddings alongside your \n", "relational data without requiring external vector databases.\n", "\n", "## Key Features\n", "\n", "### In-Database Embeddings\n", "Ge...\n" ] } ], "source": [ "# Step 1: Create a sample document\n", "print(\"📄 Step 1: Create Sample Document\")\n", "print(\"=\"*60)\n", "\n", "# This simulates a document that would be uploaded to the system\n", "sample_document_content = \"\"\"\n", "# Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It allows you to store, index, and query vector embeddings alongside your \n", "relational data without requiring external vector databases.\n", "\n", "## Key Features\n", "\n", "### In-Database Embeddings\n", "Generate embeddings directly in the database using ONNX models. This eliminates\n", "the need for external API calls and keeps data processing within your security\n", "perimeter. Supported models include ALL_MINILM_L12_V2 for general-purpose embeddings.\n", "\n", "### Vector Indexing\n", "Oracle supports multiple indexing strategies:\n", "- HNSW (Hierarchical Navigable Small World): Best for approximate nearest neighbor\n", "- IVF (Inverted File): Good balance of speed and accuracy\n", "- Flat: Exact search, best for small datasets\n", "\n", "### Distance Metrics\n", "Choose from multiple distance strategies:\n", "- EUCLIDEAN_DISTANCE: Standard L2 distance\n", "- COSINE_DISTANCE: Angle-based similarity\n", "- DOT_PRODUCT: Inner product similarity\n", "\n", "## Integration with LangChain\n", "\n", "The langchain-oracledb package provides seamless integration:\n", "\n", "```python\n", "from langchain_oracledb import OracleVS, OracleEmbeddings\n", "from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n", "\n", "# Initialize embeddings with in-database ONNX model\n", "embeddings = OracleEmbeddings(conn=connection, params={\"provider\": \"database\", \"model\": \"ALL_MINILM_L12_V2\"})\n", "\n", "# Create vector store\n", "vector_store = OracleVS(client=connection, embedding_function=embeddings, table_name=\"MY_COLLECTION\")\n", "\n", "# Add documents and search\n", "vector_store.add_texts(texts, metadatas)\n", "results = vector_store.similarity_search(\"my query\", k=5)\n", "```\n", "\n", "## Best Practices\n", "\n", "1. **Chunking Strategy**: Use OracleTextSplitter for consistent server-side chunking\n", "2. **Metadata Management**: Store source information, timestamps, and document IDs\n", "3. **Index Selection**: Use HNSW for large datasets, Flat for small ones\n", "4. **Batch Processing**: Add documents in batches for better performance\n", "\"\"\"\n", "\n", "print(f\"Document title: Oracle AI Vector Search Guide\")\n", "print(f\"Document size: {len(sample_document_content)} characters\")\n", "print(f\"\")\n", "print(\"Preview (first 300 chars):\")\n", "print(\"-\"*60)\n", "print(sample_document_content[:300] + \"...\")" ] }, { "cell_type": "code", "execution_count": 36, "id": "97b5ba61", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✂️ Step 2: Process with langchain-oracledb\n", "============================================================\n", "Connecting (no wallet) to dsn (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=g2f4dc3e5463897_mern_tpurgent.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes))) and user ADMIN\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "✅ Database connection established\n", "✅ OracleTextSplitter initialized with params: {'normalize': 'all'}\n", "\n", "📊 Chunking Results:\n", " Input: 1978 characters\n", " Output: 5 chunks\n", "\n", "Chunk details:\n", "------------------------------------------------------------\n", " Chunk 1: 540 chars\n", " Preview: # Oracle AI Vector Search Guide Oracle AI Vector Search enables semantic simila...\n", " Chunk 2: 523 chars\n", " Preview: ### Vector Indexing Oracle supports multiple indexing strategies:-HNSW (Hierarch...\n", " Chunk 3: 431 chars\n", " Preview: ```python from langchain_oracledb import OracleVS, OracleEmbeddings from langcha...\n", " Chunk 4: 148 chars\n", " Preview: # Add documents and search vector_store.add_texts(texts, metadatas) results = ve...\n", " Chunk 5: 313 chars\n", " Preview: 1. **Chunking Strategy**: Use OracleTextSplitter for consistent server-side chun...\n" ] } ], "source": [ "# Step 2: Process document with langchain-oracledb\n", "print(\"✂️ Step 2: Process with langchain-oracledb\")\n", "print(\"=\"*60)\n", "\n", "from langchain_oracledb import OracleVS, OracleEmbeddings\n", "from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n", "from src.db_utils import get_db_connection, load_config\n", "import uuid\n", "\n", "# Get fresh connection for this demo\n", "demo_config = load_config()\n", "demo_connection = get_db_connection(demo_config)\n", "print(\"✅ Database connection established\")\n", "\n", "# Initialize OracleTextSplitter (same as used in PDFProcessor)\n", "splitter_params = {\"normalize\": \"all\"}\n", "demo_splitter = OracleTextSplitter(conn=demo_connection, params=splitter_params)\n", "print(f\"✅ OracleTextSplitter initialized with params: {splitter_params}\")\n", "\n", "# Split the document into chunks\n", "chunks = demo_splitter.split_text(sample_document_content)\n", "print(f\"\")\n", "print(f\"📊 Chunking Results:\")\n", "print(f\" Input: {len(sample_document_content)} characters\")\n", "print(f\" Output: {len(chunks)} chunks\")\n", "print(f\"\")\n", "print(\"Chunk details:\")\n", "print(\"-\"*60)\n", "for i, chunk in enumerate(chunks):\n", " print(f\" Chunk {i+1}: {len(chunk)} chars\")\n", " preview = chunk[:80].replace('\\n', ' ').strip()\n", " print(f\" Preview: {preview}...\")" ] }, { "cell_type": "code", "execution_count": 37, "id": "cf0e6301", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📦 Step 3: Store in OracleVS\n", "============================================================\n", "✅ OracleEmbeddings initialized\n", " Provider: database\n", " Model: ALL_MINILM_L12_V2\n", "\n", "✅ OracleVS vector store created\n", " Table: DEMO_WORKFLOW_COLLECTION\n", " Distance strategy: EUCLIDEAN_DISTANCE\n", "\n", "📝 Adding 5 chunks to vector store...\n", "✅ Chunks added successfully!\n", " Document ID: 33ea4156-f087-4800-8885-f90d473afffc\n", " Time elapsed: 481.48 ms\n" ] } ], "source": [ "# Step 3: Store in OracleVS vector store\n", "print(\"📦 Step 3: Store in OracleVS\")\n", "print(\"=\"*60)\n", "\n", "# Initialize OracleEmbeddings (same pattern as OraDBVectorStore)\n", "embed_params = demo_config.get(\"ORACLE_EMBEDDINGS_PARAMS\", \n", " {\"provider\": \"database\", \"model\": \"ALL_MINILM_L12_V2\"})\n", "if isinstance(embed_params, str):\n", " import json\n", " embed_params = json.loads(embed_params)\n", "\n", "demo_embeddings = OracleEmbeddings(conn=demo_connection, params=embed_params)\n", "print(f\"✅ OracleEmbeddings initialized\")\n", "print(f\" Provider: {embed_params.get('provider')}\")\n", "print(f\" Model: {embed_params.get('model')}\")\n", "\n", "# Create a dedicated vector store for this demo\n", "DEMO_TABLE = \"DEMO_WORKFLOW_COLLECTION\"\n", "\n", "demo_vector_store = OracleVS(\n", " client=demo_connection,\n", " embedding_function=demo_embeddings,\n", " table_name=DEMO_TABLE,\n", " distance_strategy=\"EUCLIDEAN_DISTANCE\"\n", ")\n", "print(f\"\")\n", "print(f\"✅ OracleVS vector store created\")\n", "print(f\" Table: {DEMO_TABLE}\")\n", "print(f\" Distance strategy: EUCLIDEAN_DISTANCE\")\n", "\n", "# Prepare chunks with metadata (same pattern as OraDBVectorStore.add_pdf_chunks)\n", "document_id = str(uuid.uuid4())\n", "document_source = \"oracle_vector_search_guide.pdf\"\n", "\n", "texts = chunks\n", "metadatas = [\n", " {\n", " \"source\": document_source,\n", " \"document_id\": document_id,\n", " \"chunk_index\": i,\n", " \"total_chunks\": len(chunks),\n", " \"processed_by\": \"langchain-oracledb\"\n", " }\n", " for i in range(len(chunks))\n", "]\n", "\n", "print(f\"\")\n", "print(f\"📝 Adding {len(texts)} chunks to vector store...\")\n", "start_time = time.time()\n", "\n", "# Add texts with metadata (this generates embeddings and stores them)\n", "demo_vector_store.add_texts(texts=texts, metadatas=metadatas)\n", "demo_connection.commit()\n", "\n", "elapsed = time.time() - start_time\n", "print(f\"✅ Chunks added successfully!\")\n", "print(f\" Document ID: {document_id}\")\n", "print(f\" Time elapsed: {elapsed*1000:.2f} ms\")" ] }, { "cell_type": "code", "execution_count": 38, "id": "9c14c6a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 Step 4: Query with Similarity Search\n", "============================================================\n", "Method 1: Direct OracleVS.similarity_search()\n", "------------------------------------------------------------\n", "Query: What embedding models are supported for vector search?\n", "\n", "Retrieved 2 results:\n", "\n", "Result [1]:\n", " Content: # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It allows you to store, index, and...\n", " Metadata: {'source': 'oracle_vector_search_guide.pdf', 'document_id': '33ea4156-f087-4800-8885-f90d473afffc', 'chunk_index': Decimal('0'), 'total_chunks': Decimal('5'), 'processed_by': 'langchain-oracledb'}\n", "\n", "Result [2]:\n", " Content: # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It allows you to store, index, and...\n", " Metadata: {'source': 'oracle_vector_search_guide.pdf', 'document_id': '321ff555-13f3-42bc-a3ed-db207e7c409d', 'chunk_index': Decimal('0'), 'total_chunks': Decimal('5'), 'processed_by': 'langchain-oracledb'}\n", "\n" ] } ], "source": [ "# Step 4: Query with similarity search (using OraDBVectorStore pattern)\n", "print(\"🔍 Step 4: Query with Similarity Search\")\n", "print(\"=\"*60)\n", "\n", "# Method 1: Direct OracleVS similarity search\n", "print(\"Method 1: Direct OracleVS.similarity_search()\")\n", "print(\"-\"*60)\n", "\n", "query = \"What embedding models are supported for vector search?\"\n", "print(f\"Query: {query}\")\n", "print(\"\")\n", "\n", "# Perform similarity search\n", "results = demo_vector_store.similarity_search(query, k=2)\n", "\n", "print(f\"Retrieved {len(results)} results:\")\n", "print(\"\")\n", "for i, doc in enumerate(results):\n", " print(f\"Result [{i+1}]:\")\n", " print(f\" Content: {doc.page_content[:150]}...\")\n", " print(f\" Metadata: {doc.metadata}\")\n", " print()" ] }, { "cell_type": "code", "execution_count": 39, "id": "21999906", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Method 2: OracleVS.similarity_search_with_score()\n", "------------------------------------------------------------\n", "Query: How do I integrate Oracle vector search with LangChain?\n", "\n", "Retrieved 2 results with scores:\n", "\n", "Result [1]:\n", " Distance: 0.7999\n", " Similarity: 0.5556\n", " Content: # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It a...\n", "\n", "Result [2]:\n", " Distance: 0.7999\n", " Similarity: 0.5556\n", " Content: # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It a...\n", "\n" ] } ], "source": [ "# Method 2: Query with similarity scores\n", "print(\"Method 2: OracleVS.similarity_search_with_score()\")\n", "print(\"-\"*60)\n", "\n", "query2 = \"How do I integrate Oracle vector search with LangChain?\"\n", "print(f\"Query: {query2}\")\n", "print(\"\")\n", "\n", "# Similarity search with scores (used by OraDBVectorStore._query_collection)\n", "try:\n", " results_with_scores = demo_vector_store.similarity_search_with_score(query2, k=2)\n", " \n", " print(f\"Retrieved {len(results_with_scores)} results with scores:\")\n", " print(\"\")\n", " for i, (doc, distance) in enumerate(results_with_scores):\n", " # Convert distance to similarity (same formula as OraDBVectorStore)\n", " similarity = 1 / (1 + distance) if distance >= 0 else 0\n", " \n", " print(f\"Result [{i+1}]:\")\n", " print(f\" Distance: {distance:.4f}\")\n", " print(f\" Similarity: {similarity:.4f}\")\n", " print(f\" Content: {doc.page_content[:120]}...\")\n", " print()\n", "except Exception as e:\n", " print(f\"Note: similarity_search_with_score error: {e}\")\n", " print(\"Falling back to similarity_search (same as OraDBVectorStore)\")\n", " results = demo_vector_store.similarity_search(query2, k=2)\n", " for i, doc in enumerate(results):\n", " print(f\"Result [{i+1}]: {doc.page_content[:120]}...\")" ] }, { "cell_type": "code", "execution_count": 40, "id": "7399114e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Method 3: Using OraDBVectorStore (Production Pattern)\n", "------------------------------------------------------------\n", "\n", "This is how the Agentic RAG application performs queries.\n", "OraDBVectorStore wraps OracleVS with collection management and\n", "metadata handling.\n", "\n", "Connecting (no wallet) to dsn (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=g2f4dc3e5463897_mern_tpurgent.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes))) and user ADMIN\n", "Oracle DB Connection successful!\n", "Adding demo document to GENERALCOLLECTION...\n", "🔄 [OraDB] Inserting 5 chunks into GENERALCOLLECTION...\n", "✅ [OraDB] Successfully inserted 5 chunks.\n", "✅ Added 5 chunks to GENERALCOLLECTION\n", "\n", "Query: What distance metrics can I use for vector similarity?\n", "\n", "🔍 [OracleVS] Querying GENERALCOLLECTION\n", "🔍 [OracleVS] Retrieved 3 chunks from GENERALCOLLECTION\n", "Retrieved 3 results:\n", "\n", "Result [1]:\n", " Content: ### Vector Indexing\n", "Oracle supports multiple indexing strategies:-HNSW (Hierarchical Navigable Small World): Best for ap...\n", " Source: oracle_vector_search_guide.pdf\n", " Similarity: 0.4927\n", "\n", "Result [2]:\n", " Content: # Add documents and search\n", "vector_store.add_texts(texts, metadatas)\n", "results = vector_store.similarity_search(\"my query\",...\n", " Source: oracle_vector_search_guide.pdf\n", " Similarity: 0.4708\n", "\n", "Result [3]:\n", " Content: # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It a...\n", " Source: oracle_vector_search_guide.pdf\n", " Similarity: 0.4622\n" ] } ], "source": [ "# Method 3: Using OraDBVectorStore (production pattern)\n", "print(\"Method 3: Using OraDBVectorStore (Production Pattern)\")\n", "print(\"-\"*60)\n", "print(\"\")\n", "print(\"This is how the Agentic RAG application performs queries.\")\n", "print(\"OraDBVectorStore wraps OracleVS with collection management and\")\n", "print(\"metadata handling.\")\n", "print(\"\")\n", "\n", "from src.OraDBVectorStore import OraDBVectorStore\n", "\n", "# Create OraDBVectorStore instance (same as LocalRAGAgent uses)\n", "production_store = OraDBVectorStore()\n", "\n", "# Add our demo chunks to the GENERAL collection (for testing)\n", "print(\"Adding demo document to GENERALCOLLECTION...\")\n", "demo_chunks = [\n", " {\n", " \"text\": chunk,\n", " \"metadata\": {\n", " \"source\": document_source,\n", " \"document_id\": document_id,\n", " \"chunk_index\": i\n", " }\n", " }\n", " for i, chunk in enumerate(chunks)\n", "]\n", "\n", "# Use the production method to add chunks\n", "production_store.add_general_knowledge(demo_chunks, source_id=document_id)\n", "production_store.connection.commit()\n", "print(f\"✅ Added {len(demo_chunks)} chunks to GENERALCOLLECTION\")\n", "print(\"\")\n", "\n", "# Query using the production method\n", "query3 = \"What distance metrics can I use for vector similarity?\"\n", "print(f\"Query: {query3}\")\n", "print(\"\")\n", "\n", "results = production_store.query_general_collection(query3, n_results=3)\n", "\n", "print(f\"Retrieved {len(results)} results:\")\n", "for i, r in enumerate(results):\n", " print(f\"\")\n", " print(f\"Result [{i+1}]:\")\n", " print(f\" Content: {r['content'][:120]}...\")\n", " print(f\" Source: {r['metadata'].get('source', 'unknown')}\")\n", " if 'score' in r:\n", " print(f\" Similarity: {r['score']:.4f}\")" ] }, { "cell_type": "code", "execution_count": 41, "id": "5ed15404", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Method 4: Using LocalRAGAgent (Full Agentic RAG)\n", "------------------------------------------------------------\n", "\n", "LocalRAGAgent combines vector retrieval with LLM generation.\n", "This is the same agent used in the production Gradio UI.\n", "\n", "LocalRAGAgent init - model_name: None\n", "Using default model: gemma3:270m\n", "Model Name after assignment: gemma3:270m\n", "\n", "Loading Ollama model...\n", "Model: gemma3:270m\n", "Note: Make sure Ollama is running on your system.\n", "Available Ollama models: mistral-small3.2:latest, qwen3:latest, glm-4.7-flash:latest, glm4:9b-chat-q4_K_M, codegemma:latest, codellama:latest, deepseek-coder:6.7b, gpt-oss:20b, gemma3:270m, gemma3:latest, gemma3:1b-it-qat, gemma3:4b-it-qat, mistral:latest, smollm2:135m, qwen3:0.6b, deepseek-r1:1.5b, llama3.2:3b, mistral:7b, phi3:3.8b, nomic-embed-text:latest, phi3:latest, mario:latest, llama3-backup:latest, mattw/pygmalion:latest, llama3.2:latest, qwq:latest, phi4:latest, llama2:latest, llama2:7b, qwen2:latest, deepseek-r1:latest, llama3:latest\n", "Using Ollama model: gemma3:270m\n", "Using Ollama model: gemma3:270m\n", "✅ LocalRAGAgent initialized\n", " Collection: General Knowledge\n", " Vector Store: OraDBVectorStore\n", "\n", "Query: Explain the vector indexing options available in Oracle\n", "\n", "⚠️ Note: Full RAG response requires Ollama LLM\n", " Demonstrating context retrieval step only...\n", "\n", "🔍 [OracleVS] Querying GENERALCOLLECTION\n", "🔍 [OracleVS] Retrieved 3 chunks from GENERALCOLLECTION\n", "📚 Retrieved Context (3 chunks):\n", "------------------------------------------------------------\n", "\n", "[1] ### Vector Indexing\n", "Oracle supports multiple indexing strategies:-HNSW (Hierarchical Navigable Small World): Best for approximate nearest neighbor-IVF (Inverted File): Good balance of speed and accura...\n", "\n", "[2] # Oracle AI Vector Search Guide\n", "\n", "Oracle AI Vector Search enables semantic similarity search within Oracle Database.\n", "It allows you to store, index, and query vector embeddings alongside your\n", "relational...\n", "\n", "[3] 1. **Chunking Strategy**: Use OracleTextSplitter for consistent server-side chunking\n", "2. **Metadata Management**: Store source information, timestamps, and document IDs\n", "3. **Index Selection**: Use HNSW...\n", "\n", "💡 In production, LocalRAGAgent.generate_response() would:\n", " 1. Retrieve this context using OraDBVectorStore\n", " 2. Format a prompt with the context + query\n", " 3. Send to Ollama LLM for response generation\n", " 4. Return the final answer to the user\n" ] } ], "source": [ "# Method 4: Using LocalRAGAgent (Full Agentic RAG Pattern)\n", "print(\"Method 4: Using LocalRAGAgent (Full Agentic RAG)\")\n", "print(\"-\"*60)\n", "print(\"\")\n", "print(\"LocalRAGAgent combines vector retrieval with LLM generation.\")\n", "print(\"This is the same agent used in the production Gradio UI.\")\n", "print(\"\")\n", "\n", "from src.local_rag_agent import LocalRAGAgent\n", "\n", "# Initialize LocalRAGAgent with General Knowledge collection\n", "# (where we just added our demo document)\n", "rag_agent = LocalRAGAgent(\n", " vector_store=production_store,\n", " collection=\"General Knowledge\"\n", ")\n", "print(f\"✅ LocalRAGAgent initialized\")\n", "print(f\" Collection: General Knowledge\")\n", "print(f\" Vector Store: OraDBVectorStore\")\n", "print(\"\")\n", "\n", "# Perform a RAG query (retrieval + generation)\n", "query4 = \"Explain the vector indexing options available in Oracle\"\n", "print(f\"Query: {query4}\")\n", "print(\"\")\n", "\n", "# Note: This requires Ollama to be running with a model\n", "print(\"⚠️ Note: Full RAG response requires Ollama LLM\")\n", "print(\" Demonstrating context retrieval step only...\")\n", "print(\"\")\n", "\n", "# Directly get context (what the agent does internally)\n", "context_results = production_store.query_general_collection(query4, n_results=3)\n", "\n", "print(f\"📚 Retrieved Context ({len(context_results)} chunks):\")\n", "print(\"-\"*60)\n", "for i, r in enumerate(context_results):\n", " print(f\"\")\n", " print(f\"[{i+1}] {r['content'][:200]}...\")\n", " \n", "print(\"\")\n", "print(\"💡 In production, LocalRAGAgent.generate_response() would:\")\n", "print(\" 1. Retrieve this context using OraDBVectorStore\")\n", "print(\" 2. Format a prompt with the context + query\")\n", "print(\" 3. Send to Ollama LLM for response generation\")\n", "print(\" 4. Return the final answer to the user\")" ] }, { "cell_type": "code", "execution_count": 42, "id": "b831d389", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🧹 Cleaning up demo data...\n", "============================================================\n", "✅ Dropped demo table: DEMO_WORKFLOW_COLLECTION\n", "✅ Removed 5 demo chunks from GENERALCOLLECTION\n", "✅ Demo connection closed\n", "\n", "📋 Workflow Summary:\n", "------------------------------------------------------------\n", " 1. Created sample document (Oracle Vector Search Guide)\n", " 2. Split with OracleTextSplitter → chunks\n", " 3. Stored with OracleVS → embeddings + metadata\n", " 4. Queried with similarity_search → relevant context\n", " 5. Used OraDBVectorStore → production pattern\n", " 6. Used LocalRAGAgent → full agentic RAG workflow\n" ] } ], "source": [ "# Cleanup: Remove demo data\n", "print(\"🧹 Cleaning up demo data...\")\n", "print(\"=\"*60)\n", "\n", "# Drop the demo table\n", "try:\n", " cursor = demo_connection.cursor()\n", " cursor.execute(f\"DROP TABLE {DEMO_TABLE} PURGE\")\n", " demo_connection.commit()\n", " cursor.close()\n", " print(f\"✅ Dropped demo table: {DEMO_TABLE}\")\n", "except Exception as e:\n", " print(f\" Note: Could not drop {DEMO_TABLE}: {e}\")\n", "\n", "# Remove demo chunks from GENERALCOLLECTION\n", "try:\n", " cursor = production_store.connection.cursor()\n", " cursor.execute(\"\"\"\n", " DELETE FROM GENERALCOLLECTION \n", " WHERE JSON_VALUE(metadata, '$.document_id') = :doc_id\n", " \"\"\", {\"doc_id\": document_id})\n", " deleted_count = cursor.rowcount\n", " production_store.connection.commit()\n", " cursor.close()\n", " print(f\"✅ Removed {deleted_count} demo chunks from GENERALCOLLECTION\")\n", "except Exception as e:\n", " print(f\" Note: Could not clean GENERALCOLLECTION: {e}\")\n", "\n", "# Close demo connection\n", "demo_connection.close()\n", "print(f\"✅ Demo connection closed\")\n", "print(\"\")\n", "print(\"📋 Workflow Summary:\")\n", "print(\"-\"*60)\n", "print(\" 1. Created sample document (Oracle Vector Search Guide)\")\n", "print(\" 2. Split with OracleTextSplitter → chunks\")\n", "print(\" 3. Stored with OracleVS → embeddings + metadata\")\n", "print(\" 4. Queried with similarity_search → relevant context\")\n", "print(\" 5. Used OraDBVectorStore → production pattern\")\n", "print(\" 6. Used LocalRAGAgent → full agentic RAG workflow\")" ] }, { "cell_type": "markdown", "id": "section-6", "metadata": { "papermill": { "duration": 0.027297, "end_time": "2026-01-21T11:19:34.332161", "exception": false, "start_time": "2026-01-21T11:19:34.304864", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 7. RAG Query Workflow\n", "\n", "The LocalRAGAgent handles the query workflow:\n", "1. Analyze query type\n", "2. Retrieve relevant context from vector stores\n", "3. Generate response using Ollama" ] }, { "cell_type": "code", "execution_count": 43, "id": "rag-workflow", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:34.342593Z", "iopub.status.busy": "2026-01-21T11:19:34.342448Z", "iopub.status.idle": "2026-01-21T11:19:34.345697Z", "shell.execute_reply": "2026-01-21T11:19:34.345263Z" }, "papermill": { "duration": 0.009221, "end_time": "2026-01-21T11:19:34.346195", "exception": false, "start_time": "2026-01-21T11:19:34.336974", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 RAG Query Workflow\n", "==================================================\n", "\n", "┌─────────────────────────────────────────────────────────────┐\n", "│ RAG QUERY WORKFLOW │\n", "├─────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ User Query │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 1. Query Analysis (LocalRAGAgent) │ │\n", "│ │ • Determine query type │ │\n", "│ │ • Select target collection │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 2. Context Retrieval (OracleVS) │ │\n", "│ │ • Embed query │ │\n", "│ │ • Similarity search │ │\n", "│ │ • Return top-k chunks │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 3. Response Generation (Ollama) │ │\n", "│ │ • Assemble context + query │ │\n", "│ │ • Generate with LLM │ │\n", "│ │ • Return answer │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ Response to User │\n", "│ │\n", "└─────────────────────────────────────────────────────────────┘\n", "\n" ] } ], "source": [ "# Demonstrate RAG query workflow\n", "print(\"🔍 RAG Query Workflow\")\n", "print(\"=\"*50)\n", "\n", "workflow_diagram = \"\"\"\n", "┌─────────────────────────────────────────────────────────────┐\n", "│ RAG QUERY WORKFLOW │\n", "├─────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ User Query │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 1. Query Analysis (LocalRAGAgent) │ │\n", "│ │ • Determine query type │ │\n", "│ │ • Select target collection │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 2. Context Retrieval (OracleVS) │ │\n", "│ │ • Embed query │ │\n", "│ │ • Similarity search │ │\n", "│ │ • Return top-k chunks │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌─────────────────────────────────────┐ │\n", "│ │ 3. Response Generation (Ollama) │ │\n", "│ │ • Assemble context + query │ │\n", "│ │ • Generate with LLM │ │\n", "│ │ • Return answer │ │\n", "│ └─────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ Response to User │\n", "│ │\n", "└─────────────────────────────────────────────────────────────┘\n", "\"\"\"\n", "print(workflow_diagram)" ] }, { "cell_type": "code", "execution_count": 44, "id": "context-retrieval", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:34.356562Z", "iopub.status.busy": "2026-01-21T11:19:34.356414Z", "iopub.status.idle": "2026-01-21T11:19:35.196484Z", "shell.execute_reply": "2026-01-21T11:19:35.195959Z" }, "papermill": { "duration": 0.845835, "end_time": "2026-01-21T11:19:35.196922", "exception": false, "start_time": "2026-01-21T11:19:34.351087", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📚 Context Retrieval Demo\n", "==================================================\n", "\n", "Query: What is vector search?\n", "----------------------------------------\n", "🔍 [OracleVS] Querying PDFCOLLECTION\n", "🔍 [OracleVS] Retrieved 2 chunks from PDFCOLLECTION\n", "Retrieved 2 chunks:\n", " [1] In our approach, we attempt to find efficient muscle routings through optimizati...\n", " [2] In our approach, we attempt to find efficient muscle routings through optimizati...\n", "\n", "Query: How do agents communicate?\n", "----------------------------------------\n", "🔍 [OracleVS] Querying PDFCOLLECTION\n", "🔍 [OracleVS] Retrieved 2 chunks from PDFCOLLECTION\n", "Retrieved 2 chunks:\n", " [1] Our creature model consists of a hierarchy of rigid bodies, which are actuated u...\n", " [2] Our creature model consists of a hierarchy of rigid bodies, which are actuated u...\n", "\n", "Query: Explain the embedding process\n", "----------------------------------------\n", "🔍 [OracleVS] Querying PDFCOLLECTION\n", "🔍 [OracleVS] Retrieved 2 chunks from PDFCOLLECTION\n", "Retrieved 2 chunks:\n", " [1] parameters. The first is the result of activation dynamics ( § 3.2), the second ...\n", " [2] parameters. The first is the result of activation dynamics ( § 3.2), the second ...\n" ] } ], "source": [ "# Demonstrate context retrieval (Step 2)\n", "if ora_store:\n", " print(\"📚 Context Retrieval Demo\")\n", " print(\"=\"*50)\n", " \n", " test_queries = [\n", " \"What is vector search?\",\n", " \"How do agents communicate?\",\n", " \"Explain the embedding process\"\n", " ]\n", " \n", " for query in test_queries:\n", " print(f\"\\nQuery: {query}\")\n", " print(\"-\"*40)\n", " \n", " # Try PDF collection first (mimicking LocalRAGAgent behavior)\n", " results = ora_store.query_pdf_collection(query, n_results=2)\n", " \n", " if results:\n", " print(f\"Retrieved {len(results)} chunks:\")\n", " for i, r in enumerate(results):\n", " content = r.get('content', '')[:80]\n", " print(f\" [{i+1}] {content}...\")\n", " else:\n", " print(\" No relevant context found\")\n", " print(\" (Would fall back to LLM knowledge)\")" ] }, { "cell_type": "markdown", "id": "section-7", "metadata": { "papermill": { "duration": 0.005073, "end_time": "2026-01-21T11:19:35.207419", "exception": false, "start_time": "2026-01-21T11:19:35.202346", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 8. Multi-Agent Architecture\n", "\n", "Agentic RAG uses a Chain of Thought (CoT) approach with four specialized agents:\n", "\n", "| Agent | Role | Task |\n", "|-------|------|------|\n", "| **Planner** | Strategic planning | Break queries into 3-4 steps |\n", "| **Researcher** | Information gathering | Search vector stores |\n", "| **Reasoner** | Logical analysis | Draw conclusions |\n", "| **Synthesizer** | Response generation | Combine into final answer |" ] }, { "cell_type": "code", "execution_count": 45, "id": "agent-cards", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:35.218144Z", "iopub.status.busy": "2026-01-21T11:19:35.218016Z", "iopub.status.idle": "2026-01-21T11:19:35.233903Z", "shell.execute_reply": "2026-01-21T11:19:35.233410Z" }, "papermill": { "duration": 0.021887, "end_time": "2026-01-21T11:19:35.234310", "exception": false, "start_time": "2026-01-21T11:19:35.212423", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🤖 Multi-Agent Architecture\n", "==================================================\n", "Total agents: 8\n", "\n", "📌 Strategic Planner Agent\n", " ID: planner_agent_v1\n", " Role: Strategic Planner\n", " Expertise: problem_decomposition, strategic_planning, task_breakdown\n", "\n", "📌 Information Researcher Agent\n", " ID: researcher_agent_v1\n", " Role: Information Gatherer\n", " Expertise: information_retrieval, knowledge_extraction, data_analysis\n", "\n", "📌 Logic and Reasoning Agent\n", " ID: reasoner_agent_v1\n", " Role: Logic and Analysis\n", " Expertise: logical_reasoning, critical_thinking, pattern_recognition\n", "\n", "📌 Information Synthesis Agent\n", " ID: synthesizer_agent_v1\n", " Role: Information Synthesizer\n", " Expertise: information_synthesis, summarization, coherent_writing\n", "\n" ] } ], "source": [ "from src.specialized_agent_cards import get_all_specialized_agent_cards\n", "\n", "print(\"🤖 Multi-Agent Architecture\")\n", "print(\"=\"*50)\n", "\n", "# Get all agent cards\n", "agent_cards = get_all_specialized_agent_cards()\n", "\n", "print(f\"Total agents: {len(agent_cards)}\")\n", "print(\"\")\n", "\n", "# Show primary agents (v1)\n", "for agent_id in [\"planner_agent_v1\", \"researcher_agent_v1\", \"reasoner_agent_v1\", \"synthesizer_agent_v1\"]:\n", " card = agent_cards.get(agent_id, {})\n", " print(f\"📌 {card.get('name', agent_id)}\")\n", " print(f\" ID: {card.get('agent_id')}\")\n", " print(f\" Role: {card.get('metadata', {}).get('role', 'N/A')}\")\n", " print(f\" Expertise: {', '.join(card.get('metadata', {}).get('expertise', []))}\")\n", " print()" ] }, { "cell_type": "code", "execution_count": 46, "id": "cot-workflow", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:35.245263Z", "iopub.status.busy": "2026-01-21T11:19:35.245104Z", "iopub.status.idle": "2026-01-21T11:19:35.248709Z", "shell.execute_reply": "2026-01-21T11:19:35.248268Z" }, "papermill": { "duration": 0.009692, "end_time": "2026-01-21T11:19:35.249096", "exception": false, "start_time": "2026-01-21T11:19:35.239404", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔗 Chain of Thought (CoT) Workflow\n", "==================================================\n", "\n", "┌─────────────────────────────────────────────────────────────────┐\n", "│ CHAIN OF THOUGHT (CoT) WORKFLOW │\n", "├─────────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ User Query: \"Explain how the RAG system processes PDFs\" │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ PLANNER AGENT │ │\n", "│ │ \"Let me break this down into steps:\" │ │\n", "│ │ Step 1: Understand PDF ingestion │ │\n", "│ │ Step 2: Explain text extraction │ │\n", "│ │ Step 3: Describe chunking process │ │\n", "│ │ Step 4: Detail embedding storage │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ RESEARCHER AGENT (for each step) │ │\n", "│ │ • Searches PDFCOLLECTION │ │\n", "│ │ • Gathers relevant context │ │\n", "│ │ • Extracts key findings │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ REASONER AGENT (for each step) │ │\n", "│ │ • Analyzes findings │ │\n", "│ │ • Draws logical conclusions │ │\n", "│ │ • Identifies patterns │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ SYNTHESIZER AGENT │ │\n", "│ │ • Combines all reasoning steps │ │\n", "│ │ • Produces coherent final answer │ │\n", "│ │ • Addresses original query │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ Final Answer: \"The RAG system processes PDFs through...\" │\n", "│ │\n", "└─────────────────────────────────────────────────────────────────┘\n", "\n" ] } ], "source": [ "# Visualize Chain of Thought workflow\n", "print(\"🔗 Chain of Thought (CoT) Workflow\")\n", "print(\"=\"*50)\n", "\n", "cot_diagram = \"\"\"\n", "┌─────────────────────────────────────────────────────────────────┐\n", "│ CHAIN OF THOUGHT (CoT) WORKFLOW │\n", "├─────────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ User Query: \"Explain how the RAG system processes PDFs\" │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ PLANNER AGENT │ │\n", "│ │ \"Let me break this down into steps:\" │ │\n", "│ │ Step 1: Understand PDF ingestion │ │\n", "│ │ Step 2: Explain text extraction │ │\n", "│ │ Step 3: Describe chunking process │ │\n", "│ │ Step 4: Detail embedding storage │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ RESEARCHER AGENT (for each step) │ │\n", "│ │ • Searches PDFCOLLECTION │ │\n", "│ │ • Gathers relevant context │ │\n", "│ │ • Extracts key findings │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ REASONER AGENT (for each step) │ │\n", "│ │ • Analyzes findings │ │\n", "│ │ • Draws logical conclusions │ │\n", "│ │ • Identifies patterns │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ ┌───────────────────────────────────────────────────────┐ │\n", "│ │ SYNTHESIZER AGENT │ │\n", "│ │ • Combines all reasoning steps │ │\n", "│ │ • Produces coherent final answer │ │\n", "│ │ • Addresses original query │ │\n", "│ └───────────────────────────────────────────────────────┘ │\n", "│ ↓ │\n", "│ Final Answer: \"The RAG system processes PDFs through...\" │\n", "│ │\n", "└─────────────────────────────────────────────────────────────────┘\n", "\"\"\"\n", "print(cot_diagram)" ] }, { "cell_type": "markdown", "id": "section-8", "metadata": { "papermill": { "duration": 0.005072, "end_time": "2026-01-21T11:19:35.259360", "exception": false, "start_time": "2026-01-21T11:19:35.254288", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## 9. A2A Protocol Integration\n", "\n", "The Agent-to-Agent (A2A) protocol enables distributed agent communication via JSON-RPC 2.0." ] }, { "cell_type": "code", "execution_count": 47, "id": "a2a-demo", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:35.270292Z", "iopub.status.busy": "2026-01-21T11:19:35.270174Z", "iopub.status.idle": "2026-01-21T11:19:35.273926Z", "shell.execute_reply": "2026-01-21T11:19:35.273507Z" }, "papermill": { "duration": 0.009966, "end_time": "2026-01-21T11:19:35.274488", "exception": false, "start_time": "2026-01-21T11:19:35.264522", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔌 A2A Protocol Demo\n", "==================================================\n", "A2A Request Format (JSON-RPC 2.0):\n", "----------------------------------------\n", "{\n", " \"jsonrpc\": \"2.0\",\n", " \"method\": \"document.query\",\n", " \"params\": {\n", " \"query\": \"How does vector search work?\",\n", " \"collection\": \"PDF\",\n", " \"use_cot\": true\n", " },\n", " \"id\": \"1\"\n", "}\n", "\n", "Available A2A Methods:\n", "----------------------------------------\n", " • document.query: Query with context retrieval\n", " • document.upload: Process and store documents\n", " • agent.query: Query specialized CoT agents\n", " • agent.discover: Find agents with capabilities\n", " • agent.register: Register new agents\n", " • task.create: Create long-running tasks\n", " • task.status: Check task status\n", " • health.check: System health\n" ] } ], "source": [ "from src.a2a_models import AgentCard, AgentCapability, AgentEndpoint\n", "\n", "print(\"🔌 A2A Protocol Demo\")\n", "print(\"=\"*50)\n", "\n", "# Show A2A request format\n", "a2a_request_example = {\n", " \"jsonrpc\": \"2.0\",\n", " \"method\": \"document.query\",\n", " \"params\": {\n", " \"query\": \"How does vector search work?\",\n", " \"collection\": \"PDF\",\n", " \"use_cot\": True\n", " },\n", " \"id\": \"1\"\n", "}\n", "\n", "print(\"A2A Request Format (JSON-RPC 2.0):\")\n", "print(\"-\"*40)\n", "print(json.dumps(a2a_request_example, indent=2))\n", "\n", "print(\"\")\n", "print(\"Available A2A Methods:\")\n", "print(\"-\"*40)\n", "methods = [\n", " (\"document.query\", \"Query with context retrieval\"),\n", " (\"document.upload\", \"Process and store documents\"),\n", " (\"agent.query\", \"Query specialized CoT agents\"),\n", " (\"agent.discover\", \"Find agents with capabilities\"),\n", " (\"agent.register\", \"Register new agents\"),\n", " (\"task.create\", \"Create long-running tasks\"),\n", " (\"task.status\", \"Check task status\"),\n", " (\"health.check\", \"System health\")\n", "]\n", "\n", "for method, desc in methods:\n", " print(f\" • {method}: {desc}\")" ] }, { "cell_type": "code", "execution_count": 48, "id": "distributed-arch", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:35.285582Z", "iopub.status.busy": "2026-01-21T11:19:35.285466Z", "iopub.status.idle": "2026-01-21T11:19:35.288666Z", "shell.execute_reply": "2026-01-21T11:19:35.288234Z" }, "papermill": { "duration": 0.009326, "end_time": "2026-01-21T11:19:35.289075", "exception": false, "start_time": "2026-01-21T11:19:35.279749", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🌐 Distributed Agent Architecture\n", "==================================================\n", "\n", "┌─────────────────────────────────────────────────────────────────┐\n", "│ DISTRIBUTED AGENT DEPLOYMENT │\n", "├─────────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ ┌─────────────────┐ ┌─────────────────┐ │\n", "│ │ Client App │ A2A │ Main Server │ │\n", "│ │ (Gradio/API) │ ◄──────► │ (Port 8000) │ │\n", "│ └─────────────────┘ └─────────────────┘ │\n", "│ │ │\n", "│ ┌───────────────────┼───────────────────┐ │\n", "│ │ │ │ │\n", "│ ▼ ▼ ▼ │\n", "│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐\n", "│ │ Planner Agent │ │ Researcher Agent│ │ Reasoner Agent │\n", "│ │ (Server 1) │ │ (Server 2) │ │ (Server 3) │\n", "│ └─────────────────┘ └─────────────────┘ └─────────────────┘\n", "│ │ │ │ │\n", "│ └────────────────────┼────────────────────┘ │\n", "│ ▼ │\n", "│ ┌─────────────────────┐ │\n", "│ │ Oracle Database │ │\n", "│ │ (Vector Store) │ │\n", "│ └─────────────────────┘ │\n", "│ │\n", "│ Configuration (config.yaml): │\n", "│ AGENT_ENDPOINTS: │\n", "│ planner_url: http://server1:8000 │\n", "│ researcher_url: http://server2:8000 │\n", "│ reasoner_url: http://server3:8000 │\n", "│ synthesizer_url: http://server4:8000 │\n", "│ │\n", "└─────────────────────────────────────────────────────────────────┘\n", "\n" ] } ], "source": [ "# Show distributed architecture\n", "print(\"🌐 Distributed Agent Architecture\")\n", "print(\"=\"*50)\n", "\n", "distributed_diagram = \"\"\"\n", "┌─────────────────────────────────────────────────────────────────┐\n", "│ DISTRIBUTED AGENT DEPLOYMENT │\n", "├─────────────────────────────────────────────────────────────────┤\n", "│ │\n", "│ ┌─────────────────┐ ┌─────────────────┐ │\n", "│ │ Client App │ A2A │ Main Server │ │\n", "│ │ (Gradio/API) │ ◄──────► │ (Port 8000) │ │\n", "│ └─────────────────┘ └─────────────────┘ │\n", "│ │ │\n", "│ ┌───────────────────┼───────────────────┐ │\n", "│ │ │ │ │\n", "│ ▼ ▼ ▼ │\n", "│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐\n", "│ │ Planner Agent │ │ Researcher Agent│ │ Reasoner Agent │\n", "│ │ (Server 1) │ │ (Server 2) │ │ (Server 3) │\n", "│ └─────────────────┘ └─────────────────┘ └─────────────────┘\n", "│ │ │ │ │\n", "│ └────────────────────┼────────────────────┘ │\n", "│ ▼ │\n", "│ ┌─────────────────────┐ │\n", "│ │ Oracle Database │ │\n", "│ │ (Vector Store) │ │\n", "│ └─────────────────────┘ │\n", "│ │\n", "│ Configuration (config.yaml): │\n", "│ AGENT_ENDPOINTS: │\n", "│ planner_url: http://server1:8000 │\n", "│ researcher_url: http://server2:8000 │\n", "│ reasoner_url: http://server3:8000 │\n", "│ synthesizer_url: http://server4:8000 │\n", "│ │\n", "└─────────────────────────────────────────────────────────────────┘\n", "\"\"\"\n", "print(distributed_diagram)" ] }, { "cell_type": "markdown", "id": "cleanup", "metadata": { "papermill": { "duration": 0.005241, "end_time": "2026-01-21T11:19:35.299666", "exception": false, "start_time": "2026-01-21T11:19:35.294425", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## Cleanup" ] }, { "cell_type": "code", "execution_count": 49, "id": "cleanup-code", "metadata": { "execution": { "iopub.execute_input": "2026-01-21T11:19:35.310959Z", "iopub.status.busy": "2026-01-21T11:19:35.310811Z", "iopub.status.idle": "2026-01-21T11:19:36.364517Z", "shell.execute_reply": "2026-01-21T11:19:36.363653Z" }, "papermill": { "duration": 1.059873, "end_time": "2026-01-21T11:19:36.364945", "exception": false, "start_time": "2026-01-21T11:19:35.305072", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🧹 Cleanup\n", "==================================================\n", "✅ Dropped test table NOTEBOOK_TEST_COLLECTION\n", "✅ Database connection closed\n" ] } ], "source": [ "# Clean up test data\n", "if 'test_store' in dir():\n", " print(\"🧹 Cleanup\")\n", " print(\"=\"*50)\n", " \n", " try:\n", " cursor = connection.cursor()\n", " cursor.execute(\"DROP TABLE NOTEBOOK_TEST_COLLECTION PURGE\")\n", " connection.commit()\n", " print(\"✅ Dropped test table NOTEBOOK_TEST_COLLECTION\")\n", " except Exception as e:\n", " print(f\" Note: {e}\")\n", "\n", "# Close connections\n", "if 'connection' in dir() and connection:\n", " connection.close()\n", " print(\"✅ Database connection closed\")\n", " \n", "if 'ora_store' in dir() and ora_store:\n", " try:\n", " ora_store.connection.close()\n", " except:\n", " pass" ] }, { "cell_type": "markdown", "id": "summary", "metadata": { "papermill": { "duration": 0.005443, "end_time": "2026-01-21T11:19:36.376131", "exception": false, "start_time": "2026-01-21T11:19:36.370688", "status": "completed" }, "tags": [] }, "source": [ "---\n", "## Summary\n", "\n", "This notebook demonstrated how **Agentic RAG** leverages **langchain-oracledb** for enterprise-grade RAG:\n", "\n", "### langchain-oracledb Components\n", "- **OracleEmbeddings**: In-database embedding generation with ONNX models\n", "- **OracleVS**: Vector storage and similarity search\n", "- **OracleTextSplitter**: Server-side text chunking with normalization\n", "\n", "### Multi-Collection Architecture\n", "- PDF, Web, Repository, and General Knowledge collections\n", "- Unified query interface across collections\n", "- Metadata management and sanitization\n", "\n", "### Multi-Agent CoT System\n", "- Four specialized agents (Planner, Researcher, Reasoner, Synthesizer)\n", "- A2A protocol for distributed communication\n", "- Configurable deployment across multiple servers\n", "\n", "### Next Steps\n", "1. Start the API: `python -m src.main`\n", "2. Launch Gradio UI: `python gradio_app.py`\n", "3. Use the CLI: `python agent_cli.py`\n", "4. Test A2A: `curl -X POST http://localhost:8000/a2a -d '{...}'`" ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" }, "papermill": { "default_parameters": {}, "duration": 17.606931, "end_time": "2026-01-21T11:19:37.498095", "environment_variables": {}, "exception": null, "input_path": "notebooks/agentic_rag_langchain_oracledb_demo.ipynb", "output_path": "notebooks/agentic_rag_langchain_oracledb_demo.ipynb", "parameters": {}, "start_time": "2026-01-21T11:19:19.891164", "version": "2.6.0" } }, "nbformat": 4, "nbformat_minor": 5 }