{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Agentic RAG with Oracle Database 26AI: Vector, Keyword, and Hybrid Search\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agentic_rag_hybrid_search.ipynb)\n", "\n", "---\n", "\n", "This notebook demonstrates how to build a production-ready agentic RAG system using Oracle Database 26AI as the sole data backend — no separate vector database required. You will store document embeddings directly alongside relational data in Oracle, run three distinct retrieval modes (vector similarity, keyword, and hybrid), and wire everything together with a LangGraph ReAct agent powered by GPT-4.1-mini. By the end you will have a working AI research analyst that searches a synthetic tech-news knowledge base and cites its sources.\n", "\n", "## Tech Stack\n", "\n", "| Component | Technology |\n", "|---|---|\n", "| Database | Oracle Database 26AI Free (Docker) |\n", "| Embeddings | HuggingFace all-MiniLM-L12-v2 (384-dim, local) |\n", "| Vector Store | langchain-oracledb (OracleVS) |\n", "| Agent Framework | LangGraph create_react_agent |\n", "| LLM | OpenAI gpt-4.1-mini |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What You'll Learn\n", "\n", "- Store vectors alongside relational data in Oracle 26AI - no separate vector DB needed\n", "- Run vector similarity search using HNSW indexes and `VECTOR_DISTANCE()`\n", "- Run keyword search using Oracle Text\n", "- Combine both in a single hybrid SQL query (the real power move)\n", "- Build a LangGraph ReAct agent that searches the knowledge base with full tool call visibility\n" ] }, { "cell_type": "markdown", "id": "acdd7553", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "- Docker Desktop running\n", "- An OpenAI API key\n", "\n", "### Set Up Environment Variables\n", "\n", "Create a `.env` file in the same directory as this notebook (or export the variables in your shell):\n", "\n", "```bash\n", "ORACLE_USER=vector\n", "ORACLE_PWD=vector\n", "ORACLE_DSN=localhost:1521/FREEPDB1\n", "OPENAI_API_KEY=sk-your-key-here\n", "```\n", "\n", "### Start Oracle Database 26AI Free\n", "\n", "```bash\n", "docker run -d --name oracle-26ai-free \\\n", " -p 1521:1521 \\\n", " -e ORACLE_PWD=YourPassword123 \\\n", " container-registry.oracle.com/database/free:latest-lite\n", "```\n", "\n", "Wait for the container to be healthy (takes 60-90 seconds):\n", "\n", "```bash\n", "docker exec oracle-26ai-free bash -c \\\n", " \"echo 'SELECT 1 FROM dual;' | sqlplus -s sys/YourPassword123@localhost:1521/FREEPDB1 AS SYSDBA\"\n", "```\n", "\n", "### Create the Vector User\n", "\n", "Connect with sqlplus and run:\n", "\n", "```sql\n", "ALTER SESSION SET CONTAINER = FREEPDB1;\n", "\n", "CREATE USER vector IDENTIFIED BY vector\n", " DEFAULT TABLESPACE users\n", " QUOTA UNLIMITED ON users;\n", "\n", "GRANT CONNECT, RESOURCE TO vector;\n", "GRANT DB_DEVELOPER_ROLE TO vector;\n", "GRANT CREATE MINING MODEL TO vector;\n", "\n", "BEGIN\n", " DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n", " host => '*',\n", " ace => xs$ace_type(\n", " privilege_list => xs$name_list('connect', 'resolve'),\n", " principal_name => 'vector',\n", " principal_type => xs_acl.ptype_db\n", " )\n", " );\n", "END;\n", "/\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "f023fdc7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "The process cannot access the file because it is being used by another process.\n" ] } ], "source": [ "%pip install langchain-oracledb>=0.2.0 langchain-openai>=0.3.0 langchain-huggingface>=0.1.0 sentence-transformers>=3.0.0 langgraph>=0.3.0 python-dotenv>=1.0.0 oracledb>=2.0.0 \"packaging<26\" -q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Configuration\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import array\n", "import os\n", "import warnings\n", "from pathlib import Path\n", "\n", "warnings.filterwarnings(\"ignore\", message=\"create_react_agent has been moved\")\n", "\n", "import oracledb\n", "from dotenv import load_dotenv\n", "from langchain_core.documents import Document\n", "from langchain_core.tools import tool\n", "from langchain_huggingface import HuggingFaceEmbeddings\n", "from langchain_openai import ChatOpenAI\n", "from langgraph.prebuilt import create_react_agent\n", "\n", "from langchain_oracledb import OracleVS\n", "from langchain_oracledb.vectorstores.oraclevs import DistanceStrategy\n", "\n", "load_dotenv()\n", "\n", "ORACLE_USER = os.getenv(\"ORACLE_USER\", \"vector\")\n", "ORACLE_PWD = os.getenv(\"ORACLE_PWD\", \"vector\")\n", "ORACLE_DSN = os.getenv(\"ORACLE_DSN\", \"localhost:1521/FREEPDB1\")\n", "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")\n", "\n", "EMBEDDING_MODEL = \"sentence-transformers/all-MiniLM-L12-v2\"\n", "TABLE_NAME = \"TECH_NEWS_COLLECTION\"\n", "\n", "print(f\"Oracle DSN: {ORACLE_DSN}\")\n", "print(f\"Embedding model: {EMBEDDING_MODEL}\")\n", "print(f\"OpenAI API key: {'set' if OPENAI_API_KEY else 'NOT SET - set OPENAI_API_KEY env var'}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Connect to Oracle Database 26AI\n", "\n", "python-oracledb runs in thin mode by default, which means no Oracle Client libraries are required. The driver connects directly to Oracle Database over the network using a pure-Python implementation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn = oracledb.connect(user=ORACLE_USER, password=ORACLE_PWD, dsn=ORACLE_DSN)\n", "\n", "cursor = conn.cursor()\n", "cursor.execute(\"SELECT banner FROM v$version WHERE ROWNUM = 1\")\n", "version = cursor.fetchone()\n", "print(f\"Connected! Database: {version[0]}\")\n", "cursor.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Initialize Embeddings\n", "\n", "We use HuggingFace's `all-MiniLM-L12-v2` model to generate 384-dimensional embeddings. This model runs entirely locally — no API key needed and no data leaves your machine. The model is downloaded on first use and cached in `~/.cache/huggingface/`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)\n", "\n", "test_vec = embeddings.embed_query(\"test\")\n", "print(f\"Embedding model loaded: {EMBEDDING_MODEL}\")\n", "print(f\"Embedding dimension: {len(test_vec)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Load Sample Documents\n", "\n", "These are 10 synthetic tech news articles designed to demonstrate both retrieval modes. The articles contain specific proper nouns (company names, product names) to test keyword search, and conceptual themes (AI infrastructure, enterprise adoption, open source) to test vector similarity search. All content is inline — no external files required.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "documents = [\n", " Document(\n", " page_content=\"\"\"# NVIDIA Expands AI Infrastructure Alliances With Microsoft, Google, and Oracle\n", "\n", "**Santa Clara, CA — March 10, 2026**\n", "\n", "NVIDIA today announced a sweeping set of infrastructure partnerships with Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure (OCI) to accelerate the deployment of its latest H200 Tensor Core GPUs across hyperscale data centers worldwide. The agreements represent a combined infrastructure commitment exceeding $10 billion over the next three years.\n", "\n", "At the center of the announcement is NVIDIAs DGX SuperPOD platform, which packages H200 GPUs into rack-scale clusters purpose-built for large language model training and inference. Oracle will offer DGX SuperPOD configurations directly through OCI, giving enterprise customers a turnkey path to petaflop-scale AI without managing bare-metal provisioning themselves.\n", "\n", "\"The H200 delivers nearly double the memory bandwidth of its predecessor, and that matters enormously when you're serving 100-billion-parameter models at low latency,\" said Jensen Huang, NVIDIA's CEO, during a keynote at the company's GTC developer conference. \"Our cloud partners are the distribution layer that brings this capability to every enterprise on the planet.\"\n", "\n", "The partnership also deepens integration between NVIDIA's CUDA software stack and each cloud provider's managed AI services. Microsoft Azure will offer CUDA-accelerated virtual machines in its NC-series lineup with native support for NVIDIA's NIM microservices, allowing developers to deploy optimized inference endpoints with a single API call. Google Cloud will integrate NVIDIA's Triton Inference Server directly into Vertex AI, streamlining the path from model training to production serving.\n", "\n", "For enterprises running on-premises workloads, NVIDIA announced an updated DGX OS 6.2 release that tightens integration with Kubernetes orchestration and includes pre-validated configurations for running NVIDIA NeMo training pipelines alongside existing VMware infrastructure. Early access partners including SAP, Accenture, and Deloitte have been testing the combined stack in production since January.\n", "\n", "Analysts noted that the breadth of the announcement signals NVIDIA's intent to become not just a chip vendor but a full-stack AI infrastructure provider. \"CUDA's moat is real, but these cloud deals are how NVIDIA locks in the software ecosystem for the next decade,\" said Stacy Rasgon, senior analyst at Bernstein Research.\n", "\n", "NVIDIA shares rose 4.2% on the news, closing at a record high.\"\"\",\n", " metadata={\"source\": \"01_nvidia_ai_infrastructure.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# Databricks Brings Compound AI Systems to the Lakehouse With Mosaic AI Overhaul\n", "\n", "**San Francisco, CA — March 5, 2026**\n", "\n", "Databricks has announced a major expansion of its Mosaic AI platform, deepening the integration between model development, data governance, and production serving directly inside the lakehouse. The release, unveiled at the company's Data + AI Summit preview event, brings together Unity Catalog, MLflow 3.0, and Delta Lake into a unified AI development loop that the company is calling \"compound AI systems.\"\n", "\n", "The headline feature is a new Mosaic AI Agent Framework that lets data teams define multi-step AI pipelines entirely within Databricks notebooks, with each step's inputs, outputs, and metadata tracked automatically in MLflow. Evaluation harnesses built into the framework allow teams to benchmark agent performance against ground-truth datasets stored in Delta Lake tables, closing a feedback loop that previously required stitching together separate tools.\n", "\n", "Unity Catalog plays a central governance role in the new architecture. Models, vector indexes, and external data sources are all registered as governed assets in Unity Catalog, giving data platform teams a single place to audit who is using which model and what data it touched. \"Governance can't be an afterthought when AI is reading your most sensitive enterprise data,\" said Ali Ghodsi, Databricks CEO, at the preview event.\n", "\n", "MLflow 3.0, shipping alongside the announcement, introduces a redesigned Experiment Tracking UI with native support for logging multimodal artifacts — including image and audio outputs from generative models — and a new LiveView feature that streams real-time metrics during long training runs. The release also adds first-class support for fine-tuning Llama and Mistral models on Databricks compute without leaving the platform.\n", "\n", "On the data side, Delta Lake 4.0 introduces liquid clustering as a generally available feature, replacing manual OPTIMIZE and ZORDER operations with an automated background process that continuously reorganizes table files based on actual query patterns. Early adopters report 40-60% improvements in scan performance on large feature store tables.\n", "\n", "Databricks, which was last valued at $62 billion in a 2024 secondary funding round, declined to comment on IPO timing but said it expects annualized revenue to exceed $3 billion by end of 2026.\"\"\",\n", " metadata={\"source\": \"02_databricks_lakehouse_ai.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# Snowflake Unleashes Cortex AI Agents to Automate Enterprise Data Workflows\n", "\n", "**San Mateo, CA — February 28, 2026**\n", "\n", "Snowflake today announced the general availability of Cortex AI Agents, a new capability inside the Snowflake Data Cloud that enables enterprises to build autonomous, multi-step AI workflows directly on top of their governed data — without moving it outside the platform. The launch marks the most significant expansion of Snowflake's AI strategy since the company introduced the Arctic foundation model last year.\n", "\n", "Cortex AI Agents can be configured to answer natural-language questions, execute data transformations, and trigger downstream actions by chaining together Cortex Search, Cortex Analyst, and Snowflake's built-in Python runtime. A retail customer using the private preview, which Snowflake declined to name, automated its weekly inventory discrepancy report — a process that previously required a data analyst two hours — down to a four-minute agent run.\n", "\n", "\"The insight is that enterprises don't want to move their data to an AI platform. They want AI to come to the data,\" said Sridhar Ramaswamy, Snowflake's CEO. \"Cortex Agents are how we make that real.\"\n", "\n", "A key enabling technology is Cortex Search, which now supports hybrid retrieval combining dense vector embeddings with BM25 keyword scoring. This allows agents to locate relevant rows, documents, and semi-structured JSON fields across Snowflake tables without requiring customers to maintain a separate vector database. Embeddings are computed and stored inside Snowflake itself using the Arctic Embed model family, keeping all data in-region and subject to existing Snowflake security policies.\n", "\n", "Developer experience is handled through Streamlit in Snowflake, which has been updated to support an agent canvas — a drag-and-drop interface for wiring together Cortex functions, SQL queries, and Python code blocks into a deployable agent. Agents can be published as internal apps accessible to business users via Streamlit's web UI, or exposed as REST endpoints for integration with external systems like Salesforce and ServiceNow.\n", "\n", "Snowflake also announced a partnership with Accenture to offer Cortex Agent implementation services, targeting financial services and healthcare verticals where data residency requirements make cloud-neutral AI deployments essential.\n", "\n", "Pricing for Cortex AI Agents is consumption-based, billed in Snowflake credits.\"\"\",\n", " metadata={\"source\": \"03_snowflake_cortex_agents.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# Google Cloud Supercharges Vertex AI With Expanded Model Garden and Gemini 2.0 Integrations\n", "\n", "**Sunnyvale, CA — March 3, 2026**\n", "\n", "Google Cloud announced a broad set of updates to Vertex AI this week, centering on a dramatically expanded Model Garden, tighter BigQuery ML integration, and new agentic capabilities powered by Gemini 2.0. The announcements were made at Google Cloud Next's pre-conference press day and represent the platform's biggest feature drop since Vertex AI was rebranded from AI Platform in 2021.\n", "\n", "The Model Garden now hosts over 200 foundation models, including the full Gemini family, Google's Med-PaLM 2 for healthcare, and a curated selection of open-source models including Llama 4, Mistral Large 2, and Code Llama variants. New one-click fine-tuning workflows allow developers to adapt any hosted model to proprietary data in Vertex AI Workbench, with the resulting fine-tuned checkpoints automatically registered in Vertex AI Model Registry for versioning and governance.\n", "\n", "The biggest workflow change is the introduction of Vertex AI Agent Engine, which brings Gemini's function-calling and code-execution capabilities together with Google Search grounding into a managed agentic runtime. Developers define tools — BigQuery queries, Cloud Functions, third-party APIs — and Agent Engine handles the planning loop, tool dispatch, and state management. Google says Agent Engine can handle up to 1,000 concurrent agent sessions per project out of the box, with auto-scaling beyond that.\n", "\n", "BigQuery ML gets a significant upgrade with native support for unstructured data analysis. Teams can now run multimodal Gemini models directly against BigQuery tables containing image or PDF columns using standard SQL syntax, enabling document extraction and visual QA pipelines without leaving the data warehouse. \"We want SQL to be a first-class language for AI, not just analytics,\" said Thomas Kurian, Google Cloud's CEO.\n", "\n", "Vertex AI Workbench, the managed notebook environment, ships new collaborative features including real-time co-editing and a persistent compute option that keeps Python kernels alive between sessions — addressing a longstanding complaint from data scientists who lost state when notebooks timed out.\n", "\n", "Google Cloud has also announced a $200 million credit program for AI startups building on Vertex AI, administered through its Google for Startups accelerator.\"\"\",\n", " metadata={\"source\": \"04_google_vertex_ai_update.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# Microsoft Rebrands Azure AI Studio as AI Foundry, Launches Phi-4 and Expanded Responsible AI Tooling\n", "\n", "**Redmond, WA — February 20, 2026**\n", "\n", "Microsoft has officially rebranded Azure AI Studio as Azure AI Foundry, signaling a broader strategic vision for the platform as an end-to-end factory for enterprise AI application development. Alongside the rebrand, the company announced the general availability of Phi-4, its most capable small language model to date, and a new suite of responsible AI tools that Microsoft says are now required components for any production deployment on the platform.\n", "\n", "Phi-4 is a 14-billion-parameter model trained on a curated synthetic dataset and achieves performance competitive with models three times its size on reasoning and coding benchmarks. Unlike its cloud-only predecessors, Phi-4 is available as a download for on-premises and edge deployments, making it attractive for regulated industries that cannot send data to external APIs. Microsoft has also released a Phi-4 ONNX variant optimized for CPU inference, targeting deployment on Windows Copilot+ PCs without requiring a discrete GPU.\n", "\n", "Azure AI Foundry introduces a project-based organization model that groups models, datasets, deployments, and evaluation runs under a single governance boundary. Each project is tied to an Azure subscription and inherits its role-based access control, making it straightforward for platform teams to enforce separation between development and production environments. Integration with Azure DevOps and GitHub Actions allows AI deployments to be versioned and promoted through CI/CD pipelines, a capability enterprise architects have been requesting since the platform's launch.\n", "\n", "The responsible AI layer, now mandatory rather than optional, includes content safety filters backed by Azure AI Content Safety, groundedness detection for RAG pipelines powered by the Azure OpenAI Service, and a new Prompt Shields feature that detects jailbreak and indirect injection attacks at the API gateway level. \"Responsible AI has to be infrastructure, not a checkbox,\" said Satya Nadella during the announcement livestream.\n", "\n", "Microsoft Copilot Studio, which allows non-developers to build Copilot extensions using a low-code interface, gains deep integration with AI Foundry, so that custom agents built by professional developers in Foundry can be surfaced directly inside Microsoft 365 Copilot experiences.\n", "\n", "The Azure OpenAI Service, which continues to offer exclusive early access to OpenAI's latest models under a commercial agreement, now supports o3 and GPT-5-preview for customers in the US East and West Europe regions.\"\"\",\n", " metadata={\"source\": \"05_microsoft_azure_ai_foundry.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# AWS Announces Major Bedrock Agent Upgrades, Deeper Claude Integration, and Enterprise Guardrails\n", "\n", "**Seattle, WA — March 1, 2026**\n", "\n", "Amazon Web Services today announced a sweeping set of enhancements to Amazon Bedrock, its managed foundation model service, with a particular focus on agentic AI capabilities, expanded knowledge base connectors, and a new enterprise-grade guardrails framework. The announcements were made at AWS's inaugural AI Innovation Day event in Seattle and come as competition in the managed AI platform space intensifies.\n", "\n", "The flagship addition is Bedrock Agents 2.0, a reimagined orchestration layer that supports multi-agent collaboration — allowing specialized agents to hand off tasks to one another and share intermediate results through a shared working memory. AWS demonstrated a financial analysis workflow in which a research agent retrieved documents from an S3-backed knowledge base, passed structured summaries to a calculation agent, and routed the final output to a formatting agent before delivering a polished report to the end user. The entire pipeline ran in under 30 seconds.\n", "\n", "Claude 3.7 Sonnet from Anthropic is now the default recommended model for Bedrock Agents, following benchmark results that show it outperforms competing models on multi-step tool use and instruction following. AWS and Anthropic, which has received $4 billion in Amazon investment, have deepened their co-engineering relationship to optimize Claude's function-calling implementation for Bedrock's tool schema format.\n", "\n", "Knowledge bases — Bedrock's managed RAG layer — now support 15 data source connectors including Confluence, SharePoint, Salesforce, and web crawlers, up from 6 at last year's re:Invent. A new hybrid retrieval mode combines semantic vector search with metadata filters, letting agents narrow results by document date, author, or custom tags before performing embedding-based similarity ranking. AWS says this reduces irrelevant context in agent prompts by an average of 34% based on internal benchmarks.\n", "\n", "Bedrock Guardrails, the platform's content safety layer, adds support for sensitive information redaction — automatically masking PII such as social security numbers, credit card numbers, and health record identifiers before they appear in model outputs. A new topic denial feature lets administrators define off-limits subject areas using natural-language descriptions, with the guardrail layer blocking responses that stray into those topics regardless of which underlying model is used.\n", "\n", "AWS also announced a preview of Bedrock Model Distillation, which allows customers to generate synthetic training data from a large frontier model and use it to fine-tune a smaller, cheaper model for specific tasks — effectively letting enterprises create custom SLMs without sourcing labeled data themselves.\n", "\n", "Pricing for Bedrock Agents 2.0 remains consumption-based, with no additional charge for multi-agent orchestration during the preview period.\"\"\",\n", " metadata={\"source\": \"06_aws_bedrock_agents.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# VectorForge Raises $85M Series B to Tackle Hybrid Search at Enterprise Scale\n", "\n", "**New York, NY — February 14, 2026**\n", "\n", "VectorForge, a two-year-old database startup founded by former engineers from MongoDB and Elastic, announced today that it has closed an $85 million Series B funding round led by Andreessen Horowitz, with participation from Databricks Ventures and Salesforce Ventures. The company is building what it describes as a \"query-first\" vector database designed from scratch for hybrid search workloads — combining dense vector retrieval with sparse keyword scoring in a single query engine.\n", "\n", "The funding comes as demand for production-grade vector storage explodes, driven largely by enterprise RAG deployments. VectorForge enters a crowded field that includes Pinecone, Weaviate, Qdrant, Chroma, and Milvus, but CEO Priya Nair argues the competition has largely inherited legacy architectural assumptions. \"Pinecone is a great managed service for pure ANN search, and Weaviate has a rich object model, but neither was designed around the hybrid query path being the primary workload,\" Nair said. \"We built the storage engine to treat vector and keyword indexes as equal citizens from day one.\"\n", "\n", "VectorForge's headline benchmark numbers have drawn scrutiny and attention in equal measure. The company published results on the BEIR dataset showing p99 query latency of 12 milliseconds at one million vectors with hybrid search enabled — compared to 38ms for Qdrant and 47ms for Weaviate under the same conditions on equivalent hardware. Independent replication by the DBTech benchmarking group confirmed the numbers within a 15% margin, attributing VectorForge's advantage to a custom columnar storage format that co-locates vector and inverted index entries for the same document in adjacent memory pages.\n", "\n", "The product offers a fully managed cloud service with multi-region replication, as well as a self-hosted Docker image that can run on a single machine with 16GB of RAM. An Oracle Cloud Infrastructure deployment option, certified under OCI's ISV partnership program, is expected in Q2 2026.\n", "\n", "Enterprise features include role-based access control at the collection and namespace level, audit logging, and a query explain tool that shows which documents were retrieved by the vector path versus the keyword path and their respective scores — a feature that RAG developers have found invaluable for debugging retrieval failures.\n", "\n", "VectorForge plans to use the Series B proceeds to grow its engineering team from 40 to 90 people and to expand its go-to-market effort in the financial services and life sciences verticals, where semantic search over proprietary document corpora is a common workload.\"\"\",\n", " metadata={\"source\": \"07_startup_vector_database.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# LangChain Crosses 100,000 GitHub Stars as LangGraph and LangSmith Drive Enterprise Adoption\n", "\n", "**San Francisco, CA — February 25, 2026**\n", "\n", "LangChain, the open-source framework that became synonymous with the early wave of LLM application development, announced this week that its core repository has surpassed 100,000 GitHub stars — a milestone the company calls a reflection of the broader AI developer ecosystem's maturation. But the company's real momentum story in 2026 is less about the original chain-and-agent abstractions and more about two newer products: LangGraph for stateful agent orchestration and LangSmith for production observability.\n", "\n", "LangGraph, which models agent workflows as directed graphs with persistent state, has become the orchestration layer of choice for teams building complex multi-step AI systems. The framework hit 1 million monthly downloads in January and is the backbone of agent implementations at companies including Elastic, Klarna, and Replit. A new LangGraph Platform offering, announced this week, adds managed deployment, built-in persistence via PostgreSQL, and human-in-the-loop interrupt capabilities — allowing agents to pause and request human approval before taking irreversible actions like sending emails or executing database writes.\n", "\n", "LangSmith, LangChain's observability and evaluation product, now has over 80,000 monthly active users across its free and paid tiers. The platform traces every LLM call, tool invocation, and retrieval step in a LangChain or LangGraph application, making it possible to diagnose exactly why an agent produced a wrong answer. A new Evaluation Datasets feature lets teams build ground-truth test suites and run automated regression tests against new model versions before promoting them to production.\n", "\n", "Partnership announcements accompanied the milestone. Oracle has certified LangChain and LangGraph for use with Oracle Database 23ai's vector search capabilities, enabling developers to use Oracle's HNSW vector indexes as a retrieval backend through a new OracleVectorStore integration. Anthropic has contributed an updated Claude integration that surfaces extended thinking traces in LangSmith, giving developers visibility into model reasoning chains. OpenAI has partnered on a reference architecture for building customer service agents using GPT-5 and LangGraph that is now available in the LangChain documentation.\n", "\n", "Harrison Chase, LangChain's CEO, said the company is on track to be cash-flow positive by Q3 2026, driven by LangSmith enterprise subscriptions and LangGraph Platform usage fees. The company last raised a $25 million Series A in 2023.\n", "\n", "Community statistics: 3,000 contributors across LangChain, LangGraph, and LangSmith repositories; 400+ third-party integration packages in the LangChain ecosystem; 12 million monthly downloads across all packages.\"\"\",\n", " metadata={\"source\": \"08_langchain_ecosystem_growth.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# McKinsey Global Survey: Enterprise AI Adoption Accelerates, But Governance Gaps Threaten ROI\n", "\n", "**New York, NY — March 7, 2026**\n", "\n", "Enterprise adoption of artificial intelligence has reached an inflection point, with 72% of large organizations now running at least one AI application in production — up from 55% just 18 months ago — according to McKinsey & Company's 2026 State of AI in the Enterprise report, released today. The survey, which polled 1,400 executives across 22 countries, paints a picture of rapid deployment alongside persistent challenges in data governance, ROI measurement, and workforce readiness.\n", "\n", "Retrieval-augmented generation (RAG) has emerged as the dominant pattern for enterprise AI deployment, cited by 68% of respondents as the architectural approach underlying their most-used AI applications. RAG's appeal lies in its ability to ground model outputs in proprietary data without the cost and complexity of full fine-tuning. However, McKinsey analysts note that poorly implemented RAG pipelines — particularly those with weak chunking strategies or missing hybrid retrieval — are a leading cause of the hallucination complaints that executives cite as their top barrier to expanding AI usage.\n", "\n", "AI agents are the next frontier. Forty-one percent of surveyed organizations are piloting autonomous agents that can take multi-step actions on behalf of users, and the report forecasts that this number will reach 70% by end of 2027. Healthcare and financial services lead in agent deployment, with clinical documentation automation and trade surveillance representing the two most mature agent use cases. In retail, inventory optimization agents that integrate with ERP systems are showing 15-20% reductions in stockout rates at early-adopter companies.\n", "\n", "Despite the enthusiasm, ROI tracking remains inconsistent. Only 38% of respondents said they have a formal methodology for measuring the financial return on AI investments, and among those that do, the median reported return is $1.80 for every dollar spent — a number analysts say understates true impact because most organizations are not capturing productivity improvements in knowledge worker workflows.\n", "\n", "Data governance is the single most frequently cited obstacle to scaling AI, ahead of cost, talent, and regulatory uncertainty. Organizations with mature data governance frameworks — defined as having unified data catalogs, enforced data lineage, and clear data ownership policies — report AI project success rates 2.3 times higher than those without such foundations.\n", "\n", "\"The companies winning with AI in 2026 are not the ones with the biggest model budgets,\" said Lareina Yee, a senior partner at McKinsey who led the research. \"They're the ones that treated data infrastructure as a prerequisite rather than an afterthought.\"\n", "\n", "The report highlights a skills gap as a secondary constraint: 61% of organizations say they lack sufficient prompt engineering and ML engineering talent to staff their AI initiatives, and the median time-to-hire for an AI engineer with RAG experience has grown to 4.2 months.\n", "\n", "McKinsey's report is available at mckinsey.com/ai-enterprise-2026.\"\"\",\n", " metadata={\"source\": \"09_enterprise_ai_adoption.md\"},\n", " ),\n", " Document(\n", " page_content=\"\"\"# Meta and Mistral Drop Landmark Open-Source Models, Reshaping the On-Premises AI Landscape\n", "\n", "**San Francisco, CA — March 11, 2026**\n", "\n", "Two of the most significant releases in open-source AI history landed within 48 hours of each other this week, as Meta unveiled Llama 4 and Mistral AI released Mistral Large 2 — both models that, by multiple benchmarks, rival or surpass proprietary offerings that cost orders of magnitude more to access via API.\n", "\n", "Meta's Llama 4 comes in three sizes: an 8-billion-parameter Scout variant, a 70-billion-parameter Maverick variant, and a 400-billion-parameter Behemoth variant. Maverick, the most practically deployable of the three, achieves scores on the MMLU, HumanEval, and MATH benchmarks that exceed GPT-4o and Claude 3.5 Sonnet, according to Meta's internal evaluations. Independent testing by Hugging Face's Open LLM Leaderboard team broadly confirms these results, placing Llama 4 Maverick at the top of the open-source rankings. Llama 4 is released under Meta's custom community license, which permits commercial use for deployments with fewer than 700 million monthly active users — a threshold that covers the vast majority of enterprise deployments.\n", "\n", "Mistral Large 2, released simultaneously via Mistral's La Plateforme API and as open weights on Hugging Face, is a 123-billion-parameter model that Mistral describes as its \"reasoning-first\" flagship. The model was trained with an extended chain-of-thought pre-training stage and shows particularly strong performance on legal document analysis, scientific literature summarization, and multilingual code generation. Mistral Large 2 is released under the MRL 2.0 license, which permits commercial deployment with revenue sharing above 10 million annual revenue — a first for the open-source AI space.\n", "\n", "Both models are available in ONNX format, enabling CPU-based inference on commodity hardware. Enterprises running air-gapped environments or facing data residency regulations that prohibit cloud API calls have been among the most eager early adopters. Oracle has announced that both Llama 4 Maverick and Mistral Large 2 will be available as managed model endpoints in Oracle Cloud Infrastructure Generative AI by end of Q2 2026, with on-premises deployment via Oracle Alloy for customers requiring sovereign AI infrastructure.\n", "\n", "Fine-tuning tooling is maturing rapidly to match the model releases. Hugging Face's TRL library has been updated with quantization-aware fine-tuning support for both models, allowing teams to adapt them to domain-specific tasks using as few as 500 labeled examples on a single A100 GPU. MLflow's model registry now natively tracks Llama and Mistral fine-tuning runs with full lineage back to base model weights.\n", "\n", "The simultaneous releases intensify pressure on proprietary model providers. \"Open weights are not a toy anymore,\" said Yann LeCun, Meta's chief AI scientist, in a post on X. \"The performance gap is closed. The only question now is whether enterprises choose the convenience of managed APIs or the control of self-hosted models.\"\n", "\n", "Community response has been extraordinary: Llama 4 Maverick recorded 500,000 downloads on Hugging Face within 24 hours of release, breaking the previous record set by Llama 3.\"\"\",\n", " metadata={\"source\": \"10_open_source_ai_models.md\"},\n", " ),\n", "]\n", "\n", "print(f\"Loaded {len(documents)} sample documents\\n\")\n", "for doc in documents:\n", " title = doc.page_content.split(\"\\n\")[0].replace(\"# \", \"\")\n", " print(f\" - {doc.metadata['source']}: {title[:80]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Ingest Documents into Oracle Vector Store\n", "\n", "`OracleVS.from_documents()` creates a table with four columns: `id` (primary key), `text` (CLOB for document content), `metadata` (JSON), and `embedding` (VECTOR for the 384-dimensional embedding). Vectors live in the same table as the source text — there is no separate store to keep in sync.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Drop existing table for a clean demo run\n", "cursor = conn.cursor()\n", "cursor.execute(\n", " \"SELECT COUNT(*) FROM user_tables WHERE table_name = :1\",\n", " [TABLE_NAME.upper()],\n", ")\n", "if cursor.fetchone()[0] > 0:\n", " print(f\"Table {TABLE_NAME} already exists. Dropping for clean run...\")\n", " cursor.execute(f\"DROP TABLE {TABLE_NAME} PURGE\")\n", " conn.commit()\n", "cursor.close()\n", "\n", "# Ingest: embed + store in one call\n", "print(f\"Ingesting {len(documents)} documents into Oracle Vector Store...\")\n", "vector_store = OracleVS.from_documents(\n", " documents,\n", " embeddings,\n", " client=conn,\n", " table_name=TABLE_NAME,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n", "conn.commit()\n", "\n", "# Verify\n", "cursor = conn.cursor()\n", "cursor.execute(f\"SELECT COUNT(*) FROM {TABLE_NAME}\")\n", "count = cursor.fetchone()[0]\n", "cursor.close()\n", "print(f\"Done! {count} document chunks stored in {TABLE_NAME}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Create HNSW Vector Index\n", "\n", "Oracle 26AI supports Hierarchical Navigable Small World (HNSW) indexes — the same approximate nearest-neighbor algorithm used by dedicated vector databases like Pinecone and Weaviate, but built directly into the SQL engine. The `accuracy` parameter controls the recall-speed trade-off: 95 means the index will find at least 95% of the true nearest neighbors.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain_oracledb.vectorstores.oraclevs import create_index\n", "\n", "create_index(\n", " conn,\n", " vector_store,\n", " params={\n", " \"idx_name\": \"TECH_NEWS_HNSW_IDX\",\n", " \"idx_type\": \"HNSW\",\n", " \"accuracy\": 95,\n", " \"parallel\": 4,\n", " },\n", ")\n", "print(\"HNSW vector index created successfully.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Demo: Vector Similarity Search\n", "\n", "Pure vector search finds semantically similar documents even when they share no words with the query. The query below uses conceptual language with no specific product names — the model maps both the query and the documents into the same embedding space and ranks by cosine distance.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query = \"companies building AI into their data infrastructure\"\n", "print(f'Query: \"{query}\"')\n", "print(\"(Conceptual language - no exact product names)\\n\")\n", "\n", "results = vector_store.similarity_search_with_score(query, k=3)\n", "for i, (doc, score) in enumerate(results, 1):\n", " similarity = 1 / (1 + score) if score >= 0 else 0\n", " source = doc.metadata.get(\"source\", \"unknown\")\n", " snippet = doc.page_content[:150].replace(\"\\n\", \" \")\n", " print(f\"[{i}] Score: {similarity:.4f} | Source: {source}\")\n", " print(f\" {snippet}...\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Demo: Keyword Search (Oracle Text)\n", "\n", "Keyword search uses Oracle Text's `CONTAINS` operator for full-text matching. This finds documents containing the exact term regardless of semantic context — useful for product names, company names, and other proper nouns that need precise matching. Oracle Text is built into the same database as the vector store, so there is no separate Elasticsearch cluster to maintain.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create Oracle Text index for keyword search\n", "cursor = conn.cursor()\n", "try:\n", " cursor.execute(f\"\"\"\n", " CREATE INDEX tech_news_text_idx\n", " ON {TABLE_NAME}(text)\n", " INDEXTYPE IS CTXSYS.CONTEXT\n", " PARAMETERS ('SYNC (ON COMMIT)')\n", " \"\"\")\n", " conn.commit()\n", " print(\"Created Oracle Text index for keyword search.\\n\")\n", "except oracledb.DatabaseError as e:\n", " if \"DRG-10700\" in str(e) or \"already exists\" in str(e).lower():\n", " print(\"Oracle Text index already exists.\\n\")\n", " else:\n", " print(f\"Note: {e}\\n\")\n", "\n", "# Search for a specific term\n", "keyword = \"LangChain\"\n", "print(f'Keyword: \"{keyword}\"')\n", "print(\"(Exact term match - finds documents mentioning this specific name)\\n\")\n", "\n", "try:\n", " cursor.execute(f\"\"\"\n", " SELECT text, metadata\n", " FROM {TABLE_NAME}\n", " WHERE CONTAINS(text, :kw, 1) > 0\n", " FETCH FIRST 3 ROWS ONLY\n", " \"\"\", {\"kw\": keyword})\n", "\n", " rows = cursor.fetchall()\n", " for i, row in enumerate(rows, 1):\n", " raw = row[0]\n", " text = (raw.read() if hasattr(raw, 'read') else str(raw))[:150] if raw else \"\"\n", " print(f\"[{i}] {text.replace(chr(10), ' ')}...\")\n", "except oracledb.DatabaseError as e:\n", " print(f\"Keyword search note: {e}\")\n", " print(\"(Oracle Text may need CTXSYS schema - falling back to LIKE)\")\n", " cursor.execute(f\"\"\"\n", " SELECT text, metadata\n", " FROM {TABLE_NAME}\n", " WHERE DBMS_LOB.INSTR(UPPER(text), :kw) > 0\n", " FETCH FIRST 3 ROWS ONLY\n", " \"\"\", {\"kw\": keyword.upper()})\n", " rows = cursor.fetchall()\n", " for i, row in enumerate(rows, 1):\n", " raw = row[0]\n", " text = (raw.read() if hasattr(raw, 'read') else str(raw))[:150] if raw else \"\"\n", " print(f\"[{i}] {text.replace(chr(10), ' ')}...\")\n", "\n", "cursor.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Demo: Hybrid Search (Vector + Keyword in One Query)\n", "\n", "This is the real power move. A single SQL query uses a CTE to first retrieve the top vector candidates with `VECTOR_DISTANCE()`, then filters those results by keyword match — all in one round trip to the database.\n", "\n", "With a standalone vector database you would need two systems (a vector DB and a search engine), two queries, and application-layer result merging. Here it is one SQL statement.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query = \"LangChain integrations with enterprise databases\"\n", "print(f'Query: \"{query}\"')\n", "print(\"(Combines semantic understanding with exact term matching)\\n\")\n", "\n", "# Generate query embedding locally\n", "query_embedding = embeddings.embed_query(query)\n", "query_array = array.array(\"f\", query_embedding)\n", "\n", "cursor = conn.cursor()\n", "try:\n", " cursor.execute(f\"\"\"\n", " WITH vec_candidates AS (\n", " SELECT id, text, metadata,\n", " 1 - VECTOR_DISTANCE(embedding, :q, COSINE) AS vec_score\n", " FROM {TABLE_NAME}\n", " ORDER BY vec_score DESC\n", " FETCH APPROX FIRST 20 ROWS ONLY\n", " )\n", " SELECT text, metadata, ROUND(vec_score, 4) AS score\n", " FROM vec_candidates\n", " WHERE DBMS_LOB.INSTR(UPPER(text), :kw) > 0\n", " ORDER BY vec_score DESC\n", " FETCH FIRST 3 ROWS ONLY\n", " \"\"\", {\"q\": query_array, \"kw\": query.split()[0].upper()})\n", "\n", " rows = cursor.fetchall()\n", " if rows:\n", " for i, row in enumerate(rows, 1):\n", " raw = row[0]\n", " text = (raw.read() if hasattr(raw, 'read') else str(raw))[:150] if raw else \"\"\n", " score = row[2]\n", " print(f\"[{i}] Hybrid Score: {score}\")\n", " print(f\" {text.replace(chr(10), ' ')}...\\n\")\n", " else:\n", " print(\"No hybrid results found.\")\n", "except oracledb.DatabaseError as e:\n", " print(f\"Hybrid search note: {e}\")\n", "\n", "cursor.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. Build a LangGraph ReAct Agent\n", "\n", "Now we tie it all together. The ReAct (Reasoning + Acting) agent uses GPT-4.1-mini as its reasoning engine and has access to a `search_tech_news` tool backed by Oracle Vector Store. The agent can make multiple tool calls per question, reasoning about what to search and how to synthesize the results.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@tool\n", "def search_tech_news(query: str) -> str:\n", " \"\"\"Search the tech news knowledge base for information about AI companies,\n", " products, partnerships, and technology trends. Use this tool whenever\n", " you need to find specific facts or details about technology topics.\"\"\"\n", " results = vector_store.similarity_search_with_score(query, k=4)\n", " if not results:\n", " return \"No relevant documents found.\"\n", "\n", " output_parts = []\n", " for doc, score in results:\n", " similarity = 1 / (1 + score) if score >= 0 else 0\n", " source = doc.metadata.get(\"source\", \"unknown\")\n", " output_parts.append(\n", " f\"[Source: {source} | Relevance: {similarity:.2f}]\\n{doc.page_content}\"\n", " )\n", " return \"\\n\\n---\\n\\n\".join(output_parts)\n", "\n", "\n", "llm = ChatOpenAI(model=\"gpt-4.1-mini\", temperature=0)\n", "\n", "agent = create_react_agent(\n", " llm,\n", " tools=[search_tech_news],\n", " prompt=(\n", " \"You are a helpful AI research analyst. Use the search_tech_news tool \"\n", " \"to find information and answer questions about technology companies, \"\n", " \"AI products, partnerships, and industry trends. Always cite your \"\n", " \"sources. Be concise and factual.\"\n", " ),\n", ")\n", "print(\"Agent built with gpt-4.1-mini + search_tech_news tool.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11. Demo: Run the Agent\n", "\n", "Ask questions and watch the full reasoning chain — tool calls, search queries, retrieved documents, and synthesized answers. The agent decides when to search, what to search for, and how to combine multiple search results into a coherent response.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "queries = [\n", " \"What are the major cloud providers doing with AI infrastructure?\",\n", " \"Tell me about LangChain's partnerships and ecosystem growth.\",\n", " \"Which companies are making AI agents a first-class feature?\",\n", "]\n", "\n", "for query in queries:\n", " print(f'\\n{\"=\" * 70}')\n", " print(f\"User: {query}\")\n", " print(f'{\"=\" * 70}')\n", "\n", " response = agent.invoke({\"messages\": [{\"role\": \"user\", \"content\": query}]})\n", "\n", " for msg in response[\"messages\"]:\n", " if msg.type == \"human\":\n", " continue # already printed above\n", " elif msg.type == \"ai\" and msg.tool_calls:\n", " for tc in msg.tool_calls:\n", " print(f\"\\nTool Call: {tc['name']}\")\n", " print(f\" Query: \\\"{tc['args'].get('query', '')}\\\"\")\n", " elif msg.type == \"tool\":\n", " # Show a compact preview of what came back\n", " lines = msg.content.split(\"\\n\")\n", " sources = [l for l in lines if l.startswith(\"[Source:\")]\n", " print(f\" -> {len(sources)} documents returned:\")\n", " for s in sources:\n", " print(f\" {s}\")\n", " elif msg.type == \"ai\":\n", " print(f\"\\nAgent Response:\\n{msg.content}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 12. Cleanup\n", "\n", "Close the database connection. The table and indexes are retained so you can continue exploring.\n", "\n", "To fully clean up, run:\n", "\n", "```sql\n", "DROP TABLE TECH_NEWS_COLLECTION PURGE;\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn.close()\n", "print(\"Connection closed. Table retained for further exploration.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What We Built\n", "\n", "A complete agentic RAG system where Oracle Database 26AI replaces an entire stack of specialized infrastructure:\n", "\n", "| Traditional RAG Stack | Oracle 26AI RAG Stack |\n", "|---|---|\n", "| PostgreSQL + pgvector (or Pinecone/Weaviate) for vectors | Oracle 26AI (vectors built in) |\n", "| Elasticsearch for keyword search | Oracle Text (built in) |\n", "| Sync pipeline to keep data consistent | Single table — always consistent |\n", "| Application-layer hybrid search logic | One SQL query with CTE |\n", "\n", "The key insight is that storing vectors in the same database as your source data eliminates an entire category of infrastructure complexity: no sync pipeline, no dual-write, no consistency lag between what is in your database and what is in your vector store.\n", "\n", "### Next Steps\n", "\n", "- Swap in your own documents by replacing the `documents` list in Cell 12\n", "- Try the `IVF` index type for larger datasets (better insert throughput than HNSW)\n", "- Add metadata filters to the vector search to narrow results by date, author, or category\n", "- Replace GPT-4.1-mini with a model hosted on OCI Generative AI for a fully Oracle-native stack\n", "- Extend the hybrid search CTE with BM25-style scoring using Oracle Text's `SCORE()` function\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.11" } }, "nbformat": 4, "nbformat_minor": 5 }