{ "cells": [ { "cell_type": "markdown", "id": "159c65b0", "metadata": {}, "source": [ "# Memory and Context Engineering for AI Agents with Oracle AI Database, Langchain and Tavily\n", "\n", "[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=flat-square&logo=googlecolab)](https://colab.research.google.com/github/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/memory_context_engineering_agents.ipynb)\n", "\n", "--------" ] }, { "cell_type": "markdown", "id": "5340125d", "metadata": {}, "source": [ "\n", "\n", "In this notebook, you'll learn how to engineer memory systems that give AI agents the ability to remember, learn, and adapt across conversations. \n", "Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory typesβ€”each serving a specific cognitive function." ] }, { "cell_type": "markdown", "id": "e56d6ba7", "metadata": {}, "source": [ "\n", "\n", "## What You'll Build\n", "\n", "| Memory Type | Purpose | Storage |\n", "|-------------|---------|---------|\n", "| **Conversational** | Chat history per thread | SQL Table |\n", "| **Knowledge Base** | Searchable documents & facts | Vector-Enabled SQL Table |\n", "| **Workflow** | Learned action patterns | Vector-Enabled SQL Table |\n", "| **Toolbox** | Dynamic tool definitions | Vector-Enabled SQL Table |\n", "| **Entity** | People, places, systems extracted from context | Vector-Enabled SQL Table |\n", "| **Summary** | Compressed context for long conversations | Vector-Enabled SQL Table |\n", "| **Tool Log** | Offloaded tool call outputs (experimental memory) | SQL Table |" ] }, { "cell_type": "markdown", "id": "e65027f9", "metadata": {}, "source": [ "\n", "## Key Concepts Covered\n", "\n", "- **Memory Engineering**: Design patterns for agent memory systems\n", "- **Context Engineering**: Techniques for optimizing what goes into the LLM context\n", "- **Context Window Management**: Monitor usage and compact context when needed\n", "- **Just-in-Time Retrieval**: Compact summaries with on-demand expansion\n", "- **Dynamic Tool Calling**: Semantic tool discovery and execution\n", "- **Entity Extraction**: LLM-powered entity recognition and storage\n" ] }, { "cell_type": "markdown", "id": "f8775610", "metadata": {}, "source": [ "\n", "## Prerequisites\n", "\n", "- Python 3.10+\n", "- Oracle AI Database (local Docker or cloud)\n", "- OpenAI API key\n", "- Tavily API key\n", "\n", "## By the End\n", "You'll have a reusable `MemoryManager` and a turn-level agent harness (`call_agent`) that demonstrates how modern AI agents maintain context, learn from interactions, and manage information across sessions." ] }, { "cell_type": "code", "execution_count": null, "id": "b243d181", "metadata": {}, "outputs": [], "source": [ "! pip install -qU langchain-oracledb sentence-transformers langchain-openai langchain tavily-python datasets" ] }, { "cell_type": "markdown", "id": "393bf345", "metadata": {}, "source": [ "# Part 1: Local Installation of Oracle AI Database via Docker [Memory Core]\n", "\n", "--------" ] }, { "cell_type": "markdown", "id": "792a6485", "metadata": {}, "source": [ "This section walks you through setting up **Oracle AI Database 26ai** locally using Docker. Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engineβ€”making it ideal for AI applications that need semantic search, embeddings storage, and vector similarity queries.\n", "\n", "**What you'll do:**\n", "1. Pull and run the Oracle Database Docker container\n", "2. Establish a connection from Python using `oracledb`\n", "3. Create a dedicated user for vector operations\n", "\n", "This local setup gives you a fully functional Oracle database for development and testing without needing cloud infrastructure." ] }, { "cell_type": "markdown", "id": "58676655", "metadata": {}, "source": [ "### Step 1: Installing Oracle AI Database via Docker" ] }, { "cell_type": "markdown", "id": "80c745a8", "metadata": {}, "source": [ "For this notebook we will be using a local installation of [Oracle AI Database](https://www.oracle.com/database/free/get-started/)" ] }, { "cell_type": "markdown", "id": "c92b14d9", "metadata": {}, "source": [ "1. Install & start Docker. Docker Desktop (Mac/Windows) or Docker Engine (Linux). Make sure it’s running.\n", " - If installed with Docker Enginer, run from terminal ```open /Applications/Docker.app```\n", "2. We are going to pull the [docker image](https://container-registry.oracle.com/ords/f?p=113:4:13936724845291:::4:P4_REPOSITORY,AI_REPOSITORY,AI_REPOSITORY_NAME,P4_REPOSITORY_NAME,P4_EULA_ID,P4_BUSINESS_AREA_ID:1863,1863,Oracle%20Database%20Free,Oracle%20Database%20Free,1,0&cs=3cVNH02fFYhB723ODpNnr0JZI1S7Z64nRyL_zC1Ls5BSVLafGsOLMFvFoPhn8JeeB8tXPhkfFKH8-dkrL_z3_0g)\n", "3. Run a container with oracle image\n", "\n", " ```\n", " docker run -d \\\n", " --name oracle-free \\\n", " -p 1521:1521 -p 5500:5500 \\\n", " -e ORACLE_PWD=OraclePwd_2025 \\\n", " -v $HOME/oracle/full_data:/opt/oracle/oradata \\\n", " container-registry.oracle.com/database/free:latest\n", "\n", " ```" ] }, { "cell_type": "markdown", "id": "d95d25a7", "metadata": {}, "source": [ "> 🚫 **Troubleshoot** \n", "> If you see the error: \n", "> *`docker: Error response from daemon: Conflict. The container name \"/oracle-full\" is already in use by container ... You have to remove (or rename) that container to be able to reuse that name.`* \n", ">\n", "> 🧩 **Fix:** \n", "> - Remove the existing container: \n", "> ```bash\n", "> docker rm oracle-free\n", "> ``` \n", "> - Then re-run your Docker command from **Step 3** to start a new container.\n" ] }, { "cell_type": "markdown", "id": "88463dc3", "metadata": {}, "source": [ "### Step 2: One-Click Database Setup\n", "\n", "The cell below handles **everything automatically**:\n", "- βœ… Checks if Docker is running\n", "- βœ… Checks if Oracle container exists and is healthy\n", "- βœ… Waits for database to be ready (with progress indicator)\n", "- βœ… Fixes the listener for ARM Macs (Apple Silicon)\n", "- βœ… Creates the VECTOR user with proper privileges\n", "- βœ… Tests the connection\n", "\n", "**Just run the cell below and wait for the βœ… success message!**\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fb7ae8e6", "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "import time\n", "import sys\n", "\n", "def setup_oracle_database(container_name=\"oracle-free\", vector_password=\"VectorPwd_2025\"):\n", " \"\"\"\n", " Complete Oracle Database setup - handles everything in one call.\n", " \n", " This function:\n", " 1. Checks Docker is running\n", " 2. Verifies container exists and is healthy\n", " 3. Waits for database to be ready\n", " 4. Fixes listener for ARM Macs\n", " 5. Creates VECTOR user\n", " 6. Tests connection\n", " \"\"\"\n", " print(\"=\" * 60)\n", " print(\"πŸš€ ORACLE DATABASE SETUP\")\n", " print(\"=\" * 60)\n", " \n", " # Step 1: Check Docker\n", " print(\"\\n[1/6] Checking Docker...\")\n", " try:\n", " result = subprocess.run(['docker', 'info'], capture_output=True, text=True, timeout=10)\n", " if result.returncode != 0:\n", " print(\" ❌ Docker is not running!\")\n", " print(\" πŸ’‘ Start Docker Desktop and try again.\")\n", " return False\n", " print(\" βœ… Docker is running\")\n", " except FileNotFoundError:\n", " print(\" ❌ Docker not found! Please install Docker.\")\n", " return False\n", " except subprocess.TimeoutExpired:\n", " print(\" ❌ Docker is not responding. Please restart Docker.\")\n", " return False\n", " \n", " # Step 2: Check container\n", " print(f\"\\n[2/6] Checking container '{container_name}'...\")\n", " result = subprocess.run(\n", " ['docker', 'ps', '-a', '--filter', f'name={container_name}', '--format', '{{.Status}}'],\n", " capture_output=True, text=True\n", " )\n", " status = result.stdout.strip()\n", " \n", " if not status:\n", " print(f\" ❌ Container '{container_name}' not found!\")\n", " print(\" πŸ’‘ Run the docker run command from the previous cell first.\")\n", " return False\n", " elif \"Up\" not in status:\n", " print(f\" ⚠️ Container exists but not running. Starting...\")\n", " subprocess.run(['docker', 'start', container_name], capture_output=True)\n", " time.sleep(5)\n", " \n", " print(f\" βœ… Container is running\")\n", " \n", " def probe_database_ready():\n", " \"\"\"True readiness check via SQL: CDB OPEN and FREEPDB1 READ WRITE.\"\"\"\n", " probe_sql = \"\"\"\n", "SET HEADING OFF FEEDBACK OFF PAGESIZE 0 VERIFY OFF ECHO OFF\n", "WHENEVER SQLERROR EXIT SQL.SQLCODE\n", "SELECT status || ':' || open_mode\n", "FROM v$instance\n", "CROSS JOIN (SELECT open_mode FROM v$pdbs WHERE name = 'FREEPDB1');\n", "EXIT;\n", "\"\"\"\n", " probe = subprocess.run(\n", " ['docker', 'exec', '-i', container_name, 'bash', '-c',\n", " 'export ORACLE_SID=FREE && sqlplus -s / as sysdba'],\n", " input=probe_sql,\n", " capture_output=True,\n", " text=True\n", " )\n", "\n", " stdout_lines = [line.strip() for line in probe.stdout.splitlines() if line.strip()]\n", " normalized = \" \".join(stdout_lines).upper()\n", " is_ready = probe.returncode == 0 and \"OPEN:READ WRITE\" in normalized\n", "\n", " if stdout_lines:\n", " details = \" | \".join(stdout_lines)\n", " else:\n", " details = probe.stderr.strip() or f\"sqlplus exited with code {probe.returncode}\"\n", "\n", " return is_ready, details\n", "\n", " # Step 3: Wait for database ready\n", " print(\"\\n[3/6] Waiting for database to be ready...\")\n", " print(\" (True check: probing instance state and FREEPDB1 open mode)\")\n", "\n", " max_wait = 300 # 5 minutes\n", " check_interval = 5\n", " elapsed = 0\n", " last_details = None\n", "\n", " while elapsed < max_wait:\n", " # Ensure container is still up while waiting\n", " status_result = subprocess.run(\n", " ['docker', 'ps', '--filter', f'name={container_name}', '--format', '{{.Status}}'],\n", " capture_output=True, text=True\n", " )\n", " running_status = status_result.stdout.strip()\n", " if \"up\" not in running_status.lower():\n", " print(f\"\\n ❌ Container stopped while waiting: {running_status or 'unknown status'}\")\n", " return False\n", "\n", " ready, details = probe_database_ready()\n", " if ready:\n", " print(\"\\n βœ… Database is ready (OPEN:READ WRITE)\")\n", " break\n", "\n", " # Show probe state when it changes\n", " if details != last_details:\n", " print(f\"\\n πŸ”Ž Probe status: {details}\")\n", " last_details = details\n", "\n", " dots = \".\" * ((elapsed // check_interval) % 4 + 1)\n", " print(f\"\\r ⏳ Waiting{dots.ljust(5)} ({elapsed}s elapsed)\", end=\"\", flush=True)\n", " time.sleep(check_interval)\n", " elapsed += check_interval\n", " else:\n", " print(f\"\\n ❌ Timeout waiting for database. Check 'docker exec -it {container_name} bash'\")\n", " return False\n", " \n", " # Step 4: Fix listener (for ARM Macs)\n", " print(\"\\n[4/6] Configuring listener...\")\n", " \n", " # Fix listener.ora\n", " subprocess.run(\n", " ['docker', 'exec', container_name, 'bash', '-c',\n", " \"sed -i 's/HOST = [^)]*)/HOST = 0.0.0.0)/g' /opt/oracle/product/26ai/dbhomeFree/network/admin/listener.ora\"],\n", " capture_output=True\n", " )\n", " \n", " # Restart listener\n", " subprocess.run(['docker', 'exec', container_name, 'lsnrctl', 'stop'], capture_output=True)\n", " start_result = subprocess.run(\n", " ['docker', 'exec', container_name, 'lsnrctl', 'start'],\n", " capture_output=True, text=True\n", " )\n", " \n", " if \"Listening on\" not in start_result.stdout:\n", " print(\" ❌ Failed to start listener\")\n", " return False\n", " \n", " # Register services\n", " subprocess.run(\n", " ['docker', 'exec', container_name, 'bash', '-c',\n", " \"export ORACLE_SID=FREE && sqlplus -s / as sysdba <<< 'ALTER SYSTEM REGISTER;'\"],\n", " capture_output=True\n", " )\n", " print(\" βœ… Listener configured and running\")\n", " \n", " # Step 5: Create VECTOR user\n", " print(\"\\n[5/6] Creating VECTOR user...\")\n", " \n", " create_user_sql = f'''\n", " DECLARE\n", " user_count NUMBER;\n", " BEGIN\n", " SELECT COUNT(*) INTO user_count FROM all_users WHERE username = 'VECTOR';\n", " IF user_count = 0 THEN\n", " EXECUTE IMMEDIATE 'CREATE USER VECTOR IDENTIFIED BY {vector_password}';\n", " EXECUTE IMMEDIATE 'GRANT CONNECT, RESOURCE, CREATE SESSION TO VECTOR';\n", " EXECUTE IMMEDIATE 'GRANT UNLIMITED TABLESPACE TO VECTOR';\n", " EXECUTE IMMEDIATE 'GRANT CREATE TABLE, CREATE SEQUENCE, CREATE VIEW TO VECTOR';\n", " DBMS_OUTPUT.PUT_LINE('CREATED');\n", " ELSE\n", " DBMS_OUTPUT.PUT_LINE('EXISTS');\n", " END IF;\n", " END;\n", " /\n", " '''\n", " \n", " result = subprocess.run(\n", " ['docker', 'exec', container_name, 'bash', '-c',\n", " f\"export ORACLE_SID=FREE && sqlplus -s / as sysdba <<< \\\"ALTER SESSION SET CONTAINER = FREEPDB1; {create_user_sql}\\\"\"],\n", " capture_output=True, text=True\n", " )\n", " \n", " if \"ORA-\" in result.stdout:\n", " print(f\" ⚠️ Warning: {result.stdout}\")\n", " else:\n", " print(\" βœ… VECTOR user ready\")\n", " \n", " # Step 6: Test connection\n", " print(\"\\n[6/6] Testing connection...\")\n", " try:\n", " import oracledb\n", " conn = oracledb.connect(\n", " user=\"VECTOR\",\n", " password=vector_password,\n", " dsn=\"127.0.0.1:1521/FREEPDB1\"\n", " )\n", " with conn.cursor() as cur:\n", " cur.execute(\"SELECT 1 FROM dual\")\n", " cur.fetchone()\n", " conn.close()\n", " print(\" βœ… Connection successful!\")\n", " except Exception as e:\n", " print(f\" ❌ Connection failed: {e}\")\n", " return False\n", " \n", " # Success!\n", " print(\"\\n\" + \"=\" * 60)\n", " print(\"πŸŽ‰ SETUP COMPLETE!\")\n", " print(\"=\" * 60)\n", " print(f\"\"\"\n", "You can now connect to Oracle:\n", " User: VECTOR\n", " Password: {vector_password}\n", " DSN: 127.0.0.1:1521/FREEPDB1\n", "\"\"\")\n", " return True\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bacd1640", "metadata": {}, "outputs": [], "source": [ "# Run this cell after starting your Docker container\n", "# It handles everything: waits for ready, fixes listener, creates user, tests connection\n", "setup_oracle_database()" ] }, { "cell_type": "markdown", "id": "63b1afd2", "metadata": {}, "source": [ "### Step 3: Connection Helper Function\n", "\n", "In the code below we have a reusable function that connects to Oracle Database with automatic retry logic and helpful error messages.\n", "\n", "**What it does:**\n", "1. Attempts to connect using the `oracledb` Python driver\n", "2. Retries up to 3 times if the connection fails (useful when the database is still starting)\n", "3. Prints the Oracle version banner on successful connection. This will also include the version you are running\n", "4. Provides troubleshooting hints for common connection errors\n" ] }, { "cell_type": "code", "execution_count": null, "id": "322bb9ef", "metadata": {}, "outputs": [], "source": [ "import oracledb\n", "import time\n", "\n", "def connect_to_oracle(max_retries=3, retry_delay=5, user=\"sys\", password=\"OraclePwd_2025\", dsn=\"127.0.0.1:1521/FREEPDB1\", program=\"langchain_oracledb_deep_research_demo\"):\n", " \"\"\"\n", " Connect to Oracle database with retry logic and better error handling.\n", " \n", " Args:\n", " max_retries: Maximum number of connection attempts\n", " retry_delay: Seconds to wait between retries\n", " \"\"\"\n", " \n", " for attempt in range(1, max_retries + 1):\n", " try:\n", " print(f\"Connection attempt {attempt}/{max_retries}...\")\n", " conn = oracledb.connect(\n", " user=user,\n", " password=password,\n", " dsn=dsn,\n", " program=program\n", " )\n", " print(\"βœ“ Connected successfully!\")\n", " \n", " # Test the connection\n", " with conn.cursor() as cur:\n", " cur.execute(\"SELECT banner FROM v$version WHERE banner LIKE 'Oracle%';\")\n", " banner = cur.fetchone()[0]\n", " # Banner should include the version you are running\n", " print(f\"\\n{banner}\")\n", " \n", " return conn\n", " \n", " except oracledb.OperationalError as e:\n", " error_msg = str(e)\n", " print(f\"βœ— Connection failed (attempt {attempt}/{max_retries})\")\n", " \n", " if \"DPY-4011\" in error_msg or \"Connection reset by peer\" in error_msg:\n", " print(\" β†’ This usually means:\")\n", " print(\" 1. Database is still starting up (wait 2-3 minutes)\")\n", " print(\" 2. Listener configuration issue\")\n", " print(\" 3. Container is not running\")\n", " \n", " if attempt < max_retries:\n", " print(f\"\\n Waiting {retry_delay} seconds before retry...\")\n", " time.sleep(retry_delay)\n", " else:\n", " print(\"\\n πŸ’‘ Try running: setup_oracle_database()\")\n", " print(\" This will fix the listener and verify the connection.\")\n", " raise\n", " else:\n", " raise\n", " except Exception as e:\n", " print(f\"βœ— Unexpected error: {e}\")\n", " raise\n", " \n", " raise ConnectionError(\"Failed to connect after all retries\")" ] }, { "cell_type": "markdown", "id": "1f8bacbe", "metadata": {}, "source": [ "Ensure you have your Docker Engine running before going through the next steps" ] }, { "cell_type": "markdown", "id": "c03562c6", "metadata": {}, "source": [ "Connect as the `VECTOR` user dedicated schema for storing embeddings and vector data.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f3aa1368", "metadata": {}, "outputs": [], "source": [ "vector_conn = connect_to_oracle(\n", " user=\"VECTOR\",\n", " password=\"VectorPwd_2025\",\n", " dsn=\"127.0.0.1:1521/FREEPDB1\",\n", " program=\"devrel.hub.memory_engineering\",\n", ")\n", "\n", "print(\"Using user:\", vector_conn.username)\n", "\n", "# One-time cleanup: remove prior user-created indexes on the demo vector table.\n", "# This prevents ORA-01408 when rerunning index creation with a new name.\n", "def one_time_cleanup_vector_demo_indexes(conn):\n", " dropped = []\n", " with conn.cursor() as cur:\n", " cur.execute(\"\"\"\n", " SELECT index_name\n", " FROM user_indexes\n", " WHERE table_name = 'VECTOR_SEARCH_DEMO'\n", " AND generated = 'N'\n", " \"\"\")\n", " indexes = [row[0] for row in cur.fetchall()]\n", "\n", " for idx in indexes:\n", " try:\n", " cur.execute(f'DROP INDEX \"{idx}\"')\n", " dropped.append(idx)\n", " except Exception as e:\n", " print(f\" ⚠️ Could not drop index {idx}: {e}\")\n", "\n", " conn.commit()\n", " if dropped:\n", " print(f\"🧹 One-time cleanup: dropped {len(dropped)} old index(es): {', '.join(dropped)}\")\n", " else:\n", " print(\"🧹 One-time cleanup: no existing user-created indexes on VECTOR_SEARCH_DEMO\")\n", "\n", "one_time_cleanup_vector_demo_indexes(vector_conn)\n" ] }, { "cell_type": "markdown", "id": "365bcafb", "metadata": {}, "source": [ "βœ… **Setup complete!** You now have Oracle AI Database running locally with an active connection.\n", "\n", "Next, we'll create vector-enabled SQL tables using **LangChain's OracleVS integration** to store embeddings and metadata for semantic search." ] }, { "cell_type": "markdown", "id": "47f8ccd5", "metadata": {}, "source": [ "# Part 2: Vector Search With Langchain and Oracle AI Database\n", "\n", "--------" ] }, { "cell_type": "markdown", "id": "69fa782a", "metadata": {}, "source": [ "This section demonstrates how to use **LangChain's OracleVS abstraction** over vector-enabled SQL tables to store and search documents using semantic similarity. \n", "\n", "Vector search enables finding documents based on meaning rather than exact keyword matches.\n", "\n", "## What You'll Learn\n", "\n", "| Step | Description |\n", "|------|-------------|\n", "| **1. Initialize Embeddings** | Load a HuggingFace embedding model to convert text into vectors |\n", "| **2. Create Vector-Enabled Table (OracleVS)** | Set up an Oracle-backed vector-enabled table with cosine distance |\n", "| **3. Create Index** | Build an HNSW (Hierarchical Navigable Small World) index for fast similarity search |\n", "| **4. Add Documents** | Store text with metadata in the vector database |\n", "| **5. Query** | Search for similar documents using natural language |\n", "| **6. Filter Results** | Use metadata filters to narrow down search results |\n", "\n", "## Key Components\n", "\n", "- **`OracleVS`**: LangChain abstraction over Oracle vector-enabled SQL tables\n", "- **`HuggingFaceEmbeddings`**: Converts text to 768-dimensional vectors\n", "- **`DistanceStrategy.COSINE`**: Measures vector similarity using cosine distance\n", "- **HNSW Index**: Graph-based ANN index for fast and accurate nearest-neighbor retrieval\n" ] }, { "cell_type": "markdown", "id": "c9e5cbeb", "metadata": {}, "source": [ "## Step 1: Creating Vector-Enabled Tables with LangChain OracleVS" ] }, { "cell_type": "code", "execution_count": null, "id": "340bffc4", "metadata": {}, "outputs": [], "source": [ "from langchain_oracledb.vectorstores import OracleVS\n", "from langchain_community.embeddings import HuggingFaceEmbeddings\n", "from langchain_oracledb.vectorstores.oraclevs import create_index\n", "from langchain_community.vectorstores.utils import DistanceStrategy\n", "\n", "# Initialize the embedding model\n", "embedding_model = HuggingFaceEmbeddings(\n", " model_name=\"sentence-transformers/paraphrase-mpnet-base-v2\"\n", ")\n", "\n", "# Initialize the OracleVS handle over a vector-enabled SQL table\n", "vector_store = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=\"VECTOR_SEARCH_DEMO\",\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "64e24aa3", "metadata": {}, "outputs": [], "source": [ "# Helper to safely create index (skips if already exists)\n", "def safe_create_index(conn, vs, idx_name):\n", " \"\"\"Create index, skipping if it already exists.\"\"\"\n", " try:\n", " create_index(\n", " client=conn,\n", " vector_store=vs,\n", " params={\"idx_name\": idx_name, \"idx_type\": \"HNSW\"}\n", " )\n", " print(f\" βœ… Created index: {idx_name}\")\n", " except Exception as e:\n", " if \"ORA-00955\" in str(e):\n", " print(f\" ⏭️ Index already exists: {idx_name} (skipped)\")\n", " else:\n", " raise\n" ] }, { "cell_type": "code", "execution_count": null, "id": "146cb8bc", "metadata": {}, "outputs": [], "source": [ "import logging\n", "\n", "# Suppress langchain_oracledb logging, remove this if you want to see the debug logs\n", "logging.getLogger(\"langchain_oracledb\").setLevel(logging.CRITICAL)\n", "\n", "# Create an HNSW index for fast similarity search\n", "safe_create_index(vector_conn, vector_store, \"oravs_hnsw\")\n" ] }, { "cell_type": "markdown", "id": "8cc1b4b9", "metadata": {}, "source": [ "## Step 2: Ingesting Research Paper Data\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4926e267", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "\n", "MAX_PAPERS = 1000\n", "ds_stream = load_dataset(\"nick007x/arxiv-papers\", split=\"train\", streaming=True)\n", "\n", "sampled_papers = []\n", "texts = []\n", "metadata = []\n", "\n", "for i, item in enumerate(ds_stream):\n", " if i >= MAX_PAPERS:\n", " break\n", "\n", " arxiv_id = item.get(\"arxiv_id\", f\"unknown_{i}\")\n", " title = (item.get(\"title\") or \"\").strip()\n", " abstract = (item.get(\"abstract\") or \"\").strip()\n", " primary_subject = (item.get(\"primary_subject\") or \"\").strip()\n", " authors = item.get(\"authors\") or []\n", "\n", " if isinstance(authors, str):\n", " authors_text = authors\n", " elif isinstance(authors, list):\n", " authors_text = \", \".join(str(a).strip() for a in authors if str(a).strip())\n", " else:\n", " authors_text = \"\"\n", "\n", " text = f\"Title: {title}\\nAbstract: {abstract}\"\n", "\n", " sampled_papers.append({\n", " \"arxiv_id\": arxiv_id,\n", " \"title\": title,\n", " \"abstract\": abstract,\n", " \"primary_subject\": primary_subject,\n", " \"authors\": authors_text,\n", " })\n", " texts.append(text)\n", " metadata.append({\n", " \"id\": arxiv_id,\n", " \"arxiv_id\": arxiv_id,\n", " \"title\": title,\n", " \"primary_subject\": primary_subject,\n", " \"authors\": authors_text,\n", " })\n", "\n", "vector_store.add_texts(\n", " texts=texts,\n", " metadatas=metadata,\n", ")\n", "\n", "print(f\"βœ… Ingested {len(texts)} research papers into VECTOR_SEARCH_DEMO\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3365f2db", "metadata": {}, "outputs": [], "source": [ "sample_primary_subject = sampled_papers[0][\"primary_subject\"] if sampled_papers else \"\"\n", "sample_arxiv_id = sampled_papers[0][\"arxiv_id\"] if sampled_papers else \"\"\n", "print(\"Sample primary subject:\", sample_primary_subject)\n", "print(\"Sample arxiv_id:\", sample_arxiv_id)\n" ] }, { "cell_type": "markdown", "id": "1061c994", "metadata": {}, "source": [ "## Step 3: Querying Vector-Enabled SQL Tables\n", "\n", "Search for documents similar to a natural language query. \n", "\n", "The OracleVS layer converts queries to embeddings and finds the closest matches.\n" ] }, { "cell_type": "markdown", "id": "78909ea2", "metadata": {}, "source": [ "Basic Search" ] }, { "cell_type": "code", "execution_count": null, "id": "3196205a", "metadata": {}, "outputs": [], "source": [ "query = \"Find research papers about planetary exploration mission planning.\"\n", "\n", "results = vector_store.similarity_search(query, k=3)\n", "\n", "for i, doc in enumerate(results, start=1):\n", " print(f\"--- Result {i} ---\")\n", " print(\"Text:\", doc.page_content)\n", " print(\"Metadata:\", doc.metadata)\n" ] }, { "cell_type": "markdown", "id": "99ece922", "metadata": {}, "source": [ "Search With Scores" ] }, { "cell_type": "code", "execution_count": null, "id": "56e523e0", "metadata": {}, "outputs": [], "source": [ "results = vector_store.similarity_search_with_score(query, k=3)\n", "\n", "for doc, score in results:\n", " print(\"Score:\", score)\n", " print(\"Text :\", doc.page_content)\n", " print(\"Meta :\", doc.metadata)\n", " print(\"------\")" ] }, { "cell_type": "markdown", "id": "b10609d0", "metadata": {}, "source": [ "Filter by exact match on a metadata field" ] }, { "cell_type": "code", "execution_count": null, "id": "081c7f74", "metadata": {}, "outputs": [], "source": [ "query = \"Find papers related to mission planning and observational astronomy.\"\n", "\n", "# This returns docs where metadata.primary_subject matches the sampled subject.\n", "docs = vector_store.similarity_search(\n", " query,\n", " k=3,\n", " filter={\"primary_subject\": {\"$eq\": sample_primary_subject}},\n", ")\n", "\n", "for doc in docs:\n", " print(\"Text:\", doc.page_content[:120], \"...\")\n", " print(\"Meta:\", doc.metadata)\n", " print(\"------\")\n" ] }, { "cell_type": "markdown", "id": "90290170", "metadata": {}, "source": [ "Filter by id list ($in)" ] }, { "cell_type": "code", "execution_count": null, "id": "e0d8de15", "metadata": {}, "outputs": [], "source": [ "docs = vector_store.similarity_search(\n", " query=\"Explain key themes in this research paper\",\n", " k=5,\n", " filter={\"id\": {\"$in\": [sample_arxiv_id]}},\n", ")\n", "\n", "print(docs)\n" ] }, { "cell_type": "markdown", "id": "82623c94", "metadata": {}, "source": [ "# Part 3: Memory Engineering and Agent Memory\n", "--------\n" ] }, { "cell_type": "markdown", "id": "7b3ff4cf", "metadata": {}, "source": [ "\n", "**`Agent Memory`** is the exocortex that augments an LLMβ€”capturing, encoding, storing, linking, and retrieving information beyond the model's parametric and contextual limits. \n", "It provides the persistence and structure required for long-horizon reasoning and reliable behaviour.\n", "\n", "**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right timeβ€”so agents can behave reliably, believably, and capabaly.\n", "\n", "This is the core section of the notebook where we build a complete **`Memory Manager`** for AI agents. \n", "\n", "Just like humans have different types of memory (short-term, long-term, procedural), AI agents benefit from specialized memory systems.\n", "\n", "## Why Memory Engineering Matters\n", "\n", "Without memory, agents:\n", "- Forget previous conversations\n", "- Can't learn from past interactions\n", "- Repeat the same mistakes\n", "- Lack context for complex tasks\n", "\n", "With proper memory engineering, agents can:\n", "- Maintain context across sessions\n", "- Learn and improve over time\n", "- Access relevant knowledge when needed\n", "- Execute complex multi-step workflows\n", "\n", "## Memory Types We'll Implement\n", "\n", "| Memory Type | Human Analogy | Purpose | Storage |\n", "|-------------|---------------|---------|---------|\n", "| **Conversational** | Short-term memory | Chat history per thread | SQL Table |\n", "| **Knowledge Base** | Long-term semantic memory | Facts, documents, search results | Vector-Enabled SQL Table |\n", "| **Workflow** | Procedural memory | Learned action patterns | Vector-Enabled SQL Table |\n", "| **Toolbox** | Skill memory | Available tools & capabilities | Vector-Enabled SQL Table |\n", "| **Entity** | Episodic memory | People, places, systems mentioned | Vector-Enabled SQL Table |\n", "| **Summary** | Compressed memory | Condensed context for long conversations | Vector-Enabled SQL Table |\n", "| **Tool Log** | Episodic memory | Raw tool call outputs offloaded from context | SQL Table |\n", "\n", "> **Note on Tool Log:** Tool Log is a form of episodic memory β€” it records *what happened* during each tool execution. Beyond keeping the context window lean, tool logs can serve as a source from which **procedural memories** (workflow patterns) and **semantic memories** (knowledge base entries) can be distilled over time.\n", "\n", "## Steps in This Section\n", "\n", "1. **Define table names** for each memory type\n", "2. **Create SQL table** for conversational history\n", "3. **Create SQL table** for tool call logs (experimental memory)\n", "4. **Create vector-enabled SQL tables** for semantic memories\n", "5. **Build indexes** for fast similarity search\n", "6. **Implement MemoryLayer class** with read/write methods for each memory type\n", "7. **Initialize the memory manager** with all storage backends" ] }, { "cell_type": "markdown", "id": "461b4fa8", "metadata": {}, "source": [ "## Step 1: Define Memory Tables and Stores\n", "First, we define table names for each memory type. \n", "\n", "These tables will be created in Oracle Database to persist agent memory." ] }, { "cell_type": "code", "execution_count": null, "id": "8e0ee2ee", "metadata": {}, "outputs": [], "source": [ "# Table names for each memory type\n", "CONVERSATIONAL_TABLE = \"CONVERSATIONAL_MEMORY\" # Episodic memory\n", "KNOWLEDGE_BASE_TABLE = \"SEMANTIC_MEMORY\" # Semantic memory\n", "WORKFLOW_TABLE = \"WORKFLOW_MEMORY\" # Procedural memory\n", "TOOLBOX_TABLE = \"TOOLBOX_MEMORY\" # Procedural memory\n", "ENTITY_TABLE = \"ENTITY_MEMORY\" # Semantic memory\n", "SUMMARY_TABLE = \"SUMMARY_MEMORY\" # Semanatic memory\n", "TOOL_LOG_TABLE = \"TOOL_LOG\" # Episodic\n", "\n", "ALL_TABLES = [CONVERSATIONAL_TABLE, KNOWLEDGE_BASE_TABLE, WORKFLOW_TABLE, TOOLBOX_TABLE, ENTITY_TABLE, SUMMARY_TABLE, TOOL_LOG_TABLE]\n", "\n", "# Drop existing tables to start fresh\n", "for table in ALL_TABLES:\n", " try:\n", " with vector_conn.cursor() as cur:\n", " cur.execute(f\"DROP TABLE {table} PURGE\")\n", " except Exception as e:\n", " if \"ORA-00942\" in str(e):\n", " print(f\" - {table} (not exists)\")\n", " else:\n", " print(f\" βœ— {table}: {e}\")\n", " \n", "vector_conn.commit()" ] }, { "cell_type": "code", "execution_count": null, "id": "29c9b27a", "metadata": {}, "outputs": [], "source": [ "# Model token limits (for context management)\n", "MODEL_TOKEN_LIMITS = {\n", " \"gpt-5\": 256000,\n", " \"gpt-5-mini\": 128000,\n", " \"gpt-4o\": 128000,\n", " \"gpt-5\": 128000,\n", " \"gpt-4-turbo\": 128000,\n", " \"gpt-4\": 8192,\n", " \"gpt-3.5-turbo\": 16385,\n", "}" ] }, { "cell_type": "markdown", "id": "ded028dc", "metadata": {}, "source": [ "### Step 1a: Create Conversational Memory Table\n", "\n", "This function below creates a SQL table to store chat history. \n", "\n", "Unlike semantic memories backed by vector-enabled SQL tables, conversational memory uses a traditional table because we need exact retrieval by thread ID (not similarity search).\n", "\n", "**What it does:**\n", "- Creates a table with columns: `id`, `thread_id`, `role`, `content`, `timestamp`, `metadata`\n", "- Adds an index on `thread_id` for fast conversation lookups\n", "- Adds an index on `timestamp` for chronological ordering\n" ] }, { "cell_type": "code", "execution_count": null, "id": "54032e53", "metadata": {}, "outputs": [], "source": [ "def create_conversational_history_table(conn, table_name: str = \"CONVERSATIONAL_MEMORY\"):\n", " \"\"\"\n", " Create a table to store conversational history.\n", "\n", " Args:\n", " conn: Oracle database connection\n", " table_name: Name of the table to create\n", " \"\"\"\n", " with conn.cursor() as cur:\n", " # Drop table if exists\n", " try:\n", " cur.execute(f\"DROP TABLE {table_name}\")\n", " except:\n", " pass # Table doesn't exist\n", " \n", " # Create table with proper schema\n", " cur.execute(f\"\"\"\n", " CREATE TABLE {table_name} (\n", " id VARCHAR2(100) DEFAULT SYS_GUID() PRIMARY KEY,\n", " thread_id VARCHAR2(100) NOT NULL,\n", " role VARCHAR2(50) NOT NULL,\n", " content CLOB NOT NULL,\n", " timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n", " metadata CLOB,\n", " created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n", " summary_id VARCHAR2(100) DEFAULT NULL\n", " )\n", " \"\"\")\n", " \n", " # Create index on thread_id for faster lookups\n", " cur.execute(f\"\"\"\n", " CREATE INDEX idx_{table_name.lower()}_thread_id ON {table_name}(thread_id)\n", " \"\"\")\n", " \n", " # Create index on timestamp for ordering\n", " cur.execute(f\"\"\"\n", " CREATE INDEX idx_{table_name.lower()}_timestamp ON {table_name}(timestamp)\n", " \"\"\")\n", " \n", " conn.commit()\n", " print(f\"Table {table_name} created successfully with indexes\")\n", " return table_name\n" ] }, { "cell_type": "code", "execution_count": null, "id": "24745bff", "metadata": {}, "outputs": [], "source": [ "# Create the table\n", "CONVERSATION_HISTORY_TABLE = create_conversational_history_table(vector_conn, CONVERSATIONAL_TABLE)" ] }, { "cell_type": "markdown", "id": "3nfqqg1la3m", "metadata": {}, "source": [ "### Step 1b: Create Tool Log Table (Experimental Memory)\n", "\n", "Tool call outputs during agent execution can **bloat the context window** quickly β€” a single web search might return thousands of tokens that are only needed once. \n", "\n", "The `TOOL_LOG` table acts as an **experimental memory**: full tool outputs are persisted to the database and replaced in the context window with a compact one-line reference. The agent can retrieve full outputs later if needed via `read_tool_log`.\n", "\n", "This is a form of **context offloading** β€” keeping the working memory lean while preserving full fidelity in durable storage." ] }, { "cell_type": "code", "execution_count": null, "id": "p6sgtbmd6jl", "metadata": {}, "outputs": [], "source": [ "def create_tool_log_table(conn, table_name: str = \"TOOL_LOG\"):\n", " \"\"\"Create a table to log tool call outputs (experimental memory).\"\"\"\n", " with conn.cursor() as cur:\n", " try:\n", " cur.execute(f\"DROP TABLE {table_name}\")\n", " except:\n", " pass\n", "\n", " cur.execute(f\"\"\"\n", " CREATE TABLE {table_name} (\n", " id VARCHAR2(100) DEFAULT SYS_GUID() PRIMARY KEY,\n", " thread_id VARCHAR2(100) NOT NULL,\n", " tool_call_id VARCHAR2(200) NOT NULL,\n", " tool_name VARCHAR2(200) NOT NULL,\n", " tool_args CLOB,\n", " tool_output CLOB,\n", " timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP\n", " )\n", " \"\"\")\n", " cur.execute(f\"CREATE INDEX idx_{table_name.lower()}_thread ON {table_name}(thread_id)\")\n", " conn.commit()\n", " print(f\"Table {table_name} created successfully\")\n", " return table_name\n", "\n", "TOOL_LOG_TABLE_NAME = create_tool_log_table(vector_conn, TOOL_LOG_TABLE)" ] }, { "cell_type": "markdown", "id": "4539b694", "metadata": {}, "source": [ "### Step 1c: Create Vector-Enabled Tables for Each Memory Type\n", "\n", "Here we create 5 separate OracleVS-backed vector-enabled SQL tablesβ€”one for each memory type. \n", "\n", "Each semantic memory is backed by its own Oracle table with a VECTOR column and uses the same embedding model for consistency.\n", "\n", "| Vector-Enabled Table Handle | Purpose |\n", "|--------------|---------|\n", "| `knowledge_base_vs` | Store documents, facts, and search results |\n", "| `workflow_vs` | Store learned action patterns and tool sequences |\n", "| `toolbox_vs` | Store tool definitions for semantic tool discovery |\n", "| `entity_vs` | Store extracted entities (people, places, systems) |\n", "| `summary_vs` | Store compressed summaries for long conversations |\n" ] }, { "cell_type": "code", "execution_count": null, "id": "01fffb00", "metadata": {}, "outputs": [], "source": [ "knowledge_base_vs = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=KNOWLEDGE_BASE_TABLE,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n", "\n", "workflow_vs = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=WORKFLOW_TABLE,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n", "\n", "toolbox_vs = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=TOOLBOX_TABLE,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n", "\n", "entity_vs = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=ENTITY_TABLE,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n", "\n", "summary_vs = OracleVS(\n", " client=vector_conn,\n", " embedding_function=embedding_model,\n", " table_name=SUMMARY_TABLE,\n", " distance_strategy=DistanceStrategy.COSINE,\n", ")\n" ] }, { "cell_type": "markdown", "id": "75548e90", "metadata": {}, "source": [ "Then we create indexes for each vector-enabled table" ] }, { "cell_type": "code", "execution_count": null, "id": "fe0100e3", "metadata": {}, "outputs": [], "source": [ "print(\"Creating vector indexes...\")\n", "safe_create_index(vector_conn, knowledge_base_vs, \"knowledge_base_vs_hnsw\")\n", "safe_create_index(vector_conn, workflow_vs, \"workflow_vs_hnsw\")\n", "safe_create_index(vector_conn, toolbox_vs, \"toolbox_vs_hnsw\")\n", "safe_create_index(vector_conn, entity_vs, \"entity_vs_hnsw\")\n", "safe_create_index(vector_conn, summary_vs, \"summary_vs_hnsw\")\n", "print(\"All indexes created!\")\n", "\n", "if \"sampled_papers\" in globals() and sampled_papers:\n", " kb_texts = [f\"Title: {p['title']}\\nAbstract: {p['abstract']}\" for p in sampled_papers]\n", " kb_meta = [\n", " {\n", " \"id\": p[\"arxiv_id\"],\n", " \"arxiv_id\": p[\"arxiv_id\"],\n", " \"title\": p[\"title\"],\n", " \"primary_subject\": p[\"primary_subject\"],\n", " \"authors\": p[\"authors\"],\n", " \"source_type\": \"arxiv_papers\",\n", " }\n", " for p in sampled_papers\n", " ]\n", " knowledge_base_vs.add_texts(kb_texts, kb_meta)\n", " print(f\"βœ… Seeded knowledge base memory with {len(kb_texts)} arXiv papers\")\n" ] }, { "cell_type": "markdown", "id": "929d5ff9", "metadata": {}, "source": [ "## Step 2: Programmatic vs Agent-Triggered Operations\n" ] }, { "cell_type": "markdown", "id": "fc5678a3", "metadata": {}, "source": [ "A key design decision in memory engineering is deciding which operations run **programmatically** (always executed by the harness) versus **agent-triggered** (the LLM chooses to invoke them during the loop).\n", "\n", "In this notebook, the harness is intentionally opinionated: memory loading and persistence are automatic, while external/expansion actions are chosen by the agent.\n", "\n", "| Operation | Programmatic | Agent-Triggered | Notes |\n", "|-----------|:------------:|:---------------:|-------|\n", "| `read_conversational_memory()` | βœ… | ❌ | Always loaded at loop start (unsummarized units only) |\n", "| `read_knowledge_base()` | βœ… | ❌ | Always loaded at loop start |\n", "| `read_workflow()` | βœ… | ❌ | Always loaded at loop start |\n", "| `read_entity()` | βœ… | ❌ | Always loaded at loop start |\n", "| `read_summary_context()` | βœ… | ❌ | Always loaded at loop start (IDs + descriptions) |\n", "| `read_toolbox()` | βœ… | ❌ | Tool schemas are retrieved before model reasoning |\n", "| `write_conversational_memory()` | βœ… | ❌ | User message (pre-loop) + assistant answer (post-loop) |\n", "| `write_workflow()` | βœ… | ❌ | Persisted after loop when tool steps exist |\n", "| `write_entity()` | βœ… | ❌ | Best-effort extraction around user/final assistant text |\n", "| `write_tool_log()` | βœ… | ❌ | Full tool output offloaded to DB after every tool execution |\n", "| Tool-call decision (`tool_choice=auto`) | ❌ | βœ… | Model decides whether to call tools |\n", "| `search_tavily()` | ❌ | βœ… | Agent-triggered external retrieval |\n", "| `expand_summary()` | ❌ | βœ… | Agent-triggered just-in-time summary expansion |\n", "| `summarize_and_store()` | ❌ | βœ… | Agent-triggered context compaction primitive |\n", "| `summarize_conversation()` | ❌ | βœ… | Agent-triggered conversation compaction for active thread |\n", "\n", "### What Is Programmatic in This Harness\n", "\n", "These operations are always executed by code, not delegated to the model:\n", "\n", "1. **Context assembly** at the start of `call_agent()`.\n", "2. **Tool schema retrieval** before each model call.\n", "3. **Memory persistence** around the loop (store user turn, store assistant turn, persist workflow/entity updates).\n", "4. **Tool execution dispatch** after a tool call is chosen (once selected by the model, execution is deterministic in code).\n", "5. **Tool output offloading** via `write_tool_log()` β€” full outputs are persisted to the database and replaced with compact references in the context window.\n", "\n", "### What Is Agent-Triggered in This Harness\n", "\n", "These operations are chosen by the model during the loop:\n", "\n", "1. **Whether** to call a tool at all.\n", "2. **Which** tool to call.\n", "3. **When** to trigger web search, summary expansion, or conversation compaction.\n", "4. **How** to sequence multiple tool calls before finalizing an answer.\n", "\n", "### Why This Split Works for Memory-Centric Agents\n", "\n", "1. **Reliability from programmatic memory** β€” critical memory load/save behavior never depends on the model remembering to do it.\n", "2. **Adaptivity from agent-triggered tools** β€” the model can selectively fetch/expand/compact only when needed.\n", "3. **Clear control boundaries** β€” the harness owns state integrity; the model owns strategy inside those boundaries." ] }, { "cell_type": "markdown", "id": "76f3f57d", "metadata": {}, "source": [ "## Step 3: Memory Manager Implementation" ] }, { "cell_type": "markdown", "id": "a85d87a8", "metadata": {}, "source": [ "The `MemoryManager` class is the central abstraction that unifies all memory operations. It provides a clean interface for reading and writing to different memory types, hiding the complexity of SQL queries and vector-enabled table operations.\n", "\n", "### What We're Building\n", "\n", "A single class that manages 6 types of memory with consistent read/write patterns:\n", "\n", "| Memory Type | Storage | Write Method | Read Method |\n", "|-------------|---------|--------------|-------------|\n", "| **Conversational** | SQL Table | `write_conversational_memory()` | `read_conversational_memory()` |\n", "| **Knowledge Base** | Vector-Enabled SQL Table | `write_knowledge_base()` | `read_knowledge_base()` |\n", "| **Workflow** | Vector-Enabled SQL Table | `write_workflow()` | `read_workflow()` |\n", "| **Toolbox** | Vector-Enabled SQL Table | `write_toolbox()` | `read_toolbox()` |\n", "| **Entity** | Vector-Enabled SQL Table | `write_entity()` | `read_entity()` |\n", "| **Summary** | Vector-Enabled SQL Table | `write_summary()` | `read_summary_memory()`, `read_summary_context()` |\n", "\n", "### Key Features\n", "\n", "- **Thread-based conversations** β€” Messages are organized by `thread_id` for multi-conversation support\n", "- **Semantic search** β€” Vector-enabled SQL tables enable finding relevant content by meaning, not just keywords\n", "- **Metadata filtering** β€” Workflows filter by `num_steps > 0`, summaries filter by `id`\n", "- **LLM-powered entity extraction** β€” Automatically extracts people, places, and systems from text\n", "- **Formatted context output** β€” Each read method returns formatted text ready for the LLM context\n", "\n", "### Alternative: Memory Manager Frameworks\n", "\n", "There are existing frameworks that abstract memory management for AI agents:\n", "\n", "| Framework | Description |\n", "|-----------|-------------|\n", "| **LangChain Memory** | Built-in memory classes (ConversationBufferMemory, VectorStoreRetrieverMemory) |\n", "| **Mem0** | Dedicated memory layer for AI agents with automatic memory management |\n", "| **LlamaIndex** | Document-based memory with various storage backends |\n", "| **Zep** | Long-term memory service for AI assistants |\n", "\n", "### Pros and Cons of Building Your Own\n", "\n", "| Approach | Pros | Cons |\n", "|----------|------|------|\n", "| **Custom (what we're doing)** | Full control, tailored to your needs, deeper understanding, no external dependencies | More code to maintain, need to handle edge cases yourself |\n", "| **Using a framework** | Faster to implement, battle-tested, community support, handles edge cases | Less control, may not fit your exact use case, additional dependency |" ] }, { "cell_type": "markdown", "id": "69bb7a2f", "metadata": {}, "source": [ "> **For learning purposes**, building your own memory manager (as we do here) gives you a deep understanding of how memory engineering works. \n", "> \n", "> **For production**, you might consider using or extending an existing framework. \n", ">\n", "> For example, this simple notebook only illustrates reads and writes, but not deletion and updates." ] }, { "cell_type": "code", "execution_count": null, "id": "27aacfa9", "metadata": {}, "outputs": [], "source": [ "import json as json_lib\n", "from datetime import datetime\n", "\n", "class MemoryManager:\n", " \"\"\"\n", " A simplified memory manager for AI agents using Oracle AI Database.\n", " \n", " Manages 6 types of memory:\n", " - Conversational: Chat history per thread (SQL table)\n", " - Knowledge Base: Searchable documents (vector-enabled SQL table)\n", " - Workflow: Execution patterns (vector-enabled SQL table)\n", " - Toolbox: Available tools (vector-enabled SQL table)\n", " - Entity: People, places, systems (vector-enabled SQL table)\n", " - Summary: Storing compressed context window\n", " - Tool Log: Experimental memory that offloads tool outputs from context window (SQL table)\n", " \"\"\"\n", " \n", " def __init__(self, conn, conversation_table: str, knowledge_base_vs, workflow_vs, toolbox_vs, entity_vs, summary_vs, tool_log_table: str = None):\n", " self.conn = conn\n", " self.conversation_table = conversation_table\n", " self.knowledge_base_vs = knowledge_base_vs\n", " self.workflow_vs = workflow_vs\n", " self.toolbox_vs = toolbox_vs\n", " self.entity_vs = entity_vs\n", " self.summary_vs = summary_vs\n", " self.tool_log_table = tool_log_table\n", " \n", " # ==================== CONVERSATIONAL MEMORY (SQL) ====================\n", " \n", " def write_conversational_memory(self, content: str, role: str, thread_id: str) -> str:\n", " \"\"\"Store a message in conversation history.\"\"\"\n", " thread_id = str(thread_id)\n", " with self.conn.cursor() as cur:\n", " id_var = cur.var(str)\n", " cur.execute(f\"\"\"\n", " INSERT INTO {self.conversation_table} (thread_id, role, content, metadata, timestamp)\n", " VALUES (:thread_id, :role, :content, :metadata, CURRENT_TIMESTAMP)\n", " RETURNING id INTO :id\n", " \"\"\", {\"thread_id\": thread_id, \"role\": role, \"content\": content, \"metadata\": \"{}\", \"id\": id_var})\n", " record_id = id_var.getvalue()[0] if id_var.getvalue() else None\n", " self.conn.commit()\n", " return record_id\n", " \n", " def get_unsummarized_messages(self, thread_id: str, limit: int = 100) -> list[dict]:\n", " \"\"\"Return unsummarized conversation units for a thread.\"\"\"\n", " thread_id = str(thread_id)\n", " with self.conn.cursor() as cur:\n", " cur.execute(f\"\"\"\n", " SELECT id, role, content, timestamp\n", " FROM {self.conversation_table}\n", " WHERE thread_id = :thread_id AND summary_id IS NULL\n", " ORDER BY timestamp ASC\n", " FETCH FIRST :limit ROWS ONLY\n", " \"\"\", {\"thread_id\": thread_id, \"limit\": limit})\n", " rows = cur.fetchall()\n", "\n", " return [\n", " {\"id\": rid, \"role\": role, \"content\": content, \"timestamp\": ts}\n", " for rid, role, content, ts in rows\n", " ]\n", "\n", " def read_conversational_memory(self, thread_id: str, limit: int = 10) -> str:\n", " \"\"\"Read unsummarized conversation history for a thread.\n", " \n", " NOTE: Only returns messages where summary_id IS NULL. Once messages are\n", " summarized via summarize_conversation(), they are excluded here and\n", " replaced by a compact summary reference in Summary Memory.\n", " \"\"\"\n", " messages = self.get_unsummarized_messages(thread_id, limit=limit)\n", " lines = [f\"[{m['timestamp'].strftime('%H:%M:%S')}] [{m['role']}] {m['content']}\" for m in messages]\n", " messages_formatted = '\\n'.join(lines)\n", " return f\"\"\"## Conversation Memory (Thread: {thread_id})\n", "### Purpose: Recent dialogue turns that have NOT yet been summarized.\n", "### When to use: Refer to this for the user's latest questions, your prior answers, and any\n", "### commitments or follow-ups from the current conversation. If context grows too long,\n", "### call summarize_conversation(thread_id) to compact older turns into Summary Memory.\n", "\n", "{messages_formatted}\"\"\"\n", "\n", " def mark_as_summarized(self, thread_id: str, summary_id: str, message_ids: list[str] | None = None):\n", " \"\"\"Mark conversation units as summarized.\"\"\"\n", " thread_id = str(thread_id)\n", " with self.conn.cursor() as cur:\n", " if message_ids:\n", " cur.executemany(\n", " f\"\"\"\n", " UPDATE {self.conversation_table}\n", " SET summary_id = :summary_id\n", " WHERE thread_id = :thread_id AND id = :id AND summary_id IS NULL\n", " \"\"\",\n", " [{\"summary_id\": summary_id, \"thread_id\": thread_id, \"id\": mid} for mid in message_ids],\n", " )\n", " count = len(message_ids)\n", " else:\n", " cur.execute(f\"\"\"\n", " UPDATE {self.conversation_table}\n", " SET summary_id = :summary_id\n", " WHERE thread_id = :thread_id AND summary_id IS NULL\n", " \"\"\", {\"summary_id\": summary_id, \"thread_id\": thread_id})\n", " count = cur.rowcount\n", " self.conn.commit()\n", " print(f\" πŸ“¦ Marked {count} messages as summarized (summary_id: {summary_id})\")\n", "\n", " # ==================== KNOWLEDGE BASE (Vector-Enabled SQL Table) ====================\n", " \n", " def write_knowledge_base(self, text: str, metadata: dict):\n", " \"\"\"Store text in knowledge base with metadata.\"\"\"\n", " self.knowledge_base_vs.add_texts([text], [metadata])\n", " \n", " def read_knowledge_base(self, query: str, k: int = 3) -> str:\n", " \"\"\"Search knowledge base for relevant content.\"\"\"\n", " results = self.knowledge_base_vs.similarity_search(query, k=k)\n", " content = \"\\n\".join([doc.page_content for doc in results])\n", " return f\"\"\"## Knowledge Base Memory\n", "### Purpose: Factual documents, research papers, and web search results stored for long-term reference.\n", "### When to use: Treat this as your primary source of evidence. Cite specific facts, titles, or\n", "### findings from here before resorting to external search. If the knowledge base lacks what you\n", "### need, use search_tavily() to fetch new information (which will be stored here automatically).\n", "\n", "{content}\"\"\"\n", " \n", " \n", " # ==================== WORKFLOW (Vector-Enabled SQL Table) ====================\n", " \n", " def write_workflow(self, query: str, steps: list, final_answer: str, success: bool = True):\n", " \"\"\"Store a completed workflow pattern for future reference.\"\"\"\n", " # Format steps as text\n", " steps_text = \"\\n\".join([f\"Step {i+1}: {s}\" for i, s in enumerate(steps)])\n", " text = f\"Query: {query}\\nSteps:\\n{steps_text}\\nAnswer: {final_answer[:200]}\"\n", " \n", " metadata = {\n", " \"query\": query,\n", " \"success\": success,\n", " \"num_steps\": len(steps),\n", " \"timestamp\": datetime.now().isoformat()\n", " }\n", " self.workflow_vs.add_texts([text], [metadata])\n", " \n", " def read_workflow(self, query: str, k: int = 3) -> str:\n", " \"\"\"Search for similar past workflows with at least 1 step.\"\"\"\n", " # Filter to only include workflows that have steps (num_steps > 0)\n", " results = self.workflow_vs.similarity_search(\n", " query, \n", " k=k, \n", " filter={\"num_steps\": {\"$gt\": 0}}\n", " )\n", " if not results:\n", " return \"## Workflow Memory\\nNo relevant workflows found.\"\n", " content = \"\\n---\\n\".join([doc.page_content for doc in results])\n", " return f\"\"\"## Workflow Memory\n", "### Purpose: Step-by-step records of how similar past queries were resolved (tool calls and outcomes).\n", "### When to use: Before planning a multi-step action, check if a similar workflow already succeeded.\n", "### Reuse proven tool sequences instead of re-discovering them. Skip workflows marked as failed.\n", "\n", "{content}\"\"\"\n", " \n", " # ==================== TOOLBOX (Vector-Enabled SQL Table) ====================\n", " \n", " def write_toolbox(self, text: str, metadata: dict):\n", " \"\"\"Store a tool definition in the toolbox.\"\"\"\n", " self.toolbox_vs.add_texts([text], [metadata])\n", " \n", " def read_toolbox(self, query: str, k: int = 3) -> list[dict]:\n", " \"\"\"Find relevant tools and return OpenAI-compatible schemas.\"\"\"\n", " results = self.toolbox_vs.similarity_search(query, k=k)\n", " tools = []\n", " for doc in results:\n", " meta = doc.metadata\n", " # Extract parameters from metadata and convert to OpenAI format\n", " stored_params = meta.get(\"parameters\", {})\n", " properties = {}\n", " required = []\n", " \n", " for param_name, param_info in stored_params.items():\n", " # Convert stored param info to OpenAI schema format\n", " param_type = param_info.get(\"type\", \"string\")\n", " # Map Python types to JSON schema types\n", " type_mapping = {\n", " \"\": \"string\",\n", " \"\": \"integer\", \n", " \"\": \"number\",\n", " \"\": \"boolean\",\n", " \"str\": \"string\",\n", " \"int\": \"integer\",\n", " \"float\": \"number\",\n", " \"bool\": \"boolean\"\n", " }\n", " json_type = type_mapping.get(param_type, \"string\")\n", " properties[param_name] = {\"type\": json_type}\n", " \n", " # If no default, it's required\n", " if \"default\" not in param_info:\n", " required.append(param_name)\n", " \n", " tools.append({\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": meta.get(\"name\", \"tool\"),\n", " \"description\": meta.get(\"description\", \"\"),\n", " \"parameters\": {\"type\": \"object\", \"properties\": properties, \"required\": required}\n", " }\n", " })\n", " return tools\n", "\n", " # ==================== ENTITY (Vector-Enabled SQL Table) ====================\n", " \n", " def extract_entities(self, text: str, llm_client) -> list[dict]:\n", " \"\"\"Use LLM to extract entities (people, places, systems) from text.\"\"\"\n", " if not text or len(text.strip()) < 5:\n", " return []\n", " \n", " prompt = f'''Extract entities from: \"{text[:500]}\"\n", "Return JSON: [{{\"name\": \"X\", \"type\": \"PERSON|PLACE|SYSTEM\", \"description\": \"brief\"}}]\n", "If none: []'''\n", "\n", " try:\n", " response = llm_client.chat.completions.create(\n", " model=\"gpt-5\",\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " max_completion_tokens=300\n", " )\n", " result = response.choices[0].message.content.strip()\n", " \n", " # Extract JSON array from response\n", " start, end = result.find(\"[\"), result.rfind(\"]\")\n", " if start == -1 or end == -1:\n", " return []\n", " \n", " parsed = json_lib.loads(result[start:end+1])\n", " return [{\"name\": e[\"name\"], \"type\": e.get(\"type\", \"UNKNOWN\"), \"description\": e.get(\"description\", \"\")} \n", " for e in parsed if isinstance(e, dict) and e.get(\"name\")]\n", " except:\n", " return []\n", " \n", " def write_entity(self, name: str, entity_type: str, description: str, llm_client=None, text: str = None):\n", " \"\"\"Store an entity OR extract and store entities from text.\"\"\"\n", " if text and llm_client:\n", " # Extract and store entities from text\n", " entities = self.extract_entities(text, llm_client)\n", " for e in entities:\n", " self.entity_vs.add_texts(\n", " [f\"{e['name']} ({e['type']}): {e['description']}\"],\n", " [{\"name\": e['name'], \"type\": e['type'], \"description\": e['description']}]\n", " )\n", " return entities\n", " else:\n", " # Store single entity directly\n", " self.entity_vs.add_texts(\n", " [f\"{name} ({entity_type}): {description}\"],\n", " [{\"name\": name, \"type\": entity_type, \"description\": description}]\n", " )\n", " \n", " def read_entity(self, query: str, k: int = 5) -> str:\n", " \"\"\"Search for relevant entities.\"\"\"\n", " results = self.entity_vs.similarity_search(query, k=k)\n", " if not results:\n", " return \"## Entity Memory\\nNo entities found.\"\n", " \n", " entities = [f\"β€’ {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}\" \n", " for doc in results if hasattr(doc, 'metadata')]\n", " entities_formatted = '\\n'.join(entities)\n", " return f\"\"\"## Entity Memory\n", "### Purpose: Named entities (people, places, systems, paper titles) extracted from conversations.\n", "### When to use: Use these to resolve references like \"that author\" or \"the system we discussed\".\n", "### Entity memory provides continuity across turns β€” ground your answers in known entities\n", "### rather than guessing or re-asking the user for names and details already mentioned.\n", "\n", "{entities_formatted}\"\"\"\n", " \n", " # ==================== SUMMARY (Vector-Enabled SQL Table) ====================\n", " \n", " def write_summary(self, summary_id: str, full_content: str, summary: str, description: str):\n", " \"\"\"Store a summary with its original content.\"\"\"\n", " self.summary_vs.add_texts(\n", " [f\"{summary_id}: {description}\"],\n", " [{\"id\": summary_id, \"full_content\": full_content, \"summary\": summary, \"description\": description}]\n", " )\n", " return summary_id\n", " \n", " def read_summary_memory(self, summary_id: str) -> str:\n", " \"\"\"Retrieve a specific summary by ID (just-in-time retrieval).\"\"\"\n", " results = self.summary_vs.similarity_search(\n", " summary_id, \n", " k=5, \n", " filter={\"id\": summary_id}\n", " )\n", " if not results:\n", " return f\"Summary {summary_id} not found.\"\n", " doc = results[0]\n", " return doc.metadata.get('summary', 'No summary content.')\n", " \n", " def read_summary_context(self, query: str = \"\", k: int = 10) -> str:\n", " \"\"\"Get available summaries for context window (IDs + descriptions only).\"\"\"\n", " results = self.summary_vs.similarity_search(query or \"summary\", k=k)\n", " if not results:\n", " return \"## Summary Memory\\nNo summaries available.\"\n", " \n", " lines = [\n", " \"## Summary Memory\",\n", " \"### Purpose: Compressed snapshots of older conversations and context windows.\",\n", " \"### When to use: These are lightweight pointers. If a summary looks relevant,\",\n", " \"### call expand_summary(summary_id) to retrieve the full content just-in-time.\",\n", " \"### Do NOT expand all summaries β€” only expand when you need specific details.\",\n", " \"\"\n", " ]\n", " for doc in results:\n", " sid = doc.metadata.get('id', '?')\n", " desc = doc.metadata.get('description', 'No description')\n", " lines.append(f\" β€’ [ID: {sid}] {desc}\")\n", " return \"\\n\".join(lines)\n", " \n", " # ==================== TOOL LOG (SQL - Experimental Memory) ====================\n", " \n", " def write_tool_log(self, thread_id: str, tool_call_id: str, tool_name: str, tool_args: str, tool_output: str) -> str:\n", " \"\"\"Log a tool call output to the database and return a compact reference.\"\"\"\n", " if not self.tool_log_table:\n", " return tool_output # Fallback: return full output if no table configured\n", " \n", " with self.conn.cursor() as cur:\n", " id_var = cur.var(str)\n", " cur.execute(f\"\"\"\n", " INSERT INTO {self.tool_log_table} (thread_id, tool_call_id, tool_name, tool_args, tool_output)\n", " VALUES (:thread_id, :tool_call_id, :tool_name, :tool_args, :tool_output)\n", " RETURNING id INTO :id\n", " \"\"\", {\n", " \"thread_id\": str(thread_id), \"tool_call_id\": tool_call_id,\n", " \"tool_name\": tool_name, \"tool_args\": tool_args,\n", " \"tool_output\": tool_output, \"id\": id_var\n", " })\n", " log_id = id_var.getvalue()[0] if id_var.getvalue() else None\n", " self.conn.commit()\n", " \n", " # Return a compact reference instead of the full output\n", " preview = tool_output[:150].replace(\"\\n\", \" \")\n", " return f\"[Tool Log {log_id}] {tool_name} executed. Preview: {preview}...\"\n", " \n", " def read_tool_log(self, thread_id: str, limit: int = 20) -> list[dict]:\n", " \"\"\"Read tool call logs for a thread.\"\"\"\n", " if not self.tool_log_table:\n", " return []\n", " with self.conn.cursor() as cur:\n", " cur.execute(f\"\"\"\n", " SELECT id, tool_call_id, tool_name, tool_args, tool_output, timestamp\n", " FROM {self.tool_log_table}\n", " WHERE thread_id = :thread_id\n", " ORDER BY timestamp DESC\n", " FETCH FIRST :limit ROWS ONLY\n", " \"\"\", {\"thread_id\": str(thread_id), \"limit\": limit})\n", " rows = cur.fetchall()\n", " return [\n", " {\"id\": r[0], \"tool_call_id\": r[1], \"tool_name\": r[2],\n", " \"tool_args\": r[3], \"tool_output\": r[4], \"timestamp\": r[5]}\n", " for r in rows\n", " ]\n", "\n", " # ==================== SUMMARY EXPANSION HELPERS ====================\n", "\n", " def get_messages_by_summary_id(self, summary_id: str) -> list[dict]:\n", " \"\"\"Retrieve original conversation messages that were compacted into a given summary.\"\"\"\n", " with self.conn.cursor() as cur:\n", " cur.execute(f\"\"\"\n", " SELECT id, role, content, timestamp\n", " FROM {self.conversation_table}\n", " WHERE summary_id = :summary_id\n", " ORDER BY timestamp ASC\n", " \"\"\", {\"summary_id\": summary_id})\n", " rows = cur.fetchall()\n", " return [\n", " {\"id\": rid, \"role\": role, \"content\": content, \"timestamp\": ts}\n", " for rid, role, content, ts in rows\n", " ]" ] }, { "cell_type": "code", "execution_count": null, "id": "b775acc2", "metadata": {}, "outputs": [], "source": [ "# Initialize the MemoryLayer instance\n", "# Note: Uses SQL table for conversational memory, vector-enabled SQL tables for others\n", "memory_manager = MemoryManager(\n", " conn=vector_conn,\n", " conversation_table=CONVERSATION_HISTORY_TABLE, \n", " knowledge_base_vs=knowledge_base_vs,\n", " workflow_vs=workflow_vs,\n", " toolbox_vs=toolbox_vs,\n", " entity_vs=entity_vs,\n", " summary_vs=summary_vs,\n", " tool_log_table=TOOL_LOG_TABLE_NAME # Experimental memory: offloads tool outputs from context\n", ")" ] }, { "cell_type": "markdown", "id": "a1879e58", "metadata": {}, "source": [ "## Step 4: Creating the Agent's Toolbox" ] }, { "cell_type": "markdown", "id": "8c0b03d9", "metadata": {}, "source": [ "### The Scalability Problem with Tools\n", "\n", "As your AI system grows, you might have **hundreds of tools** availableβ€”APIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:\n", "\n", "| Problem | Impact |\n", "|---------|--------|\n", "| **Context bloat** | Tool definitions consume tokens, leaving less room for actual content |\n", "| **Tool selection failure** | LLMs struggle to choose the right tool when presented with too many options |\n", "| **Increased latency** | More tokens = slower inference |\n", "| **Higher costs** | More tokens = higher API costs |\n", "\n", "Model providers like OpenAI and Anthropic typically recommend limiting the number of tools exposed to an LLM (often 10-20 max for reliable selection).\n", "\n", "### The Solution: Semantic Tool Retrieval\n", "\n", "The `Toolbox` class solves this by treating tools as a **searchable memory**:\n", "\n", "1. **Register hundreds of tools** β€” Store all available tools with their descriptions and embeddings\n", "2. **Retrieve only relevant tools** β€” At inference time, use vector search to find tools semantically relevant to the current query\n", "3. **Pass a focused toolset** β€” Only the retrieved tools (typically 3-5) are passed to the LLM\n", "\n", "This approach means your system can **scale to hundreds of tools** while the LLM only sees the most relevant ones for each query.\n", "\n", "### How the Code Works\n", "\n", "The `Toolbox` class uses **docstrings as the retrieval key**:\n", "\n", "```\n", "User Query β†’ Embed Query β†’ Vector Search β†’ Find tools with similar docstrings β†’ Return relevant tools\n", "```\n", "\n", "| Component | Purpose |\n", "|-----------|---------|\n", "| `get_embedding()` | Converts tool description to a vector |\n", "| `ToolMetadata` | Pydantic model storing tool name, description, signature, parameters |\n", "| `_augment_docstring()` | Uses LLM to improve the docstring for better retrieval |\n", "| `_generate_queries()` | Creates synthetic queries that would trigger this tool |\n", "| `register_tool()` | Decorator that stores tool with its embedding in the toolbox |\n", "\n", "When you call `memory_manager.read_toolbox(query)`, it performs a similarity search to find tools whose docstrings are semantically similar to the query.\n", "\n", "### The Intersection of Three Engineering Disciplines\n", "\n", "This implementation combines techniques from **memory engineering**, **context engineering**, and **prompt engineering**:\n", "\n", "| Discipline | Technique Used | How It Helps |\n", "|------------|----------------|--------------|\n", "| **Memory Engineering** | Toolbox as procedural memory | Tools are stored and retrieved like learned skills |\n", "| **Memory Engineering** | Docstring augmentation | LLM improves docstrings for better semantic retrieval |\n", "| **Memory Engineering** | Synthetic query generation | Creates example queries to improve tool discoverability |\n", "| **Context Engineering** | Selective tool retrieval | Only relevant tools enter the context, reducing bloat |\n", "| **Context Engineering** | Context offloading | Tool results can be summarized to save context space |\n", "| **Prompt Engineering** | Role setting | \"You are a technical writer\" improves docstring quality |\n", "\n", "### Key Insight\n", "\n", "The `augment=True` flag in `@toolbox.register_tool(augment=True)` triggers:\n", "1. **Docstring augmentation** β€” LLM rewrites the docstring to be clearer and more searchable\n", "2. **Synthetic query generation** β€” LLM generates example queries that would need this tool\n", "3. **Rich embedding** β€” Combines name + augmented docstring + signature + queries for better retrieval\n", "\n", "This means a simple one-line docstring like `\"Search the web\"` becomes a rich, detailed description that's much more likely to be retrieved when the user asks something like `\"What's the latest news about AI?\"`" ] }, { "cell_type": "code", "execution_count": null, "id": "8d418dd1", "metadata": {}, "outputs": [], "source": [ "import inspect\n", "import uuid\n", "from typing import Callable, Optional, Union\n", "from pydantic import BaseModel\n", "\n", "def get_embedding(text: str) -> list[float]:\n", " \"\"\"\n", " Get the embedding for a text using the configured embedding model.\n", " \"\"\"\n", " return embedding_model.embed_query(text)\n", "\n", "\n", "class ToolMetadata(BaseModel):\n", " \"\"\"Metadata for a registered tool.\"\"\"\n", " name: str\n", " description: str\n", " signature: str\n", " parameters: dict\n", " return_type: str\n", "\n", "\n", "class Toolbox:\n", " \"\"\"\n", " A toolbox for registering, storing, and retrieving tools with LLM-powered augmentation.\n", " \n", " Tools are stored with embeddings for semantic retrieval, allowing the agent to\n", " find relevant tools based on natural language queries.\n", " \"\"\"\n", " \n", " def __init__(self, memory_manager, llm_client, model: str = \"gpt-5\"):\n", " \"\"\"\n", " Initialize the Toolbox.\n", " \n", " Args:\n", " memory_manager: MemoryManager instance for storing tools\n", " llm_client: OpenAI client for LLM augmentation\n", " model: Model to use for augmentation (default: gpt-5)\n", " \"\"\"\n", " self.memory_manager = memory_manager\n", " self.llm_client = llm_client\n", " self.model = model\n", " self._tools: dict[str, Callable] = {} # Maps tool_id -> callable\n", " self._tools_by_name: dict[str, Callable] = {} # Maps function_name -> callable for execution\n", " \n", " def _augment_docstring(self, docstring: str) -> str:\n", " \"\"\"\n", " Use LLM to improve and expand a tool's docstring.\n", " \n", " Takes a basic docstring and returns an enhanced version with:\n", " - Clearer description of what the tool does\n", " - Better formatted parameters and return values\n", " - Usage examples and edge cases\n", " \n", " Args:\n", " docstring: The original docstring to augment\n", " \n", " Returns:\n", " An improved, more detailed docstring\n", " \"\"\"\n", " if not docstring.strip():\n", " return \"No description provided.\"\n", "\n", "\n", " # NOTE: The role description of a technical writer below is a prompt engineering technique that is used to improve the quality of the docstring\n", " # Athough there are research that suggest that role description doesn't realy affect the quality of the LLM's output, it is still a useful technique\n", " #Β and it is a good [prompt engineering] technique to know.\n", " prompt = f\"\"\"You are a technical writer. Improve the following function docstring to be more clear, \n", " comprehensive, and useful. Include:\n", " 1. A clear concise summary\n", " 2. Detailed description of what the function does\n", " 3. When to use this function\n", " 4. Any important notes or caveats\n", "\n", " Original docstring:\n", " {docstring}\n", "\n", " Return ONLY the improved docstring, no other text.\n", " \"\"\"\n", "\n", " response = self.llm_client.chat.completions.create(\n", " model=self.model,\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " max_completion_tokens=500\n", " )\n", " \n", " return response.choices[0].message.content.strip()\n", " \n", " def _generate_queries(self, docstring: str, num_queries: int = 5) -> list[str]:\n", " \"\"\"\n", " Generate synthetic example queries that would lead to using this tool.\n", " \n", " These queries are used to improve retrieval - by embedding both the tool\n", " description AND example queries, we increase the chances of finding the\n", " right tool when the user asks a related question.\n", " \n", " Args:\n", " docstring: The tool's docstring (ideally augmented)\n", " num_queries: Number of example queries to generate\n", " \n", " Returns:\n", " List of example natural language queries\n", " \"\"\"\n", " prompt = f\"\"\"Based on the following tool description, generate {num_queries} diverse example queries \n", " that a user might ask when they need this tool. Make them natural and varied.\n", "\n", " Tool description:\n", " {docstring}\n", "\n", " Return ONLY a JSON array of strings, like: [\"query1\", \"query2\", ...]\n", " \"\"\"\n", "\n", " response = self.llm_client.chat.completions.create(\n", " model=self.model,\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " max_completion_tokens=300\n", " )\n", " \n", " try:\n", " import json\n", " queries = json.loads(response.choices[0].message.content.strip())\n", " return queries if isinstance(queries, list) else []\n", " except json.JSONDecodeError:\n", " # Fallback: extract queries from text\n", " return [response.choices[0].message.content.strip()]\n", " \n", " def _get_tool_metadata(self, func: Callable) -> ToolMetadata:\n", " \"\"\"\n", " Extract metadata from a function for storage and retrieval.\n", " \n", " Args:\n", " func: The function to extract metadata from\n", " \n", " Returns:\n", " ToolMetadata object with function details\n", " \"\"\"\n", " sig = inspect.signature(func)\n", " \n", " # Extract parameter info\n", " parameters = {}\n", " for name, param in sig.parameters.items():\n", " param_info = {\"name\": name}\n", " if param.annotation != inspect.Parameter.empty:\n", " param_info[\"type\"] = str(param.annotation)\n", " if param.default != inspect.Parameter.empty:\n", " param_info[\"default\"] = str(param.default)\n", " parameters[name] = param_info\n", " \n", " # Extract return type\n", " return_type = \"Any\"\n", " if sig.return_annotation != inspect.Signature.empty:\n", " return_type = str(sig.return_annotation)\n", " \n", " return ToolMetadata(\n", " name=func.__name__,\n", " description=func.__doc__ or \"No description\",\n", " signature=str(sig),\n", " parameters=parameters,\n", " return_type=return_type\n", " )\n", " \n", " def register_tool(\n", " self, func: Optional[Callable] = None, augment: bool = False\n", " ) -> Union[str, Callable]:\n", " \"\"\"\n", " Register a function as a tool in the toolbox.\n", "\n", " Can be used as a decorator or called directly:\n", " \n", " @toolbox.register_tool\n", " def my_tool(): ...\n", " \n", " @toolbox.register_tool(augment=True)\n", " def my_enhanced_tool(): ...\n", " \n", " tool_id = toolbox.register_tool(some_function)\n", "\n", " Parameters:\n", " -----------\n", " func : Callable, optional\n", " The function to register as a tool. If None, returns a decorator.\n", " augment : bool, optional\n", " Whether to augment the tool docstring and generate synthetic queries\n", " using the configured LLM provider.\n", " \n", " Returns:\n", " --------\n", " Union[str, Callable]\n", " If func is provided, returns the tool ID. Otherwise returns a decorator.\n", " \"\"\"\n", "\n", " def decorator(f: Callable) -> str:\n", " docstring = f.__doc__ or \"\"\n", " signature = str(inspect.signature(f))\n", " object_id = uuid.uuid4()\n", " object_id_str = str(object_id)\n", "\n", " # NOTE: Augmentation is a technique that is used to improve the quality of the tool's docstring\n", " #Β by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique\n", " if augment:\n", " # Use LLM to enhance the tool's discoverability\n", " augmented_docstring = self._augment_docstring(docstring)\n", " queries = self._generate_queries(augmented_docstring)\n", " \n", " # Create rich embedding text combining all information\n", " embedding_text = f\"{f.__name__} {augmented_docstring} {signature} {' '.join(queries)}\"\n", " embedding = get_embedding(embedding_text)\n", " \n", " tool_data = self._get_tool_metadata(f)\n", " tool_data.description = augmented_docstring # Use augmented description\n", "\n", " tool_dict = {\n", " \"_id\": object_id_str, # Use string, not UUID object\n", " \"embedding\": embedding,\n", " \"queries\": queries,\n", " \"augmented\": True,\n", " **tool_data.model_dump(),\n", " }\n", " else:\n", " # Basic registration without augmentation\n", " embedding = get_embedding(f\"{f.__name__} {docstring} {signature}\")\n", " tool_data = self._get_tool_metadata(f)\n", "\n", " tool_dict = {\n", " \"_id\": object_id_str, # Use string, not UUID object\n", " \"embedding\": embedding,\n", " \"augmented\": False,\n", " **tool_data.model_dump(),\n", " }\n", "\n", " # Store the tool in the toolbox memory for retrieval\n", " # The embedding enables semantic search to find relevant tools\n", " self.memory_manager.write_toolbox(\n", " f\"{f.__name__} {docstring} {signature}\", \n", " tool_dict\n", " )\n", " \n", " # Keep reference to the callable for execution\n", " self._tools[object_id_str] = f\n", " self._tools_by_name[f.__name__] = f # Also store by name for easy lookup\n", " return object_id_str\n", "\n", " if func is None:\n", " return decorator\n", " return decorator(func)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b6de020f", "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "# Function to securely get and set environment variables\n", "def set_env_securely(var_name, prompt):\n", " value = getpass.getpass(prompt)\n", " os.environ[var_name] = value\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7eb6bacd", "metadata": {}, "outputs": [], "source": [ "set_env_securely(\"OPENAI_API_KEY\", \"OpenAI API Key: \")" ] }, { "cell_type": "code", "execution_count": null, "id": "fb27c840", "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "\n", "client = OpenAI()\n", "\n", "# Initialize the Toolbox\n", "toolbox = Toolbox(memory_manager=memory_manager, llm_client=client)" ] }, { "cell_type": "markdown", "id": "305bc1bd", "metadata": {}, "source": [ "# Part 4: Context Engineering Techniques\n", "\n", "--------\n" ] }, { "cell_type": "markdown", "id": "2e4ac84b", "metadata": {}, "source": [ "> **Context engineering** refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.\n", "> \n", "> β€” *Anthropic*\n", "\n", "While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *how to manage what's in the context window right now*. This includes monitoring usage, compressing information, and providing just-in-time access to details.\n", "\n", "## What This Section Covers\n", "\n", "| Step | Function | Purpose |\n", "|------|----------|---------|\n", "| **1. Calculate Usage** | `calculate_context_usage()` | Monitor what % of the context window is used |\n", "| **2. Summarize** | `summarise_context_window()` | Compress long content into summaries using LLM |\n", "| **3. Compact** | `summarize_conversation()` / `summarize_and_store()` | Agent-triggered compaction when context gets long |\n", "| **4. Just-in-Time Retrieval** | `expand_summary()` tool | Let agent expand summaries on demand |\n", "\n", "**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent’s current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.\n", "\n", "## The Context Management Flow\n", "\n", "```\n", "Context built β†’ Check usage % β†’ Agent may compact (summarize) β†’ Store summary with ID\n", " ↓\n", "Agent sees: [Summary ID: abc123] Brief description ← Agent can call expand_summary(\"abc123\") if needed\n", "```\n", "\n", "This approach keeps the context lean while giving the agent access to full details when required." ] }, { "cell_type": "code", "execution_count": null, "id": "d39643d7", "metadata": {}, "outputs": [], "source": [ "# Context window calculator - returns percentage used\n", "def calculate_context_usage(context: str, model: str = \"gpt-5\") -> dict:\n", " \"\"\"Calculate context window usage as percentage.\"\"\"\n", " estimated_tokens = len(context) // 4 # ~4 chars per token\n", " max_tokens = MODEL_TOKEN_LIMITS.get(model, 128000)\n", " percentage = (estimated_tokens / max_tokens) * 100\n", " return {\"tokens\": estimated_tokens, \"max\": max_tokens, \"percent\": round(percentage, 1)}\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ea6db760", "metadata": {}, "outputs": [], "source": [ "# Context summariser - calls LLM and stores summary\n", "import uuid\n", "\n", "def summarise_context_window(content: str, memory_manager, llm_client, model: str = \"gpt-5\") -> dict:\n", " \"\"\"Summarise context window using LLM and store in summary memory.\"\"\"\n", " summary_prompt = f\"\"\"\n", "You are compressing an AI agent context window for later retrieval.\n", "The content may include conversation memory, retrieved papers, entities, workflows, and prior summaries.\n", "\n", "Produce a compact summary that preserves:\n", "- user goal and constraints\n", "- key facts/findings already established\n", "- important entities (paper titles, arXiv IDs, authors)\n", "- unresolved questions and next actions\n", "\n", "Output 4-7 short bullet points.\n", "Be faithful to the source, and do not add new facts.\n", "\n", "Context window content:\n", "{content[:3000]}\n", "\"\"\".strip()\n", "\n", " response = llm_client.chat.completions.create(\n", " model=model,\n", " messages=[{\"role\": \"user\", \"content\": summary_prompt}],\n", " max_completion_tokens=220\n", " )\n", " summary = response.choices[0].message.content\n", "\n", " desc_response = llm_client.chat.completions.create(\n", " model=model,\n", " messages=[{\"role\": \"user\", \"content\": f\"Write a short label (max 12 words) for this summary:\\n{summary}\"}],\n", " max_completion_tokens=40\n", " )\n", " description = desc_response.choices[0].message.content.strip()\n", "\n", " summary_id = str(uuid.uuid4())[:8]\n", " memory_manager.write_summary(summary_id, content, summary, description)\n", "\n", " return {\"id\": summary_id, \"description\": description, \"summary\": summary}\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b1a7538a", "metadata": {}, "outputs": [], "source": [ "# Context offloader - replaces content with summary reference\n", "def offload_to_summary(context: str, memory_manager, llm_client, threshold_percent: float = 80.0) -> tuple:\n", " \"\"\"If context exceeds threshold, summarise and return compacted version.\"\"\"\n", " usage = calculate_context_usage(context)\n", " \n", " if usage['percent'] < threshold_percent:\n", " return context, [] # No offload needed\n", " \n", " # Summarise the context\n", " result = summarise_context_window(context, memory_manager, llm_client)\n", " \n", " # Return compact reference instead of full content\n", " compact = f\"[Summary ID: {result['id']}] {result['description']}\"\n", " return compact, [result]\n" ] }, { "cell_type": "markdown", "id": "77c1efd3", "metadata": {}, "source": [ "### Step 1: Summary Tools & Conversation Compaction\n", "\n", "Below we register the `expand_summary` and `summarize_and_store` functions as tools the agent can call.\n", "\n", "#### Design Logic: Why Mark Instead of Delete?\n", "\n", "When conversation history grows large, we need to reduce context window usage. We had two choices:\n", "\n", "| Approach | Pros | Cons |\n", "|----------|------|------|\n", "| **Delete summarized messages** | Simple, immediate space savings | Permanent data loss, can't audit or recover |\n", "| **Mark as summarized (our choice)** | Preserves history, reversible, auditable | Slightly more complex queries |\n", "\n", "**Our intuition:** Memory should be *compressed*, or *forgotten* not *erased*. By marking messages with a `summary_id` instead of deleting them:\n", "\n", "1. **Full history is preserved** β€” Original messages remain in the database for auditing, debugging, or reprocessing\n", "2. **Linkage is maintained** β€” Each summary knows which messages it represents (via `summary_id`)\n", "3. **Reversible** β€” If a summary is deleted, you could \"unsummarize\" by clearing the `summary_id`\n", "\n", "#### The Flow\n", "\n", "```\n", "Thread has 50 messages β†’ Context too large β†’ summarize_conversation(thread_id)\n", " ↓\n", " 1. Read unsummarized messages\n", " 2. LLM summarizes them\n", " 3. Store summary with unique ID\n", " 4. UPDATE messages SET summary_id = 'abc123'\n", " ↓\n", " Next read: Only new messages appear + Summary ID reference\n", "```\n", "\n", "This is a form of **log compaction** β€” a pattern borrowed from databases and message queues where old entries are compressed but not lost." ] }, { "cell_type": "code", "execution_count": null, "id": "54eb2bb6", "metadata": {}, "outputs": [], "source": [ "# Summary tools for the agent\n", "@toolbox.register_tool(augment=True)\n", "def expand_summary(summary_id: str) -> str:\n", " \"\"\"Expand a summary reference to full content, including the original conversation\n", " messages that were compacted into it. Use when you need more details from a [Summary ID: xxx] reference.\"\"\"\n", " summary_text = memory_manager.read_summary_memory(summary_id)\n", " \n", " # Also retrieve the original conversation messages linked to this summary\n", " original_msgs = memory_manager.get_messages_by_summary_id(summary_id)\n", " if original_msgs:\n", " lines = [f\"[{m['role']}] {m['content']}\" for m in original_msgs]\n", " return f\"Summary:\\n{summary_text}\\n\\nOriginal messages ({len(original_msgs)}):\\n\" + \"\\n\".join(lines)\n", " return summary_text\n", "\n", "@toolbox.register_tool(augment=True)\n", "def summarize_and_store(text: str) -> str:\n", " \"\"\"Summarize a long text block and store it. Returns [Summary ID: ...] for later expansion.\"\"\"\n", " result = summarise_context_window(text, memory_manager, client)\n", " return f\"Stored as [Summary ID: {result['id']}] {result['description']}\"\n", "\n", "@toolbox.register_tool(augment=True)\n", "def summarize_conversation(thread_id: str) -> str:\n", " \"\"\"\n", " Summarize unsummarized conversation units for a thread and mark those units with summary_id.\n", " Use this when conversation memory becomes long and you need context compaction.\n", " \"\"\"\n", " unsummarized = memory_manager.get_unsummarized_messages(thread_id, limit=200)\n", " if not unsummarized:\n", " return \"No unsummarized conversation units found.\"\n", "\n", " full_text = \"\\n\".join([f\"[{m['role']}] {m['content']}\" for m in unsummarized])\n", " result = summarise_context_window(full_text, memory_manager, client)\n", "\n", " message_ids = [m[\"id\"] for m in unsummarized]\n", " memory_manager.mark_as_summarized(thread_id, result['id'], message_ids=message_ids)\n", "\n", " return f\"Conversation summarized as [Summary ID: {result['id']}] {result['description']}\"" ] }, { "cell_type": "markdown", "id": "d05404f5", "metadata": {}, "source": [ "# Part 5: Web Access with Tavily\n", "\n", "--------" ] }, { "cell_type": "markdown", "id": "a0cce642", "metadata": {}, "source": [ "This section demonstrates how to create an **agentic tool** that the LLM can call to search the web. \n", "\n", "We use [Tavily](https://tavily.com/), an AI-optimized search API designed for LLM applications.\n", "\n", "## What This Section Does\n", "\n", "1. **Initialize the Tavily client** β€” Set up the search API with an API key\n", "2. **Register `search_tavily` as a tool** β€” Use `@toolbox.register_tool(augment=True)` to make it discoverable\n", "3. **Implement the search-and-store pattern** β€” Results are automatically written to knowledge base memory\n", "4. **Test tool retrieval** β€” Verify the tool can be found via semantic search\n", "\n", "## The Search-and-Store Pattern\n", "\n", "One thing to note is that not only do we get external context that is not available to the Agent at execution, but we persists this to the knowledge base memory and the Agent can reuse this information in subsequent iteration.\n", "When the agent calls `search_tavily()`, it doesn't just return resultsβ€”it **persists them to the knowledge base**:\n", "\n", "```\n", "Agent calls search_tavily(\"latest AI news\")\n", " ↓\n", "Tavily API returns results\n", " ↓\n", "Each result is written to knowledge_base_vs with metadata (title, URL, timestamp)\n", " ↓\n", "Future queries can retrieve this information without searching again\n", "```\n", "\n", "This pattern means the agent **learns** from its searches. Information discovered once becomes part of the agent's long-term memory, available for future conversations without additional API calls." ] }, { "cell_type": "code", "execution_count": null, "id": "500d7836", "metadata": {}, "outputs": [], "source": [ "set_env_securely(\"TAVILY_API_KEY\", \"Tavily API Key: \")" ] }, { "cell_type": "code", "execution_count": null, "id": "6772460e", "metadata": {}, "outputs": [], "source": [ "from tavily import TavilyClient\n", "from datetime import datetime\n", "\n", "# Don't forget to set your API key!\n", "tavily_client = TavilyClient(api_key=os.environ[\"TAVILY_API_KEY\"])\n", "\n", "@toolbox.register_tool(augment=True)\n", "def search_tavily(query: str, max_results: int = 5):\n", " \"\"\"\n", " Use this function to search the web and store the results in the knowledge base.\n", " \"\"\"\n", " response = tavily_client.search(query=query, max_results=max_results)\n", " results = response.get(\"results\", [])\n", "\n", " # Write each result to the knowledge base\n", " for result in results:\n", " # Create the text content to embed\n", " text = f\"Title: {result.get('title', '')}\\nContent: {result.get('content', '')}\\nURL: {result.get('url', '')}\"\n", " \n", " # Create metadata\n", " metadata = {\n", " \"title\": result.get(\"title\", \"\"),\n", " \"url\": result.get(\"url\", \"\"),\n", " \"score\": result.get(\"score\", 0),\n", " \"source_type\": \"tavily_search\",\n", " \"query\": query,\n", " \"timestamp\": datetime.now().isoformat()\n", " }\n", " \n", " # Write to knowledge base\n", " memory_manager.write_knowledge_base(text, metadata)\n", "\n", " return results" ] }, { "cell_type": "code", "execution_count": null, "id": "7c7f8535", "metadata": {}, "outputs": [], "source": [ "import pprint\n", "retreived_tools = memory_manager.read_toolbox(\"Search the internet\")\n", "pprint.pprint(retreived_tools)" ] }, { "cell_type": "markdown", "id": "9e993b26", "metadata": {}, "source": [ "# Part 6: Agent Execution\n", "\n", "--------\n" ] }, { "cell_type": "markdown", "id": "7e94e836", "metadata": {}, "source": [ "This is where everything comes together. We build a complete **turn-level agent harness** that integrates all the memory types, context engineering, and tool calling we've implemented.\n", "\n", "## What This Section Contains\n", "\n", "| Component | Purpose |\n", "|-----------|---------|\n", "| `AGENT_SYSTEM_PROMPT` | Instructions telling the LLM how to use memory and tools |\n", "| `execute_tool()` | Looks up and executes tools from the toolbox by name |\n", "| `call_openai_chat()` | Wrapper for OpenAI Chat Completions API with tool support |\n", "| `call_agent()` | Turn-level harness for one agent run (build context, run tool-call loop, persist results) |\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bc1e5e6e", "metadata": {}, "outputs": [], "source": [ "import json as json_lib\n", "\n", "client = OpenAI()\n", "\n", "# Persistent context-window tracker β€” survives across call_agent() invocations\n", "context_size_history = [] # list of (run_label, iteration, estimated_tokens)\n", "\n", "# ==================== SYSTEM PROMPT ====================\n", "AGENT_SYSTEM_PROMPT = \"\"\"\n", "# System Instructions\n", "You are a Research Paper Assistant with access to memory and tools.\n", "\n", "IMPORTANT: The user's input contains CONTEXT retrieved from multiple memory systems.\n", "Each memory section has a Purpose and When-to-use guide β€” follow them.\n", "\n", "## Memory Priority Order\n", "1. **Conversation Memory** β€” check what the user already asked and what you already answered.\n", "2. **Knowledge Base Memory** β€” cite facts from stored papers/documents before searching externally.\n", "3. **Entity Memory** β€” resolve named references (\"that author\", \"the system\") from here.\n", "4. **Workflow Memory** β€” reuse proven tool sequences for similar past queries.\n", "5. **Summary Memory** β€” expand a summary ID only when you need specific details from older context.\n", "\n", "## Tool Output Handling\n", "Tool call outputs are logged to a Tool Log table and replaced with compact references in context.\n", "The preview in each [Tool Log ...] reference contains enough to reason about the result.\n", "If you need the full output, it can be retrieved from the database β€” but prefer working with\n", "the preview and the knowledge base (where search results are also stored).\n", "\n", "## Context Management\n", "If conversation memory is getting long or repetitive, call summarize_conversation(thread_id) to compact it.\n", "Use summarization tools at your discretion when they improve context quality.\n", "\n", "When answering:\n", "1. FIRST, use the context provided in the input\n", "2. Expand summary IDs just-in-time when needed\n", "3. Use external search tools only if memory context is insufficient\n", "4. Keep responses evidence-based and aligned with retrieved research context\n", "\"\"\"\n", "\n", "def execute_tool(tool_name: str, tool_args: dict) -> str:\n", " \"\"\"Execute a tool by looking it up in the toolbox.\"\"\"\n", "\n", " if tool_name not in toolbox._tools_by_name:\n", " return f\"Error: Tool '{tool_name}' not found\"\n", "\n", " return str(toolbox._tools_by_name[tool_name](**tool_args) or \"Done\")\n", "\n", "# ==================== OPENAI CHAT FUNCTION ====================\n", "def call_openai_chat(messages: list, tools: list = None, model: str = \"gpt-5\"):\n", " \"\"\"Call OpenAI Chat Completions API with tools.\"\"\"\n", " kwargs = {\"model\": model, \"messages\": messages}\n", " if tools:\n", " kwargs[\"tools\"] = tools\n", " kwargs[\"tool_choice\"] = \"auto\"\n", " return client.chat.completions.create(**kwargs)" ] }, { "cell_type": "markdown", "id": "9f5c797b", "metadata": {}, "source": [ "## Step 1: The Turn-Level Agent Run Flow\n", "\n", "```\n", "1. BUILD CONTEXT (programmatic)\n", " β”œβ”€β”€ Read conversational memory (unsummarized chat units)\n", " β”œβ”€β”€ Read knowledge base (relevant documents)\n", " β”œβ”€β”€ Read workflow memory (past action patterns)\n", " β”œβ”€β”€ Read entity memory (people, places, systems)\n", " └── Read summary context (available summary IDs + descriptions)\n", "\n", "2. GET TOOLS (programmatic)\n", " └── Retrieve semantically relevant tools from toolbox\n", "\n", "3. STORE USER MESSAGE (programmatic)\n", " └── Persist the user message + best-effort entity extraction\n", "\n", "4. WITHIN-RUN TOOL-CALL LOOP (up to max_iterations and within max_execution_time_s)\n", " β”œβ”€β”€ Call LLM with context + tool schemas\n", " β”œβ”€β”€ If tool calls β†’ execute tools and append tool outputs\n", " β”œβ”€β”€ If tools changed memory (search/compaction) β†’ rebuild context for the next iteration\n", " └── If no tool calls β†’ finalize answer\n", "\n", "5. GUARDED STOP\n", " └── If iteration/time budget is hit β†’ force a final best-effort answer (no tools)\n", "\n", "6. SAVE RESULTS (programmatic)\n", " β”œβ”€β”€ Write workflow (if tools were used)\n", " β”œβ”€β”€ Best-effort entity extraction on final answer\n", " └── Store assistant response in conversational memory\n", "```\n", "\n", "## Key Design Decisions\n", "\n", "- **Memory is loaded programmatically** so the model always starts from a consistent state.\n", "- **Tool use is agent-triggered** but tool execution is deterministic (the harness executes exactly what was requested).\n", "- **Context and memory engineering are first-class**: compaction/search can refresh context within the same run.\n", "- **Guardrails matter**: iteration and time budgets prevent runaway loops; the harness can still produce a final answer.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5ddc7b7f", "metadata": {}, "outputs": [], "source": [ "# ==================== TURN-LEVEL AGENT HARNESS (ONE RUN) ====================\n", "def call_agent(query: str, thread_id: str = \"1\", max_iterations: int = 10, max_execution_time_s: float = 60.0) -> str:\n", " \"\"\"Turn-level agent harness: build context, run tool-call loop, persist results.\n", " \n", " Appends (run_label, iteration, tokens) to the global context_size_history list\n", " so context growth can be visualised across multiple runs.\n", " \"\"\"\n", " thread_id = str(thread_id)\n", " steps = []\n", " run_label = f\"Run {len(set(r for r, _, _ in context_size_history)) + 1}\"\n", "\n", " import time\n", "\n", " start_time = time.time()\n", " timed_out = False\n", "\n", " # 1. Build context from memory\n", " print(\"\\n\" + \"=\"*50)\n", " print(\"🧠 BUILDING CONTEXT...\")\n", "\n", " def build_context() -> str:\n", " \"\"\"Rebuild the full context from the current memory state.\"\"\"\n", " ctx = f\"# Question\\n{query}\\n\\n\"\n", " ctx += memory_manager.read_conversational_memory(thread_id) + \"\\n\\n\"\n", " ctx += memory_manager.read_knowledge_base(query) + \"\\n\\n\"\n", " ctx += memory_manager.read_workflow(query) + \"\\n\\n\"\n", " ctx += memory_manager.read_entity(query) + \"\\n\\n\"\n", " ctx += memory_manager.read_summary_context(query) + \"\\n\\n\" # IDs + descriptions only\n", " return ctx\n", "\n", " context = build_context()\n", "\n", " print(\"====CONTEXT WINDOW=====\\n\")\n", " print(context)\n", "\n", " # 2. Check context usage (agent decides whether to summarize via tools)\n", " usage = calculate_context_usage(context)\n", " print(f\"πŸ“Š Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", " if usage['percent'] > 80:\n", " print(\"⚠️ Context >80% - agent may call summarize_conversation(thread_id) for compaction.\")\n", "\n", " # 3. Get tools\n", " dynamic_tools = memory_manager.read_toolbox(query, k=5)\n", "\n", " # Ensure summary tools are available for discretionary compaction/JIT expansion\n", " summary_tool_candidates = memory_manager.read_toolbox(\n", " \"summarize conversation compact context expand summary memory\", k=5\n", " )\n", " must_have = {\"expand_summary\", \"summarize_conversation\", \"summarize_and_store\"}\n", " existing = {t.get(\"function\", {}).get(\"name\") for t in dynamic_tools}\n", "\n", " for tool in summary_tool_candidates:\n", " name = tool.get(\"function\", {}).get(\"name\")\n", " if name in must_have and name not in existing:\n", " dynamic_tools.append(tool)\n", " existing.add(name)\n", "\n", " print(f\"πŸ”§ Tools: {[t['function']['name'] for t in dynamic_tools]}\")\n", "\n", " # 4. Store user message & extract entities\n", " memory_manager.write_conversational_memory(query, \"user\", thread_id)\n", " try:\n", " memory_manager.write_entity(\"\", \"\", \"\", llm_client=client, text=query)\n", " except:\n", " pass\n", "\n", " # 5. Within-run tool-call loop\n", " messages = [{\"role\": \"system\", \"content\": AGENT_SYSTEM_PROMPT}, {\"role\": \"user\", \"content\": context}]\n", " final_answer = \"\"\n", "\n", " # Estimate tool schema tokens (sent with every API call)\n", " tool_schema_tokens = len(json_lib.dumps(dynamic_tools)) // 4 if dynamic_tools else 0\n", "\n", " print(\"\\nπŸ€– TOOL-CALL LOOP\")\n", " for iteration in range(max_iterations):\n", " print(f\"\\n--- Iteration {iteration + 1} ---\")\n", "\n", " # Record context window size to the global tracker (messages + tool schemas)\n", " total_chars = sum(len(m.get(\"content\", \"\") or \"\") for m in messages)\n", " est_tokens = (total_chars // 4) + tool_schema_tokens\n", " context_size_history.append((run_label, iteration + 1, est_tokens))\n", "\n", " if max_execution_time_s is not None:\n", " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", " print(f\"\\n⏱️ Time limit reached ({elapsed:.1f}s > {max_execution_time_s:.1f}s). Finalizing...\")\n", " break\n", "\n", " response = call_openai_chat(messages, tools=dynamic_tools)\n", " msg = response.choices[0].message\n", "\n", " if msg.tool_calls:\n", " messages.append({\"role\": \"assistant\", \"content\": msg.content, \"tool_calls\": [\n", " {\"id\": tc.id, \"type\": \"function\", \"function\": {\"name\": tc.function.name, \"arguments\": tc.function.arguments}}\n", " for tc in msg.tool_calls\n", " ]})\n", "\n", " for tc in msg.tool_calls:\n", " tool_name = tc.function.name\n", " raw_args = tc.function.arguments or \"{}\"\n", " try:\n", " tool_args = json_lib.loads(raw_args)\n", " except Exception as e:\n", " result = f\"Error: invalid JSON tool arguments for {tool_name}: {e}. Raw: {raw_args}\"\n", " print(f\"πŸ› οΈ {tool_name}()\")\n", " steps.append(f\"{tool_name}() β†’ failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", " if not isinstance(tool_args, dict):\n", " result = f\"Error: tool arguments for {tool_name} must be a JSON object. Got {type(tool_args).__name__}.\"\n", " print(f\"πŸ› οΈ {tool_name}()\")\n", " steps.append(f\"{tool_name}() β†’ failed\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " continue\n", "\n", " # Ensure conversation compaction always targets the active thread.\n", " if tool_name == \"summarize_conversation\":\n", " tool_args[\"thread_id\"] = thread_id\n", "\n", " args_display = {k: (v[:50] + '...' if isinstance(v, str) and len(v) > 50 else v)\n", " for k, v in tool_args.items()}\n", " print(f\"πŸ› οΈ {tool_name}({args_display})\")\n", "\n", " if max_execution_time_s is not None:\n", " elapsed = time.time() - start_time\n", " if elapsed > max_execution_time_s:\n", " timed_out = True\n", " result = f\"Error: time limit reached before executing tool {tool_name}.\"\n", " steps.append(f\"{tool_name}({args_display}) β†’ failed\")\n", " print(f\" β†’ {result}\")\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": result})\n", " break\n", "\n", " try:\n", " result = execute_tool(tool_name, tool_args)\n", " steps.append(f\"{tool_name}({args_display}) β†’ success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", " steps.append(f\"{tool_name}({args_display}) β†’ failed\")\n", "\n", " print(f\" β†’ {result[:200]}...\")\n", "\n", " # Offload tool output to TOOL_LOG table (experimental memory).\n", " # Full output is persisted in the DB; only a compact reference\n", " # stays in the messages list to keep the context window lean.\n", " compact_result = memory_manager.write_tool_log(\n", " thread_id, tc.id, tool_name, raw_args, str(result)\n", " )\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": compact_result})\n", "\n", " # If tools changed memory state, refresh context for the next iteration.\n", " if tool_name in {\"search_tavily\", \"summarize_conversation\", \"summarize_and_store\"}:\n", " context = build_context()\n", " if len(messages) >= 2 and messages[1].get(\"role\") == \"user\":\n", " messages[1][\"content\"] = context\n", " usage = calculate_context_usage(context)\n", " print(f\" Refreshed context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)\")\n", "\n", " if timed_out:\n", " break\n", " else:\n", " final_answer = msg.content or \"\"\n", " print(f\"\\nβœ… DONE ({len(steps)} tool calls)\")\n", " break\n", "\n", " if not final_answer:\n", " reason = \"time limit\" if timed_out else \"iteration limit\"\n", " print(f\"\\n⚠️ Stopped due to {reason}. Generating best-effort final answer (no tools)...\")\n", " try:\n", " final_messages = messages + [{\"role\": \"user\", \"content\": \"Finalize your answer using the context and tool outputs so far. Do not call tools.\"}]\n", " final_resp = call_openai_chat(final_messages, tools=None)\n", " final_answer = final_resp.choices[0].message.content or \"\"\n", " except Exception as e:\n", " final_answer = f\"Error: unable to finalize answer: {e}\"\n", "\n", " # 6. Save workflow & entities\n", " if steps:\n", " memory_manager.write_workflow(query, steps, final_answer)\n", " try:\n", " memory_manager.write_entity(\"\", \"\", \"\", llm_client=client, text=final_answer)\n", " except:\n", " pass\n", " memory_manager.write_conversational_memory(final_answer, \"assistant\", thread_id)\n", "\n", " print(\"\\n\" + \"=\"*50 + f\"\\nπŸ’¬ ANSWER:\\n{final_answer}\\n\" + \"=\"*50)\n", " return final_answer" ] }, { "cell_type": "code", "execution_count": null, "id": "9dd88f84", "metadata": {}, "outputs": [], "source": [ "call_agent(\"What was my first question to you\", thread_id=\"0022\")" ] }, { "cell_type": "code", "execution_count": null, "id": "1iqefdfwkbz", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "if context_size_history:\n", " tokens = [t for _, _, t in context_size_history]\n", "\n", " plt.figure(figsize=(8, 3))\n", " plt.plot(range(1, len(tokens) + 1), tokens, marker=\"o\")\n", " plt.xlabel(\"Global Iteration (across all runs)\")\n", " plt.ylabel(\"Estimated Tokens\")\n", " plt.title(\"Context Window Size Over Agent Iterations\")\n", " plt.tight_layout()\n", " plt.show()\n", "else:\n", " print(\"No iterations recorded β€” run call_agent() first.\")" ] }, { "cell_type": "markdown", "id": "1cdasgb4qzj", "metadata": {}, "source": [ "## Step 2: Baseline β€” Agent Without Context Engineering\n", "\n", "To appreciate the impact of the memory and context engineering techniques we've built, it helps to see what happens **without them**.\n", "\n", "`call_agent_naive` is a stripped-down agent harness that deliberately removes three key optimisations:\n", "\n", "| Technique Removed | What Happens Instead | Effect on Context Window |\n", "|---|---|---|\n", "| **Tool output offloading** (`write_tool_log`) | Full raw tool outputs stay in the `messages` list | Each tool call adds thousands of tokens (e.g. a web search returns ~2-4k tokens of results) |\n", "| **Summarisation tools** (`summarize_conversation`, `summarize_and_store`) | Excluded from the tool list β€” the agent has no way to compact context | Context only grows, never shrinks |\n", "| **Context refresh after search** | No rebuild from memory after tool calls | Stale + bloated context persists across iterations |\n", "| **Memory-backed context rebuild** | Messages persist as one flat list across calls | No separation of concerns β€” everything accumulates |\n", "\n", "### Why This Matters\n", "\n", "In a real agent loop, the LLM is called **once per iteration** with the full `messages` list. Without offloading, every tool output ever produced sits in that list. After just 3 web searches, the context could grow by 10,000+ tokens β€” consuming budget that could be used for reasoning.\n", "\n", "The comparison chart below plots both approaches on the same axis so you can see the divergence." ] }, { "cell_type": "code", "execution_count": null, "id": "vgycbj6ypih", "metadata": {}, "outputs": [], "source": [ "# Separate tracker for the naive agent\n", "naive_context_size_history = []\n", "# Persistent messages per thread β€” simulates no context management across runs\n", "_naive_messages_by_thread = {}\n", "\n", "def call_agent_naive(query: str, thread_id: str = \"naive_1\", dynamic_tools_override: list = None, max_iterations: int = 10, max_execution_time_s: float = 60.0) -> str:\n", " \"\"\"Naive agent harness β€” NO context engineering.\n", " \n", " Differences from call_agent:\n", " 1. Full raw tool outputs stay in messages (no write_tool_log offloading)\n", " 2. No summarisation tools available (agent cannot compact context)\n", " 3. No context refresh after memory-mutating tools\n", " 4. Messages persist across calls β€” context only grows, never shrinks\n", " 5. No memory reads β€” conversation history IS the raw messages list\n", " \"\"\"\n", " thread_id = str(thread_id)\n", " steps = []\n", " import time\n", " start_time = time.time()\n", " timed_out = False\n", "\n", " # Get tools β€” but exclude summarisation tools\n", " if dynamic_tools_override is not None:\n", " dynamic_tools = dynamic_tools_override\n", " else:\n", " dynamic_tools = memory_manager.read_toolbox(query, k=5)\n", " dynamic_tools = [t for t in dynamic_tools\n", " if t.get(\"function\", {}).get(\"name\") not in\n", " {\"summarize_conversation\", \"summarize_and_store\", \"expand_summary\"}]\n", "\n", " # Initialize or reuse persistent messages for this thread.\n", " # No memory reads β€” the raw messages list IS the only context.\n", " # This is the naive approach: everything accumulates in one flat list.\n", " if thread_id not in _naive_messages_by_thread:\n", " _naive_messages_by_thread[thread_id] = [\n", " {\"role\": \"system\", \"content\": \"You are a Research Paper Assistant with access to tools.\"}\n", " ]\n", " messages = _naive_messages_by_thread[thread_id]\n", "\n", " # Just append the raw query β€” no build_context(), no memory reads.\n", " # Prior turns, tool outputs, and assistant responses are already in messages.\n", " messages.append({\"role\": \"user\", \"content\": query})\n", " final_answer = \"\"\n", "\n", " # Estimate tool schema tokens (included in every API call)\n", " tool_schema_chars = len(json_lib.dumps(dynamic_tools)) if dynamic_tools else 0\n", " tool_schema_tokens = tool_schema_chars // 4\n", "\n", " for iteration in range(max_iterations):\n", " # Track context size: messages + tool schemas\n", " msg_chars = sum(len(m.get(\"content\", \"\") or \"\") for m in messages)\n", " naive_context_size_history.append((msg_chars // 4) + tool_schema_tokens)\n", "\n", " if max_execution_time_s and (time.time() - start_time) > max_execution_time_s:\n", " timed_out = True\n", " break\n", "\n", " response = call_openai_chat(messages, tools=dynamic_tools)\n", " msg = response.choices[0].message\n", "\n", " if msg.tool_calls:\n", " messages.append({\"role\": \"assistant\", \"content\": msg.content, \"tool_calls\": [\n", " {\"id\": tc.id, \"type\": \"function\", \"function\": {\"name\": tc.function.name, \"arguments\": tc.function.arguments}}\n", " for tc in msg.tool_calls\n", " ]})\n", " for tc in msg.tool_calls:\n", " tool_args = json_lib.loads(tc.function.arguments or \"{}\")\n", " try:\n", " result = execute_tool(tc.function.name, tool_args)\n", " steps.append(f\"{tc.function.name} β†’ success\")\n", " except Exception as e:\n", " result = f\"Error: {e}\"\n", " steps.append(f\"{tc.function.name} β†’ failed\")\n", "\n", " # KEY DIFFERENCE: raw tool output goes straight into messages (no offloading)\n", " messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": str(result)})\n", " # KEY DIFFERENCE: no context refresh\n", " else:\n", " final_answer = msg.content or \"\"\n", " break\n", "\n", " if not final_answer:\n", " try:\n", " messages.append({\"role\": \"user\", \"content\": \"Finalize your answer. Do not call tools.\"})\n", " final_answer = call_openai_chat(messages, tools=None).choices[0].message.content or \"\"\n", " except Exception as e:\n", " final_answer = f\"Error: {e}\"\n", "\n", " # Append assistant answer to persistent messages (it stays for the next call)\n", " messages.append({\"role\": \"assistant\", \"content\": final_answer})\n", " print(f\"βœ… Naive agent done ({len(steps)} tool calls, {len(messages)} messages in context)\")\n", " return final_answer" ] }, { "cell_type": "code", "execution_count": null, "id": "93lhzlgzars", "metadata": {}, "outputs": [], "source": [ "import uuid\n", "\n", "# Reset all trackers for a clean comparison\n", "context_size_history.clear()\n", "naive_context_size_history.clear()\n", "_naive_messages_by_thread.clear()\n", "\n", "# Generate unique thread IDs for isolation\n", "eng_thread = str(uuid.uuid4())[:8]\n", "naive_thread = str(uuid.uuid4())[:8]\n", "\n", "# Five progressive queries that build on each other β€” tests memory continuity\n", "queries = [\n", " \"Search for recent papers on AI agent memory published in 2026\",\n", " \"Pick the 3rd paper from the list and give me the key takeaways\",\n", " \"What other viewpoints or approaches might that paper have missed?\",\n", " \"Summarize everything we've discussed so far\",\n", " \"What was the first question I asked in this conversation?\",\n", "]\n", "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", " print(f\"QUERY {i}/5 β€” WITH CONTEXT ENGINEERING (thread: {eng_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent(q, thread_id=eng_thread)\n", " print(\"\\n\")\n", "\n", "for i, q in enumerate(queries, 1):\n", " print(\"=\" * 60)\n", " print(f\"QUERY {i}/5 β€” NAIVE / NO CONTEXT ENGINEERING (thread: {naive_thread})\")\n", " print(f\" >> {q}\")\n", " print(\"=\" * 60)\n", " call_agent_naive(q, thread_id=naive_thread)\n", " print(\"\\n\")" ] }, { "cell_type": "code", "execution_count": null, "id": "oowm2ifh4t", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "eng_tokens = [t for _, _, t in context_size_history]\n", "naive_tokens = naive_context_size_history\n", "\n", "plt.figure(figsize=(9, 4))\n", "if eng_tokens:\n", " plt.plot(range(1, len(eng_tokens) + 1), eng_tokens, marker=\"o\", label=\"With Context/Memory Engineering\")\n", "if naive_tokens:\n", " plt.plot(range(1, len(naive_tokens) + 1), naive_tokens, marker=\"s\", label=\"Naive (no offloading/summarisation)\")\n", "plt.xlabel(\"Iteration\")\n", "plt.ylabel(\"Estimated Tokens\")\n", "plt.title(\"Context Window Growth: Engineered vs Naive Agent\")\n", "plt.legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "d29dd1b2", "metadata": {}, "source": [ "## Key Learning for AI Developers: Agent Run, Tool-Call Loop, and Memory-Based Harness\n", "\n", "In OpenAI-style framing:\n", "- An **agent run** (one user turn handled) is what `call_agent(...)` executes.\n", "- Within a run, the **tool-call loop** repeats: model reasoning β†’ optional tool calls β†’ harness executes tools β†’ model observes results β†’ repeat until a final answer.\n", "\n", "An **agent harness** is the runtime scaffolding around that loop. In this notebook, it is a **memory-based agent harness** where:\n", "- context is assembled from multiple memory types each run\n", "- tools are discovered and executed within the run\n", "- outputs are written back into memory for future runs\n", "- summaries compact context while preserving continuity\n", "\n", "The key discipline is **context and memory engineering**:\n", "- decide what should be stored, retrieved, summarized, and refreshed\n", "- keep context windows relevant, not just large\n", "- treat memory as an evolving system that improves agent reliability over time\n", "\n", "The practical takeaway: strong agents are not just model prompts. They are run + harness systems, and memory engineering is the control layer that makes them reliable, stateful, and scalable.\n" ] }, { "cell_type": "markdown", "id": "7sp42fx6618", "metadata": {}, "source": [ "## Step 3: LLM-as-a-Judge β€” Response Quality Evaluation\n", "\n", "We've seen the **context window efficiency** difference between the two agents. But does better context engineering actually produce **better answers**?\n", "\n", "To find out, we use the **LLM-as-a-Judge** pattern: a separate LLM call evaluates both agent responses against the original query and picks a winner. This is a widely used technique for automated evaluation when ground-truth labels aren't available.\n", "\n", "| What the Judge Sees | What It Decides |\n", "|---|---|\n", "| The user query | Which response is more **accurate, complete, and relevant** |\n", "| Response A (memory-engineered agent) | A preference: **A**, **B**, or **Tie** |\n", "| Response B (naive agent) | A short explanation of its reasoning |\n", "\n", "> **Why a warmup phase?** The memory agent's advantage is **cumulative** β€” it stores conversational memory, entities, and workflows across turns while managing context size. On a brand-new conversation, both agents perform similarly. We first run 5 warmup queries to build up conversation state, then evaluate on queries that specifically test **recall, continuity, and synthesis** β€” the capabilities that memory engineering enables." ] }, { "cell_type": "code", "execution_count": null, "id": "xg8hcki218", "metadata": {}, "outputs": [], "source": [ "# ── Warmup phase: build up conversation history so the memory agent has state to leverage ──\n", "eval_thread_eng = str(uuid.uuid4())[:8]\n", "eval_thread_naive = str(uuid.uuid4())[:8]\n", "\n", "warmup_queries = [\n", " \"Search for recent papers on AI agent memory published in 2026\",\n", " \"Pick the 2nd paper from the list and give me the key takeaways\",\n", " \"What other approaches might that paper have missed?\",\n", " \"Search for papers on context window management in LLM agents\",\n", " \"Compare the findings from the two searches we did\",\n", "]\n", "\n", "print(\"πŸ”„ WARMUP β€” building conversation history on both agents...\\n\")\n", "for i, q in enumerate(warmup_queries, 1):\n", " print(f\" Warmup {i}/{len(warmup_queries)}: {q[:60]}...\")\n", " call_agent(q, thread_id=eval_thread_eng)\n", " call_agent_naive(q, thread_id=eval_thread_naive)\n", "\n", "# ── Evaluation phase: these queries test memory recall and continuity ──\n", "eval_queries = [\n", " \"What was the very first paper we discussed and what were its key points?\",\n", " \"Summarize the full arc of our conversation so far\",\n", " \"Based on everything we've discussed, what research gap would you recommend exploring next?\",\n", "]\n", "\n", "eval_results = []\n", "\n", "print(f\"\\n{'='*60}\\nπŸ“‹ EVALUATION β€” collecting response pairs for judging\\n{'='*60}\")\n", "for q in eval_queries:\n", " print(f\"\\nEVAL: {q}\")\n", " print(\" β–Ά Memory-engineered agent...\")\n", " eng_resp = call_agent(q, thread_id=eval_thread_eng)\n", " print(\" β–Ά Naive agent...\")\n", " naive_resp = call_agent_naive(q, thread_id=eval_thread_naive)\n", " eval_results.append((q, eng_resp, naive_resp))\n", "\n", "print(f\"\\nβœ… Collected {len(eval_results)} response pairs for judging.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "fieklrjjfvg", "metadata": {}, "outputs": [], "source": [ "JUDGE_PROMPT = \"\"\"You are an impartial judge evaluating two AI assistant responses to a user query.\n", "\n", "**User Query:** {query}\n", "\n", "**Response A (Agent A):**\n", "{response_a}\n", "\n", "**Response B (Agent B):**\n", "{response_b}\n", "\n", "Evaluate both responses on:\n", "1. **Accuracy** β€” Are the facts correct and claims well-supported?\n", "2. **Completeness** β€” Does the response fully address the query?\n", "3. **Relevance** β€” Does it stay on-topic and use context appropriately?\n", "4. **Coherence** β€” Is it well-structured and easy to follow?\n", "\n", "Reply with EXACTLY this JSON format (no other text):\n", "{{\"winner\": \"A\" or \"B\" or \"Tie\", \"reason\": \"one sentence explanation\"}}\"\"\"\n", "\n", "\n", "def judge_responses(query, response_a, response_b):\n", " \"\"\"Use the LLM to judge which response is better.\"\"\"\n", " resp = client.chat.completions.create(\n", " model=\"gpt-5\",\n", " messages=[{\"role\": \"user\", \"content\": JUDGE_PROMPT.format(\n", " query=query, response_a=response_a, response_b=response_b\n", " )}],\n", " )\n", " return json_lib.loads(resp.choices[0].message.content)\n", "\n", "\n", "# Run the judge on each response pair\n", "judgments = []\n", "for query, eng_resp, naive_resp in eval_results:\n", " verdict = judge_responses(query, eng_resp, naive_resp)\n", " verdict[\"query\"] = query\n", " judgments.append(verdict)\n", " label = {\"A\": \"Memory Agent βœ…\", \"B\": \"Naive Agent\", \"Tie\": \"Tie 🀝\"}\n", " print(f\"Query: {query[:60]}...\")\n", " print(f\" Winner: {label.get(verdict['winner'], verdict['winner'])}\")\n", " print(f\" Reason: {verdict['reason']}\\n\")" ] }, { "cell_type": "code", "execution_count": null, "id": "rlwrrbz0xtf", "metadata": {}, "outputs": [], "source": [ "# Visualize judge results\n", "wins = {\"Memory Agent\": 0, \"Naive Agent\": 0, \"Tie\": 0}\n", "for j in judgments:\n", " if j[\"winner\"] == \"A\":\n", " wins[\"Memory Agent\"] += 1\n", " elif j[\"winner\"] == \"B\":\n", " wins[\"Naive Agent\"] += 1\n", " else:\n", " wins[\"Tie\"] += 1\n", "\n", "colors = {\"Memory Agent\": \"#4CAF50\", \"Naive Agent\": \"#F44336\", \"Tie\": \"#9E9E9E\"}\n", "\n", "plt.figure(figsize=(6, 3))\n", "bars = plt.bar(wins.keys(), wins.values(), color=[colors[k] for k in wins])\n", "plt.ylabel(\"Queries Won\")\n", "plt.title(\"LLM Judge Preference: Memory Agent vs Naive Agent\")\n", "for bar, count in zip(bars, wins.values()):\n", " if count > 0:\n", " plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.05,\n", " str(count), ha=\"center\", va=\"bottom\", fontweight=\"bold\")\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"\\nMemory Agent wins {wins['Memory Agent']}/{len(judgments)} queries.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "52196f6d", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "playground", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 5 }