# Memory MCP Server Architecture

## High-Level Overview
Memory MCP Server is an autonomous memory system for AI agents, written in Rust. It combines semantic search (vectors), knowledge graph, and code indexing into a single binary without external dependencies.

### Key Components
1. **MCP Server**: Handles requests from clients (IDE, Agents).
2. **Embedding Architecture**: Generates vectors locally using `candle` / `ort`.
3. **Storage Layer**: Embedded SurrealDB for storing vectors, graphs, and metadata.
4. **Codebase Engine**: Indexes code using Tree-sitter (in development).

## Component Diagram (C4 Container)

```mermaid
graph TD
    User[AI Agent / IDE]
    
    subgraph "Memory MCP Server"
        Handler[MCP Handler]
        
        subgraph "Logic Layer"
            L_Mem[Memory Logic]
            L_Search[Search Logic]
            L_Graph[Graph Logic]
            L_Code[Code Logic]
        end
        
        subgraph "Embedding Subsystem"
            E_Service[Embedding Service]
            E_Queue[Adaptive Queue]
            E_Worker[Embedding Worker]
            E_Engine["Inference Engine (Candle)"]
            E_Cache["Embedding Store (L1/L2)"]
        end
        
        subgraph "Storage Layer"
            S_Surreal[SurrealDB Access]
        end

        User -- "MCP Tools" --> Handler

        Handler -- "store_memory" --> L_Mem
        Handler -- "recall/search" --> L_Search
        Handler -- "create_relation" --> L_Graph
        Handler -- "index_project" --> L_Code
        
        L_Mem -- "Embed Content" --> E_Service
        L_Search -- "Embed Query" --> E_Service
        
        E_Service --> E_Queue
        E_Queue -- "Batched Requests" --> E_Worker
        E_Worker -- "Check Cache" --> E_Cache
        E_Worker -- "Run Model" --> E_Engine
        
        L_Mem & L_Search & L_Graph & L_Code --> S_Surreal
        
        S_Surreal -.-> DB[(Embedded Files)]
        E_Cache -.-> DB
    end
```

## Component Details & Algorithms

### 1. Logic Layer
Responsible for request handling, routing, and business logic implementation.

*   **Reciprocal Rank Fusion (RRF)**: Algorithm for merging search results from different sources (Vector Search, BM25, Knowledge Graph).
    *   *Why*: Vector search is good for semantics ("meaning"), BM25 for exact keyword matches, and Graph for relationships. RRF allows taking the best of all three worlds without complex weight tuning.
    *   *Formula*: `score = 1.0 / (k + rank)`
*   **BM25**: Text search algorithm (Okapi BM25). Implemented on top of SurrealDB indexes.

### 2. Embedding Subsystem
Critical component for semantic search. Operates autonomously.

*   **Adaptive Queue**: Smart queue regulating vectorization request rate (Backpressure).
    *   *Algorithm*: Monitors queue depth and slows down new requests (`THROTTLE_DELAY_MS`) if the queue is filled > 80% (`HIGH_WATERMARK`).
    *   *Why*: Prevents OOM (Out of Memory) during massive file indexing.
*   **Inference Engine (Candle)**: Uses the `candle` library (Huggingface) to run BERT-like models (nomic-embed, e5) on CPU. Does not require Python.
*   **L1/L2 Cache**:
    *   L1: LRU Cache in RAM for most frequent requests.
    *   L2: Disk cache (Sled/SurrealDB) to avoid re-vectorizing unchanged content.

### 3. Graph Algorithms
Used for analyzing relationships between entities (files, functions, notes).

*   **Personalized PageRank (PPR)**: Algorithm for ranking graph nodes relative to "seed" nodes.
    *   *Application*: When a user searches for "Authorization", we find the "Authorization" node, and PPR finds all related concepts (e.g., "Login", "JWT", "OAuth"), even if the text doesn't contain the word "Authorization".
    *   *Hub Dampening*: Modification to reduce the weight of "super-nodes" (linked to everything) to avoid noise.
*   **Leiden Algorithm**: Community Detection algorithm.
    *   *Why*: Groups closely related files or concepts into clusters. Helps understand the modular structure of the project.

### 4. Codebase Engine
Responsible for understanding code.

*   **Tree-Sitter Chunking**: Smart code splitting into fragments (chunks) based on Abstract Syntax Tree (AST), rather than just lines.
    *   *Logic*: Respects function and class boundaries. Large functions are broken down into smaller logical blocks, preserving context.
    *   *Why*: Vector search works better with logically complete code pieces than with arbitrary text slices.
*   **Content Hashing (Blake3)**: Fast hashing for deduplication. If a file hasn't changed, it's not re-indexed.

## Data Flow: Store Memory

```mermaid
sequenceDiagram
    participant Agent
    participant MCP as MCP Server
    participant Embed as Embedding Service
    participant DB as SurrealDB

    Agent->>MCP: store_memory(content: "...")
    MCP->>Embed: embed(content)
    Embed-->>MCP: [0.12, -0.45, ...] (Vector)
    MCP->>DB: CREATE memory SET content=..., embedding=...
    DB-->>MCP: Memory ID
    MCP-->>Agent: Memory ID
```

## Data Flow: Search (Recall / Hybrid Search)

```mermaid
sequenceDiagram
    participant Agent
    participant MCP as MCP Server
    participant Embed as Embedding Service
    participant DB as SurrealDB

    Agent->>MCP: recall(query: "...")
    par Vector Search
        MCP->>Embed: embed(query)
        Embed-->>MCP: Vector
        MCP->>DB: SELECT * FROM memory WHERE embedding <|5|> vector
    and Stats / Graph Search
        MCP->>DB: SELECT * FROM memory WHERE content CONTAINS query
    end
    DB-->>MCP: Results
    MCP->>MCP: Re-rank (RRF)
    MCP-->>Agent: Top Results
```

## Module Structure (Crate Structure)

* `src/main.rs`: Entry point, CLI initialization, and services.
* `src/server/`: MCP protocol implementation and tool routing.
* `src/embedding/`: Wrapper around `candle` for local model inference.
* `src/storage/`: Abstraction over SurrealDB.
* `src/graph/`: Graph algorithms (PageRank, Community Detection).
* `src/codebase/`: Code indexing and chunking logic.