--- title: "Memory Pipeline" description: "LLM-based memory extraction and processing pipeline." order: 4 section: "Core Concepts" --- Memory Pipeline v2 ================== Overview and Philosophy --- Pipeline v2 exists because the original [[memory]] system was purely reactive: callers wrote whatever they wanted, the database accepted it, and recall quality depended entirely on how well the caller chose what to store. That model worked for bootstrapping but doesn't scale — memories accumulate noise, contradict each other, and fragment across overlapping phrasings of the same fact. The pipeline introduces a background extraction layer. When a memory arrives, it is persisted immediately (raw-first safety), and a job is enqueued to analyze it asynchronously. The job runs extraction and decision passes using a local LLM, then optionally writes derived facts back into the memory store. This means the caller's raw content is never lost — it is always durably committed before any LLM call runs — and derived facts are layered on top rather than replacing the original. This is substrate work. The pipeline's job is to turn raw interaction data into cleaner, more structured material the rest of the system can use for retrieval, repair, and eventually learned context selection. The central constraint governing every design decision here is: **no LLM calls inside write-locked transactions.** SQLite write locks are exclusive, and a blocking HTTP call to Ollama inside one would stall the entire [[daemon]]. The pipeline enforces a strict two-phase discipline: fetch and embed outside the lock, then commit atomically inside `withWriteTx`. Any violation of this rule introduces unbounded latency into every other writer. Pipeline Modes --- Three operational modes are composed from five boolean flags. **Shadow mode** is active when `enabled` is true but `shadowMode` is also true, or when `mutationsFrozen` is true. In this mode the pipeline runs the full extraction and decision sequence, records all proposals to `memory_history` for audit, but makes no writes to the memories table. Shadow mode is useful for validating extraction quality without affecting production data. **Controlled-write mode** is active when `enabled` is true, `shadowMode` is false, and `mutationsFrozen` is false. In this mode, ADD and NONE decisions are applied. ADD creates new memory rows and embeddings; NONE is recorded for audit only. UPDATE and DELETE proposals are blocked unless `autonomous.allowUpdateDelete` is true. **Full mode** is controlled-write mode with `allowUpdateDelete` set to true. In this mode UPDATE proposals modify the referenced memory through the mutation API path, and DELETE proposals soft-delete the referenced memory through the forget path. The previous target state is archived to the cold tier first, and pinned memories are skipped rather than deleted. The five config flags in detail: - `enabled` — Master switch. When false, no extraction jobs are processed. - `shadowMode` — Run extraction and decisions without writing any facts. - `allowUpdateDelete` — Permit UPDATE/DELETE decisions to mutate existing memories through guarded modify/forget paths. - `mutationsFrozen` — Emergency brake. Disables all writes even if `shadowMode` is false. - `autonomous.frozen` — Disables the maintenance worker's scheduled interval even if `autonomous.enabled` is true. Extraction Stage --- Extraction is the first LLM pass. Its job is to decompose a raw memory string into a list of discrete, reusable facts and a list of entity relationship triples. The extraction prompt instructs the model to return a JSON object with two arrays. Each fact carries a `content` string, a `type` discriminant (`fact`, `preference`, `decision`, `procedural`, or `semantic`), and a floating-point `confidence` in [0, 1]. Each entity triple carries `source`, `relationship`, `target`, and `confidence`. The prompt includes worked examples and explicitly tells the model to skip ephemeral details and return only the JSON object — no surrounding text. The model's output is post-processed before validation. `` blocks emitted by chain-of-thought models like qwen3 are stripped first. Then Markdown code fences are removed if present. The resulting string is parsed as JSON. Validation is strict and partial-failure safe. Facts are capped at 20 per input. Any fact shorter than 10 characters is rejected. Any fact longer than 2000 characters is truncated. An unknown type string is coerced to `fact` with a warning recorded. Entities are capped at 50 per input; each must have non-empty `source` and `target` strings and a non-empty `relationship`. Input longer than 12,000 characters is truncated before the prompt is built. Validation failures produce warnings that are accumulated in the `ExtractionResult` and surfaced in the job's result payload. They never throw — partial results are always returned. Decision Stage --- The decision stage evaluates each extracted fact independently against the existing memory store. For each fact, the engine retrieves the top-5 candidate memories via hybrid search, then asks the LLM which of four actions to take: ADD, UPDATE, DELETE, or NONE. This stage is intentionally conservative. It is better understood as a proposal and curation layer than as autonomous semantic rewriting. Its output improves memory quality and auditability; it does not eliminate the need for downstream relevance learning. Candidate retrieval uses the same BM25 + vector hybrid search that powers recall. The BM25 leg queries `memories_fts` with the fact's content as the full-text query; scores are normalized to [0, 1] via `1 / (1 + |score|)`. The vector leg embeds the fact content and calls `vectorSearch` against the embeddings table. Results from both legs are merged by ID, then combined with a weighted sum: `alpha × vector + (1 - alpha) × bm25` when both legs returned a score, or the single available score otherwise. Candidates below `min_score` are dropped. The top 5 are fetched from the memories table. When no candidates are found, the engine immediately proposes ADD without an LLM call, using the fact's own confidence as the proposal confidence and a fixed reason string. When candidates exist, the decision prompt presents the fact and a numbered list of candidates with their IDs, types, and content. The model is asked to return a JSON object with `action`, `targetId` (required for UPDATE and DELETE), `confidence`, and `reason`. The response is parsed with the same ``-strip and fence-removal logic as extraction. Validation on the decision output ensures that UPDATE and DELETE decisions reference an ID that actually appears in the candidate set. Proposals with missing or hallucinated IDs are dropped with a warning. An empty `reason` string is also rejected. The function is named `runShadowDecisions` regardless of mode — "shadow" here means the function itself makes no writes. Whether the proposals are applied or merely recorded is a concern of the worker that calls this function. Controlled Writes --- When controlled-write mode is active, the worker applies ADD decisions inside a single `withWriteTx` call after all LLM and embedding work has completed. The write path is implemented in `applyPhaseCWrites`. Before entering the transaction, the worker pre-fetches embeddings for all ADD proposals in parallel. Each fact content is passed through `normalizeAndHashContent` to compute a `contentHash`, and the storage content (original casing) and hash are used as the key for caching the vector. The embedding fetch is intentionally outside the transaction lock. Inside the transaction, each ADD proposal passes through a sequence of safety gates. First, the fact's confidence is compared to `minFactConfidenceForWrite` (default 0.7); facts below this threshold are skipped with reason `low_fact_confidence`. Second, the normalized content is checked for zero length; empty facts are skipped with reason `empty_fact_content`. Third, the `content_hash` is checked against the memories table to detect exact duplicates — both at the pre-insert check and defensively on UNIQUE constraint collision. Duplicates are recorded with the existing memory's ID and counted as `deduped`. For facts that clear all gates, `txIngestEnvelope` creates the memory row in a single insert, with `who` set to `pipeline-v2`, `why` to `extracted-fact`, and the pipeline's extraction model name in `extractionModel`. If a pre-fetched embedding vector is available for this content hash, it is upserted into the embeddings table in the same transaction. Audit records are written for every proposal in every outcome: ADD (created), ADD (deduped), ADD (skipped), NONE (recorded), and destructive (blocked). Each record lands in `memory_history` with enough metadata to reconstruct the decision context: proposal action, fact content, confidence, the source memory ID, the extraction model, and fact and entity counts. The contradiction detector runs on UPDATE and DELETE proposals before they are blocked. It tokenizes both the fact content and the target memory's content, checks for lexical overlap of at least two tokens, and then looks for either a negation-polarity difference (one has a negation token, the other doesn't) or an antonym pair conflict (enabled/disabled, allow/deny, etc.). Proposals that trigger the detector are flagged `reviewNeeded: true` in their audit record. Content Normalization --- All content passes through `normalizeAndHashContent` before storage or hashing. The function is deterministic and produces three derived values. `storageContent` is the text after trimming and whitespace collapsing (`/\s+/g → " "`). This is what gets written to the database. Original casing is preserved. `normalizedContent` takes `storageContent`, lowercases it, and strips trailing punctuation (`[.,!?;:]+$`). This is used for FTS indexing and as the hash basis when non-empty. `contentHash` is a SHA-256 digest of the hash basis (normalized content if non-empty, otherwise lowercased storage content). This 64-character hex string is the deduplication key. Upserts on the embeddings table use it as the unique key, and memory inserts check it to avoid exact-content duplicates. Inline Entity Linker --- Before any async pipeline job runs, the inline entity linker (`platform/daemon/src/inline-entity-linker.ts`) performs a fast, synchronous mention-linking pass at memory write time. This is a mechanical helper, not a semantic author. The linker runs without an LLM call. It scans the memory's content text for candidate proper nouns and links only entities that already exist for the same `agent_id`. It writes `memory_entity_mentions` rows so a new memory can be discovered from known entity pages immediately, but it does not create entities, aspects, attributes, or dependencies. Structured graph writes come from `POST /api/memory/remember` with a `structured` payload, explicit user/agent actions, or reviewed normalization passes. This keeps the default background path cheap, predictable, and hard to poison: incidental capitalization can attach a memory to an existing known entity, but it cannot invent graph structure. Because the linker runs inside the write transaction, it must stay fast and deterministic. There are no network calls, no LLM inference, and no blocking I/O, only candidate matching and SQLite writes against existing graph rows. Structural Classification --- When explicitly enabled, after extraction writes facts to the database, the structural classification worker (`structural-classify.ts`) runs a second LLM pass to assign each extracted fact to its entity's aspect hierarchy. Jobs are enqueued as `structural_classify` entries in `memory_jobs` and processed by a separate polling worker that batches by `entity_id`, all facts for the same entity in one LLM call. The prompt presents the entity name, type, existing aspects, and suggested aspect names (from `ASPECT_SUGGESTIONS` keyed by entity type). The LLM returns a JSON array of `{i, aspect, kind, new}` objects. Each fact is assigned to a named aspect and classified as either `attribute` or `constraint`. Aspects are upserted into `entity_aspects` on `(entity_id, canonical_name)` conflict. The `entity_attributes` row written during extraction has its `aspect_id` and `kind` filled in. When an entity's type was not determinable during extraction (stored as `"extracted"`), the classify prompt also asks the LLM to infer the type. If a valid canonical type is returned (`person`, `project`, `system`, `tool`, `concept`, `skill`, `task`, or `unknown`), the `entities` row is updated in the same transaction. The worker configuration lives under `structural` in the pipeline config: `enabled` (default `false`), `pollIntervalMs` (how often to check for pending jobs), and `classifyBatchSize` (max facts per entity per LLM call). The default pipeline does not use a background LLM to author graph structure; structured remember is the normal semantic write path. For details on the knowledge graph persistence stage, see [KNOWLEDGE-GRAPH.md](./KNOWLEDGE-GRAPH.md). Knowledge Graph --- When `graph.enabled` is true, graph reads, traversal, and recall boosting are available. Background extraction only persists extracted entity triples when `graph.extractionWritesEnabled` is also true. That write gate defaults to `true` so new installs populate the graph from extraction. Set it to `false` to keep graph navigation on without letting the async extractor author semantic graph structure. The daemon logs a startup warning when graph reads are enabled while extraction writes are disabled. `/api/diagnostics` also reports `graph.extractionWritesEnabled` and degrades graph health once enough active memories exist but the graph still has no entities. If extraction graph writes are explicitly enabled, they happen in a **separate** transaction immediately after the main write transaction commits. Graph persistence failure is non-fatal: it logs a warning but never reverts the fact extraction results. Entities are stored in the `entities` table with `name` (original casing), `canonical_name` (lowercase, whitespace-normalized), `entity_type`, and `mentions` (an integer count). New entities are inserted; existing entities (matched by `canonical_name`) have their `mentions` counter incremented. UNIQUE constraint collisions on the `name` column are handled gracefully by falling back to the existing row and incrementing mentions there. Relations are stored in the `relations` table linking two entity rows by `source_entity_id`, `target_entity_id`, and `relation_type`. The `strength` field is fixed at 1.0 for all pipeline-extracted relations. When a relation already exists (same source, target, and type), `mentions` is incremented and `confidence` is updated via a running average: `(old_avg × n + new_confidence) / (n + 1)`. Every source and target entity is linked back to the originating memory row via `memory_entity_mentions`. The link stores `mention_text` (the raw string before canonicalization) and `confidence`. Inserts use `INSERT OR IGNORE` so re-processing the same memory is idempotent. Aspect Feedback --- After recall, `aspect-feedback.ts` feeds behavioral signals back to the knowledge graph by measuring FTS overlap between retrieved content and entity aspects. The function `applyFtsOverlapFeedback` is called at session end with the session key and agent ID. The feedback loop operates as follows. Memories that received at least one FTS hit during the session (tracked in `session_memories.fts_hit_count`) are looked up. For each confirmed memory, the `entity_attributes` table is queried to find its parent `aspect_id`. Confirmation counts are summed per aspect, and each aspect's `weight` column is incremented by `delta × confirmations`, clamped to `[minWeight, maxWeight]`. This updates which aspects were structurally "correct" for the session — aspects whose memories were actively searched for gain weight, aspects whose memories were ignored do not. A separate `decayAspectWeights` function handles time-based decay. Aspects that have not been updated in more than `staleDays` days have their weight reduced by `decayRate`, floored at `minWeight`. Session decay is governed by a counter so it runs every N sessions rather than on every call. Telemetry is accumulated in an in-process snapshot (`getFeedbackTelemetry`) and exposed on the pipeline status endpoint: `feedbackAspectsUpdated`, `feedbackFtsConfirmations`, `feedbackDecayedAspects`, and `feedbackPropagatedAttributes`. Graph-Augmented Search --- At query time, when `graph.enabled` is true and the caller requests a graph boost, `getGraphBoostIds` is called synchronously against the read database. The function returns a set of memory IDs that should receive a score boost in the final recall ranking. The lookup proceeds in three steps. First, query tokens (2+ character alphanumeric runs, lowercased) are matched against `canonical_name LIKE ?` for each token, with results ordered by `mentions` descending and capped at 20 entity hits. Second, the matched entity IDs are expanded one hop through the `relations` table in both directions (source and target), collecting up to 50 additional neighbor entity IDs. Third, the expanded entity ID set is joined through `memory_entity_mentions` to collect up to 200 distinct non-deleted memory IDs. The entire function is deadline-bounded. A `Date.now()` cutoff is checked after each step; if the deadline is exceeded, the function returns whatever it has accumulated so far with `timedOut: true`. On any exception, it returns an empty result. There is no degradation in recall correctness — graph boosting is always additive. The boost weight (default 0.15) is applied by the search layer on top of the hybrid BM25 + vector score. IDs in the graph-linked set receive a score increment of `graphBoostWeight`. Worker Model --- The extraction pipeline runs as a polling worker loop. A single `startWorker` call starts a `setTimeout`-chain tick loop that leases one job per tick from the `memory_jobs` table, processes it, and reschedules itself. The use of `setTimeout` chains rather than `setInterval` allows dynamic delay adjustment via exponential backoff on failure. Job leasing is atomic. The tick calls `accessor.withWriteTx` to both select and update the job row in one transaction: `SELECT ... LIMIT 1` on pending extract jobs ordered by `created_at`, immediately followed by an `UPDATE` setting `status = 'leased'`, `leased_at`, and incrementing `attempts`. This ensures no two workers can lease the same job even if multiple processes were running. On failure, a job's `attempts` counter is already incremented (happens during lease). If `attempts >= max_attempts` (default 3), the job is moved to status `dead`; otherwise it returns to `pending` for retry on the next tick. A dead job stays in the table for audit and cleanup purposes. Job deduplication is enforced at enqueue time: `enqueueExtractionJob` checks for any existing job for the same `memory_id` with status `pending` or `leased` before inserting a new one. A stale lease reaper runs on a fixed 60-second `setInterval`. Any job with `status = 'leased'` and `leased_at` older than `leaseTimeoutMs` (default 300,000 ms / 5 minutes) is reset to `pending`. This handles worker crashes that leave jobs leased indefinitely. Backoff state tracks consecutive failures. On zero failures, the tick interval is `workerPollMs` (default 2,000 ms). Each failure doubles the delay (starting from 1,000 ms base) up to a 30,000 ms cap, with up to 500 ms of random jitter added. Document Ingest --- The document worker processes `document_ingest` jobs from the same `memory_jobs` table. It runs as a fixed-interval polling loop separate from the extraction worker, defaulting to 10,000 ms between ticks. A document ingest job carries a `document_id` rather than a `memory_id`. The referenced row in the `documents` table carries the source content and type. Two source types are supported: `url` (content fetched via HTTP) and anything else (content read from `raw_content`). URL fetch is bounded by `documentMaxContentBytes` (default 10 MB). The URL fetcher accepts responses with content types `text/html`, `text/*`, `application/json`, and `application/xml`. For HTML, it extracts the page title and strips `