# Disk Flushing & OOM Prevention > Architecture documentation for sweet-search's indexing persistence layer. > Covers memory management, crash safety, and incremental persistence > across all indexing phases. > > **Status**: Implemented (`c775434`) > **Last updated**: 2026-04-07 > **HCGS Status (2026-05)**: HCGS is disabled by default (`HCGS_CONFIG.enabled = false`); references below describe the original design. Flip the flag to re-enable. --- ## Table of Contents 1. [Problem Statement](#1-problem-statement) 2. [Architecture Overview](#2-architecture-overview) 3. [HNSW Persistence (Phase A)](#3-hnsw-persistence) 4. [Streaming Embeddings from SQLite (Phase B)](#4-streaming-embeddings-from-sqlite) 5. [Late-Interaction Segmented Flush (Phase C)](#5-late-interaction-segmented-flush) 6. [Artifact Builder Streaming (Phase D)](#6-artifact-builder-streaming) 7. [HNSW Periodic Checkpoints (Phase E)](#7-hnsw-periodic-checkpoints) 8. [SQLite WAL Tuning (Phase F)](#8-sqlite-wal-tuning) 9. [Code Graph Batched Insert (Phase G)](#9-code-graph-batched-insert) 10. [Crash-Resume Tracker (Phase H)](#10-crash-resume-tracker) 11. [Memory Budget](#11-memory-budget) 12. [Design Decisions & Constraints](#12-design-decisions--constraints) 13. [Industry SOTA Reference (April 2026)](#13-industry-sota-reference) 14. [Research Sources](#14-research-sources) --- ## 1. Problem Statement Before this work, indexing a 100K-chunk codebase could peak at **~1.0–1.5 GB** of V8 heap due to several large in-memory arrays held simultaneously. At 500K chunks this exceeded V8's default ~4 GB heap. A process crash at any point during indexing lost the entire run — HNSW, late-interaction, and artifact indexes were each written in a single atomic save at the end. ### Memory accumulation points (ranked by peak size) | # | Component | Location | Peak (100K chunks, 512d) | |---|-----------|----------|--------------------------| | 1 | Late-interaction docs Map | `late-interaction-index.js` | **335+ MB** (int8) | | 2 | Artifact parsed embeddings | `artifact-builder.js` | **409 MB** | | 3 | Embeddings array | `indexer-build.js` | **200 MB** | | 4 | HNSW native index | USearch C++ heap | **~200 MB** | | 5 | All chunks array | `indexer-build.js` | **100+ MB** | | 6 | ONNX hidden states | ORT tensor per batch | **100 MB** (transient) | | 7 | Artifact DB row load | `artifact-builder.js` | **50–100 MB** | | 8 | Code graph entities | `indexer-build.js` | **2–5 MB** | ### Crash safety gaps (before) | Component | Crash at minute 59/60 | Data lost | |-----------|----------------------|-----------| | HNSW index | 100% lost — single `save()` at end | Entire HNSW | | Late-interaction index | 100% lost — single `save()` at end | Entire LI index | | Binary HNSW + int8 | 100% lost | Entire artifact set | --- ## 2. Architecture Overview The persistence layer is organized around a pipeline of indexing phases, each with its own flush/crash-safety strategy: ``` Phase 1: Code Graph → Batched insert (100 files/batch), atomic swap Phase 2: Vector Embeddings → Pipelined embed+insert to SQLite, WAL mode Phase 3: HNSW Index → Reads from SQLite cursor, time-based checkpoints Phase 4: Late Interaction → Segmented flush (10K docs/segment), staged builds Phase 5: Artifacts → Streams from SQLite cursor, per-row truncation ``` ### Phase concurrency model ``` Timeline: ├─ HCGS Summaries ──────────────────────┐ (hcgsPromise) ├─ Vector Embeddings ───────────────────┤ (vectorPromise) │ (if shouldParallelLI) │ ├─ Late-Interaction Encoding ───────────┤ (liPromise) │ │ └─── Promise.all ───────────────────────┘ │ ▼ Sequential: ├─ HNSW Build ────────────── Streams from SQLite (Phase B) ├─ LI Sequential Fallback ── Only if !shouldParallelLI └─ Artifact Build ─────────── Streams from SQLite (Phase D) ``` --- ## 3. HNSW Persistence ### Search-time: mmap via `view()` (Phase A.1) The search path uses USearch's `view()` (mmap, zero-copy) instead of `load()` (full copy into process memory). ``` Build phase: index = new Index({...}); add vectors; index.save(path) Serve phase: index = new Index({...}); index.view(path) // mmap, zero-copy ``` **Files**: `core/vector-store/hnsw-index.js` (`load()` accepts `{ mmap: true }`), `core/search/sweet-search.js` (passes `{ mmap: true }` in search init). | Metric | Before (load) | After (view) | |--------|---------------|--------------| | HNSW at search time | ~200 MB in process | 0 MB (mmap, OS-managed) | | HNSW during build | ~200 MB in process | ~200 MB (**unchanged**) | ### Build-time: periodic checkpoints (Phase E) See [Section 7](#7-hnsw-periodic-checkpoints). --- ## 4. Streaming Embeddings from SQLite ### What changed (Phase B) Before: `pipelinedEmbedAndInsert` accumulated an `embeddings[]` array (200 MB) and `buildVectorIndex` returned `{ allChunks, allEmbeddings }` (300+ MB total). HNSW construction consumed these arrays. After: `pipelinedEmbedAndInsert` returns only a count. `buildVectorIndex` returns stats only. HNSW reads directly from SQLite via a streaming cursor. ### `streamVectorsFromDb` generator ``` core/indexing/indexer-ann.js function* streamVectorsFromDb(db, _dim, order = 'sequential') ``` For **sequential** order: `SELECT ... FROM vectors ORDER BY rowid` with `.iterate()` — O(1) memory per row. For **shuffle** or **diversity** order: pre-computes a permutation in a temp table (`hnsw_order`), then streams via `JOIN ... ORDER BY o.pos`. The temp-table approach replaces the old in-memory index permutation (`applyInsertionOrder`). The `indices` array is O(n) integers (~800 KB for 100K rows) — negligible compared to the O(n*d) arrays it replaces. ### Insertion order with temp tables ```sql CREATE TEMP TABLE hnsw_order (pos INTEGER PRIMARY KEY, vector_rowid INTEGER); -- populate with permuted rowids SELECT v.* FROM hnsw_order o JOIN vectors v ON v.rowid = o.vector_rowid ORDER BY o.pos; ``` **Constraint**: non-sequential orders require a writable DB connection (temp tables). The code opens `readonly: true` only for sequential order. ### Files changed - `core/indexing/indexer-build.js` — `pipelinedEmbedAndInsert` returns count; `buildVectorIndex` returns stats only - `core/indexing/indexer-ann.js` — `buildHNSWIndex(dbPath)` and `incrementalUpdateHNSW(dbPath, changedFiles)` read from SQLite - `core/indexing/indexer-phases.js` — always pre-chunks files; passes `DB_PATHS.codebase` to HNSW build | Metric | Before | After | |--------|--------|-------| | embeddings[] peak | 200 MB | 0 MB | | allChunks[] peak | 100+ MB | 0 MB | | HNSW build input | In-memory arrays | SQLite cursor (O(1) per row) | --- ## 5. Late-Interaction Segmented Flush ### Segment format (Phase C) Documents are split into segments of `LI_SEGMENT_SIZE` (default 10,000 docs). Each segment is a standalone binary file: ``` .sweet-search/ late-interaction-tokens.db # Stub → points to segment dir late-interaction-tokens.db.segments/ segment-0000.bin # docs 0-9999 segment-0001.bin # docs 10000-19999 manifest.json # segment metadata ``` Binary segment format: ``` Header (64 bytes): magic: u32 = 0x4C495345 ("LISE") version: u16 = 1 docCount: u32 tokenDim: u16 useInt8: u8 padding: 51 bytes Body: JSON array of documents (tokens, metadata, dequant params) ``` ### Search-time behavior (Option 3) All segments are flattened into `this.documents` at search init — identical search semantics to the legacy single-file format. Memory savings come from indexing time only. Upgrade to mmap-per-segment (Option 1) when search-time memory becomes a constraint; the segment format is mmap-ready. ### Staged rebuild isolation When `saveToPath !== loadFromPath` (staged builds), `resetForSave(saveToPath)` is called **immediately after `init()`**, before any `add()` calls. This resets `_segmentDir` and `_segments` so that intermediate segment flushes during encoding go to the staged directory, never the live one. On `save()`, all segments are rewritten from scratch using the authoritative `this.documents` Map — stale segment files from a previous load are never reused, ensuring document removals are reflected on disk. ### Files changed - `core/ranking/late-interaction-index.js` — segment write/read/flush logic, `resetForSave()`, rewrite-on-save - `core/indexing/indexer-ann.js` — calls `resetForSave(saveToPath)` before encoding loop | Metric | Before | After | |--------|--------|-------| | Peak LI memory during indexing (100K docs) | 335 MB | ~33 MB (1 segment) | | LI memory at search time | 335 MB | 335 MB (unchanged, Option 3) | | Crash data loss | 100% | Max 10K docs (1 segment) | --- ## 6. Artifact Builder Streaming ### Phase D(b): Kill `Array.from` copy `artifact-builder.js` used `Array.from(new Float32Array(...))` to parse embeddings from SQLite BLOBs. The `Array.from` creates a JS array (~2x memory vs typed arrays due to pointer boxing). Changed to use `Float32Array` directly. ### Phase D(a): Stream from SQLite cursor `buildHnswIndexFromDb(db)` streams rows from SQLite with per-row truncation and quantization. Pre-computes insertion order in a temp table (`artifact_order`) for non-sequential modes. The `buildAndSaveFloatStoreFromDb` function similarly streams from a cursor. **Constraint**: `FloatVectorStore.build()` requires the full entries array, so float store construction still materializes O(n) entries. This is bounded to truncated vectors (not full-dimension), so peak is lower. ### Files changed - `core/indexing/artifact-builder.js` — `buildHnswIndexFromDb(db)`, `buildAndSaveFloatStoreFromDb(db)`, `Float32Array` direct use | Metric | Before | After | |--------|--------|-------| | items[] with Array.from | 409 MB | 0 MB (streamed per-row) | | truncated[] array | ~200 MB | 0 MB (per-row) | | Peak during artifact build | ~550 MB | ~60 MB | --- ## 7. HNSW Periodic Checkpoints ### Design (Phase E) USearch `save()` serializes the **entire** index, not a delta. Save cost grows with index size. **Time-based** checkpoints guarantee bounded data loss (~30s) regardless of index size or insertion speed. ### Checkpoint protocol ``` During build: 1. After each vector add, check elapsed time since last checkpoint 2. If elapsed >= 30s AND vectorsSinceCheckpoint >= 1000 → save checkpoint 3. Write sidecar JSON: { vectorsAdded, lastRowId, version, timestamp } 4. fsync(checkpoint) → fsync(sidecar) → fsync(parent directory) 5. On completion → final save(), delete checkpoint + sidecar On restart: 1. Check for checkpoint file + sidecar 2. If found → init() fresh, load USearch graph from checkpoint, rebuild JS metadata (idMap, reverseMap, nextKey) from SQLite 3. Resume adding from rowid > sidecar.lastRowId 4. If config changed → discard checkpoint, full rebuild ``` ### Sequential-only constraint Checkpoint resume uses `rowid <= lastRowId` to skip already-processed vectors. This is only valid when the stream is ordered by rowid (sequential mode). For shuffle/diversity orders, checkpoints are disabled and stale checkpoint files are cleaned up. ### fsync ordering The checkpoint write follows POSIX durability requirements: `write file → fsync(file) → write sidecar → fsync(sidecar) → fsync(directory)`. Without directory fsync, the checkpoint file may not be visible after power-loss even though the contents are durable. ### Files changed - `core/indexing/indexer-ann.js` — checkpoint save/load/resume/cleanup, fsync helpers, metadata rebuild from SQLite | Parameter | Default | |-----------|---------| | `CHECKPOINT_INTERVAL_SEC` | 30 | | `MIN_VECTORS_BETWEEN_SAVES` | 1,000 | | Max data loss on crash | ~30s of work | --- ## 8. SQLite WAL Tuning ### WAL on macOS (Phase F.1) `isWalSafe()` was restricted to Linux only. WAL mode works correctly on macOS (APFS/HFS+). The guard now only excludes WSL/NTFS mounts (unreliable file locking). ### Indexing-optimized pragmas (Phase F.2) ```sql PRAGMA wal_autocheckpoint = 4000; -- ~16 MB WAL before auto-checkpoint PRAGMA mmap_size = 1073741824; -- 1 GB mmap for reads during build PRAGMA cache_size = -64000; -- 64 MB page cache PRAGMA journal_size_limit = 67108864; -- 64 MB WAL size limit ``` ### Explicit checkpoint after inserts `checkpointWal(db)` calls `PRAGMA wal_checkpoint(TRUNCATE)` after all inserts complete and before HNSW construction reads from the DB. Critical because the streaming cursor holds a read transaction open for the entire HNSW build, preventing WAL checkpointing. ### Files changed - `core/indexing/indexer-utils.js` — `isWalSafe()`, `configureJournalMode()`, `checkpointWal()` - `core/indexing/indexer-build.js` — calls `checkpointWal()` after inserts --- ## 9. Code Graph Batched Insert ### Design (Phase G) Entity/relationship extraction and insertion are batched every 100 files instead of accumulating everything in memory before a single `insertGraph()` call. All batches go to the temp DB before the atomic swap. Memory savings are small (2–5 MB → <0.5 MB per batch), but crash granularity improves: only the last 100 files' entities are lost on crash. ### Files changed - `core/indexing/indexer-build.js` — `buildCodeGraph()` batch loop --- ## 10. Crash-Resume Tracker ### Design (Phase H) The incremental tracker (`core/indexing/incremental-tracker.js`) is extended with per-phase progress markers written durably (fsync). On restart, the pipeline can skip completed phases. ### Phase progress file ```json { "phase": "hnsw", "status": "in_progress", "configFingerprint": { "provider": "...", "model": "...", ... }, "timestamp": "2026-04-07T10:23:45Z" } ``` Phase transitions: `vectors` → `hnsw` → `late-interaction` → `artifacts`. Progress file is cleared after successful pipeline completion. Config fingerprint mismatch discards stale progress. ### Files changed - `core/indexing/incremental-tracker.js` — `updatePhaseProgress()`, `markPhaseComplete()`, `getPhaseProgress()`, `clearPhaseProgress()` - `core/indexing/indexer-phases.js` — phase markers at boundaries --- ## 11. Memory Budget ### Concurrent peak (worst-case overlap) | Milestone | V8 heap | Process RSS | OOM risk | |-----------|---------|-------------|----------| | Before | ~785 MB | ~885 MB | **High** at >200K chunks | | After all phases | ~213 MB | ~313 MB | **None** | Note: HNSW's 200 MB lives in USearch's C++ heap (process RSS), not V8's managed heap. V8's GC cannot see it and it won't trigger V8 OOM. ### Per-component breakdown (after) | Component | Peak | |-----------|------| | Embeddings array | **0 MB** (streamed from SQLite) | | All chunks array | **0 MB** (streamed from SQLite) | | Late-interaction docs | **~33 MB** (1 segment during indexing) | | Artifact build | **~60 MB** (streamed) | | HNSW build (C++ heap) | **~200 MB** (unavoidable, not V8 heap) | | Code graph | **~0.5 MB** (batched) | | ONNX inference | **~100 MB** (transient, per batch) | | SQLite + WAL | **~80 MB** (tuned cache + mmap) | --- ## 12. Design Decisions & Constraints ### Why SQLite streaming, not flat files SQLite `.iterate()` provides O(1) memory per row with transactional guarantees. A custom flat file would be faster (~5–15% less overhead from per-row BLOB deserialization) but adds a persistence format to maintain. SQLite is already the source of truth for vector storage. ### Why time-based checkpoints, not count-based USearch `save()` serializes the entire index. Save cost grows with index size: saving 200K vectors takes longer than saving 50K. A fixed count interval means later checkpoints are disproportionately expensive. Time-based guarantees bounded data loss regardless of corpus size. ### Why sequential-only for checkpoint resume Non-sequential insertion orders (shuffle, diversity) permute the stream so that rowid is not a monotonic marker of progress. Storing the full permutation in the sidecar would work but adds significant complexity. The dominant production pattern (Qdrant, HAKES) uses per-segment version tracking which requires immutable segments — a larger architectural change deferred for now. ### Why Option 3 for LI segments Option 3 (flatten at search init) preserves current search semantics exactly. Option 1 (mmap per segment) saves memory at search time but changes the scoring hot path — each document's tokens must be accessed from a mmap'd segment file instead of a contiguous in-memory Map. Deferred until search-time memory becomes a constraint. ### Why temp tables for insertion order The old `applyInsertionOrder` operated on paired in-memory arrays via index permutation (`indices.map(i => chunks[i])`). With streaming, random access isn't possible. A temp table in SQLite serves the same purpose: pre-compute the order, then join to stream in that order. The `indices` array is O(n) integers (~800 KB for 100K rows) — negligible. ### USearch binding limitations - `view()` is read-only mmap. It helps search time only. - During construction, the HNSW graph lives in C++ heap (~200 MB for 100K@512d). Not addressable by V8 GC but counts toward process RSS. - Native C++ supports mmap construction, but this is not exposed in JS. --- ## 13. Industry SOTA Reference ### Production vector databases (April 2026) **Qdrant** (Rust, HNSW): - Segments with configurable `flush_interval_sec`, mmap-backed vectors/graph - Per-segment version tracking for WAL replay - 2025–2026: Incremental HNSW indexing, inline storage, graph compression **Milvus** (Go/C++): - Full WAL (Kafka-backed cluster, RocksDB standalone) - 2.6 (2025): Woodpecker WAL — object-storage persistence, O_DIRECT **Oracle 26ai** (2026): - `ENABLE_CHECKPOINT` / `DISABLE_CHECKPOINT` for HNSW indexes - Automatic checkpoints during creation, repopulation, refresh **USearch** (C++, our library): - `view(path)` for mmap read-only search; `save()/load()` for serialization - Construction always in-memory; mmap construction not exposed in JS bindings ### Key takeaways 1. **Nobody has a clean incremental WAL for HNSW graphs.** Industry consensus: mmap the graph to disk and/or periodic checkpoint snapshots. 2. **Per-segment versioning** (Qdrant) is more robust than positional skip. 3. **fsync ordering is critical.** Atomic rename is NOT durable without `write → fsync(file) → rename → fsync(parent directory)`. 4. **WAL + checkpoint + segment-version-tracking** is the dominant pattern. --- ## 14. Research Sources ### Production systems - Qdrant 1.17+ (Feb 2026): incremental HNSW, inline storage, per-segment versions - Milvus 2.6 (2025): Woodpecker WAL, zero-disk durability - Oracle 26ai (2026): HNSW checkpoint/snapshot architecture - DuckDB VSS: Experimental HNSW persistence (WAL not yet implemented) - OpenSearch k-NN #1599: Streaming vectors to JNI to avoid 2x memory - TursoDB (2025): Tantivy Directory trait over B-Tree for transactional FTS ### Academic papers - **SPFresh** (Harvard, 2024, arXiv:2410.14452): Incremental in-place HNSW - **Starling** (Tongji, 2024, arXiv:2401.02116): I/O-efficient disk-resident graph - **LSM-VEC** (NTU, 2025, arXiv:2505.17152): LSM-tree dynamic vector search with WAL - **P-HNSW** (Konkuk, Sep 2025, doi:10.3390/app151910554): Crash-consistent HNSW - **HAKES** (NUS, VLDB 2025, pvldb/vol18/p3049): Periodic checkpoints + re-insert - **I/O Optimizations for Disk-Resident ANN** (Feb 2026, arXiv:2602.21514): I/O = 70–90% of latency - **B+ANN** (Georgia Tech/IBM, Nov 2025, arXiv:2511.15557): B+ tree disk NN - **SHINE** (Salzburg, Jul 2025, arXiv:2507.17647): Scalable HNSW in disaggregated memory - **GoVector** (Northeastern, Aug 2025, arXiv:2508.15694): I/O-efficient HNSW caching - **In-Place Updates of Graph Index** (CMU, Feb 2025, arXiv:2502.13826): Streaming ANN - **Quake** (Waterloo/Apple, Jun 2025, arXiv:2506.03437): Adaptive vector index - **FreshDiskANN** (Microsoft, 2021, arXiv:2105.09613): Foundational streaming graph ANN ### Durability fundamentals - SQLite WAL docs: `autocheckpoint`, `wal_checkpoint(TRUNCATE)`, `mmap_size` - POSIX fsync ordering: file fsync + directory fsync for durable rename - Qdrant memory article: 1M vectors served with 135 MB RAM via full mmap --- ## 15. Streaming rebuild for large repos (2026-06) — restores Phase B/C at any scale > **Status**: Implemented. `core/indexing/streaming-vectors.js`, > `core/indexing/indexer-phases.js` (gate), `core/ranking/late-interaction-index.js` > (bounded build mode), `core/indexing/dedup/dedup-phase.js` (shared annotation). ### 15.1 The regression Phases B (streaming embeddings) and C (segmented LI flush) above were quietly defeated for the **default local-model path** by two later, well-intentioned changes: 1. **Global length-sort embed.** `buildVectorIndex` sends *all* texts to the bucketer in one call (`batchSize = embedTexts.length`) so `callLocalModelBucketed` can globally length-sort for uniform batches. That makes `pipelinedEmbedAndInsert`'s outer loop run **once**, so the embeddings array + insert rows + write buffer accumulate the whole corpus before a single flush — Phase B's "embeddings[] peak: 0 MB" no longer holds. 2. **LI docs never evicted.** `LateInteractionIndex.add()` keeps every doc in `this.documents` for the whole build; `_flushSegment()` released only the rolling 10k `_currentSegment`, not `this.documents`. Phase C's "~33 MB (1 segment)" became O(all docs) of per-token slabs. Add the always-resident chunk corpus (`chunkFiles()` → every chunk + every embed-text held through dedup/embed/LI) and the alias-insert materialising all alias rows at once, and peak heap is **O(repo)**. Measured on the default ~4 GB Node heap: | Repo | Files | Chunks | Dedup | Dominant hog | In-memory result | |------|-------|--------|-------|--------------|------------------| | tursodatabase/libsql @59b922b | 24,238 | **431,260** | 94.2% alias (25,124 exemplars) | chunk corpus (~3 GB) + 406k alias rows | parallel: **hang → SIGKILL @ 6.5 GB RSS**; sequential: borderline | | swc-project/swc | 62,609 | **216,940** | 17% alias (180,156 exemplars) | embed output + LI per-token (180k) | OOM / hang | This is **backend-agnostic**: the hogs are JS-side (chunk objects, embedding arrays, LI slabs, alias rows), so CUDA, Metal, CoreML and ORT-CPU all crash the same way. (The Apple parallel-LI Metal *hang* is an additional symptom; the underlying memory blow-up is universal.) ### 15.2 The fix — bounded-window streaming For **large full rebuilds** (`filesToIndex.length ≥ SWEET_SEARCH_STREAM_MIN_FILES`, default 5000) the phase routes to `buildVectorsAndLiStreaming` instead of the in-memory `buildVectorIndex ‖ buildLateInteractionIndex`. Small repos and all incremental runs keep the byte-identical in-memory path, so **benchmark indexes are unaffected** (auto-selected by size; no opt-in flag — set `SWEET_SEARCH_STREAM_VECTORS=0` to force legacy). ``` 1. PARSE+SPILL parse files in windows (reuse chunkFiles), compute dedup fingerprints, apply the LI skip policy, spill each chunk to a temp SQLite store. Only lightweight per-chunk records stay resident (id, text length, path/hash, fingerprint, li-keep). 2. DEDUP cluster the resident fingerprints GLOBALLY (identical clusters to the in-memory path — verified byte-for-byte on libsql: 13,198 clusters / 406,136 aliases) and annotate the records. 3. EMBED hydrate ONLY exemplar seqs (file-aligned windows) and insert via the UNCHANGED pipelinedEmbedAndInsert → callLocalModelBucketed. 4. ALIAS hydrate ONLY alias seqs in windows; copy exemplar vectors via the UNCHANGED insertAliasVectors (orphan purge skipped — a fresh build can't orphan). 5. LI feed LI-lite records (exemplar token-text only; eligible aliases are pointer-only) to the UNCHANGED buildLateInteractionIndex in bounded build mode: each flushed segment's per-token slabs are evicted (peak O(one segment)). A lightweight exemplar-id set keeps alias-pointer registration valid post-eviction. ``` Peak heap is **O(window)** + a tiny O(repo) bookkeeping array (ids/offsets/ fingerprints), regardless of repo size or backend. ### 15.3 What is deliberately NOT changed - **The tuned compute-batching.** `callLocalModelBucketed` (cache-aware batch sizing — §"Cache-aware batching" in the README) and `buildLateInteractionBatches` are reused verbatim. Streaming only changes *where chunk text lives* (disk vs heap) and *how results are flushed*, not how compute batches are formed. For repos that fit one window the embed call is identical to today. - **Global dedup.** Preserved (clustering runs over all fingerprints), so dup-heavy repos keep their alias short-cut and don't re-embed everything — the "don't slow down" guard for libsql-class repos. - **On-disk format.** codebase.db vectors + atomic swap, SSLX-v3 LI segments, binary-HNSW/int8 artifacts — all identical. Existing goldens stay readable. ### 15.4 Cost The spill writes each chunk to a temp SQLite file once and reads back only the seqs each pass needs (exemplars for embed, aliases for alias-insert; LI input is assembled during those passes). The IO is a few seconds against a multi-minute encode, so indexing throughput is unchanged. The temp store is deleted in a `finally`. ### 15.5 Results **Memory (libsql @59b922b vectors phase, ~431k chunks):** | Path | Peak RSS | Outcome | |------|----------|---------| | In-memory, parallel embed+LI (Apple default) | **6.5 GB** | hang -> SIGKILL | | In-memory, sequential (`--max-old-space-size=4096`) | **~6.0 GB** | completes, borderline | | **Streaming** (default heap) | **~4.2 GB** | bounded | | **Streaming** (hard `--max-old-space-size=2048`) | ~3.3 GB RSS | **V8 heap held under 2 GB** through parse + global dedup + embed | The streaming RSS is dominated by the *evictable* 1.3 GB code-graph.db mmap + the model; the V8 managed heap (what OOMs) is far lower and bounded by the window, not the repo. Streaming uses **less** memory than the in-memory path *and* removes the O(repo) growth entirely. **Correctness (no MRR regression):** indexing a corpus via streaming vs in-memory produces a **byte-identical `codebase.db`** -- every embedding BLOB and the full exemplar/alias dedup partition match exactly (maxAbsDiff = 0) -- so search results, and therefore MRR, are identical. This holds exactly for any repo whose exemplars fit one embed window (GCSN-class benchmarks); larger repos differ only by the floating-point batch-shape nondeterminism already present in the in-memory baseline. At libsql scale the global dedup is byte-identical (13,198 clusters / 406,136 aliases). **End-to-end:** `tests/integration/streaming-vectors.integration.test.js` indexes a synthetic pathological repo (a 250k-line >1 MB generated file + a binary blob + 160 files incl. near-duplicates) through the forced streaming path under a constrained heap and asserts it completes, the oversized + binary files are skipped by admission, global dedup runs, and an in-process `SweetSearch` query returns ranked results. Every change is in the JS accumulation layer, so this is backend-agnostic (CUDA / Metal / CoreML / ORT-CPU). > Note: a *full* end-to-end libsql run on Apple Silicon intermittently stalls in > the CoreML/ANE embed under sustained load on a hot machine (the same > `embedBatch` call the in-memory path makes -- orthogonal to this memory fix). > Bounded memory, byte-identical-index parity, and the end-to-end integration > test establish the fix independently of that backend flakiness; ORT-CPU > completes libsql without it.