# Zero-Copy Memory-Mapped Index (`*.mmap`) To enable ultra-low-latency real-time lookups and keep memory usage extremely low during watch sessions, DiskTracker implements a custom memory-mapped binary search index next to the SQLite database. For the default `~/.disktracker/data.db`, the mmap file is `~/.disktracker/data.mmap`. --- ## 💾 Packed Binary Representation Directory paths and computed metadata are packed into a flat binary file structure mapping directly to in-memory slices: * **20-byte Header**: Stores magic bytes (`DSKTRKMM`), format version, entry count, and path-pool byte length. * **16-byte Entries**: Each entry stores a path offset, path length, and recursive `total_bytes` value. * **Contiguous Path Pool**: Paths are packed once after the entry table, avoiding per-entry string allocation while keeping lookups byte-oriented. --- ## 🔍 Zero-Copy $O(\log N)$ Lookup Standard databases or in-memory tables consume substantial memory and require deserializing structures into heap memory. DiskTracker avoids this completely: * **Alphabetical Binary Search**: Paths are stored alphabetically sorted on disk inside the memory-mapped file. * **Zero Heap Allocations**: Lookups perform a pure binary search over the raw memory-mapped byte stream. Path matching is performed directly on the mapped pointers without copying or allocating heap memory. * **$O(\log N)$ Time Complexity**: Resolves directory size in logarithmic time over sorted byte paths. --- ## 🏗️ LSM-Style Transient Overlay Because memory-mapped files cannot be dynamically resized in place without expensive operating system page faults or file fragmentation, DiskTracker utilizes a Log-Structured Merge (LSM)-style structure for live mutations: * **Active In-Memory Writes**: Live changes are buffered into a transient `HashMap, OverlayEntry>`. * **`Deleted` Tombstones**: When a subtree is deleted, tombstones are written for matching mapped and overlay paths so later lookups cannot leak stale sizes. * **Unified Reader Interface**: Reads query the overlay first. If a modified value or tombstone matches, that result wins; otherwise, lookup falls back to the underlying `$O(\log N)$` memory map. --- ## 🔄 Double-Buffered Compaction To merge the transient in-memory overlay map back into the primary memory-mapped index without interrupting active watch sessions or locking files: 1. **Temp File Allocation**: The compaction engine writes a fully compacted, sorted index directly to a temporary file beside the target mmap file (for example, `data.mmap.tmp`). 2. **Flush & Page Sync**: Calls OS flush operations (`sync_all`) to ensure data is physically written to disk. 3. **Cross-Platform File Swap**: Drops the active memory map pointer to release the file handle (crucial on Windows to avoid access errors) and atomically renames the temp file to override the primary mmap file. 4. **Re-hydration**: Instantly re-maps the new compacted mmap file, clearing the transient in-memory overlay map safely. --- ## 🚀 Fast Boot Re-hydration When `disktracker watch` starts, it bypasses full disk scans using instant re-hydration: * **Mmap Hydration**: If the database-adjacent `.mmap` file exists, DiskTracker maps it directly and initializes the incremental engine from the packed entries. * **Snapshot Reuse**: If `watch_state` contains a `last_snapshot_id`, the watch session reuses that snapshot id after successful mmap hydration. * **Fallback Scan**: If the `.mmap` file is absent or cannot be loaded, DiskTracker performs physical initialization scans and writes a fresh `.mmap` file for the next watch boot.