# System Architecture DiskTracker is engineered for maximum throughput, low overhead, and persistent filesystem observability. Its design combines a concurrent, lock-free parallel scanner, a non-pointer-chasing node arena, a memory-mapped fast index lookup, and an incremental, lazy mutation reconciler. --- ## 🗺️ Component Interactions The following diagram illustrates how DiskTracker modules interact to deliver high-performance directory scans and real-time observability: ```mermaid graph TD %% CLI / User Input CLI[disktracker-cli] -->|Command Options| Core[disktracker-core] CLI -->|Watch Command| Watcher[disktracker-watch] CLI -->|Reconcile Command| DB_Store[disktracker-db] %% Parallel Scanning Core -->|1. Cold Parallel Scan| Deque[crossbeam-deque work pool] Core -->|2. Warm Validation| Warm_Cache[SnapshotTree fingerprints] Deque -->|Populates| Arena[Segment-Allocated Arena: NodeHot & NodeCold] Arena -->|Bulk Writes| DB_Store %% Real-time Monitoring & Hydration Watcher -->|Instant Hydration| MmapIndex[*.mmap: Memory-Mapped Files] MmapIndex -->|O log N Lookup| LSM[LSM In-Memory Overlay & Tombstones] Watcher -->|FS Events| DirtyQ[Dirty Queue Deduplicator] DirtyQ -->|Batches| MutationLog[SQLite mutation_log Table] %% Database Sync & Lazy Reconcile DB_Store -->|WAL Configured SQLite| SQL[(data.db)] DB_Store -->|Lazy Reconcile| Propagate[Bottom-up Delta Propagator] Propagate -->|Updates| SQL %% Compaction Pass LSM -->|Atomic Compaction| DoubleBuffer[mmap.tmp Write & Renaming] DoubleBuffer -->|Compacted Index| MmapIndex ``` --- ## 🛠️ Modular Crate Structure DiskTracker is built as an extensible workspace comprised of five highly focused systems crates: ### 1. [`disktracker-core`](../crates/disktracker-core) The high-speed parallel filesystem traversal engine. Manages work-stealing thread pools, segmented memory arenas, and raw OS traversal abstractions. ### 2. [`disktracker-events`](../crates/disktracker-events) Defines filesystem mutation events (`FsEvent`) and handles dirty path deduplication structures (`DirtyQueue`). ### 3. [`disktracker-watch`](../crates/disktracker-watch) Connects OS filesystem change notifications (`notify`), manages the database-adjacent `*.mmap` file, maintains the LSM-style overlay, and propagates local size deltas through cached parent paths. ### 4. [`disktracker-db`](../crates/disktracker-db) Houses the SQLite connection manager, custom database WAL migrations, snapshot differences logic, explain attribution engine, and bottom-up reconciliation routines. ### 5. [`disktracker-cli`](../crates/disktracker-cli) Orchestrates command argument validation, terminal progress indicators, and formats structured JSON or tabular outputs. --- ## 💾 Database Schema DiskTracker implements a fully indexed SQLite schema structured for snapshots, diffs, live events, and watcher reconciliation. The active schema is defined in [`crates/disktracker-db/src/schema.rs`](../crates/disktracker-db/src/schema.rs); the summary below mirrors the current table and index names. ```sql CREATE TABLE IF NOT EXISTS snapshots ( id INTEGER PRIMARY KEY AUTOINCREMENT, scan_root TEXT NOT NULL, started_at INTEGER NOT NULL, finished_at INTEGER NOT NULL, total_files INTEGER NOT NULL, total_bytes INTEGER NOT NULL, error_count INTEGER NOT NULL DEFAULT 0, host TEXT NOT NULL ); CREATE TABLE IF NOT EXISTS dir_snapshots ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_id INTEGER NOT NULL REFERENCES snapshots(id) ON DELETE CASCADE, path_blob BLOB NOT NULL, path_utf8 TEXT, depth INTEGER NOT NULL, total_bytes INTEGER NOT NULL, file_count INTEGER NOT NULL, mtime INTEGER NOT NULL, dev INTEGER NOT NULL DEFAULT 0, ino INTEGER NOT NULL DEFAULT 0, fingerprint_lo INTEGER NOT NULL DEFAULT 0, fingerprint_hi INTEGER NOT NULL DEFAULT 0, direct_bytes INTEGER NOT NULL DEFAULT 0, child_count INTEGER NOT NULL DEFAULT 0 ); CREATE TABLE IF NOT EXISTS diff_cache ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_a INTEGER NOT NULL REFERENCES snapshots(id) ON DELETE CASCADE, snapshot_b INTEGER NOT NULL REFERENCES snapshots(id) ON DELETE CASCADE, path_blob BLOB NOT NULL, bytes_a INTEGER, bytes_b INTEGER, delta_bytes INTEGER NOT NULL ); CREATE TABLE IF NOT EXISTS fs_events ( id INTEGER PRIMARY KEY AUTOINCREMENT, timestamp INTEGER NOT NULL, event_type INTEGER NOT NULL, path_blob BLOB NOT NULL, delta_bytes INTEGER, is_dir INTEGER NOT NULL DEFAULT 0 ); CREATE TABLE IF NOT EXISTS dir_deltas ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_id INTEGER NOT NULL, path_blob BLOB NOT NULL, path_utf8 TEXT, previous_bytes INTEGER, current_bytes INTEGER NOT NULL, delta_bytes INTEGER NOT NULL, recorded_at INTEGER NOT NULL ); CREATE TABLE IF NOT EXISTS watch_state ( id INTEGER PRIMARY KEY, watch_root BLOB NOT NULL, last_event_time INTEGER, last_reconcile_time INTEGER, last_snapshot_id INTEGER ); CREATE TABLE IF NOT EXISTS mutation_log ( id INTEGER PRIMARY KEY AUTOINCREMENT, timestamp INTEGER NOT NULL, mutation_type INTEGER NOT NULL, dev INTEGER NOT NULL, ino INTEGER NOT NULL, path_blob BLOB NOT NULL, old_size INTEGER, new_size INTEGER, old_path_blob BLOB ); CREATE INDEX IF NOT EXISTS idx_dir_snapshot_id ON dir_snapshots(snapshot_id); CREATE INDEX IF NOT EXISTS idx_dir_path_bytes ON dir_snapshots(path_blob, snapshot_id); CREATE INDEX IF NOT EXISTS idx_dir_identity ON dir_snapshots(dev, ino, snapshot_id); CREATE INDEX IF NOT EXISTS idx_diff_delta ON diff_cache(snapshot_a, snapshot_b, delta_bytes); CREATE INDEX IF NOT EXISTS idx_fs_events_time ON fs_events(timestamp); CREATE INDEX IF NOT EXISTS idx_dir_deltas_snapshot ON dir_deltas(snapshot_id); CREATE INDEX IF NOT EXISTS idx_dir_deltas_path ON dir_deltas(path_blob, recorded_at); CREATE INDEX IF NOT EXISTS idx_mutation_log_time ON mutation_log(timestamp); CREATE INDEX IF NOT EXISTS idx_mutation_log_identity ON mutation_log(dev, ino); ``` The connection setup enables `journal_mode = WAL`, `synchronous = NORMAL`, `cache_size = -65536`, `mmap_size = 268435456`, `temp_store = MEMORY`, and foreign-key enforcement.