# High-Performance Parallel & Warm Scanning DiskTracker leverages multiple low-level systems optimizations to achieve high traversal speeds. --- ## ⚡ Work-Stealing Scanner At the core of DiskTracker's cold scan engine is a highly optimized concurrent task pool powered by [`crossbeam-deque`](https://crates.io/crates/crossbeam-deque). * **Work-Stealing Deques**: Threads pull directory paths from localized deques using work-stealing FIFO queues, which minimizes global lock contention. * **Queue Depth Maximization**: Maximizes queue depths on Solid-State Drives (SSDs) to exploit native parallel IOPS and deep queue hardware limits. --- ## 📦 Segment-Allocated Pathless Arena Traversing millions of directories causes massive heap fragmentation and pointer-chasing delays in standard object-oriented hierarchies. DiskTracker solves this by utilizing a flat segment-allocated pathless arena: * **Separation of Hot/Cold Fields**: * **`NodeHot` Array**: 24-byte cache-line friendly objects containing total directory sizes, file/directory counts, and child ranges. * **`NodeCold` Array**: Stashes slower, infrequently referenced strings (like directory names) and device details (`dev`, `ino`). * **Zero Heap Allocation Overhead**: Completely eliminates individual pointer chasing by packaging items in contiguous blocks of memory, boosting L3 cache hit rates. --- ## 🏷️ Warm Scan Snapshot Validation Warm scans load the previous snapshot into a `SnapshotTree` and validate directories with filesystem identity plus subtree fingerprints: * **Stable Identifiers**: Stores device IDs (`dev`) and inode numbers (`ino`) from previous snapshots. * **Subtree Fingerprints**: Persists `fingerprint_lo`, `fingerprint_hi`, `direct_bytes`, and `child_count` so warm scans can classify directories as `trusted`, `shallow_changed`, `deep_changed`, `identity_mismatch`, or `unknown`. * **Validation Telemetry**: Reports trust rate and per-result counters after warm scans. The current benchmark harness records validation stats alongside syscall totals. --- ## 📂 Pre-Traversal Skip Filters DiskTracker evaluates ignore rules and skip criteria *before* invoking `readdir`/`getdents64` on directory entries. * **Pre-traversal evaluation**: Skips directory subtrees (like `node_modules` or `.git`) at the path level. * **Reduced Context Switching**: Drastically reduces user-kernel boundary transitions and context switches by avoiding scanning entries inside ignored paths. --- ## 🐧 Linux Raw Traversal Optimizations On Linux, DiskTracker can read directory entries through raw `getdents64` calls: * **64KB Reusable Buffers**: Reads kernel directory structures into a reusable user-space buffer, bypassing glibc wrapper allocations. * **Direct Struct Decoding**: Decodes the raw `linux_dirent64` byte-aligned stream and uses kernel directory file-type classifications when available. --- ## 📊 Low-Level Syscall Telemetry To prevent silent performance regressions across changes, DiskTracker integrates low-level thread-safe atomic syscall telemetry: * **Metric Counters**: Directly tracks counts of `openat`, `statx`, and `getdents` during benchmarked operations. * **CI Test Suite Assertions**: Enforces assertions during continuous integration runs to prevent syscall and timing regressions against the checked-in baseline.