# Reproducible Benchmark Suite & Comparative Profiling DiskTracker contains a built-in benchmark harness and automated scripts to allow developer-reproducible profiling on distinct hardware configurations. --- ## 🛠️ The Standalone Benchmark Utility (`bench_tool`) To prevent external filesystem volatility and ensure deterministic runs, DiskTracker includes a standalone test directory generator and profiling harness (`bench_tool`). ### What `bench_tool` Does 1. **Generates a Deterministic Dataset**: Creates a transient synthetic directory structure containing **50,001 filesystem entries** (5,001 directories, 45,000 files). 2. **Performs Parallel Cold Traversal**: Runs an initial cold scan using parallel work-stealing, measuring exact thread scaling. 3. **Performs Warm Snapshot Validation**: Runs subsequent scans on the same dataset using `SnapshotTree` fingerprints and validation telemetry. 4. **Tracks Raw Kernel Syscalls**: Leverages thread-safe atomic telemetry to count `openat`, `statx`, and `getdents` calls during DiskTracker scans. 5. **Benchmarks Installed Competitors**: Automatically runs identical traversals using `fd`, `dust`, `dua`, `ncdu`, and `gdu` if they are installed on the local system. ### How to Run ```bash cargo run --release -p disktracker-cli --bin bench_tool ``` ### Latest Local Run The latest local run on May 22, 2026 generated **5,001 directories and 45,000 files** under `/tmp/disktracker_bench`. The platform reported `/tmp` as rotational storage, so auto parallelism was capped to one worker. | Run | Wall Time | Files | Dirs | Total Bytes | `openat` | `statx` | `getdents` | Total Syscalls | Peak RSS | | :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | Serial cold scan | 143 ms | 45,000 | 5,001 | 45,000 | 5,001 | 50,001 | 10,002 | 65,004 | 12,736 KB | | Auto cold scan, HDD-capped | 292 ms | 45,000 | 5,001 | 45,000 | 10,002 | 50,002 | 10,002 | 70,006 | 13,556 KB | | Warm unchanged scan | 130 ms | 45,000 | 5,001 | 45,000 | 5,001 | 55,001 | 10,002 | 70,004 | 15,176 KB | | Warm scan, 1 file changed | 138 ms | 45,000 | 5,001 | 45,015 | 5,001 | 55,001 | 10,002 | 70,004 | 15,580 KB | | Warm scan, deep subtree changed | 149 ms | 45,001 | 5,001 | 45,031 | 5,001 | 55,002 | 10,002 | 70,005 | 15,984 KB | | Warm scan, 100 scattered changes | 151 ms | 45,001 | 5,001 | 47,516 | 5,001 | 55,002 | 10,002 | 70,005 | 16,388 KB | The competitor tools (`fd`, `dust`, `dua`, `ncdu`, and `gdu`) were not installed in this environment, so the local run did not produce comparative timings. --- ## ✅ CI Regression Baseline The checked-in performance regression test uses a smaller deterministic fixture to keep CI fast: **51 directories, 450 files, and 450 bytes**. Its baseline lives in [`crates/disktracker-core/tests/perf_baseline.json`](../crates/disktracker-core/tests/perf_baseline.json). | Run | Wall Time | Files | Dirs | Total Bytes | Total Syscalls | | :--- | ---: | ---: | ---: | ---: | ---: | | Serial cold scan | 1 ms | 450 | 51 | 450 | 654 | | Parallel cold scan | 2 ms | 450 | 51 | 450 | 706 | | Warm unchanged scan | 1 ms | 450 | 51 | 450 | 704 | | Warm scan, 1 file changed | 1 ms | 450 | 51 | 465 | 704 | | Warm scan, deep subtree changed | 1 ms | 451 | 51 | 481 | 705 | | Warm scan, scattered changes | 1 ms | 451 | 51 | 716 | 705 | --- ## 🏃 Run Dataset Benchmarks DiskTracker includes automated scripts to benchmark actual physical folders of varying structures (such as high-contention node packages, Git objects, and large systems paths like `/usr`). Execute the comprehensive dataset benchmarks: ```bash ./scripts/bench_datasets.sh ``` --- ## 💾 SSD vs HDD Dynamic Hardware Scaling DiskTracker dynamically scales its concurrency architecture to maximize the physical performance limits of the underlying storage hardware: ### High-Speed Solid-State Drives (NVMe/SATA SSDs) * **Highly Parallel Workload**: SSDs do not have mechanical heads and benefit from parallel read channels. * **Deep Queue Depth Utilization**: DiskTracker spawns multiple threads to queue directory read operations simultaneously, fully exploiting parallel Solid-State Drive IOPS. ### Rotational Drives (Mechanical HDDs) * **Capping Parallelism**: Spawning multiple parallel read threads on a spinning magnetic hard drive causes head thrashing, severely degrading performance. * **Automatic Fallback**: DiskTracker detects rotational drives (using stable OS device parameters) and automatically restricts scanning to a **single thread** (`1 thread`) to preserve contiguous physical reads and avoid thrashing.