# Reproducible Benchmark Suite & Comparative Profiling

DiskTracker contains a built-in benchmark harness and automated scripts to allow developer-reproducible profiling on distinct hardware configurations.

---

## 🛠️ The Standalone Benchmark Utility (`bench_tool`)

To prevent external filesystem volatility and ensure deterministic runs, DiskTracker includes a standalone test directory generator and profiling harness (`bench_tool`).

### What `bench_tool` Does

1. **Generates a Deterministic Dataset**: Creates a transient synthetic directory structure containing **50,001 filesystem entries** (5,001 directories, 45,000 files).
2. **Performs Parallel Cold Traversal**: Runs an initial cold scan using parallel work-stealing, measuring exact thread scaling.
3. **Performs Warm Snapshot Validation**: Runs subsequent scans on the same dataset using `SnapshotTree` fingerprints and validation telemetry.
4. **Tracks Raw Kernel Syscalls**: Leverages thread-safe atomic telemetry to count `openat`, `statx`, and `getdents` calls during DiskTracker scans.
5. **Benchmarks Installed Competitors**: Automatically runs identical traversals using `fd`, `dust`, `dua`, `ncdu`, and `gdu` if they are installed on the local system.

### How to Run

```bash
cargo run --release -p disktracker-cli --bin bench_tool
```

### Latest Local Run

The latest local run on May 22, 2026 generated **5,001 directories and 45,000 files** under `/tmp/disktracker_bench`. The platform reported `/tmp` as rotational storage, so auto parallelism was capped to one worker.

| Run | Wall Time | Files | Dirs | Total Bytes | `openat` | `statx` | `getdents` | Total Syscalls | Peak RSS |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Serial cold scan | 143 ms | 45,000 | 5,001 | 45,000 | 5,001 | 50,001 | 10,002 | 65,004 | 12,736 KB |
| Auto cold scan, HDD-capped | 292 ms | 45,000 | 5,001 | 45,000 | 10,002 | 50,002 | 10,002 | 70,006 | 13,556 KB |
| Warm unchanged scan | 130 ms | 45,000 | 5,001 | 45,000 | 5,001 | 55,001 | 10,002 | 70,004 | 15,176 KB |
| Warm scan, 1 file changed | 138 ms | 45,000 | 5,001 | 45,015 | 5,001 | 55,001 | 10,002 | 70,004 | 15,580 KB |
| Warm scan, deep subtree changed | 149 ms | 45,001 | 5,001 | 45,031 | 5,001 | 55,002 | 10,002 | 70,005 | 15,984 KB |
| Warm scan, 100 scattered changes | 151 ms | 45,001 | 5,001 | 47,516 | 5,001 | 55,002 | 10,002 | 70,005 | 16,388 KB |

The competitor tools (`fd`, `dust`, `dua`, `ncdu`, and `gdu`) were not installed in this environment, so the local run did not produce comparative timings.

---

## ✅ CI Regression Baseline

The checked-in performance regression test uses a smaller deterministic fixture to keep CI fast: **51 directories, 450 files, and 450 bytes**. Its baseline lives in [`crates/disktracker-core/tests/perf_baseline.json`](../crates/disktracker-core/tests/perf_baseline.json).

| Run | Wall Time | Files | Dirs | Total Bytes | Total Syscalls |
| :--- | ---: | ---: | ---: | ---: | ---: |
| Serial cold scan | 1 ms | 450 | 51 | 450 | 654 |
| Parallel cold scan | 2 ms | 450 | 51 | 450 | 706 |
| Warm unchanged scan | 1 ms | 450 | 51 | 450 | 704 |
| Warm scan, 1 file changed | 1 ms | 450 | 51 | 465 | 704 |
| Warm scan, deep subtree changed | 1 ms | 451 | 51 | 481 | 705 |
| Warm scan, scattered changes | 1 ms | 451 | 51 | 716 | 705 |

---

## 🏃 Run Dataset Benchmarks

DiskTracker includes automated scripts to benchmark actual physical folders of varying structures (such as high-contention node packages, Git objects, and large systems paths like `/usr`).

Execute the comprehensive dataset benchmarks:

```bash
./scripts/bench_datasets.sh
```

---

## 💾 SSD vs HDD Dynamic Hardware Scaling

DiskTracker dynamically scales its concurrency architecture to maximize the physical performance limits of the underlying storage hardware:

### High-Speed Solid-State Drives (NVMe/SATA SSDs)

* **Highly Parallel Workload**: SSDs do not have mechanical heads and benefit from parallel read channels.
* **Deep Queue Depth Utilization**: DiskTracker spawns multiple threads to queue directory read operations simultaneously, fully exploiting parallel Solid-State Drive IOPS.

### Rotational Drives (Mechanical HDDs)

* **Capping Parallelism**: Spawning multiple parallel read threads on a spinning magnetic hard drive causes head thrashing, severely degrading performance.
* **Automatic Fallback**: DiskTracker detects rotational drives (using stable OS device parameters) and automatically restricts scanning to a **single thread** (`1 thread`) to preserve contiguous physical reads and avoid thrashing.