--- name: benchmark description: Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code. --- # Benchmark Any change touching `apply_*`, `functional.py`, `get_params`, `get_params_dependent_on_data`, `composition.py`, or `transforms_interface.py` **must** include benchmark results. ## Standard Matrix Always benchmark all 9 combinations: | Size | Channels | Use case | |------------|----------|---------------------------------------| | 256×256 | 1 | Grayscale classification | | 256×256 | 3 | RGB classification | | 256×256 | 5 | Multispectral | | 512×512 | 1 | Depth maps | | 512×512 | 3 | Detection/segmentation (YOLO, U-Net) | | 512×512 | 5 | Multispectral segmentation | | 1024×1024 | 1 | Medical imaging | | 1024×1024 | 3 | High-res segmentation | | 1024×1024 | 5 | Satellite imagery | Skip channel counts the transform explicitly doesn't support. Always include the channel axis: grayscale inputs are `(H, W, 1)`, not `(H, W)`. If the optimization changes dtype conversion or a `@uint8_io` / `@float32_io` wrapped function, benchmark the hot dtype and add correctness tests for the other supported dtype. For example, a uint8-only speedup in a `@uint8_io` function still needs a float32 regression test that verifies wrapper round-tripping. ## Template: Isolated Function ```python import timeit import numpy as np SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)} CHANNELS = [1, 3, 5] N = 100 for size_name, (h, w) in SIZES.items(): for ch in CHANNELS: shape = (h, w, ch) img = np.random.randint(0, 256, shape, dtype=np.uint8) old_t = timeit.timeit(lambda img=img: old_func(img, **params), number=N) new_t = timeit.timeit(lambda img=img: new_func(img, **params), number=N) print(f"{size_name} {h}x{w}x{ch}: old={old_t:.4f}s new={new_t:.4f}s speedup={old_t/new_t:.2f}x") ``` ## Template: Full Pipeline (Compose) ```python import timeit import numpy as np import albumentations as A SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)} CHANNELS = [1, 3, 5] transform = A.Compose([A.YourTransform(p=1.0)]) for size_name, (h, w) in SIZES.items(): for ch in CHANNELS: shape = (h, w, ch) img = np.random.randint(0, 256, shape, dtype=np.uint8) t = timeit.timeit(lambda img=img: transform(image=img), number=100) print(f"{size_name} {h}x{w}x{ch}: {t:.4f}s (100 calls)") ``` ## Workflow 1. **Before**: run benchmark on the current `main` / original code, save output to a JSON file 2. **After**: run benchmark on the modified code, save output to a JSON file 3. **Compare**: load both JSON files, compute speedup = old_time / new_time for each transform/size combo 4. **Report** results in the PR/commit message body ### JSON Output Format Save benchmark results as JSON for automated comparison: ```python import json results = {} for transform_name, (h, w), ch, elapsed in all_results: key = f"{transform_name}_{h}x{w}x{ch}" results[key] = {"time": elapsed, "iterations": N} with open("benchmark_results.json", "w") as f: json.dump(results, f, indent=2) ``` ### Comparison Script Pattern ```python import json with open("bench_old.json") as f: old = json.load(f) with open("bench_new.json") as f: new = json.load(f) for key in sorted(old): if key in new: speedup = old[key]["time"] / new[key]["time"] indicator = "FASTER" if speedup > 1.05 else "SLOWER" if speedup < 0.95 else "SAME" print(f"{key}: {old[key]['time']:.4f}s -> {new[key]['time']:.4f}s {speedup:.2f}x {indicator}") ``` ## Reporting Format ``` Benchmark (uint8, 100 iterations): Function direct: 256x256x1 — Before: 0.0200s After: 0.0100s Speedup: 2.00x 256x256x3 — Before: 0.0500s After: 0.0300s Speedup: 1.67x ... Compose single: 256x256x1 — 0.0120s 256x256x3 — 0.0340s ... ``` ## Template: Batch (apply_to_images) When benchmarking batch optimizations (kernel pre-computation, 4D indexing, pre-allocated loops): ```python import timeit import numpy as np import albumentations as A BATCH_SIZES = [4, 8, 16] SIZES = {"small": (256, 256), "medium": (512, 512)} transform = A.Compose([A.YourTransform(p=1.0)]) for batch_size in BATCH_SIZES: for size_name, (h, w) in SIZES.items(): # Grayscale batch — benefits from reshape trick images = [np.random.randint(0, 256, (h, w, 1), dtype=np.uint8) for _ in range(batch_size)] t = timeit.timeit(lambda: transform(images=images), number=50) print(f"batch={batch_size} {size_name} {h}x{w}x1: {t:.4f}s") # RGB batch — baseline images_rgb = [np.random.randint(0, 256, (h, w, 3), dtype=np.uint8) for _ in range(batch_size)] t = timeit.timeit(lambda: transform(images=images_rgb), number=50) print(f"batch={batch_size} {size_name} {h}x{w}x3: {t:.4f}s") ``` ## Rules - Run on the **same machine**, back-to-back, same conditions - Use at least **100 iterations** for fast functions; fewer for slow ones (aim for >1s total) - Test **both uint8 and float32** if the change affects dtype handling. If benchmarking only the hot dtype, add correctness tests for the other dtype. - A **>5% regression** on any combination requires justification or rework - If adding a new transform, benchmark against the equivalent naive numpy implementation - For batch optimizations, compare 1-channel, 3-channel RGB, and 5-channel multichannel inputs to verify speedup holds across channel counts - Keep channel-last shapes throughout: images `(H,W,C)`, image batches `(N,H,W,C)`, volumes `(D,H,W,C)`, volume batches `(N,D,H,W,C)`