# Performance ## Benchmark Results ### Flat Files These are the results from parsing files of the NYC Yellow Taxi Trip data set (subset 2016-01 to 2025-11, ~9.2GB overall, ~650M rows), running on a Macbook Pro M3 Max. The test (`FlatPerformanceTest`) parses all files and adds up the values of three columns (out of 20). The results shown are for: * The row reader API, using indexed access (mapping field names to indexes once upfront) * The columnar reader API, using indexed access ``` ==================================================================================================== PERFORMANCE TEST RESULTS ==================================================================================================== Environment: CPU cores: 16 Java version: 25 OS: Mac OS X aarch64 Data: Files processed: 119 Total rows: 651,209,003 Total size: 9,241.1 MB Runs per contender: 5 Correctness Verification: passenger_count trip_distance fare_amount Hardwood (multifile indexed) 972,078,547 2,701,223,013.48 9,166,943,759.83 Hardwood (column reader multifile) 972,078,547 2,701,223,013.48 9,166,943,759.83 Performance (all runs): Contender Time (s) Records/sec Records/sec/core MB/sec ----------------------------------------------------------------------------------------------- Hardwood (multifile indexed) [1] 2.75 236,975,620 14,810,976 3362.8 Hardwood (multifile indexed) [2] 2.78 234,669,911 14,666,869 3330.1 Hardwood (multifile indexed) [3] 2.70 240,831,732 15,051,983 3417.6 Hardwood (multifile indexed) [4] 2.70 240,831,732 15,051,983 3417.6 Hardwood (multifile indexed) [5] 2.68 242,897,800 15,181,113 3446.9 Hardwood (multifile indexed) [AVG] 2.72 239,239,163 14,952,448 3395.0 min: 2.68s, max: 2.78s, spread: 0.09s Hardwood (column reader multifile) [1] 1.30 502,476,083 31,404,755 7130.5 Hardwood (column reader multifile) [2] 1.11 584,568,225 36,535,514 8295.4 Hardwood (column reader multifile) [3] 1.06 614,348,116 38,396,757 8718.0 Hardwood (column reader multifile) [4] 1.06 616,091,772 38,505,736 8742.8 Hardwood (column reader multifile) [5] 1.08 603,530,123 37,720,633 8564.5 Hardwood (column reader multifile) [AVG] 1.12 580,917,933 36,307,371 8243.6 min: 1.06s, max: 1.30s, spread: 0.24s ==================================================================================================== ``` ### Nested Files These are the results from parsing a file with points of interest from the Overture Maps data set (~900 MB, ~9M rows), running on a Macbook Pro M3 Max. The test (`NestedPerformanceTest`) parses all columns of the file and determines min/max values, max array lengths, etc. As above, the results shown are for the row reader API and the columnar API with indexed access. ``` ==================================================================================================== NESTED SCHEMA PERFORMANCE TEST RESULTS ==================================================================================================== Environment: CPU cores: 16 Java version: 25 OS: Mac OS X aarch64 Data: Total rows: 9,152,540 File size: 882.2 MB Runs per contender: 5 Correctness Verification: min_ver max_ver rows websites sources addresses Hardwood (indexed) 1 9 9,152,540 3,687,576 18,305,080 9,152,540 Hardwood (columnar) 1 9 9,152,540 3,687,576 18,305,080 9,152,540 Performance (all runs): Contender Time (s) Records/sec Records/sec/core MB/sec ----------------------------------------------------------------------------------------------- Hardwood (indexed) [1] 2.22 4,120,910 257,557 397.2 Hardwood (indexed) [2] 1.92 4,759,511 297,469 458.8 Hardwood (indexed) [3] 1.89 4,855,459 303,466 468.0 Hardwood (indexed) [4] 1.88 4,876,153 304,760 470.0 Hardwood (indexed) [5] 1.88 4,858,036 303,627 468.3 Hardwood (indexed) [AVG] 1.96 4,674,433 292,152 450.6 min: 1.88s, max: 2.22s, spread: 0.34s Hardwood (columnar) [1] 1.34 6,830,254 426,891 658.4 Hardwood (columnar) [2] 1.32 6,918,020 432,376 666.8 Hardwood (columnar) [3] 1.24 7,363,266 460,204 709.8 Hardwood (columnar) [4] 1.24 7,404,968 462,810 713.8 Hardwood (columnar) [5] 1.22 7,477,565 467,348 720.8 Hardwood (columnar) [AVG] 1.27 7,189,741 449,359 693.0 min: 1.22s, max: 1.34s, spread: 0.12s ==================================================================================================== ``` ## Running Performance Tests The performance testing modules are not included in the default build. Enable them with `-Pperformance-test`. ### End-to-End Performance Tests There are two end-to-end performance tests: one for flat schemas (NYC Yellow Taxi Trip data) and one for nested schemas (Overture Maps POI data). Test data is downloaded automatically on the first run. ```shell ./mvnw test -Pperformance-test ``` **Flat schema test** (`FlatPerformanceTest`) — reads ~9GB of taxi trip data (2016-2025, ~650M rows) and sums three columns. | Property | Default | Description | |----------|---------|-------------| | `perf.contenders` | `HARDWOOD_MULTIFILE_INDEXED` | Comma-separated list of contenders, or `all` | | `perf.start` | `2016-01` | Start year-month for data range | | `perf.end` | `2025-11` | End year-month for data range | | `perf.runs` | `10` | Number of timed runs per contender | Available contenders: `HARDWOOD_INDEXED`, `HARDWOOD_NAMED`, `HARDWOOD_PROJECTION`, `HARDWOOD_MULTIFILE_INDEXED`, `HARDWOOD_MULTIFILE_NAMED`, `HARDWOOD_COLUMN_READER`, `HARDWOOD_COLUMN_READER_MULTIFILE`, `PARQUET_JAVA_INDEXED`, `PARQUET_JAVA_NAMED`. **Nested schema test** (`NestedPerformanceTest`) — reads ~900MB of Overture Maps POI data (~9M rows) with deeply nested columns. | Property | Default | Description | |----------|---------|-------------| | `perf.contenders` | `HARDWOOD_NAMED` | Comma-separated list of contenders, or `all` | | `perf.runs` | `5` | Number of timed runs per contender | Available contenders: `HARDWOOD_INDEXED`, `HARDWOOD_NAMED`, `HARDWOOD_COLUMNAR`, `PARQUET_JAVA`. **Examples:** ```shell # Run all contenders for the flat test, limited to 2025 data ./mvnw test -Pperformance-test -Dtest=FlatPerformanceTest -Dperf.contenders=all -Dperf.start=2025-01 # Compare multifile indexed vs named access ./mvnw test -Pperformance-test -Dperf.contenders=HARDWOOD_MULTIFILE_INDEXED,HARDWOOD_MULTIFILE_NAMED # Run nested test only ./mvnw test -Pperformance-test -Dtest=NestedPerformanceTest -Dperf.contenders=all ``` ### PyArrow Comparison Tests Python counterparts of the Java performance tests using PyArrow, for cross-platform comparison. These scripts require a Python environment with PyArrow installed (use the `.venv` venv). **Flat schema** (`flat_performance_test.py`) — counterpart of `FlatPerformanceTest.java`: ```shell cd performance-testing/end-to-end # Run all contenders (single-threaded and multi-threaded), 5 runs each python flat_performance_test.py # Single-threaded only python flat_performance_test.py -c single_threaded # Multi-threaded, 10 runs python flat_performance_test.py -c multi_threaded -r 10 ``` **Nested schema** (`nested_performance_test.py`) — counterpart of `NestedPerformanceTest.java`: ```shell cd performance-testing/end-to-end # Run all contenders, 5 runs each python nested_performance_test.py # Single-threaded only, 3 runs python nested_performance_test.py -c single_threaded -r 3 ``` **Options:** | Flag | Default | Description | |------|---------|-------------| | `-c`, `--contenders` | `all` | Contenders to run: `single_threaded`, `multi_threaded`, or `all` | | `-r`, `--runs` | `5` | Number of timed runs per contender | **Notes on comparability:** - The flat test uses column projection (reads only the 3 summed columns), matching the Hardwood projection and column-reader contenders. The parquet-java contenders in `FlatPerformanceTest.java` read all columns without projection, so direct comparison against parquet-java is not apples-to-apples. - PyArrow uses vectorized columnar operations (C++ engine) rather than row-by-row iteration. - The `single_threaded` contender (`use_threads=False`) is most comparable to single-threaded parquet-java; `multi_threaded` is comparable to Hardwood's parallel reading. ### JMH Micro-Benchmarks For detailed micro-benchmarks, build the JMH benchmark JAR and run it directly: ```shell # Build the benchmark JAR ./mvnw package -Pperformance-test -pl performance-testing/micro-benchmarks -am -DskipTests # Run all benchmarks (with Vector API for SIMD support) java --add-modules jdk.incubator.vector \ -jar performance-testing/micro-benchmarks/target/benchmarks.jar \ -p dataDir=performance-testing/test-data-setup/target/tlc-trip-record-data # Run a specific benchmark java --add-modules jdk.incubator.vector \ -jar performance-testing/micro-benchmarks/target/benchmarks.jar \ "PageHandlingBenchmark.decodePages" \ -p dataDir=performance-testing/test-data-setup/target/tlc-trip-record-data # Run SIMD benchmark comparing scalar vs vectorized operations java --add-modules jdk.incubator.vector \ -jar performance-testing/micro-benchmarks/target/benchmarks.jar SimdBenchmark \ -p size=1024,8192,65536 -p implementation=scalar,auto # List available benchmarks java --add-modules jdk.incubator.vector \ -jar performance-testing/micro-benchmarks/target/benchmarks.jar -l ``` **Available benchmarks:** | Benchmark | Description | |-----------|-------------| | `MemoryMapBenchmark.memoryMapToByteArray` | Memory map a file and copy to byte array | | `PageHandlingBenchmark.a_decompressPages` | Scan and decompress all pages | | `PageHandlingBenchmark.b_decodePages` | Scan, decompress, and decode all pages | | `PipelineBenchmark.a_assembleColumns` | Synchronous page decoding + column assembly | | `PipelineBenchmark.b_consumeRows` | Full pipeline with row-oriented access | | `SimdBenchmark.*` | SIMD operations (countNonNulls, markNulls, dictionary, bit unpacking) | **JMH options:** | Option | Description | |--------|-------------| | `-wi ` | Number of warmup iterations (default: 3) | | `-i ` | Number of measurement iterations (default: 5) | | `-f ` | Number of forks (default: 2) | | `-p param=value` | Set benchmark parameter | | `-l` | List available benchmarks | | `-h` | Show help | **Note:** The taxi data files use GZIP compression (2016-01 to 2023-01) and ZSTD compression (2023-02 onwards). The default benchmark file is `yellow_tripdata_2025-05.parquet` (ZSTD, 75MB).