// [Overview](#Overview) / [Examples](#Examples) / [API](#API) / [FAQ](#FAQ) / [Resources](#Resources) ## `mph`: [Minimal] Static perfect hash library [![MIT Licence](http://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/license/mit) [![Version](https://img.shields.io/github/v/release/qlibs/mph)](https://github.com/qlibs/mph/releases) [![Build](https://img.shields.io/badge/build-green.svg)](https://godbolt.org/z/MsE4nnhvc) [![Try it online](https://img.shields.io/badge/try%20it-online-blue.svg)](https://godbolt.org/z/jcPPsbEvK) > https://en.wikipedia.org/wiki/Perfect_hash_function ### Use cases > A static perfect hash function maps a set of keys known in advance to a set of values with no collisions ### Features - Single header (https://raw.githubusercontent.com/qlibs/mph/main/mph) / C++20 module (https://raw.githubusercontent.com/qlibs/mph/main/mph.cppm) - Self verification upon include (can be disabled by `-DNTEST` - see [FAQ](#faq)) - Compiles cleanly with ([`-Wall -Wextra -Werror -pedantic -pedantic-errors -fno-exceptions -fno-rtti`](https://godbolt.org/z/WraE4q1dE)) - Minimal [API](#api) - Optimized run-time execution (see [performance](#performance) / [benchmarks](#benchmarks)) - Fast compilation times (see [compilation](#compilation)) - Trade-offs (see [FAQ](#faq)) ### Requirements - C++20 ([gcc-12+, clang-15+](https://godbolt.org/z/WraE4q1dE)) / [optional] ([bmi2](https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set)), [optional] ([simd](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data)), ### Overview > Hello world (https://godbolt.org/z/dzd6o3Pxo) ```cpp enum class color { red, green, blue }; constexpr auto colors = std::array{ std::pair{"red"sv, color::red}, std::pair{"green"sv, color::green}, std::pair{"blue"sv, color::blue}, }; static_assert(color::green == mph::lookup("green")); static_assert(color::red == mph::lookup("red")); static_assert(color::blue == mph::lookup("blue")); std::print("{}", mph::lookup("green"sv)); // prints 1 ``` > [!NOTE] > `mph::lookup` assumes only valid input and returns mapped value direclty. ```cpp static_assert(not mph::find("unknown")); static_assert(mph::find("green")); static_assert(mph::find("red")); static_assert(mph::find("blue")); std::print("{}", *mph::find("green"sv)); // prints 1 ``` > [!NOTE] > `mph::find` doesnt assume valid input and returns optional of mapped value. > [Minimal] Lookup (https://godbolt.org/z/rqYj9a1cr) ```cpp int lookup(int id) { static constexpr std::array ids{ std::pair{54u, 91u}, std::pair{64u, 324u}, std::pair{91u, 234u}, }; return mph::lookup(id); } ``` ```cpp lookup: // g++ -DNDEBUG -std=c++20 -O3 imull $1275516394, %edi, %eax shrl $23, %eax movl $24029728, %ecx shrxl %eax, %ecx, %eax andl $511, %eax retq ``` > Lookup (https://godbolt.org/z/vv6W4nGfb) ```cpp int lookup(int id) { static constexpr std::array ids{ std::pair{54u, 91u}, std::pair{324u, 54u}, std::pair{64u, 324u}, std::pair{234u, 64u}, std::pair{91u, 234u}, }; return mph::lookup(id); } ``` ```cpp lookup: // g++ -DNDEBUG -std=c++20 -O3 andl $7, %edi leaq lookup(%rip), %rax movl (%rax,%rdi,4), %eax retq lookup: .long 324 .long 0 .long 64 .long 234 .long 54 .long 0 .long 91 ``` > Find (https://godbolt.org/z/qMzxKK4sd) ```cpp int find(int id) { static constexpr std::array ids{ std::pair{27629, 1}, std::pair{6280, 2}, // 1..128 pairs... std::pair{33691, 128}, }; return *mph::find(id); } ``` ```cpp find: // g++ -DNDEBUG -std=c++20 -O3 -mbmi2 -mavx512f vpbroadcastd %edi, %zmm0 shll $4, %edi movzbl %dil, %ecx leaq find vpcmpeqd (%rdx,%rcx,4), %zmm0, %k0 kmovw %k0, %esi kortestw %k0, %k0 rep bsfq %rax, %rax movl $64, %eax addl %eax, %ecx xorl %eax, %eax testw %si, %si cmovnel 1024(%rdx,%rcx,4), %eax vzeroupper retq find: ... // see godbolt ``` > Find Strings (https://godbolt.org/z/KaKzf7Pax) ```cpp int find(std::span str) { static constexpr auto symbols = std::array{ std::pair{"AMZN "sv, 1}, std::pair{"AAPL "sv, 2}, std::pair{"GOOGL "sv, 3}, std::pair{"META "sv, 4}, std::pair{"MSFT "sv, 5}, std::pair{"NVDA "sv, 6}, std::pair{"TSLA "sv, 7}, }; return *mph::find(str); } ``` ```cpp find: // g++ -DNDEBUG -std=c++20 -O3 -mbmi2 movq 8(%rsi), %rax movl $1031, %ecx leaq find(%rip), %rdx xorl %esi, %esi movq (%rax), %rax pextq %rcx, %rax, %rcx shll $4, %ecx cmpq (%rcx,%rdx), %rax movzbl 8(%rcx,%rdx), %eax cmovnel %esi, %eax retq find: ... // see godbolt ``` > Find Strings (https://godbolt.org/z/fdMPsYWjE) ```cpp int find(std::string_view str) { using std::literals::operator""sv; // values assigned from 0..N-1 static constexpr std::array symbols{ "BTC "sv, "ETH "sv, "BNB "sv, "SOL "sv, "XRP "sv, "DOGE"sv, "TON "sv, "ADA "sv, "SHIB"sv, "AVAX"sv, "LINK"sv, "BCH "sv, }; return *mph::find(str); } ``` ```cpp find: // g++ -DNDEBUG -std=c++20 -O3 -mbmi2 shll $3, %edi bzhil %edi, (%rsi), %eax movl $789, %ecx pextl %ecx, %eax, %ecx leaq find(%rip), %rdx xorl %esi, %esi cmpl (%rdx,%rcx,8), %eax movzbl 4(%rdx,%rcx,8), %eax cmovnel %esi, %eax retq find: ... // see godbolt ``` ### Examples - [feature] `lookup/find` customization point - https://godbolt.org/z/enqeGxKK9 - [feature] `to` customization point - https://godbolt.org/z/jTMx4n6j3 - [example] `branchless dispatcher` - https://godbolt.org/z/5PTE3ercE - [performance - https://wg21.link/P2996] `enum_to_string` - https://godbolt.org/z/ojohP6j7f - [performance - https://wg21.link/P2996] `string_to_enum` - https://godbolt.org/z/83vGhY7M8 ### Benchmarks (run-time) > https://github.com/qlibs/mph/tree/benchmark `clang++ -std=c++20 -O3 -DNDEBUG -mbmi2 benchmark.cpp` ``` | ns/op | op/s | err% |total | benchmark |------:|---------------:|-----:|-----:|:---------- | 12.25 | 81,602,449.70 | 0.3% | 0.15 | `random_strings_5_len_4.std.map` | 5.56 | 179,750,906.50 | 0.2% | 0.07 | `random_strings_5_len_4.std.unordered_map` | 9.17 | 109,096,850.98 | 0.2% | 0.11 | `random_strings_5_len_4.boost.unordered_map` | 13.48 | 74,210,250.54 | 0.3% | 0.16 | `random_strings_5_len_4.boost.flat_map` | 7.70 | 129,942,965.18 | 0.3% | 0.09 | `random_strings_5_len_4.gperf` | 1.61 | 621,532,188.81 | 0.1% | 0.02 | `random_strings_5_len_4.mph` | 14.66 | 68,218,086.71 | 0.8% | 0.18 | `random_strings_5_len_8.std.map` | 13.45 | 74,365,239.56 | 0.2% | 0.16 | `random_strings_5_len_8.std.unordered_map` | 9.68 | 103,355,605.09 | 0.2% | 0.12 | `random_strings_5_len_8.boost.unordered_map` | 16.00 | 62,517,180.19 | 0.4% | 0.19 | `random_strings_5_len_8.boost.flat_map` | 7.70 | 129,809,356.36 | 0.3% | 0.09 | `random_strings_5_len_8.gperf` | 1.58 | 633,084,194.24 | 0.1% | 0.02 | `random_strings_5_len_8.mph` | 17.21 | 58,109,576.87 | 0.3% | 0.21 | `random_strings_6_len_2_5.std.map` | 15.28 | 65,461,167.99 | 0.2% | 0.18 | `random_strings_6_len_2_5.std.unordered_map` | 12.21 | 81,931,391.20 | 0.4% | 0.15 | `random_strings_6_len_2_5.boost.unordered_map` | 17.15 | 58,323,741.08 | 0.5% | 0.21 | `random_strings_6_len_2_5.boost.flat_map` | 7.94 | 125,883,197.55 | 0.5% | 0.09 | `random_strings_6_len_2_5.gperf` | 6.05 | 165,239,616.00 | 0.5% | 0.07 | `random_strings_6_len_2_5.mph` | 31.61 | 31,631,402.94 | 0.2% | 0.38 | `random_strings_100_len_8.std.map` | 15.32 | 65,280,594.09 | 0.2% | 0.18 | `random_strings_100_len_8.std.unordered_map` | 17.13 | 58,383,850.20 | 0.3% | 0.20 | `random_strings_100_len_8.boost.unordered_map` | 31.42 | 31,822,519.67 | 0.2% | 0.38 | `random_strings_100_len_8.boost.flat_map` | 8.04 | 124,397,773.85 | 0.2% | 0.10 | `random_strings_100_len_8.gperf` | 1.58 | 632,813,481.73 | 0.1% | 0.02 | `random_strings_100_len_8.mph` | 32.62 | 30,656,015.03 | 0.3% | 0.39 | `random_strings_100_len_1_8.std.map` | 19.34 | 51,697,107.73 | 0.5% | 0.23 | `random_strings_100_len_1_8.std.unordered_map` | 19.51 | 51,254,525.17 | 0.3% | 0.23 | `random_strings_100_len_1_8.boost.unordered_map` | 33.58 | 29,780,574.17 | 0.6% | 0.40 | `random_strings_100_len_1_8.boost.flat_map` | 13.06 | 76,577,037.07 | 0.7% | 0.16 | `random_strings_100_len_1_8.gperf` | 6.02 | 166,100,665.07 | 0.2% | 0.07 | `random_strings_100_len_1_8.mph` | 1.28 | 778,723,795.75 | 0.1% | 0.02 | `random_uints_5.mph` ``` `g++ -std=c++20 -O3 -DNDEBUG -mbmi2 benchmark.cpp` ``` | ns/op | op/s | err% |total | benchmark |------:|---------------:|-----:|-----:|:---------- | 12.28 | 81,460,330.38 | 0.9% | 0.15 | `random_strings_5_len_4.std.map` | 5.29 | 188,967,241.90 | 0.3% | 0.06 | `random_strings_5_len_4.std.unordered_map` | 9.69 | 103,163,192.67 | 0.2% | 0.12 | `random_strings_5_len_4.boost.unordered_map` | 13.56 | 73,756,333.08 | 0.4% | 0.16 | `random_strings_5_len_4.boost.flat_map` | 7.69 | 130,055,662.66 | 0.6% | 0.09 | `random_strings_5_len_4.gperf` | 1.39 | 718,910,252.82 | 0.1% | 0.02 | `random_strings_5_len_4.mph` | 14.26 | 70,103,007.82 | 2.4% | 0.17 | `random_strings_5_len_8.std.map` | 13.36 | 74,871,047.51 | 0.4% | 0.16 | `random_strings_5_len_8.std.unordered_map` | 9.82 | 101,802,074.00 | 0.3% | 0.12 | `random_strings_5_len_8.boost.unordered_map` | 15.97 | 62,621,571.95 | 0.3% | 0.19 | `random_strings_5_len_8.boost.flat_map` | 7.92 | 126,265,206.30 | 0.3% | 0.09 | `random_strings_5_len_8.gperf` | 1.40 | 713,596,376.62 | 0.4% | 0.02 | `random_strings_5_len_8.mph` | 15.98 | 62,576,142.34 | 0.5% | 0.19 | `random_strings_6_len_2_5.std.map` | 17.56 | 56,957,868.12 | 0.5% | 0.21 | `random_strings_6_len_2_5.std.unordered_map` | 11.68 | 85,637,378.45 | 0.3% | 0.14 | `random_strings_6_len_2_5.boost.unordered_map` | 17.25 | 57,965,732.68 | 0.6% | 0.21 | `random_strings_6_len_2_5.boost.flat_map` | 9.13 | 109,580,632.48 | 0.7% | 0.11 | `random_strings_6_len_2_5.gperf` | 7.17 | 139,563,745.72 | 0.4% | 0.09 | `random_strings_6_len_2_5.mph` | 30.20 | 33,117,522.76 | 0.7% | 0.36 | `random_strings_100_len_8.std.map` | 15.01 | 66,627,962.89 | 0.4% | 0.18 | `random_strings_100_len_8.std.unordered_map` | 16.79 | 59,559,414.60 | 0.6% | 0.20 | `random_strings_100_len_8.boost.unordered_map` | 31.36 | 31,884,629.57 | 0.8% | 0.38 | `random_strings_100_len_8.boost.flat_map` | 7.75 | 128,973,947.61 | 0.7% | 0.09 | `random_strings_100_len_8.gperf` | 1.50 | 667,041,673.54 | 0.1% | 0.02 | `random_strings_100_len_8.mph` | 30.92 | 32,340,612.08 | 0.4% | 0.37 | `random_strings_100_len_1_8.std.map` | 25.35 | 39,450,222.09 | 0.4% | 0.30 | `random_strings_100_len_1_8.std.unordered_map` | 19.76 | 50,609,820.90 | 0.2% | 0.24 | `random_strings_100_len_1_8.boost.unordered_map` | 32.39 | 30,878,018.77 | 0.6% | 0.39 | `random_strings_100_len_1_8.boost.flat_map` | 11.20 | 89,270,687.92 | 0.2% | 0.13 | `random_strings_100_len_1_8.gperf` | 7.17 | 139,471,159.67 | 0.5% | 0.09 | `random_strings_100_len_1_8.mph` | 1.93 | 519,047,110.39 | 0.3% | 0.02 | `random_uints_5.mph` ``` ### Benchmarks (compilation-time) > https://qlibs.github.io/mph/perfect_hashing [![Benchmark](https://raw.githubusercontent.com/qlibs/mph/benchmark/perfect_hashing/benchmark_int_to_int.png)](https://qlibs.github.io/mph/perfect_hashing) [![Benchmark](https://raw.githubusercontent.com/qlibs/mph/benchmark/perfect_hashing/benchmark_str_to_int.png)](https://qlibs.github.io/mph/perfect_hashing) ### API ```cpp namespace mph::inline v5_0_5 { /** * Static [minimal] perfect hash lookup function * @tparam entries constexpr array of keys or key/value pairs */ template inline constexpr auto lookup = [](const auto& key) { if constexpr(constexpr lookup$magic_lut lookup{}; lookup) { return lookup(key); } else { return lookup$pext(key); } }; /** * Static perfect hash find function * @tparam entries constexpr array of keys or key/value pairs */ template inline constexpr auto find = [](const auto& key, const auto& unknown = {}) -> optional { if constexpr (entries.size() == 0u) { return unknown; } else if constexpr (entries.size() <= 64u) { return find$pext.operator()(key, unknown); } else { constexpr auto bucket_size = simd_size_v>; return find$simd.operator()(key, unknown); } }; } // namespace mph ``` ### FAQ > - Trade-offs? > > > `mph` supports different types of key/value pairs and thousands of key/value pairs, but not millions - (see [benchmarks](#benchmarks)). > > - All keys have to fit into `uint128_t`, that includes strings. > - If the above criteria are not satisfied `mph` will [SFINAE](https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error) away `lookup` function. > - In such case different backup policy should be used instead (which can be also used as customization point for user-defined `lookup` implementation), for example: > > ```cpp > template requires (entries.size() > 1'000'000) > inline constexpr auto mph::find = > [](const auto& key, const auto& unknown = {}) -> optional { ... } > ``` > > - How `mph` is working under the hood? > > > `mph` takes advantage of knowing the key/value pairs at compile-time as well as the specific hardware instructions. > The following is a pseudo code of the `lookup` algorithm for minimal perfect hash table. > > ```python > def lookup$magic_lut[entries: array](key : any, max_attempts = 100'000): > # 0. magic and lut for entries [compile-time] > nbits = sizeof(u32) * CHAR_BIT - countl_zero(max(entries.second)) > mask = (1u << nbits) - 1u; > shift = sizeof(u32) * CHAR_BIT - nbits; > lut = {}; > while max_attempts--: > magic = rand() > for k, v in entries: > lut |= v << (k * magic >> shift); > > for k, v in entries: > if (lut >> (k * magic >> shift) & mask) != v: > lut = {} > break > > assert magic != 0 and lut != 0 and shift != 0 and mask != 0 > > # 1. lookup [run-time] > return (lut >> ((key * magic) >> shift)) & mask; > ``` > > > The following is a pseudo code of the `find` algorithm for perfect hash table. > > ```python > # word: 00101011 > # mask: 11100001 > # &: 000____1 > # pext: ____0001 # intel/intrinsics-guide/index.html#text=pext > def pext(a : uN, mask : uN): > dst, m, k = ([], 0, 0) > > while m < nbits(a): > if mask[m] == 1: > dst.append(a[m]) > k += 1 > m += 1 > > return uN(dst) > ``` > > ```python > def find$pext[entries: array](key : any, unknown: any): > # 0. find mask which uniquely identifies all keys [compile-time] > mask = 0b111111... > > for i in range(nbits(mask)): > masked = [] > mask.unset(i) > > for k, v in entries: > masked.append(k & mask) > > if not unique(masked): > mask.set(i) > > assert unique(masked) > assert mask != ~mask{} > > # 1. create lookup table [compile-time] > lookup = array(typeof(entries[0]), 2**popcount(mask)) > for k, v in entries: > lookup[pext(k, mask)] = (k, v) > > # 2. lookup [run-time] # if key is a string convert to integral first (memcpy) > k, v = lookup[pext(key, mask)] > > if k == key: # cmove > return v > else: > return unknown > ``` > > ```python > def find$simd[entries: array](key : any, unknown: any): > # 0. find mask which uniquely identifies all keys [compile-time] > mask = 0b111111... > bucket_size = simd_size_v > > for i in range(nbits(mask)): > masked = [] > mask.unset(i) > > for k, v in entries: > masked.append(k & mask) > > if not unique(masked, bucket_size): > mask.set(i) > > assert unique(masked, bucket_size) > assert mask != ~mask{} > > # 1. create lookup table [compile-time] > keys = array(typeof(entries[0].first), bucket_size * 2**popcount(mask)) > values = array(typeof(entries[0].second), bucket_size * 2**popcount(mask)) > for k, v in entries: > slot = pext(k, mask) > while (keys[slot]) slot++; > keys[slot] = k > values[slot] = v > > # 2. lookup [run-time] # if key is a string convert to integral first (memcpy) > index = bucket_size * pext(key, mask) > match = k == keys[&index] # simd element-wise comparison > > if any_of(match): > return values[index + find_first_set(match)] > else: > return unknown > ``` > > > [More information](https://www.youtube.com/watch?v=kCATOctR0BA) > > - How to tweak `lookup/find` performance for my data/use case? > > > Always measure! > > - [[bmi2](https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set) ([Intel Haswell](Intel)+, [AMD Zen3](https://en.wikipedia.org/wiki/Zen_3)+)] hardware instruction acceleration is faster than software emulation. (AMD Zen2 pext takes 18 cycles, is worth disabling hardware accelerated version) > - For integral keys, use u32 or u64. > - For strings, consider aligning the input data and passing it with compile-time size via `span`, `array`. > - If all strings length is less than 4 that will be more optimized than if all string length will be less than 8 and 16. That will make the lookup table smaller and getting the value will have one instruction less. > - Experiment with different `probability` values to optimize lookups. Especially benefitial if its known that input keys are always coming from predefined `entries` (probability = 100) as it will avoid the comparison. > - Consider passing cache size alignment (`hardware_destructive_interference_size` - usually `64u`) to the `lookup/find`. That will align the underlying lookup table. > > - How to fix compilation error `constexpr evaluation hit maximum step limit`? > > > The following options can be used to increase the limits, however, compilation-times should be monitored. > > ``` > gcc: -fconstexpr-ops-limit=N > clang: -fconstexpr-steps=N > ``` > > - Is support for [bmi2](https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set) instructions required? > > > `mph` works on platforms without `bmi2` instructions which can be emulated with some limitations (*). > > ```cpp > // bmi2 > mov ecx, 789 > pext ecx, eax, ecx > ``` > > > [intel.com/pext](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=pext) / [uops.info/pext](https://uops.info/table.html?search=PEXT%20(R64%2C%20R64%2C%20R64)&cb_lat=on&cb_HSW=on&cb_BDW=on&cb_SKL=on&cb_CFL=on&cb_CLX=on&cb_ICL=on&cb_TGL=on&cb_RKL=on&cb_ZEN2=on&cb_ZEN3=on&cb_ZEN4=on&cb_measurements=on&cb_bmi=on) > > ```cpp > // no bmi2 > mov ecx, eax > and ecx, 789 > imul ecx, ecx, 57 > shr ecx, 2 > and ecx, 248 > ``` > > > https://stackoverflow.com/questions/14547087/extracting-bits-with-a-single-multiplication (*) > > - How to disable `cmov` generation? > > > Set `probability` value to something else than `50u` (default) - it means that the input data is predictable in some way and `jmp` will be generated instead. Additionaly the following compiler options can be used. > > ``` > clang: -mllvm -x86-cmov-converter=false > ``` > > - How to disable running tests at compile-time? > > > When `-DNTEST` is defined static_asserts tests wont be executed upon inclusion. > Note: Use with caution as disabling tests means that there are no gurantees upon inclusion that given compiler/env combination works as expected. > > - Similar projects? > > > [gperf](https://www.gnu.org/software/gperf), [frozen](https://github.com/serge-sans-paille/frozen), [nbperf](https://github.com/rurban/nbperf), [cmph](https://cmph.sourceforge.net), [perfecthash](https://github.com/tpn/perfecthash), [lemonhash](https://github.com/ByteHamster/LeMonHash), [pthash](https://github.com/jermp/pthash), [shockhash](https://github.com/ByteHamster/ShockHash), [burr](https://github.com/lorenzhs/BuRR), [hash-prospector](https://github.com/skeeto/hash-prospector) ### Resources > - Minimal Perfect Hashing - http://stevehanov.ca/blog/index.php?id=119 > - Pefect Hashing - https://github.com/tpn/pdfs/tree/master/Perfect%20Hashing > - Hash Functions - https://nullprogram.com/blog/2018/07/31 > - `gperf` - https://www.dre.vanderbilt.edu/~schmidt/PDF/C++-USENIX-90.pdf > - `cmph` - https://cmph.sourceforge.net/papers > - `pthash` - https://arxiv.org/pdf/2104.10402 > - `smasher` - https://github.com/rurban/smhasher ### License > - [MIT](LICENSE)