| # Foldhash |
| |
| This repository contains foldhash, a fast, non-cryptographic, minimally |
| DoS-resistant hashing algorithm implemented in Rust designed for computational |
| uses such as hash maps, bloom filters, count sketching, etc. |
| |
| When should you **not** use foldhash: |
| |
| - You are afraid of people studying your long-running program's behavior to |
| reverse engineer its internal random state and using this knowledge to create |
| many colliding inputs for computational complexity attacks. For more details |
| see the section "HashDoS resistance". |
| |
| - You expect foldhash to have a consistent output across versions or |
| platforms, such as for persistent file formats or communication protocols. |
| |
| - You are relying on foldhash's properties for any kind of security. |
| Foldhash is **not appropriate for any cryptographic purpose**. |
| |
| Foldhash has two variants, one optimized for speed which is ideal for data |
| structures such as hash maps and bloom filters, and one optimized for |
| statistical quality which is ideal for algorithms such as |
| [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) and |
| [MinHash](https://en.wikipedia.org/wiki/MinHash). |
| |
| Foldhash can be used in a `#![no_std]` environment by disabling its default |
| `"std"` feature. |
| |
| |
| ## Performance |
| |
| We evaluated foldhash against three commonly used hashes in Rust: |
| [aHash](https://github.com/tkaitchuck/aHash) v0.8.11, |
| [fxhash](https://github.com/cbreeden/fxhash) v0.2.1, and |
| [SipHash-1-3](https://en.wikipedia.org/wiki/SipHash), the default hash algorithm |
| in Rust at the time of writing. We evaluated both variants foldhash provides, |
| `foldhash-f` and `foldhash-q`, which correspond to `foldhash::fast` and |
| `foldhash::quality` in the crate respectively. |
| |
| First we note that hashers with random state inflate the size of your `HashMap`, |
| which may or may not be important for your performance: |
| ```rust |
| std::mem::size_of::<foldhash::HashMap<u32, u32>>() = 40 // (both variants) |
| std::mem::size_of::<ahash::HashMap<u32, u32>>() = 64 |
| std::mem::size_of::<fxhash::FxHashMap<u32, u32>>() = 32 |
| std::mem::size_of::<std::collections::HashMap<u32, u32>>() = 48 |
| ``` |
| |
| We tested runtime performance on two machines, one with a 2023 Apple M2 CPU, one |
| with a 2023 Intel Xeon Platinum 8481C server CPU, both with stable Rust 1.80.1. |
| Since one of our competitors (aHash) is reliant on AES-based instructions for |
| optimal performance we have included both a benchmark with and without |
| `-C target-cpu=native` for the Intel machine. |
| |
| We tested across a wide variety of data types we consider representative of |
| types / distributions one might hash in the real world, in the context of a hash |
| table key: |
| |
| - `u32` - random 32-bit unsigned integers, |
| - `u32pair` - pairs of random 32-bit unsigned integers, |
| - `u64` - random 64-bit unsigned integers, |
| - `u64pair` - pairs of random 64-bit unsigned integers, |
| - `u64lobits` - 64-bit unsigned integers where only the bottom 16 bits vary, |
| - `u64hibits` - 64-bit unsigned integers where only the top 16 bits vary, |
| - `ipv4` - [`std::net::Ipv4Addr`](https://doc.rust-lang.org/std/net/struct.Ipv4Addr.html), which is equivalent to `[u8; 4]`, |
| - `ipv6` - [`std::net::Ipv6Addr`](https://doc.rust-lang.org/std/net/struct.Ipv6Addr.html), which is equivalent to `[u8; 16]`, |
| - `rgba` - random `(u8, u8, u8, u8)` tuples, |
| - `strenglishword` - strings containing words sampled uniformly from the top 10,000 most common English words, |
| - `struuid` - random UUIDs, hashed in string representation, |
| - `strurl` - strings containing URLs sampled uniformly from a corpus of 10,000 URLs, |
| - `strdate` - random `YYYY-MM-DD` date strings, |
| - `accesslog` - `(u128, u32, chrono::NaiveDate, bool)`, meant to simulate a typical |
| larger compound type, in this case `(resource_id, user_id, date, success)` |
| for an access log. |
| - `kilobyte` - random bytestrings one kilobyte in length, |
| - `tenkilobyte` - random bytestrings ten kilobytes in length. |
| |
| We tested the performance of hashing the above data types in the following four contexts: |
| |
| - `hashonly` - only the time it takes to hash the value, |
| - `lookupmiss` - the time it takes to do a lookup in a 1,000 element hash map |
| of random elements, only sampling keys of which we know that are not in the hash map, |
| - `lookuphit` - similar to `lookupmiss`, except the keys are sampled from keys |
| known to be in the hash map, |
| - `setbuild` - the time it takes to construct a `HashSet` of 1,000 elements |
| from 1,000 randomly sampled elements each repeated 10 times (so 10,000 inserts, |
| with ~90% duplicates). |
| |
| All times are reported as expected time per operation, so one hash, one lookup, |
| or one insert respectively. The full results [can be found |
| here](https://gist.github.com/orlp/1271ad5b8b775c651cc55773888858eb). To |
| summarize, we will only show the results for `u64` and `strenglishword` here, as |
| well as the observed geometric mean and average rank over the full benchmark. |
| |
| ``` |
| Xeon 8481c |
| |
| ┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ |
| │ avg_rank ┆ 1.58 ┆ 2.66 ┆ 2.09 ┆ 3.70 ┆ 4.97 │ |
| │ geometric_mean ┆ 6.21 ┆ 7.01 ┆ 7.56 ┆ 8.74 ┆ 28.70 │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ u64 ┆ hashonly ┆ 0.79 ┆ 1.03 ┆ 0.67 ┆ 1.23 ┆ 9.09 │ |
| │ u64 ┆ lookupmiss ┆ 2.01 ┆ 2.44 ┆ 1.73 ┆ 2.73 ┆ 12.03 │ |
| │ u64 ┆ lookuphit ┆ 3.04 ┆ 3.59 ┆ 2.64 ┆ 3.84 ┆ 12.65 │ |
| │ u64 ┆ setbuild ┆ 6.13 ┆ 6.52 ┆ 4.88 ┆ 6.66 ┆ 17.80 │ |
| | ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | |
| │ strenglishword ┆ hashonly ┆ 2.63 ┆ 2.98 ┆ 3.24 ┆ 3.57 ┆ 11.87 │ |
| │ strenglishword ┆ lookupmiss ┆ 4.63 ┆ 5.03 ┆ 4.51 ┆ 5.86 ┆ 15.19 │ |
| │ strenglishword ┆ lookuphit ┆ 8.62 ┆ 9.25 ┆ 8.28 ┆ 10.06 ┆ 21.35 │ |
| │ strenglishword ┆ setbuild ┆ 14.77 ┆ 15.57 ┆ 18.86 ┆ 15.72 ┆ 35.36 │ |
| └────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ |
| |
| Xeon 8481c with RUSTFLAGS="-C target-cpu=native" |
| |
| ┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ |
| │ avg_rank ┆ 1.89 ┆ 3.12 ┆ 2.25 ┆ 2.77 ┆ 4.97 │ |
| │ geometric_mean ┆ 6.00 ┆ 6.82 ┆ 7.39 ┆ 6.94 ┆ 29.49 │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ u64 ┆ hashonly ┆ 0.79 ┆ 1.01 ┆ 0.67 ┆ 1.34 ┆ 9.24 │ |
| │ u64 ┆ lookupmiss ┆ 1.68 ┆ 2.12 ┆ 1.62 ┆ 1.96 ┆ 12.04 │ |
| │ u64 ┆ lookuphit ┆ 2.68 ┆ 3.19 ┆ 2.28 ┆ 3.16 ┆ 13.09 │ |
| │ u64 ┆ setbuild ┆ 6.16 ┆ 6.42 ┆ 4.75 ┆ 7.03 ┆ 18.88 │ |
| | ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | |
| │ strenglishword ┆ hashonly ┆ 2.60 ┆ 2.97 ┆ 3.25 ┆ 3.04 ┆ 11.58 │ |
| │ strenglishword ┆ lookupmiss ┆ 4.41 ┆ 4.96 ┆ 4.82 ┆ 4.79 ┆ 32.31 │ |
| │ strenglishword ┆ lookuphit ┆ 8.68 ┆ 9.35 ┆ 8.46 ┆ 8.63 ┆ 21.48 │ |
| │ strenglishword ┆ setbuild ┆ 15.01 ┆ 16.34 ┆ 19.34 ┆ 15.37 ┆ 35.22 │ |
| └────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ |
| |
| Apple M2 |
| |
| ┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ |
| │ avg_rank ┆ 1.62 ┆ 2.81 ┆ 2.02 ┆ 3.58 ┆ 4.97 │ |
| │ geometric_mean ┆ 4.41 ┆ 4.86 ┆ 5.39 ┆ 5.71 ┆ 21.94 │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ |
| ╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ |
| │ u64 ┆ hashonly ┆ 0.60 ┆ 0.70 ┆ 0.41 ┆ 0.78 ┆ 6.61 │ |
| │ u64 ┆ lookupmiss ┆ 1.50 ┆ 1.61 ┆ 1.23 ┆ 1.65 ┆ 8.28 │ |
| │ u64 ┆ lookuphit ┆ 1.78 ┆ 2.10 ┆ 1.57 ┆ 2.25 ┆ 8.53 │ |
| │ u64 ┆ setbuild ┆ 4.74 ┆ 5.19 ┆ 3.61 ┆ 5.38 ┆ 15.36 │ |
| | ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | |
| │ strenglishword ┆ hashonly ┆ 1.84 ┆ 2.13 ┆ 1.85 ┆ 2.13 ┆ 11.61 │ |
| │ strenglishword ┆ lookupmiss ┆ 2.71 ┆ 2.96 ┆ 2.47 ┆ 2.99 ┆ 9.27 │ |
| │ strenglishword ┆ lookuphit ┆ 7.54 ┆ 8.77 ┆ 7.83 ┆ 8.77 ┆ 18.65 │ |
| │ strenglishword ┆ setbuild ┆ 16.61 ┆ 17.09 ┆ 14.83 ┆ 16.52 ┆ 26.42 │ |
| └────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ |
| ``` |
| |
| We note from the above benchmark that for hash table performance the extra |
| quality that `foldhash-q` provides is almost never actually worth the small but |
| also non-negligible computational overhead it has over `foldhash-f`. This is our |
| justification for providing `foldhash::fast` as a default choice for hash |
| tables, even though it has measurable biases (see also the "Quality" section). |
| |
| fxhash generally does fairly well for small inputs on the benchmarks, however it |
| has structural weaknesses as a hash which makes it ill-advised to use as a |
| general-purpose hash function in our opinion. For example the `lookuphit` |
| benchmark on Apple M2 for `u64hibits` takes 1.77 nanoseconds per lookup for |
| foldhash, but 67.72 nanoseconds for fxhash (due to everything colliding - the |
| effects would be even worse with a larger hash map). In our opinion foldhash-f |
| strikes the right balance between quality and performance for hash tables, |
| whereas fxhash flies a bit too close to the sun. |
| |
| aHash is faster than foldhash for medium-long strings when compiled with |
| AES instruction support, but is slower in almost every other scenario or when |
| AES instructions are unavailable. |
| |
| |
| ## Quality |
| |
| Foldhash-f is a fairly strong hash in terms of collisions *on its full 64-bit |
| output*. However, statistical tests such as |
| [SMHasher3](https://gitlab.com/fwojcik/smhasher3) can distinguish it from an ideal |
| hash function in tests that focus on the relationship between individual |
| input/output bits. One such property is avalanching: changing a single bit in |
| the input does not flip every other bit with 50% probability when using |
| foldhash-f like it should if it behaved like a proper random oracle. |
| |
| As the benchmarks above show, spending more effort in foldhash-f to improve the |
| hash quality does not lead to better hash table performance. However, there are |
| also use cases for hash functions where it is important that (each bit of) the |
| hash is unbiased and a random function of all bits of the input, such as in |
| algorithms as HyperLogLog or MinHash. |
| |
| For this purpose we also provide foldhash-q, which is simply a post-processed |
| version of foldhash-f to properly avalanche all the bits. Foldhash-q passes the |
| [SMHasher3](https://gitlab.com/fwojcik/smhasher3) test suite [without any |
| failures](https://github.com/orlp/foldhash_smhasher3). You can also plot the |
| worst-case probability (where 50% is ideal) that any particular output bit flips |
| if you flip an input bit, which nicely visualizes how fxhash and foldhash-f |
| fail this avalanche property but foldhash-q and SipHash-1-3 pass: |
| |
| |
| | FxHash | Foldhash-f | Foldhash-q | SipHash-1-3 | |
| |--------|------------|------------|-------------| |
| | <img src="assets/avalanche-fxhash.png" width=300> | <img src="assets/avalanche-foldhash-fast.png" width=300> | <img src="assets/avalanche-foldhash-quality.png" width=300> | <img src="assets/avalanche-siphash.png" width=300> |
| |
| |
| ## Background |
| |
| The name foldhash is derived from the *folded multiply*. This technique |
| compresses two 64-bit words into a single 64-bit word while simultaneously |
| thoroughly mixing the bits. It does this using a 64 x 64 bit -> 128 bit |
| multiplication followed by folding the two halves of the 128-bit product |
| together using a XOR operation: |
| |
| ```rust |
| let full = (x as u128) * (y as u128); |
| let lo = full as u64; |
| let hi = (full >> 64) as u64; |
| let folded = lo ^ hi; |
| ``` |
| |
| We're not aware of a formal analysis of this operation, but empirically it works |
| very well. An informal intuition for why it should work is that multiplication |
| can be seen as the sum of many shifted copies of one of the arguments, only |
| including those shifted copies where the other argument has set bits, e.g. for |
| multiplying 4-bit words `abcd` and `efgh`: |
| |
| ``` |
| abcd * efgh = |
| |
| abcd * e |
| abcd * f |
| abcd * g |
| abcd * h |
| --------------- + |
| ``` |
| |
| Note that the middle bits of the product are a function of many of the input |
| bits, whereas the top-most and bottom-most bits are impacted by fewer of the |
| input bits. By folding the top half back onto the bottom half these effects |
| compensate each other, making all the output bits affected by much of the input. |
| |
| We did not invent the folded multiply, it was previously used in essentially the |
| same way in [aHash](https://github.com/tkaitchuck/aHash), |
| [wyhash](https://github.com/wangyi-fudan/wyhash), and |
| [xxhash3](https://github.com/Cyan4973/xxHash). The operation was also used |
| in [mum-hash](https://github.com/vnmakarov/mum-hash), and probably others. |
| We do not know who originally invented it, the earliest reference |
| we could find was Steven Fuerst [blogging about it](https://web.archive.org/web/20121213174842/http://locklessinc.com/articles/crypto_hash/) |
| in 2012. |
| |
| |
| ## HashDoS resistance |
| |
| The folded multiply has a fairly glaring flaw: if one of the halves is zero, the |
| output is zero. This makes it trivial to create a large number of hash |
| collisions (even by accident, as zeroes are a common input to hashes). To combat |
| this, every folded multiply in foldhash has the following form: |
| |
| ```rust |
| folded_multiply(input1 ^ secret1, input2 ^ secret2) |
| ``` |
| |
| Here `secret1` or `secret2` are either secret random numbers generated by |
| foldhash beforehand, or partial hash results influenced by such a secret prior. |
| This (plus other careful design throughout the hash function) ensures that it is |
| not possible to create a list of inputs that collide for every instance of |
| foldhash, and also prevents certain access patterns on hash tables going |
| quadratric by ensuring that each hash table uses a different seed and thus a |
| different access pattern. It is these two properties that we refer to when we |
| claim foldhash is "minimally DoS-resistant": it does the bare minimum to defeat |
| very simple attacks. |
| |
| However, to be crystal clear, **foldhash does not claim to provide HashDoS |
| resistance against interactive attackers**. For a student of cryptography it |
| should be trivial to derive the secret values from direct observation of hash |
| outputs, and feasible to derive the secret values from indirect observation of |
| hashes, such as through timing attacks or hash table iteration. Once an attacker |
| knows the secret values, they can once again create infinite hash collisions |
| with ease. |