Comments (23)
You probably want to look at XXH3. The author really gets it. The benchmarks cover the space well: variable sizes, measuring latency instead of throughput.
How does aHash fare on SMHasher and particularly this fork.
from hashbrown.
Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.
I haven't compared it in a benchmark with ahash, but looking at the code, it's very clearly designed for throughput not latency. It might even beat ahash when dealing with 1mb of data (Though I would not be too confident). But that's rarely the case with a hashmap in memory. A hashmap needs to be able to get a value in just a couple of CPU cycles when dealing with a u64 or a 5 byte string for a key. That case is far more common.
from hashbrown.
I would humbly submit aHash as a possible alternative: https://github.com/tkaitchuck/aHash
It obviously is architecture specific, but it would be possible to implement a fallback without any runtime overhead.
Either way it may be worth taking a look at how aHash handles strings. Aside from the actual scrambling function, I think the way aHash extracts the data is faster. When implementing it I became convinced that FxHash could do a lot better with poorly aligned data.
from hashbrown.
@tkaitchuck aHash looks great, except for the part where it is not always supported by the CPU. Unfortunately the hash function is called often enough that doing dynamic dispatch based on the CPU features would cost too much performance.
from hashbrown.
It ought to be possible to do statically with no runtime overhead. For example it is possible define something like:
pub struct HasherState { buffer: [u64; 2] }
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "aes"))]
impl Hasher for HasherState {
// Ahash's implementation
}
#[cfg(not(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "aes")))]
impl Hasher for HasherState {
// Fallback hash implementation
}
Then to the calling code there is always one implementation of the trait, statically dispatched. Which one get's compiled just depends of which architecture you are targeting.
If that sounds interesting I could take a stab at implementing a fallback inside of the aHash repo.
from hashbrown.
I've added a fallback algorithm to aHash. The details are in the Readme. (It was inspired by FxHash), it's also keyed and should be considerably harder to attack than MermerHash2 was (or fx which is trivial). I'm still not totally happy with it's security or performance, but it's still better than anything else I'm aware of at that level of performance. Take a look. I'll continue to tweak it.
from hashbrown.
@Amanieu Do you have any thoughts? The aHash fallback algorithm speed now beats FXHash at any string longer than 4 bytes. It also is within a nano second for all primitives. (Which I think is as good as can be done. FXHash really plays fast and loose when hashing primitives. I can easily imagine a HashMap<u64, _> where the key is a 'size' or 'location' in bytes which would turn the map into a list if the values happened to be a whole number of KB.) If AESNI is available, aHash will beat FX's speed at everything except primitives (only off by 1/3rd of a ns) and strings of exactly 4 bytes.
The benchmarks included in the README are against the generic code path, so as you can see the dispatch is done completely at compile time, and it adds no overhead.
PS: Do you know how the snippet you have at the top performs on non-x86? I considered such an approach in the fallback, knowing it would get translated to a 32/64 bit multiply on Intel, but I was afraid that on other architectures it might turn into a 128bit multiply, which could really suck. -- UPDATE: It appears ARM and WebAssembly don't have such an instruction.
from hashbrown.
@tkaitchuck One of the major use cases for hash table is with integer keys. Can you provide some benchmarks when hashing simple u32
values?
from hashbrown.
Ah nevermind, I see the results for u32
in the link you provided. Can you try running the hashbrown benchmarks with the hasher replaced with aHash (with and without AES)? I am particularly interested in the effect this has on lookup performance (if any).
from hashbrown.
(Updated)
With FX:
running 7 tests
test find_existing ... bench: 4,797 ns/iter (+/- 181)
test find_nonexisting ... bench: 4,060 ns/iter (+/- 291)
test get_remove_insert ... bench: 29 ns/iter (+/- 0)
test grow_by_insertion ... bench: 166 ns/iter (+/- 22)
test hashmap_as_queue ... bench: 21 ns/iter (+/- 0)
test new_drop ... bench: 0 ns/iter (+/- 0)
test new_insert_drop ... bench: 49 ns/iter (+/- 1)
aHash without AES
running 7 tests
test find_existing ... bench: 5,713 ns/iter (+/- 180)
test find_nonexisting ... bench: 5,473 ns/iter (+/- 95)
test get_remove_insert ... bench: 65 ns/iter (+/- 1)
test grow_by_insertion ... bench: 131 ns/iter (+/- 20)
test hashmap_as_queue ... bench: 42 ns/iter (+/- 0)
test new_drop ... bench: 0 ns/iter (+/- 0)
test new_insert_drop ... bench: 52 ns/iter (+/- 0)
aHash with AES:
running 7 tests
test find_existing ... bench: 5,401 ns/iter (+/- 109)
test find_nonexisting ... bench: 5,025 ns/iter (+/- 87)
test get_remove_insert ... bench: 36 ns/iter (+/- 1)
test grow_by_insertion ... bench: 140 ns/iter (+/- 23)
test hashmap_as_queue ... bench: 23 ns/iter (+/- 0)
test new_drop ... bench: 0 ns/iter (+/- 0)
test new_insert_drop ... bench: 43 ns/iter (+/- 0)
There is a bit of a performance hit because all those run with i32 keys which are just not as fast in aHash as in FX. A few things seem notable:
- There is a bit of a reduction in variance in find_nonexisting.
- The biggest hit is in get_remove_insert in the fallback. (I don't fully understand this, but it does not appear to be strongly influenced to tweaks to the algorithm)
- grow_by_insertion sees a performance boost with aHash with or without AES despite the lower i32 perf. I don't know what that could be if not collisions.
I could improve the performance of the fallback hasher on primitive types like i32 if it could be guaranteed the the hasher is only ever passed a single value. The only way I can think to make such a thing work, is to have the user use a macro to instantiate the map, where the macro looks at the key and creates one hasherBuilder for known primitives and a different one if it is something else. This strikes me a too invasive, as it requires changing users' code.
from hashbrown.
@Amanieu I figured out a way to speed up primitives (without changing the actual algorithm!), and released 0.1.13. (benchmarks are in the README) and I updated my comment above to reflect the impact on hashbrown using it.
from hashbrown.
@Amanieu what is the next step? Should I create a pull request? Do you want to see some string benchmarks? A comparison of a number of different hashes?
from hashbrown.
Sorry for the late response, I've been busy with some things lately.
I am hesitant to use aHash directly as the default hasher because it is slower than the current hasher. However if it is as secure as SipHash it might be a good idea to use it as a replacement for the default hasher in libstd.
from hashbrown.
Update: I was able to improve the performance a the fallback algorithm a bit. I've also submitted a PR for the additional benchmarks I've been comparing with. Here are what I observe locally (FxHash on the left AHash on the right.):
FXHash vs Ahash with fallback:
find_existing 4764 (+/- 270) ns 5050 (+/- 122) ns 106.00%
find_existing_high_bits 145,878 (+/- 3,905) ns 5451 (+/- 210) ns 3.74%
find_nonexisting 4066 (+/- 154) ns 4416 (+/- 73) ns 108.61%
get_remove_insert 30 (+/- 1) ns 32 (+/- 1) ns 106.67%
grow_by_insertion 152 (+/- 18) ns 134 (+/- 13) ns 88.16%
grow_by_insertion_kb 248 (+/- 361) ns 178 (+/- 5) ns 71.77%
hashmap_as_queue 21 (+/- 0) ns 22 (+/- 0) ns 104.76%
insert_8_char_string 12,704 (+/- 429) ns 9692 (+/- 174) ns 76.29%
new_insert_drop 73 (+/- 2) ns 73 (+/- 1) ns 100.00%
FXHash vs Ahash with AES
find_existing 4764 (+/- 270) ns 5253 (+/- 106) ns 110.26%
find_existing_high_bits 145,878 (+/- 3,905) ns 6106 (+/- 212) ns 4.19%
find_nonexisting 4066 (+/- 154) ns 5150 (+/- 88) ns 126.66%
get_remove_insert 30 (+/- 1) ns 37 (+/- 0) ns 123.33%
grow_by_insertion 152 (+/- 18) ns 137 (+/- 25) ns 90.13%
grow_by_insertion_kb 248 (+/- 361) ns 145 (+/- 3) ns 58.47%
hashmap_as_queue 21 (+/- 0) ns 23 (+/- 0) ns 109.52%
insert_8_char_string 12,704 (+/- 429) ns 8129 (+/- 196) ns 63.99%
new_insert_drop 73 (+/- 2) ns 64 (+/- 2) ns 87.67%
With this new version, the performance gap is a lot narrower, and with strings > 4 characters aHash is faster. Also FxHash really falls apart on the find_existing_high_bits test. I realize that isn't "normal" data but as a user I would never expect that level of performance drop (vs find_existing).
@Amanieu Is there a good way to actually get a representative sample of what users actually USE as their keys? That would allow us to make a benchmark that is a lot less abstract.
from hashbrown.
@tkaitchuck Your benchmarks (in the aHash
repo) seem to use the fxhash
crate, which is not the same as the FxHash
in hashbrown
or rustc_hash
. Are you using the hashbrown
version here? Testing with this PR would be interesting too.
from hashbrown.
@Zoxc AHash's readme refers to the public crate. The above was using the version in this repo.
Your comment did get me wondering if there was a performance difference: There is.
In my test they are the identical for u8-u128 and strings 68 bytes and up. However for small strings this version is much better (Though it doesn't have that one really odd good point at 4 bytes)
u8 time: 1.1536 ns
u16 time: 1.1517 ns
u32 time: 1.1483 ns
u64 time: 1.1584 ns
u128 time: 1.4320 ns
1 bytes time: 1.8159 ns
3 bytes time: 2.5453 ns
4 bytes time: 2.3543 ns
7 bytes time: 2.9000 ns
8 bytes time: 2.6467 ns
15 bytes time: 3.8354 ns
16 bytes time: 2.9812 ns
24 bytes time: 3.9831 ns
68 bytes time: 6.8920 ns
132 bytes time: 14.674 ns
1024 bytes time: 205.61 ns
(Still not as good as aHash with or without fallback for strings, but much better overall)
from hashbrown.
After thinking about this a bit and looking at alternatives, I think that AHash would be great as the default hasher for hashbrown. However there are a few issues that currently prevent me from doing so:
- The
ahash
crate needs to be#![no_std]
. - The
ahash
crate needs to compile on stable, and on Rust 1.31 which is the minimum Rust version for hashbrown.
from hashbrown.
ping @tkaitchuck
from hashbrown.
@alkis there is an open issue to do that here: tkaitchuck/aHash#6
If you are interested in contributing the needed glue code, it would be very much appreciated.
from hashbrown.
@Amanieu aHash now builds on stable and is no-std.
from hashbrown.
Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.
from hashbrown.
Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.I haven't compared it in a benchmark with ahash, but looking at the code, it's very clearly designed for throughput not latency. It might even beat ahash when dealing with 1mb of data (Though I would not be too confident). But that's rarely the case with a hashmap in memory. A hashmap needs to be able to get a value in just a couple of CPU cycles when dealing with a u64 or a 5 byte string for a key. That case is far more common.
Are you talking about meow or XXH3? If the latter I find this surprising because the blogpost I linked above specifically discusses latency for small inputs as a specific design goal.
from hashbrown.
Are you talking about meow or XXH3?
No I wasn't. I see that. H3 now adds a fast path for short inputs, and it looks eerily familiar. It uses many of the same tricks I do in aHash. It says a lot about the constraints of the problem that independent solutions can endup so similar.
from hashbrown.
Related Issues (20)
- Why the identity function can be used as unlikely function? HOT 3
- `hashbrown` fails to compile as a transitive dependency HOT 2
- allocator-api2 default-feature? HOT 2
- Compiling hashbrown 0.14.2 for aarch64-unknown-linux-gnu with "target-cpu=cortex-a53" generates illegal instructions HOT 2
- Switching to GxHash? HOT 9
- Feature: increase capacity according to the actual size returned by the allocator HOT 2
- Hashbrown crash due to bad malloc HOT 1
- 0.14.3 - no method named `clear` found for struct `HashMap` in the current scope HOT 5
- Benchmark biaised due to no fence around input
- assertion failed: buckets.is_power_of_two() HOT 8
- Build breaks on nightly due to use of `stdsimd` rust feature in ahash 0.8.6 HOT 2
- Was swap-remove behavior ever considered when removing entries? HOT 10
- Consider returning to 1.63.0 MSRV HOT 1
- How to calculate the size of the hashbrown::HashMap at runtime? HOT 1
- LLVM failed to use the knowledge from a never-overflow assumption HOT 13
- Library test `map::test_map::test_clone_from_memory_leaks` errors with using uninitialized data under valgrind and miri
- update to ahash 0.8.7 or after to use new stdsimd feature portable_simd HOT 1
- Insertion performance with arena allocators HOT 3
- Do not grow the raw table when lots of deletion and insertion is performed HOT 3
- Unsound usages of unsafe implementation from `usize` to `T` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hashbrown.