Giter VIP home page Giter VIP logo

Comments (23)

alkis avatar alkis commented on July 21, 2024 4

You probably want to look at XXH3. The author really gets it. The benchmarks cover the space well: variable sizes, measuring latency instead of throughput.

How does aHash fare on SMHasher and particularly this fork.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024 1

Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.

I haven't compared it in a benchmark with ahash, but looking at the code, it's very clearly designed for throughput not latency. It might even beat ahash when dealing with 1mb of data (Though I would not be too confident). But that's rarely the case with a hashmap in memory. A hashmap needs to be able to get a value in just a couple of CPU cycles when dealing with a u64 or a 5 byte string for a key. That case is far more common.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

I would humbly submit aHash as a possible alternative: https://github.com/tkaitchuck/aHash
It obviously is architecture specific, but it would be possible to implement a fallback without any runtime overhead.
Either way it may be worth taking a look at how aHash handles strings. Aside from the actual scrambling function, I think the way aHash extracts the data is faster. When implementing it I became convinced that FxHash could do a lot better with poorly aligned data.

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

@tkaitchuck aHash looks great, except for the part where it is not always supported by the CPU. Unfortunately the hash function is called often enough that doing dynamic dispatch based on the CPU features would cost too much performance.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

It ought to be possible to do statically with no runtime overhead. For example it is possible define something like:

pub struct HasherState { buffer: [u64; 2] }
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "aes"))]
impl Hasher for HasherState {
    // Ahash's implementation
}
#[cfg(not(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "aes")))]
impl Hasher for HasherState {
    // Fallback hash implementation
}

Then to the calling code there is always one implementation of the trait, statically dispatched. Which one get's compiled just depends of which architecture you are targeting.

If that sounds interesting I could take a stab at implementing a fallback inside of the aHash repo.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

I've added a fallback algorithm to aHash. The details are in the Readme. (It was inspired by FxHash), it's also keyed and should be considerably harder to attack than MermerHash2 was (or fx which is trivial). I'm still not totally happy with it's security or performance, but it's still better than anything else I'm aware of at that level of performance. Take a look. I'll continue to tweak it.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@Amanieu Do you have any thoughts? The aHash fallback algorithm speed now beats FXHash at any string longer than 4 bytes. It also is within a nano second for all primitives. (Which I think is as good as can be done. FXHash really plays fast and loose when hashing primitives. I can easily imagine a HashMap<u64, _> where the key is a 'size' or 'location' in bytes which would turn the map into a list if the values happened to be a whole number of KB.) If AESNI is available, aHash will beat FX's speed at everything except primitives (only off by 1/3rd of a ns) and strings of exactly 4 bytes.

The benchmarks included in the README are against the generic code path, so as you can see the dispatch is done completely at compile time, and it adds no overhead.

PS: Do you know how the snippet you have at the top performs on non-x86? I considered such an approach in the fallback, knowing it would get translated to a 32/64 bit multiply on Intel, but I was afraid that on other architectures it might turn into a 128bit multiply, which could really suck. -- UPDATE: It appears ARM and WebAssembly don't have such an instruction.

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

@tkaitchuck One of the major use cases for hash table is with integer keys. Can you provide some benchmarks when hashing simple u32 values?

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

Ah nevermind, I see the results for u32 in the link you provided. Can you try running the hashbrown benchmarks with the hasher replaced with aHash (with and without AES)? I am particularly interested in the effect this has on lookup performance (if any).

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

(Updated)

With FX:
running 7 tests
test find_existing     ... bench:       4,797 ns/iter (+/- 181)
test find_nonexisting  ... bench:       4,060 ns/iter (+/- 291)
test get_remove_insert ... bench:          29 ns/iter (+/- 0)
test grow_by_insertion ... bench:         166 ns/iter (+/- 22)
test hashmap_as_queue  ... bench:          21 ns/iter (+/- 0)
test new_drop          ... bench:           0 ns/iter (+/- 0)
test new_insert_drop   ... bench:          49 ns/iter (+/- 1)

aHash without AES
running 7 tests
test find_existing     ... bench:       5,713 ns/iter (+/- 180)
test find_nonexisting  ... bench:       5,473 ns/iter (+/- 95)
test get_remove_insert ... bench:          65 ns/iter (+/- 1)
test grow_by_insertion ... bench:         131 ns/iter (+/- 20)
test hashmap_as_queue  ... bench:          42 ns/iter (+/- 0)
test new_drop          ... bench:           0 ns/iter (+/- 0)
test new_insert_drop   ... bench:          52 ns/iter (+/- 0)

aHash with AES:
running 7 tests
test find_existing     ... bench:       5,401 ns/iter (+/- 109)
test find_nonexisting  ... bench:       5,025 ns/iter (+/- 87)
test get_remove_insert ... bench:          36 ns/iter (+/- 1)
test grow_by_insertion ... bench:         140 ns/iter (+/- 23)
test hashmap_as_queue  ... bench:          23 ns/iter (+/- 0)
test new_drop          ... bench:           0 ns/iter (+/- 0)
test new_insert_drop   ... bench:          43 ns/iter (+/- 0)

There is a bit of a performance hit because all those run with i32 keys which are just not as fast in aHash as in FX. A few things seem notable:

  • There is a bit of a reduction in variance in find_nonexisting.
  • The biggest hit is in get_remove_insert in the fallback. (I don't fully understand this, but it does not appear to be strongly influenced to tweaks to the algorithm)
  • grow_by_insertion sees a performance boost with aHash with or without AES despite the lower i32 perf. I don't know what that could be if not collisions.

I could improve the performance of the fallback hasher on primitive types like i32 if it could be guaranteed the the hasher is only ever passed a single value. The only way I can think to make such a thing work, is to have the user use a macro to instantiate the map, where the macro looks at the key and creates one hasherBuilder for known primitives and a different one if it is something else. This strikes me a too invasive, as it requires changing users' code.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@Amanieu I figured out a way to speed up primitives (without changing the actual algorithm!), and released 0.1.13. (benchmarks are in the README) and I updated my comment above to reflect the impact on hashbrown using it.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@Amanieu what is the next step? Should I create a pull request? Do you want to see some string benchmarks? A comparison of a number of different hashes?

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

Sorry for the late response, I've been busy with some things lately.

I am hesitant to use aHash directly as the default hasher because it is slower than the current hasher. However if it is as secure as SipHash it might be a good idea to use it as a replacement for the default hasher in libstd.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

Update: I was able to improve the performance a the fallback algorithm a bit. I've also submitted a PR for the additional benchmarks I've been comparing with. Here are what I observe locally (FxHash on the left AHash on the right.):

FXHash                               vs		   Ahash with fallback:	
find_existing    	   4764	(+/- 270) ns	   5050	(+/- 122) ns	106.00%
find_existing_high_bits	145,878	(+/- 3,905) ns	   5451	(+/- 210) ns	  3.74%
find_nonexisting	   4066	(+/- 154) ns	   4416	(+/- 73) ns	108.61%
get_remove_insert	     30	(+/- 1) ns	     32	(+/- 1) ns 	106.67%
grow_by_insertion	    152	(+/- 18) ns	    134	(+/- 13) ns	 88.16%
grow_by_insertion_kb	    248	(+/- 361) ns	    178	(+/- 5) ns 	 71.77%
hashmap_as_queue	     21	(+/- 0) ns  	     22	(+/- 0) ns 	104.76%
insert_8_char_string	 12,704	(+/- 429) ns	   9692	(+/- 174) ns	 76.29%
new_insert_drop              73	(+/- 2) ns  	     73	(+/- 1) ns 	100.00%

FXHash                               vs		   Ahash with AES
find_existing              4764 (+/- 270) ns	   5253	(+/- 106) ns	110.26%
find_existing_high_bits	145,878	(+/- 3,905) ns	   6106	(+/- 212) ns	  4.19%
find_nonexisting	   4066	(+/- 154) ns	   5150	(+/- 88) ns	126.66%
get_remove_insert	     30	(+/- 1) ns	     37	(+/- 0) ns 	123.33%
grow_by_insertion	    152	(+/- 18) ns	    137	(+/- 25) ns	 90.13%
grow_by_insertion_kb	    248	(+/- 361) ns	    145	(+/- 3) ns 	 58.47%
hashmap_as_queue	     21	(+/- 0) ns  	     23	(+/- 0) ns 	109.52%
insert_8_char_string	 12,704	(+/- 429) ns	   8129	(+/- 196) ns	 63.99%
new_insert_drop     	     73	(+/- 2) ns  	     64	(+/- 2) ns 	 87.67%

With this new version, the performance gap is a lot narrower, and with strings > 4 characters aHash is faster. Also FxHash really falls apart on the find_existing_high_bits test. I realize that isn't "normal" data but as a user I would never expect that level of performance drop (vs find_existing).

@Amanieu Is there a good way to actually get a representative sample of what users actually USE as their keys? That would allow us to make a benchmark that is a lot less abstract.

from hashbrown.

Zoxc avatar Zoxc commented on July 21, 2024

@tkaitchuck Your benchmarks (in the aHash repo) seem to use the fxhash crate, which is not the same as the FxHash in hashbrown or rustc_hash. Are you using the hashbrown version here? Testing with this PR would be interesting too.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@Zoxc AHash's readme refers to the public crate. The above was using the version in this repo.
Your comment did get me wondering if there was a performance difference: There is.
In my test they are the identical for u8-u128 and strings 68 bytes and up. However for small strings this version is much better (Though it doesn't have that one really odd good point at 4 bytes)

u8                   time:    1.1536 ns                    
u16                  time:    1.1517 ns                     
u32                  time:    1.1483 ns                     
u64                  time:    1.1584 ns                     
u128                 time:    1.4320 ns                      
  1 bytes            time:    1.8159 ns                            
  3 bytes            time:    2.5453 ns                              
  4 bytes            time:    2.3543 ns                               
  7 bytes            time:    2.9000 ns                                  
  8 bytes            time:    2.6467 ns                                                                         
  15 bytes           time:    3.8354 ns 
  16 bytes           time:    2.9812 ns                                  
  24 bytes           time:    3.9831 ns 
  68 bytes           time:    6.8920 ns 
 132 bytes           time:    14.674 ns 
1024 bytes           time:    205.61 ns 

(Still not as good as aHash with or without fallback for strings, but much better overall)

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

After thinking about this a bit and looking at alternatives, I think that AHash would be great as the default hasher for hashbrown. However there are a few issues that currently prevent me from doing so:

  • The ahash crate needs to be #![no_std].
  • The ahash crate needs to compile on stable, and on Rust 1.31 which is the minimum Rust version for hashbrown.

from hashbrown.

Amanieu avatar Amanieu commented on July 21, 2024

ping @tkaitchuck

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@alkis there is an open issue to do that here: tkaitchuck/aHash#6
If you are interested in contributing the needed glue code, it would be very much appreciated.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

@Amanieu aHash now builds on stable and is no-std.

from hashbrown.

LifeIsStrange avatar LifeIsStrange commented on July 21, 2024

Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.

from hashbrown.

alkis avatar alkis commented on July 21, 2024

Cyan4973/xxHash#155
Meow seems to be the fastest non cryptographic hash algorithm to exist, at least when hardware SIMD is available.

I haven't compared it in a benchmark with ahash, but looking at the code, it's very clearly designed for throughput not latency. It might even beat ahash when dealing with 1mb of data (Though I would not be too confident). But that's rarely the case with a hashmap in memory. A hashmap needs to be able to get a value in just a couple of CPU cycles when dealing with a u64 or a 5 byte string for a key. That case is far more common.

Are you talking about meow or XXH3? If the latter I find this surprising because the blogpost I linked above specifically discusses latency for small inputs as a specific design goal.

from hashbrown.

tkaitchuck avatar tkaitchuck commented on July 21, 2024

Are you talking about meow or XXH3?

No I wasn't. I see that. H3 now adds a fast path for short inputs, and it looks eerily familiar. It uses many of the same tricks I do in aHash. It says a lot about the constraints of the problem that independent solutions can endup so similar.

from hashbrown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.