Comments (8)
Another feature I'd appreciate is to be able to somehow pass a mutable environment through to both closures since I somehow need to reference the virtual machine being used. This currently can't be done because eq
and hasher
refer to separate closures and both can't capture the same variables mutably, so I currently have to resort to using thread locals.
This could be accomplished by either passing an environment into the closures or using a different common trait for equality checks and hashing.
impl RawTable {
pub fn find_or_find_insert_slot2<C: ?Sized, E>(
&mut self,
ctx: &mut C,
hash: u64,
mut eq: impl FnMut(&mut C, &T) -> Result<bool, E>,
hasher: impl Fn(&mut C, &T) -> Result<u64, E>,
) -> Result<Result<Bucket<T>, InsertSlot>, E> {
todo!()
}
}
EDIT: Here are benchmarks and size comparisons for my branch which includes fallible eq
, hasher
and passing around a mutable context:
Benches without the changes
test clone_from_large ... bench: 6,256 ns/iter (+/- 29)
test clone_from_small ... bench: 65 ns/iter (+/- 0)
test clone_large ... bench: 6,509 ns/iter (+/- 40)
test clone_small ... bench: 111 ns/iter (+/- 0)
test grow_insert_ahash_highbits ... bench: 20,852 ns/iter (+/- 154)
test grow_insert_ahash_random ... bench: 20,862 ns/iter (+/- 159)
test grow_insert_ahash_serial ... bench: 20,816 ns/iter (+/- 161)
test grow_insert_std_highbits ... bench: 37,082 ns/iter (+/- 711)
test grow_insert_std_random ... bench: 36,962 ns/iter (+/- 214)
test grow_insert_std_serial ... bench: 36,871 ns/iter (+/- 169)
test insert_ahash_highbits ... bench: 19,712 ns/iter (+/- 253)
test insert_ahash_random ... bench: 19,947 ns/iter (+/- 84)
test insert_ahash_serial ... bench: 20,242 ns/iter (+/- 91)
test insert_erase_ahash_highbits ... bench: 21,603 ns/iter (+/- 620)
test insert_erase_ahash_random ... bench: 21,112 ns/iter (+/- 57)
test insert_erase_ahash_serial ... bench: 21,103 ns/iter (+/- 85)
test insert_erase_std_highbits ... bench: 34,403 ns/iter (+/- 273)
test insert_erase_std_random ... bench: 35,150 ns/iter (+/- 228)
test insert_erase_std_serial ... bench: 34,370 ns/iter (+/- 225)
test insert_std_highbits ... bench: 25,299 ns/iter (+/- 294)
test insert_std_random ... bench: 25,439 ns/iter (+/- 410)
test insert_std_serial ... bench: 25,238 ns/iter (+/- 166)
test iter_ahash_highbits ... bench: 812 ns/iter (+/- 5)
test iter_ahash_random ... bench: 812 ns/iter (+/- 6)
test iter_ahash_serial ... bench: 835 ns/iter (+/- 15)
test iter_std_highbits ... bench: 811 ns/iter (+/- 1)
test iter_std_random ... bench: 818 ns/iter (+/- 4)
test iter_std_serial ... bench: 819 ns/iter (+/- 8)
test lookup_ahash_highbits ... bench: 4,433 ns/iter (+/- 181)
test lookup_ahash_random ... bench: 4,357 ns/iter (+/- 15)
test lookup_ahash_serial ... bench: 4,072 ns/iter (+/- 70)
test lookup_fail_ahash_highbits ... bench: 4,278 ns/iter (+/- 62)
test lookup_fail_ahash_random ... bench: 4,236 ns/iter (+/- 11)
test lookup_fail_ahash_serial ... bench: 3,914 ns/iter (+/- 14)
test lookup_fail_std_highbits ... bench: 10,352 ns/iter (+/- 165)
test lookup_fail_std_random ... bench: 10,337 ns/iter (+/- 24)
test lookup_fail_std_serial ... bench: 10,248 ns/iter (+/- 46)
test lookup_std_highbits ... bench: 10,319 ns/iter (+/- 48)
test lookup_std_random ... bench: 10,570 ns/iter (+/- 61)
test lookup_std_serial ... bench: 10,356 ns/iter (+/- 65)
test rehash_in_place ... bench: 176,636 ns/iter (+/- 1,843)
test insert ... bench: 8,853 ns/iter (+/- 29)
test insert_unique_unchecked ... bench: 6,084 ns/iter (+/- 41)
Benches with the changes
test clone_from_large ... bench: 6,220 ns/iter (+/- 73)
test clone_from_small ... bench: 66 ns/iter (+/- 0)
test clone_large ... bench: 6,504 ns/iter (+/- 74)
test clone_small ... bench: 112 ns/iter (+/- 1)
test grow_insert_ahash_highbits ... bench: 20,686 ns/iter (+/- 323)
test grow_insert_ahash_random ... bench: 20,665 ns/iter (+/- 166)
test grow_insert_ahash_serial ... bench: 20,583 ns/iter (+/- 157)
test grow_insert_std_highbits ... bench: 37,383 ns/iter (+/- 523)
test grow_insert_std_random ... bench: 37,318 ns/iter (+/- 391)
test grow_insert_std_serial ... bench: 37,240 ns/iter (+/- 499)
test insert_ahash_highbits ... bench: 19,885 ns/iter (+/- 210)
test insert_ahash_random ... bench: 19,944 ns/iter (+/- 131)
test insert_ahash_serial ... bench: 20,343 ns/iter (+/- 132)
test insert_erase_ahash_highbits ... bench: 21,533 ns/iter (+/- 192)
test insert_erase_ahash_random ... bench: 21,163 ns/iter (+/- 382)
test insert_erase_ahash_serial ... bench: 21,108 ns/iter (+/- 477)
test insert_erase_std_highbits ... bench: 34,268 ns/iter (+/- 989)
test insert_erase_std_random ... bench: 35,408 ns/iter (+/- 1,840)
test insert_erase_std_serial ... bench: 34,213 ns/iter (+/- 1,801)
test insert_std_highbits ... bench: 25,093 ns/iter (+/- 141)
test insert_std_random ... bench: 25,007 ns/iter (+/- 159)
test insert_std_serial ... bench: 25,403 ns/iter (+/- 85)
test iter_ahash_highbits ... bench: 812 ns/iter (+/- 4)
test iter_ahash_random ... bench: 812 ns/iter (+/- 9)
test iter_ahash_serial ... bench: 828 ns/iter (+/- 13)
test iter_std_highbits ... bench: 812 ns/iter (+/- 4)
test iter_std_random ... bench: 821 ns/iter (+/- 11)
test iter_std_serial ... bench: 813 ns/iter (+/- 10)
test lookup_ahash_highbits ... bench: 4,456 ns/iter (+/- 34)
test lookup_ahash_random ... bench: 4,363 ns/iter (+/- 34)
test lookup_ahash_serial ... bench: 4,074 ns/iter (+/- 32)
test lookup_fail_ahash_highbits ... bench: 4,284 ns/iter (+/- 41)
test lookup_fail_ahash_random ... bench: 4,235 ns/iter (+/- 14)
test lookup_fail_ahash_serial ... bench: 3,914 ns/iter (+/- 28)
test lookup_fail_std_highbits ... bench: 10,314 ns/iter (+/- 62)
test lookup_fail_std_random ... bench: 10,411 ns/iter (+/- 218)
test lookup_fail_std_serial ... bench: 10,241 ns/iter (+/- 21)
test lookup_std_highbits ... bench: 10,371 ns/iter (+/- 48)
test lookup_std_random ... bench: 10,575 ns/iter (+/- 86)
test lookup_std_serial ... bench: 10,453 ns/iter (+/- 189)
test rehash_in_place ... bench: 176,986 ns/iter (+/- 1,336)
test insert ... bench: 8,747 ns/iter (+/- 204)
test insert_unique_unchecked ... bench: 5,987 ns/iter (+/- 41)
Binary size differences when applying my branch to the [gimli-rs/object](https://github.com/gimli-rs/object) repo and doing a release build
--- without.txt 2023-08-23 20:46:02.666016800 +0200
+++ with.txt 2023-08-23 20:46:11.031017000 +0200
@@ -1,28 +1,28 @@
-target/release/ar 4715408
+target/release/ar 4715472
target/release/ar.d 3315
target/release/build 4096
target/release/deps 4096
-target/release/dyldcachedump 4723616
+target/release/dyldcachedump 4723664
target/release/dyldcachedump.d 3337
-target/release/elfcopy 4888528
+target/release/elfcopy 4899232
target/release/elfcopy.d 3325
-target/release/elftoefi 4739656
+target/release/elftoefi 4739704
target/release/elftoefi.d 3327
target/release/examples 4096
target/release/incremental 4096
target/release/libobject.d 2965
-target/release/libobject.rlib 11035614
+target/release/libobject.rlib 11051504
target/release/libobject_examples.d 3287
-target/release/libobject_examples.rlib 3826960
-target/release/nm 4867848
+target/release/libobject_examples.rlib 3828668
+target/release/nm 4882424
target/release/nm.d 3315
-target/release/objcopy 5067632
+target/release/objcopy 5077600
target/release/objcopy.d 3325
-target/release/objdump 5046328
+target/release/objdump 5046240
target/release/objdump.d 3325
-target/release/objectmap 4859896
+target/release/objectmap 4859768
target/release/objectmap.d 3329
-target/release/pecopy 4724840
+target/release/pecopy 4724912
target/release/pecopy.d 3323
-target/release/readobj 5334536
+target/release/readobj 5334816
target/release/readobj.d 3325
from hashbrown.
I don't think supporting fallible hasher is worth adding this level of complexity to the API. In most cases this isn't needed, and you can always emulate it by returning a hash value of 0 and recording the error in an Option
that you check after the hash table operation is complete.
However I do think there is value in find_or_find_insert_slot_with_context
: it is very common to want both eq
and hasher
to have access to the same object. It just isn't an issue in practice since mutable access is not usually required (shared access is sufficient).
from hashbrown.
I don't think supporting fallible hasher is worth adding this level of complexity to the API. In most cases this isn't needed, and you can always emulate it by returning a hash value of 0 and recording the error in an
Option
that you check after the hash table operation is complete.
I considered this initially, but would this not result in potentially ill-defined downstream behaviors in fallible cases (same for eq
)? Preferably I'd want to ensure that any errors prevent the modifications from taking place to achieve my version of exception safety.
from hashbrown.
hasher
absolutely cannot be allowed to fail, we rely on this when rehashing the table. See rehash_in_place
where if the hasher panics then we drop all elements in the table. If this is not something that you can guarantee then you may want to consider caching hash values separately so they are always available.
eq
failing is less serious since you can always just return false
, and it doesn't modify the table.
from hashbrown.
hasher
absolutely cannot be allowed to fail, we rely on this when rehashing the table. Seerehash_in_place
where if the hasher panics then we drop all elements in the table.
Right, so the behavior I'd want to emulate is what would happen if the hash implementation panics today (which is how I believe this PR works). Think of this error propagation as my own "structured panics".
If this is not something that you can guarantee then you may want to consider caching hash values separately so they are always available.
What do you mean? Are raw tables not exception safe?
eq
failing is less serious since you can always just returnfalse
, and it doesn't modify the table.
Doesn't that depend on where it fails? If it fails during insertion (e.g. find_or_find_insert_slot
) and eq
returns false
wouldn't the table allocate another bucket since two values are not considered eq but have the same hash?
from hashbrown.
I couldn't produce a modification for a failing eq
, but here is one which is avoided for a failing hasher
:
use hashbrown::raw::RawTable;
fn main() {
let mut table = RawTable::<()>::new();
unsafe {
// Insert three elements (with the same hash) to fill up initial capacity:
for _ in 0..3u64 {
let Err(slot) = table.find_or_find_insert_slot(0, |_| false, |_| 0) else {
panic!();
};
table.insert_in_slot(0, slot, ());
}
dbg!(table.capacity()); // 3
// Table does not grow when `hasher` fails like this:
let Err(()) = table.find_or_find_insert_slot_with(&mut (), 0, |_, _| Ok(false), |_, _| Err(())) else {
panic!();
};
dbg!(table.capacity()); // 3
// Table does grow when a dummy value is returned instead:
let Err(..) = table.find_or_find_insert_slot(0, |_| false, |_| 0) else {
panic!();
};
dbg!(table.capacity()); // 7
}
}
from hashbrown.
What do you mean? Are raw tables not exception safe?
Not completely: because we don't store the full hash in the table, we rely on the hasher when rehashing the table in-place. If the hasher panics then we have no way of knowing where to place elements in the table. It's exception-sense in the sense that you won't get UB, but you may lose elements in the table.
Lines 2277 to 2280 in bb6521e
Doesn't that depend on where it fails? If it fails during insertion (e.g.
find_or_find_insert_slot
) andeq
returnsfalse
wouldn't the table allocate another bucket since two values are not considered eq but have the same hash?
find_or_find_insert_slot
first calls reserve(1)
to ensure there is enough space in the table in case an insertion is needed, before doing anything. This function doesn't actually perform any insertion into the table, that is done by insert_in_slot
.
from hashbrown.
Right. Thanks for the clarity. So that is what I gathered from reading the implementation as well. The only extent I expect the table to be exception safe is that it doesn't permit anything memory unsafe to happen if we do end up catching a panic. But the exact end state is not guaranteed to be the same as prior to the attempted modification.
I am still interested to ensure that my "errors" exhibit the same behaviour as a caught Rust panic, mainly because I want to provide as much feature parity between Rust and Rune as possible. So I still want to provide fallibility to the methods.
It's nice that you're interested in passing the context around though (possibly the bigger change). But until fallible methods are supported or the exact behavior of returning "dummy values" is understood and tested* I'll probably end up maintaining a private fork of the raw table with those minor changes.
*: To the degree that it's possible to emulate the behavior of an equivalent Rust panic.
from hashbrown.
Related Issues (20)
- ahash shouldn't be the default Hasher HOT 3
- UB on aarch64_be-unknown-linux-gnu_ilp32 HOT 1
- latest/recent rev appears to break ahash/compile-time-rng usage HOT 2
- Why the identity function can be used as unlikely function? HOT 3
- `hashbrown` fails to compile as a transitive dependency HOT 2
- allocator-api2 default-feature? HOT 2
- Compiling hashbrown 0.14.2 for aarch64-unknown-linux-gnu with "target-cpu=cortex-a53" generates illegal instructions HOT 2
- Switching to GxHash? HOT 9
- Feature: increase capacity according to the actual size returned by the allocator HOT 2
- Hashbrown crash due to bad malloc HOT 1
- 0.14.3 - no method named `clear` found for struct `HashMap` in the current scope HOT 5
- Benchmark biaised due to no fence around input
- assertion failed: buckets.is_power_of_two() HOT 8
- Build breaks on nightly due to use of `stdsimd` rust feature in ahash 0.8.6 HOT 2
- Was swap-remove behavior ever considered when removing entries? HOT 10
- Consider returning to 1.63.0 MSRV HOT 1
- How to calculate the size of the hashbrown::HashMap at runtime? HOT 1
- LLVM failed to use the knowledge from a never-overflow assumption HOT 13
- Library test `map::test_map::test_clone_from_memory_leaks` errors with using uninitialized data under valgrind and miri
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hashbrown.