Comments (2)
Hi @d33tah
I'm surprised it's not fast enough for your use case! Are you hashing the raw data and inserting the hash or is record[1]
the raw data? What sort of numbers are you seeing?
I just ran a quick check using xxhash (though in reality the choice of hash here has no impact):
pub fn xxhash_bench(c: &mut Criterion) {
use std::hash::BuildHasher;
use std::hash::Hasher;
use twox_hash::*;
// Really any u64/hash bytes is fine, but this shows how to use xxhash.
let mut bloom = CompressedBitmap::new(FilterSize::KeyBytes2);
let mut hasher = RandomXxHashBuilder64::default().build_hasher();
hasher.write(&[42]); // Hash the raw data bytes
let hash = hasher.finish().to_le_bytes();
c.bench_function("xxhash_insert", |b| b.iter(|| bloom.insert_hash(hash)));
c.bench_function("xxhash_lookup hit", |b| {
b.iter(|| black_box(bloom.contains_hash(hash)))
});
c.bench_function("xxhash_lookup miss, partial match", |b| {
b.iter(|| black_box(bloom.contains_hash([1, 2, 3, 4, 5, 6, 7, 8])))
});
}
And got:
curiosity:bloom2 % cargo bench -- xxhash
Compiling bloom2 v0.1.0 (/Users/dom/Documents/rust/bloom2)
Finished bench [optimized] target(s) in 12.46s
Running target/release/deps/bench-8985f05ed1c18d86
Gnuplot not found, using plotters backend
xxhash_insert time: [26.184 ns 26.251 ns 26.321 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
2 (2.00%) high mild
2 (2.00%) high severe
xxhash_lookup hit time: [19.978 ns 20.069 ns 20.176 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
xxhash_lookup miss, partial match
time: [4.6333 ns 4.6693 ns 4.7297 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
I'm compiling with -Ctarget-cpu=native
to enable SSE/AVX and allow rust to use the all the features of the CPU - if your aiming for flexible builds (say, packaging up the binary to be run by 3rd parties on unknown CPUs) then skip this - it will still be pretty fast! But if you're deploying binaries to known hardware, everything will be a bit faster with this enabled and is worth doing!
To answer your actual question: sounds like a great idea!
Maybe something like:
fn try_insert<T: AsRef<[u8]>>(&self, hash: T) -> bool
Where the returned bool
is true/false depending on if it is inserted?
I think this crate could do with some love to be honest - adding a generic parameter for the default hasher (xxhash?) the user can override instead of requiring them to pass hashes directly, and adding more utility methods seems like an obvious improvement.
My original use case was dealing with hashes, so I didn't want the bloom filter to then _re_hash the hash I was giving it, so that's why it wound up like this, but now it's public that is likely not the most useful form this crate could take...
Dom
from bloom2.
@domodwyer thanks for the RUSTFLAGS hint! It speeded up the program twice. Now I guess that the next low hanging fruit is the try_insert function you proposed. If you're interested in adding it, please ping me once it's ready for testing. :)
from bloom2.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bloom2.