domodwyer / bloom2 Goto Github PK
View Code? Open in Web Editor NEWA fast 2-level, sparse bloom filter implementation
License: BSD 3-Clause "New" or "Revised" License
A fast 2-level, sparse bloom filter implementation
License: BSD 3-Clause "New" or "Revised" License
I'd guessing for high loads, compared to a plain bloom filter with just a block of memory, the tree overhead (pointers and stuff) might invalidate the memory saving benefit, do you have data on that, like memory saving for different loads, and at which load bloom2 becomes beneficial?
I could write some tests but wondering if you already have that data on hand.
Thanks.
When I initialize the Bloom2 filter like so:
let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = BloomFilterBuilder::default()
.size(FilterSize::KeyBytes4)
.build();
I receive the following error when my program is run:
The application panicked (crashed).
Message: index out of bounds: the len is 16 but the index is 275465
Location: C:\Users\User Name\.cargo\registry\src\github.com-1ecc6299db9ec823\bloom2-0.2.0\src\bitmap\compressed_bitmap.rs:254
However, if I initialize like this:
let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = Bloom2::default();
It works well. Nice and fast!
If you require more detail, let me know!
Hi,
I came here while researching a Bloom filter construction which would yield minimal transmission size and fast intersection. Your construction seems interesting, because it also minimises the number of hash functions (in fact, you’re using only one).
From a cursory glance over the code, it is unclear to me, however, when exactly this will result in space savings. Usually, we would expect the set bits in a bloom filter to be essentially randomly distributed, and I’m having trouble to see how this would be different here. Iow, as the load factor increases, the probability of the block map to be all-1s approaches 1. This means that we’ll have the underlying blocks of the bitmap initialised, even if they are still sparse (ie. contain mostly zeroes).
It’s possible that I’m missing something obvious. Would you mind explaining what that might be?
Thanks!
Hi! Thanks for the library! I played with it for a while, though unfortunately it wasn't fast enough for my use case. Does it use CPU's built-in hash capabilities to quickly create hashes?
Anyway, I might have a feature request: when counting how many entries are in a given set, I was trying to use the library but found that it's probably hashing twice the same entries. Here's the relevant code:
if !filter.contains_hash(&record[1]) {
filter.insert_hash(&record[1]);
cnt += 1;
}
Would it make sense to add insert_if_not_exists() that returns information about whether the hash was already there?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.