domodwyer / bloom2 Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 0.0 50 KB

A fast 2-level, sparse bloom filter implementation

License: BSD 3-Clause "New" or "Revised" License

Rust 100.00%

bloom2's People

Contributors

Stargazers

Watchers

bloom2's Issues

Question regarding actual memory usage savings

I'd guessing for high loads, compared to a plain bloom filter with just a block of memory, the tree overhead (pointers and stuff) might invalidate the memory saving benefit, do you have data on that, like memory saving for different loads, and at which load bloom2 becomes beneficial?

I could write some tests but wondering if you already have that data on hand.

Thanks.

Index out of bounds

When I initialize the Bloom2 filter like so:

let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = BloomFilterBuilder::default()
    .size(FilterSize::KeyBytes4)
    .build();

I receive the following error when my program is run:

The application panicked (crashed).
Message:  index out of bounds: the len is 16 but the index is 275465
Location: C:\Users\User Name\.cargo\registry\src\github.com-1ecc6299db9ec823\bloom2-0.2.0\src\bitmap\compressed_bitmap.rs:254

However, if I initialize like this:

let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = Bloom2::default();

It works well. Nice and fast!

If you require more detail, let me know!

Question about key distribution

Hi,

I came here while researching a Bloom filter construction which would yield minimal transmission size and fast intersection. Your construction seems interesting, because it also minimises the number of hash functions (in fact, you’re using only one).

From a cursory glance over the code, it is unclear to me, however, when exactly this will result in space savings. Usually, we would expect the set bits in a bloom filter to be essentially randomly distributed, and I’m having trouble to see how this would be different here. Iow, as the load factor increases, the probability of the block map to be all-1s approaches 1. This means that we’ll have the underlying blocks of the bitmap initialised, even if they are still sparse (ie. contain mostly zeroes).

It’s possible that I’m missing something obvious. Would you mind explaining what that might be?

Thanks!

Feature request: find_or_insert?

Hi! Thanks for the library! I played with it for a while, though unfortunately it wasn't fast enough for my use case. Does it use CPU's built-in hash capabilities to quickly create hashes?

Anyway, I might have a feature request: when counting how many entries are in a given set, I was trying to use the library but found that it's probably hashing twice the same entries. Here's the relevant code:

        if !filter.contains_hash(&record[1]) {
            filter.insert_hash(&record[1]);
            cnt += 1;
        }

Would it make sense to add insert_if_not_exists() that returns information about whether the hash was already there?

domodwyer / bloom2 Goto Github PK

bloom2's People

Contributors

Stargazers

Watchers

bloom2's Issues

Question regarding actual memory usage savings

Index out of bounds

Question about key distribution

Feature request: find_or_insert?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent