Giter VIP home page Giter VIP logo

bloom2's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar domodwyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bloom2's Issues

Question regarding actual memory usage savings

I'd guessing for high loads, compared to a plain bloom filter with just a block of memory, the tree overhead (pointers and stuff) might invalidate the memory saving benefit, do you have data on that, like memory saving for different loads, and at which load bloom2 becomes beneficial?

I could write some tests but wondering if you already have that data on hand.

Thanks.

Index out of bounds

When I initialize the Bloom2 filter like so:

let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = BloomFilterBuilder::default()
    .size(FilterSize::KeyBytes4)
    .build();

I receive the following error when my program is run:

The application panicked (crashed).
Message:  index out of bounds: the len is 16 but the index is 275465
Location: C:\Users\User Name\.cargo\registry\src\github.com-1ecc6299db9ec823\bloom2-0.2.0\src\bitmap\compressed_bitmap.rs:254

However, if I initialize like this:

let mut bloom_filter: Bloom2<RandomState, CompressedBitmap, MyStruct> = Bloom2::default();

It works well. Nice and fast!

If you require more detail, let me know!

Question about key distribution

Hi,

I came here while researching a Bloom filter construction which would yield minimal transmission size and fast intersection. Your construction seems interesting, because it also minimises the number of hash functions (in fact, you’re using only one).

From a cursory glance over the code, it is unclear to me, however, when exactly this will result in space savings. Usually, we would expect the set bits in a bloom filter to be essentially randomly distributed, and I’m having trouble to see how this would be different here. Iow, as the load factor increases, the probability of the block map to be all-1s approaches 1. This means that we’ll have the underlying blocks of the bitmap initialised, even if they are still sparse (ie. contain mostly zeroes).

It’s possible that I’m missing something obvious. Would you mind explaining what that might be?

Thanks!

Feature request: find_or_insert?

Hi! Thanks for the library! I played with it for a while, though unfortunately it wasn't fast enough for my use case. Does it use CPU's built-in hash capabilities to quickly create hashes?

Anyway, I might have a feature request: when counting how many entries are in a given set, I was trying to use the library but found that it's probably hashing twice the same entries. Here's the relevant code:

        if !filter.contains_hash(&record[1]) {
            filter.insert_hash(&record[1]);
            cnt += 1;
        }

Would it make sense to add insert_if_not_exists() that returns information about whether the hash was already there?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.