Giter VIP home page Giter VIP logo

bloomfilter's Introduction

bloomfilter

Build Status GitHub release GitHub license

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be hashNumber different hash functions defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution. Typically, hashNumber is a constant, much smaller than m, which is proportional to the number of elements to be added; the precise choice of hashNumber and the constant of proportionality of m are determined by the intended false positive probability of the filter.
If you are not familiar with it, you can refer to Wikipedia's detailed analysis.
If you use python, you can refer to the bloomfilter written in python

Overview

This project implements bloomfilter and can select and extend different BitSets or other storage methods. For example, BitSet supports Java and Redis. It can also be applied to scenarios where various data levels and different false positive probability are required.

Release Log:
bloomfilter version 1.0.0 Released

Download source code and binary jar package on the release page.

Quick Start

Download & Install

This is not necessary now. The dependency uploaded to the maven central repository.

git clone https://github.com/wxisme/bloomfilter.git

cd bloomfilter

mvn clean install

You can also download source code or binary jar package on the release page directly.

Add maven dependency

<dependency>
    <groupId>com.github.wxisme</groupId>
    <artifactId>bloomfilter</artifactId>
    <version>1.0.0</version>
</dependency>

Test

public class JavaBitSetTest {

    public static void main(String[] args) {
        //(falsePositiveProbability, expectedNumberOfElements)
        BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
        filter.bind(new JavaBitSet());

        filter.add("filter");
        System.out.println(filter.contains("filter"));
        System.out.println(filter.contains("bloom"));
        filter.add("bitset");
        filter.add("redis");
        System.out.println(filter.contains("bitset"));
        System.out.println(filter.contains("redis"));
        System.out.println(filter.contains("mysql"));
        System.out.println(filter.contains("linux"));
        System.out.println(filter.count());
        System.out.println(filter.isEmpty());
        filter.clear();
        System.out.println(filter.isEmpty());
        System.out.println(filter.contains("filter"));

    }

}

Expected test result is :

true
false
true
true
false
false
3
false
true
false

Redis and other extensions

Redis

You can easily use bloomfilter on redis,here is a simple example:

public class RedisBitSetTest {

    public static void main(String[] args) {

        //Don't forget auth password, you better use the configured redis client connection.
        //It should be noted that bloomfilter is not responsible for closing and returning redis connection resources.

        //(falsePositiveProbability, expectedNumberOfElements)
        BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
        Jedis jedis = new Jedis("127.0.0.1", 6379);
        jedis.auth("1234");
        filter.bind(new RedisBitSet(jedis, "bloomfilter:key:name"));

        //if you have a redis cluster
        //Set<HostAndPort> nodes = new HashSet<>();
        //nodes.add(new HostAndPort("127.0.0.1", 6379));

        //filter.bind(new RedisBitSet(new JedisCluster(nodes), "bloomfilter:key:name"));

        //you can also use jedispool
        //JedisPool jedisPool = new JedisPool("127.0.0.1", 6379);
        //Jedis jedis = jedisPool.getResource();
        //filter.bind(new RedisBitSet(jedis, "bloomfilter:key:name"));

        filter.add("filter");
        System.out.println(filter.contains("filter"));
        System.out.println(filter.contains("bloom"));
        filter.add("bitset");
        filter.add("redis");
        System.out.println(filter.contains("bitset"));
        System.out.println(filter.contains("redis"));
        System.out.println(filter.contains("mysql"));
        System.out.println(filter.contains("linux"));
        System.out.println(filter.count());
        System.out.println(filter.isEmpty());
        filter.clear();
        System.out.println(filter.isEmpty());
        System.out.println(filter.contains("filter"));
    }
}

Don't forget auth password, you better use the configured redis client connection. It should be noted that bloomfilter is not responsible for closing and returning redis connection resources.
If you are not familiar with redis, you can refer to the following hyperlink:
https://redis.io
https://github.com/xetorthio/jedis

Test result is :

true
false
true
true
false
false
3
false
true
false

Extensions

If you want to use your own data structure, you can directly implement the BaseBitSet interface.
Example:

public class YourBitSet implements BaseBitSet {

    private int[] data;//boolean array

    public YourBitSet(int size) {
        data = new int[size];
    }

    @Override
    public void set(int bitIndex) {
        data[bitIndex] = 1;
    }

    @Override
    public void set(int bitIndex, boolean value) {
        if (value)
            data[bitIndex] = 1;
        else data[bitIndex] = 0;
    }

    @Override
    public boolean get(int bitIndex) {
        return data[bitIndex] == 1;
    }

    @Override
    public void clear(int bitIndex) {
        data[bitIndex] = 0;
    }

    @Override
    public void clear() {
        Arrays.fill(data, 0);
    }

    @Override
    public long size() {
        long size = 0;
        for (int d : data)
            if (d == 1)
                size++;
        return size;
    }

    @Override
    public boolean isEmpty() {
        return size() <= 0;
    }
}

Test:

public class YourBitSetTest {

    public static void main(String[] args) {
        //(falsePositiveProbability, expectedNumberOfElements)
        BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
        filter.bind(new YourBitSet(1000000));

        filter.add("filter");
        System.out.println(filter.contains("filter"));
        System.out.println(filter.contains("bloom"));
        filter.add("bitset");
        filter.add("redis");
        System.out.println(filter.contains("bitset"));
        System.out.println(filter.contains("redis"));
        System.out.println(filter.contains("mysql"));
        System.out.println(filter.contains("linux"));
        System.out.println(filter.count());
        System.out.println(filter.isEmpty());
        filter.clear();
        System.out.println(filter.isEmpty());
        System.out.println(filter.contains("filter"));

    }

}

Test result is :

true
false
true
true
false
false
3
false
true
false

Welcome to submit your extension code or put any question issues on github.

License

Bloomfilter is released under the GNU Lesser General Public License v3.0.
You can just use maven to import dependency package or use your own compiled jar package.
You may copy this code directly into your project if you leave the LGPL-comment in place and reference the following hyperlink:
https://github.com/MagnusS/Java-BloomFilter
https://github.com/wxisme/bloomfilter

bloomfilter's People

Contributors

wangxu0 avatar wxisme avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.