Giter VIP home page Giter VIP logo

xorfilter's People

Contributors

ayazhafiz avatar canrau avatar detailyang avatar el10savio avatar funny-falcon avatar github-actions[bot] avatar lemire avatar mraerino avatar prataprc avatar slimsag avatar veggiedefender avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xorfilter's Issues

[Proposal] Improve xorfilter.go codebase

I would like to refactor the xorfilter.go to improve code quality & readability by doing the following:

  1. Separate the struct definitions & hashing functions to other files so that it can be made more modular.

  2. Reduce the number of lines by combining the same procedures to its own function so that it can be reused easily.

  3. Probably add more comments, tests, and benchmarks along the way.

Is there anything else that can be added?

Handle gracefully duplicated keys

Currently the user is responsible to ensure that there are no duplicated keys. We should handle this for the user (it is relatively easy to do efficiently, without even sorting). Try to construct the filter once (optimistically). The duplicated keys will be in the unmatched keys, just visit these keys.

Build error with older golang version.

$ go version
go version go1.10.4 linux/amd64
$ go build
# github.com/FastFilter/xorfilter
/home/prataprc/myworld/devgo/src/github.com/FastFilter/xorfilter/xorfilter.go:54:9: invalid operation: n << (c & 63) (shift count type int, must be unsigned integer)
/home/prataprc/myworld/devgo/src/github.com/FastFilter/xorfilter/xorfilter.go:54:27: invalid operation: n >> (-c & 63) (shift count type int, must be unsigned integer)

Note: Build is okay with latest golang version.

Streaming use cases

Great algorithm you got there. The API requires a slice to be passed in to populate. If I'm interested in adding entities on the fly without having to load/allocate my entire dataset, would this be possible?

Supportable aggregation operations?

Does it make sense to intersect two xorfilters? That is, is the bitwise intersection of two xorfilters an xorfilter with sensible false positive rate and no false negatives?

Cut releases using tags

Would you be able to cut release tags so that tooling like dependabot or renovate will not propose an upgrade on every commit?

You could use something like the really nifty release-please to automate that process along with a nice changelog.

Golang Interface in place for Source64 and Hash64?

Hi,

First, Thank you for your paper and implementation/s!

I'm wondering if it's possible to re-write this that would accept a more idiomatic 'Go' way? say, accepting Source64 and Hash64 instead of using SplitMix64 and Murmur (with that being said, I'm wondering if the random and hash functions can also be changed, say using Mersenne or XoRoShiRo generators and/or T1ha or Metro for the hash). I'm not sure if the literature mentions anything specific random generators but I know you specifically chose Murmur for the hash (in which case, Hash64 interface isn't necessary).

binaryfusefilter cannot handle some sets of large, duplicate keys

My understanding is that after #22 duplicate keys should be supported in binary fuse filters. I am working with uint64 keys that are CRC64 checksums (so quite large numbers) and am regularly finding sets that result in:

Unexpected error: too many iterations, you probably have duplicate keys

I've managed to reduce one such set down to just 121 keys, I will send a PR to add a test for this. Is it possible to fix this, or are there limitations to the duplicate key checking in some way?

permit empty filter construction

PopulateBinaryFuse8([]uint64{}) currently panics. It should return an error or preferably just construct a trivial tiny filter. Adding if size == 0 { size = 1 } is sufficient to prevent panicking.

Tunable error rate

It would be great to have a way to change/tune the false negative rate.

Implement the compressed xor+

I noticed your paper mentioned xor+ but didn't see any mention or implementation of compression. Just curious if you intend to provide that here as well?

xorfilter cannot pass this case

xorfilter cannot pass this case.
The Contains() cannot be returned correctly.

package main

import (
    "math/rand"
    "unsafe"

    "github.com/FastFilter/xorfilter"
    "github.com/cespare/xxhash/v2"
)

func encode(v1, v2 int32) []byte {
    v := make([]byte, 8)
    v = append(v, unsafe.Slice((*byte)(unsafe.Pointer(&v1)), 4)...)
    v = append(v, unsafe.Slice((*byte)(unsafe.Pointer(&v2)), 4)...)
    return v
}

func main() {
    hashes := make([]uint64, 0)
    for i := 0; i < 40000; i++ {
        v := encode(int32(rand.Intn(10)), int32(rand.Intn(100000)))
        hashes = append(hashes, xxhash.Sum64(v))
    }
    inner, err := xorfilter.PopulateBinaryFuse8(hashes)
    if err != nil {
        panic(err)
    }
    for i, d := range hashes {
        e := inner.Contains(d)
        if !e {
            panic(i)
        }
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.