fastfilter / xorfilter Goto Github PK
View Code? Open in Web Editor NEWGo library implementing binary fuse and xor filters
License: Apache License 2.0
Go library implementing binary fuse and xor filters
License: Apache License 2.0
I would like to refactor the xorfilter.go to improve code quality & readability by doing the following:
Separate the struct definitions & hashing functions to other files so that it can be made more modular.
Reduce the number of lines by combining the same procedures to its own function so that it can be reused easily.
Probably add more comments, tests, and benchmarks along the way.
Is there anything else that can be added?
Currently the user is responsible to ensure that there are no duplicated keys. We should handle this for the user (it is relatively easy to do efficiently, without even sorting). Try to construct the filter once (optimistically). The duplicated keys will be in the unmatched keys, just visit these keys.
$ go version
go version go1.10.4 linux/amd64
$ go build
# github.com/FastFilter/xorfilter
/home/prataprc/myworld/devgo/src/github.com/FastFilter/xorfilter/xorfilter.go:54:9: invalid operation: n << (c & 63) (shift count type int, must be unsigned integer)
/home/prataprc/myworld/devgo/src/github.com/FastFilter/xorfilter/xorfilter.go:54:27: invalid operation: n >> (-c & 63) (shift count type int, must be unsigned integer)
Note: Build is okay with latest golang version.
Great algorithm you got there. The API requires a slice to be passed in to populate. If I'm interested in adding entities on the fly without having to load/allocate my entire dataset, would this be possible?
Does it make sense to intersect two xorfilters? That is, is the bitwise intersection of two xorfilters an xorfilter with sensible false positive rate and no false negatives?
With .Net Core available for free everywhere, would you like a C# implementation?
Would you be able to cut release tags so that tooling like dependabot or renovate will not propose an upgrade on every commit?
You could use something like the really nifty release-please
to automate that process along with a nice changelog.
Hi,
First, Thank you for your paper and implementation/s!
I'm wondering if it's possible to re-write this that would accept a more idiomatic 'Go' way? say, accepting Source64 and Hash64 instead of using SplitMix64 and Murmur (with that being said, I'm wondering if the random and hash functions can also be changed, say using Mersenne or XoRoShiRo generators and/or T1ha or Metro for the hash). I'm not sure if the literature mentions anything specific random generators but I know you specifically chose Murmur for the hash (in which case, Hash64 interface isn't necessary).
My understanding is that after #22 duplicate keys should be supported in binary fuse filters. I am working with uint64 keys that are CRC64 checksums (so quite large numbers) and am regularly finding sets that result in:
Unexpected error: too many iterations, you probably have duplicate keys
I've managed to reduce one such set down to just 121 keys, I will send a PR to add a test for this. Is it possible to fix this, or are there limitations to the duplicate key checking in some way?
PopulateBinaryFuse8([]uint64{})
currently panics. It should return an error or preferably just construct a trivial tiny filter. Adding if size == 0 { size = 1 }
is sufficient to prevent panicking.
It would be great to have a way to change/tune the false negative rate.
I noticed your paper mentioned xor+ but didn't see any mention or implementation of compression. Just curious if you intend to provide that here as well?
xorfilter cannot pass this case.
The Contains() cannot be returned correctly.
package main
import (
"math/rand"
"unsafe"
"github.com/FastFilter/xorfilter"
"github.com/cespare/xxhash/v2"
)
func encode(v1, v2 int32) []byte {
v := make([]byte, 8)
v = append(v, unsafe.Slice((*byte)(unsafe.Pointer(&v1)), 4)...)
v = append(v, unsafe.Slice((*byte)(unsafe.Pointer(&v2)), 4)...)
return v
}
func main() {
hashes := make([]uint64, 0)
for i := 0; i < 40000; i++ {
v := encode(int32(rand.Intn(10)), int32(rand.Intn(100000)))
hashes = append(hashes, xxhash.Sum64(v))
}
inner, err := xorfilter.PopulateBinaryFuse8(hashes)
if err != nil {
panic(err)
}
for i, d := range hashes {
e := inner.Contains(d)
if !e {
panic(i)
}
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.