Giter VIP home page Giter VIP logo

roaring's Introduction

roaring Build Status Coverage Status GoDoc Go Report Card

This is a go port of the Roaring bitmap data structure.

Roaring is used by Apache Spark, Apache Kylin, Netflix Atlas, LinkedIn Pinot, Druid.io, Whoosh, Pilosa, and Apache Lucene (as well as supporting systems such as Solr and Elastic).

Roaring bitmaps are found to work well in many important applications:

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)

The roaring Go library is used by

  • Cloud Torrent: a self-hosted remote torrent client
  • runv: an Hypervisor-based runtime for the Open Containers Initiative

There are also Java and C/C++ versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. We have a format specification.

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Copyright 2016 by the authors.

References

Dependencies

Dependencies are fetched automatically by giving the -t flag to go get.

they include

  • github.com/smartystreets/goconvey/convey
  • github.com/willf/bitset
  • github.com/mschoch/smat

Note that the smat library requires Go 1.6 or better.

Installation

  • go get -t github.com/RoaringBitmap/roaring

Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring.NewBitmap()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.NewBitmap()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

	rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	buf := new(bytes.Buffer)
	size,err:=rb.WriteTo(buf)
	if err != nil {
		t.Errorf("Failed writing")
	}
	newrb:= NewBitmap()
	size,err=newrb.ReadFrom(buf)
	if err != nil {
		t.Errorf("Failed reading")
	}
	if ! rb.Equals(newrb) {
		t.Errorf("Cannot retrieve serialized version")
	}

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring

Goroutine safety

In general, it should not generally be considered safe to access the same bitmaps using different goroutines--they are left unsynchronized for performance. Should you want to access a Bitmap from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *Bitmap around (in Go style; so there is only ever one owner), or by using sync.Mutex to serialize operations on Bitmaps.

Coverage

We test our software. For a report on our test coverage, see

https://coveralls.io/github/RoaringBitmap/roaring?branch=master

Benchmark

Type

     go test -bench Benchmark -run -

Iterative use

You can use roaring with gore:

  • go get -u github.com/motemen/gore
  • Make sure that $GOPATH/bin is in your $PATH.
  • go get github/RoaringBitmap/roaring
$ gore
gore version 0.2.6  :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"

Fuzzy testing

You can help us test further the library with fuzzy testing:

     go get github.com/dvyukov/go-fuzz/go-fuzz
     go get github.com/dvyukov/go-fuzz/go-fuzz-build
     go test -tags=gofuzz -run=TestGenerateSmatCorpus
     go-fuzz-build github.com/RoaringBitmap/roaring
     go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200

Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.

Alternative in Go

There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Mailing list/discussion group

https://groups.google.com/forum/#!forum/roaring-bitmaps

roaring's People

Contributors

0x4139 avatar boutros avatar bpot avatar brentp avatar coornail avatar cornelk avatar glycerine avatar lemire avatar statik avatar steveyen avatar tgruben avatar willglynn avatar

Watchers

 avatar  avatar  avatar

Forkers

bitmapdevs

roaring's Issues

the new rangeOfOnes implementation reveals some bugs

when roaringarray.go's func rangeOfOnes(start, last int) container is implemented purely with bitmaps, arrays, or rle16 containers, I see test suite failure. Here are the 3 options:

For context: this

https://github.com/RoaringBitmap/roaring/blob/master/roaringarray.go#L42

is being replaced by:

 func rangeOfOnes(start, last int) container {
       return newBitmapContainerwithRange(start, last) // option 1
       //return newArrayContainerRange(start, last) // option 2
       //return newRunContainer16Range(uint64(start), uint64(last)) // option3
 }

and here are the suite fails, which are different in each case:

bitmaps for runs, test failures:

func TestDoubleAndNotBug01(t *testing.T)  fails, line 1681

Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 1681:
  Expected: true
  Actual:   false

----------

arrays for runs: 2 distinct fails:

Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 1616:
  Expected: 'true'
  Actual:   'false'
  (Should be equal)


--- FAIL: TestDoubleAdd2 (0.01s)

Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 1629:
  Expected: 'true'
  Actual:   'false'
  (Should be equal)



--- FAIL: TestDoubleAdd3 (0.01s)

----------
with rle16:


Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 787:
  Expected: '96617'
  Actual:   '9'
  (Should be equal)



Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 865:
  Expected: '35392'
  Actual:   '1928'
  (Should be equal)

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 886:
  Expected: '131999'
  Actual:   '131351'
  (Should be equal)

Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 933:
  Expected: 'true'
  Actual:   'false'
  (Should be equal)

Failures:

  * /Users/jaten/go/src/github.com/glycerine/roaring/roaring_test.go 
  Line 1589:
  Expected: 'true'
  Actual:   'false'
  (Should be equal)


--- FAIL: TestFlipBigA (0.12s)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.