An efficient and light-weight ordered set of 32 bits integers. This is a Python wrapper for the C library CRoaring.
The wrapping used to be done with Ctypes
. We recently switched to
Cython
for the following reasons:
- Much better performances for short function calls (e.g. addition/deletion of a single element).
- Easier installation, no need to install manually the C library, it is now distributed with PyRoaring.
- Extensibility, it will be easier to write efficient codes with Cython (e.g. some data structures based on roaring bitmaps).
If for some reason you wish to keep using the old version, based on
Ctypes
, use PyRoaring
0.0.7.
- Environment like Linux and MacOS
- Python 2.7, or Python 3.4 or better
- A recent C compiler like GCC
- The package manager
pip
- The Python package
hypothesis
(optional, for testing) - The Python package
Cython
(optional, for compiling pyroaring from the sources) - The Python package
wheel
(optional, to build a wheel for the library)
To install pyroaring on your local account, use the following command:
pip install pyroaring --user # installs PyRoaringBitMap
For a system-wide installation, use the following command:
pip install pyroaring
Naturally, the latter may require superuser rights (consider prefixing
the commands by sudo
).
If you want to use Python 3 and your system defaults on Python 2.7, you
may need to adjust the above commands, e.g., replace pip
by pip3
.
If you want to compile (and install) pyroaring by yourself, for instance
to modify the Cython sources or because you do not have pip
, follow
these steps.
Note that the Python package Cython
is required. You may be able install
it as:
pip install --upgrade setuptools -user
pip install cython --user
Clone this repository.
git clone https://github.com/Ezibenroc/PyRoaringBitMap.git
cd PyRoaringBitMap
git submodule init && git submodule update
Build pyroaring locally, e.g. to test a new feature you made.
python setup.py build_ext -i
On macOS this may fail with errors because setuptools adds -arch x86_64 -arch i386
to the compiler command, which may conflict with the -march=native
flag. You can overwrite this behavior by setting the ARCHFLAGS flag:
ARCHFLAGS="" python setup.py build_ext -i
Then you can test the new code:
pip install hypothesis --user
python test.py # run the tests, optional but recommended
Install pyroaring (use this if you do not have pip
).
python setup.py install # may require superuser rights, add option --user if you wish to install it on your local account
Package pyroaring.
python setup.py sdist
pip install dist/pyroaring-0.1.?.tar.gz # optionnal, to install the package
Build a wheel.
python setup.py bdist_wheel
For all the above commands, two environment variables can be used to control the compilation.
DEBUG=1
to build pyroaring in debug mode.ARCHI=<cpu-type>
to build pyroaring for the given platform. The platform may be any keyword given to the-march
option of gcc (see the documentation). Note that cross-compiling for a 32-bit architecture from a 64-bit architecture is not supported.
Example of use:
DEBUG=1 ARCHI=x86-64 python setup.py build_ext
First, you can run the tests to make sure everything is ok:
pip install hypothesis --user
python test.py
Note: you can define the environment variable HYPOTHESIS_PROFILE
to debug
to print every values,
or ci
to perform more tests (warning: very long). For instance:
HYPOTHESIS_PROFILE=debug ./test.py
You can use a bitmap nearly as the classical Python set in your code:
from pyroaring import BitMap
bm1 = BitMap()
bm1.add(3)
bm1.add(18)
bm2 = BitMap([3, 27, 42])
print("bm1 = %s" % bm1)
print("bm2 = %s" % bm2)
print("bm1 & bm2 = %s" % (bm1&bm2))
print("bm1 | bm2 = %s" % (bm1|bm2))
Output:
bm1 = BitMap([3, 18]) bm2 = BitMap([3, 27, 42]) bm1 & bm2 = BitMap([3]) bm1 | bm2 = BitMap([3, 18, 27, 42])
Pyroaring
is compared with the built-in set
and other implementations:
- A Python wrapper of CRoaring called
python-croaring
- A Cython implementation of Roaring bitmaps called
roaringbitmap
- A Python implemenntation of ordered sets called
sortedcontainers
The script quick_bench.py
measures the time of different set
operations. It uses randomly generated sets of size 1e6 and density
0.125. For each operation, the average time (in seconds) of 30 tests
is reported.
The results have been obtained with:
- CPU Intel Xeon CPU E5-2630 v3
- CPython version 3.5.3
- gcc version 6.3.0
- Cython version 0.28.3
- pyroaring commit dcf448a
- python-croaring commit 3aa61dd
- roaringbitmap commit 502d78d
- sortedcontainers commit 7d6a28c
operation | pyroaring | python-croaring | roaringbitmap | set | sortedcontainers |
---|---|---|---|---|---|
range constructor | 3.09e-04 | 1.48e-04 | 8.72e-05 | 7.29e-02 | 2.08e-01 |
ordered list constructor | 3.45e-02 | 6.93e-02 | 1.45e-01 | 1.86e-01 | 5.74e-01 |
list constructor | 1.23e-01 | 1.33e-01 | 1.55e-01 | 1.12e-01 | 5.12e-01 |
ordered array constructor | 5.06e-03 | 6.42e-03 | 2.89e-01 | 9.82e-02 | 3.01e-01 |
array constructor | 1.13e-01 | 1.18e-01 | 4.63e-01 | 1.45e-01 | 5.08e-01 |
element addition | 3.08e-07 | 8.26e-07 | 2.21e-07 | 1.50e-07 | 1.18e-06 |
element removal | 3.44e-07 | 8.17e-07 | 2.61e-07 | 1.78e-07 | 4.26e-07 |
membership test | 1.24e-07 | 1.00e-06 | 1.50e-07 | 1.00e-07 | 5.72e-07 |
union | 1.61e-04 | 1.96e-04 | 1.44e-04 | 2.15e-01 | 1.11e+00 |
intersection | 9.08e-04 | 9.48e-04 | 9.26e-04 | 5.22e-02 | 1.65e-01 |
difference | 1.57e-04 | 1.97e-04 | 1.43e-04 | 1.56e-01 | 4.84e-01 |
symmetric diference | 1.62e-04 | 2.01e-04 | 1.44e-04 | 2.62e-01 | 9.13e-01 |
equality test | 7.80e-05 | 7.82e-05 | 5.89e-05 | 1.81e-02 | 1.81e-02 |
subset test | 7.92e-05 | 8.12e-05 | 8.22e-05 | 1.81e-02 | 1.81e-02 |
conversion to list | 4.71e-02 | 2.78e-01 | 4.35e-02 | 5.77e-02 | 5.32e-02 |
pickle dump & load | 4.02e-04 | 6.27e-04 | 5.08e-04 | 2.41e-01 | 5.75e-01 |
"naive" conversion to array | 5.12e-02 | 2.92e-01 | 4.75e-02 | 1.20e-01 | 1.18e-01 |
"optimized" conversion to array | 1.27e-03 | 3.40e-02 | nan | nan | nan |
selection | 1.77e-06 | 5.33e-05 | 1.14e-06 | nan | 1.64e-05 |
contiguous slice | 9.38e-05 | 9.51e-05 | 6.99e-05 | nan | 2.04e-02 |
slice | 2.88e-03 | 3.04e-01 | 1.00e-01 | nan | 4.74e-01 |
small slice | 8.93e-05 | 3.00e-01 | 3.60e-03 | nan | 1.79e-02 |