Comments (4)
I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
(in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)
Here's some quick back-of-the-envelope type numbers from CPython 3.7, testing inserting the integer keys from 0 to 4,194,304 and the values the negatives of those using pyperf:
$ python -m pyperf timeit \
-s 'from BTrees import family64; bt = family64.OO.BTree()' \
'for i in range(2**22): bt[i] = -i'
(Yes, if you pass a dict
directly to the BTree constructor the iteration happens in C, but that turns out not to make much difference.)
Comparing the times for the standard dict
, the LLBTree, and the OOBTree, the OOBTree does not come out looking good.
Benchmark | dict | LLBTree | OOBTree |
---|---|---|---|
timeit | 542 ms | 699 ms: 1.29x slower (+29%) | 2.96 sec: 5.46x slower (+446%) |
The picture is different if you look at memory allocations (--tracemalloc
):
Benchmark | dict | LLBTree | OOBTree |
---|---|---|---|
timeit | 392.5 MB | 12.3 MB: 31.91x smaller (-97%) | 263.8 MB: 1.49x smaller (-33%) |
Testing i in bt
where bt
is empty is approximately equal timing for all three maps. If bt
is pre-populated with the full range and then tested for each key, we get this:
Benchmark | dict | LLBTree | OOBTree |
---|---|---|---|
timeit | 397 ms | 710 ms: 1.79x slower (+79%) | 4.22 sec: 10.63x slower (+963%) |
Timing on CPython 2.7 is similar (I only ran the insert test):
Benchmark | dict | LLBTree | OOBTree |
---|---|---|---|
timeit | 484 ms | 697 ms: 1.44x slower (+44%) | 2.79 sec: 5.75x slower (+475%) |
And here's PyPy7.1 (2.7) doing its best to JIT the pure-Python BTree code:
Benchmark | dict | LLBTree | OOBTree |
---|---|---|---|
timeit | 679 ms | 4.45 sec: 6.56x slower (+556%) | 5.22 sec: 7.68x slower (+668%) |
from btrees.
In a more general way, it seems it is hard to use large integers either as keys or values when common C types are too small to contains those integers. This is too bad because integer/long BTrees flavors are announced to have better performances.
>>> import sys, BTrees.LLBTree
>>> BTrees.LLBTree.LLBTree({sys.maxsize+1:0})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: long integer out of range
>>> BTrees.LLBTree.LLBTree({0: sys.maxsize+1})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: long integer out of range
Could representing byte arrays might be a solution here?
from btrees.
I do not feel qualified to reply to this, but since nobody else is replying, here are some thoughts:
- a 128-bit btree version would be a possibility, provided there's a native 128-bit integer type supported by your C compiler, but what are you going to call it? L is already taken.
- do 32-bit platforms support 128-bit integers? if not, are we ready to drop 32-bit support (or make it optional)?
- somebody (i.e. you, since you want this) would have to come up with a patch adding the new btree version
- I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
- (in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)
from btrees.
If necessary, 128-bit support could be implemented using two 64-bit values (assuming we're careful!), and we'd get all the memory advantages and at least most of the performance advantages, and could make it work on any platform with 64-bit C types. An available 128-bit type might offer better performance, but I'd worry about having two flavors that need to be tested, especially with not all developers having easy access to both hardware platforms. I think the variations in support are too great to support single 128-bit types directly.
It's unclear whether it makes more sense to use 16-byte bytes
objects or int
objects (long
in Python 2.7) as the Python-facing representation, and there's the question of whether signed or unsigned interpretation makes sense (if using integers instead of bytes
).
Coming up with a good name would certainly be a good start.
from btrees.
Related Issues (20)
- Release 4.7.3 HOT 2
- Pylance has problem resolving imports HOT 5
- Add support to release aarch64 wheels HOT 2
- Python 2: OxBTrees allow types as keys; Python 3 does not
- Can we move to the src/ layout? HOT 3
- Support PURE_PYTHON=0 to require C extensions
- BTree.get() swallows POSKeyError on internal corruption (C only) HOT 2
- Python/C Inconsistency: Detecting classes that just implement `__eq__`
- Python/C Inconsistency: Getting/Setting max_internal_size on the BTree class
- Regression in 4.9: Subclasses can't use @adapter
- fsBTree and fsTreeSet broken in 4.9.0/4.9.1
- fsBTree.difference fails when the second argument is a set HOT 2
- Zope5.2.1 install warnings HOT 3
- Convert to meta/config HOT 2
- Consider using cibuldwheel for building binary wheels. HOT 1
- 4.10.0: sphinx warnings `reference target not found` HOT 3
- 4.10.0: pytest is failing in some units HOT 6
- btrees not installing on m1 computer HOT 2
- Get a random element from a BTree HOT 2
- 'IFBucket' object has no attribute 'byValue' when running with PURE_PYTHON HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from btrees.