Historically my company has used uuid4 to identify it

128-bits long support about btrees HOT 4 OPEN

zopefoundation commented on July 18, 2024 1

128-bits long support

from btrees.

Comments (4)

jamadden commented on July 18, 2024 2

I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
(in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)

Here's some quick back-of-the-envelope type numbers from CPython 3.7, testing inserting the integer keys from 0 to 4,194,304 and the values the negatives of those using pyperf:

$ python -m pyperf timeit  \
     -s 'from BTrees import family64; bt = family64.OO.BTree()' \
     'for i in range(2**22): bt[i] = -i'

(Yes, if you pass a dict directly to the BTree constructor the iteration happens in C, but that turns out not to make much difference.)

Comparing the times for the standard dict, the LLBTree, and the OOBTree, the OOBTree does not come out looking good.

Benchmark	dict	LLBTree	OOBTree
timeit	542 ms	699 ms: 1.29x slower (+29%)	2.96 sec: 5.46x slower (+446%)

The picture is different if you look at memory allocations (--tracemalloc):

Benchmark	dict	LLBTree	OOBTree
timeit	392.5 MB	12.3 MB: 31.91x smaller (-97%)	263.8 MB: 1.49x smaller (-33%)

Testing i in bt where bt is empty is approximately equal timing for all three maps. If bt is pre-populated with the full range and then tested for each key, we get this:

Benchmark	dict	LLBTree	OOBTree
timeit	397 ms	710 ms: 1.79x slower (+79%)	4.22 sec: 10.63x slower (+963%)

Timing on CPython 2.7 is similar (I only ran the insert test):

Benchmark	dict	LLBTree	OOBTree
timeit	484 ms	697 ms: 1.44x slower (+44%)	2.79 sec: 5.75x slower (+475%)

And here's PyPy7.1 (2.7) doing its best to JIT the pure-Python BTree code:

Benchmark	dict	LLBTree	OOBTree
timeit	679 ms	4.45 sec: 6.56x slower (+556%)	5.22 sec: 7.68x slower (+668%)

from btrees.

azmeuk commented on July 18, 2024

In a more general way, it seems it is hard to use large integers either as keys or values when common C types are too small to contains those integers. This is too bad because integer/long BTrees flavors are announced to have better performances.

>>> import sys, BTrees.LLBTree

>>> BTrees.LLBTree.LLBTree({sys.maxsize+1:0})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: long integer out of range

>>> BTrees.LLBTree.LLBTree({0: sys.maxsize+1})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: long integer out of range

Could representing byte arrays might be a solution here?

from btrees.

mgedmin commented on July 18, 2024

I do not feel qualified to reply to this, but since nobody else is replying, here are some thoughts:

a 128-bit btree version would be a possibility, provided there's a native 128-bit integer type supported by your C compiler, but what are you going to call it? L is already taken.
do 32-bit platforms support 128-bit integers? if not, are we ready to drop 32-bit support (or make it optional)?
somebody (i.e. you, since you want this) would have to come up with a patch adding the new btree version
I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
(in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)

from btrees.

freddrake commented on July 18, 2024

If necessary, 128-bit support could be implemented using two 64-bit values (assuming we're careful!), and we'd get all the memory advantages and at least most of the performance advantages, and could make it work on any platform with 64-bit C types. An available 128-bit type might offer better performance, but I'd worry about having two flavors that need to be tested, especially with not all developers having easy access to both hardware platforms. I think the variations in support are too great to support single 128-bit types directly.

It's unclear whether it makes more sense to use 16-byte bytes objects or int objects (long in Python 2.7) as the Python-facing representation, and there's the question of whether signed or unsigned interpretation makes sense (if using integers instead of bytes).

Coming up with a good name would certainly be a good start.

from btrees.

128-bits long support about btrees HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent