Giter VIP home page Giter VIP logo

Comments (4)

jamadden avatar jamadden commented on July 18, 2024 2

I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
(in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)

Here's some quick back-of-the-envelope type numbers from CPython 3.7, testing inserting the integer keys from 0 to 4,194,304 and the values the negatives of those using pyperf:

$ python -m pyperf timeit  \
     -s 'from BTrees import family64; bt = family64.OO.BTree()' \
     'for i in range(2**22): bt[i] = -i' 

(Yes, if you pass a dict directly to the BTree constructor the iteration happens in C, but that turns out not to make much difference.)

Comparing the times for the standard dict, the LLBTree, and the OOBTree, the OOBTree does not come out looking good.

Benchmark dict LLBTree OOBTree
timeit 542 ms 699 ms: 1.29x slower (+29%) 2.96 sec: 5.46x slower (+446%)

The picture is different if you look at memory allocations (--tracemalloc):

Benchmark dict LLBTree OOBTree
timeit 392.5 MB 12.3 MB: 31.91x smaller (-97%) 263.8 MB: 1.49x smaller (-33%)

Testing i in bt where bt is empty is approximately equal timing for all three maps. If bt is pre-populated with the full range and then tested for each key, we get this:

Benchmark dict LLBTree OOBTree
timeit 397 ms 710 ms: 1.79x slower (+79%) 4.22 sec: 10.63x slower (+963%)

Timing on CPython 2.7 is similar (I only ran the insert test):

Benchmark dict LLBTree OOBTree
timeit 484 ms 697 ms: 1.44x slower (+44%) 2.79 sec: 5.75x slower (+475%)

And here's PyPy7.1 (2.7) doing its best to JIT the pure-Python BTree code:

Benchmark dict LLBTree OOBTree
timeit 679 ms 4.45 sec: 6.56x slower (+556%) 5.22 sec: 7.68x slower (+668%)

from btrees.

azmeuk avatar azmeuk commented on July 18, 2024

In a more general way, it seems it is hard to use large integers either as keys or values when common C types are too small to contains those integers. This is too bad because integer/long BTrees flavors are announced to have better performances.

>>> import sys, BTrees.LLBTree

>>> BTrees.LLBTree.LLBTree({sys.maxsize+1:0})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: long integer out of range

>>> BTrees.LLBTree.LLBTree({0: sys.maxsize+1})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: long integer out of range

Could representing byte arrays might be a solution here?

from btrees.

mgedmin avatar mgedmin commented on July 18, 2024

I do not feel qualified to reply to this, but since nobody else is replying, here are some thoughts:

  • a 128-bit btree version would be a possibility, provided there's a native 128-bit integer type supported by your C compiler, but what are you going to call it? L is already taken.
  • do 32-bit platforms support 128-bit integers? if not, are we ready to drop 32-bit support (or make it optional)?
  • somebody (i.e. you, since you want this) would have to come up with a patch adding the new btree version
  • I'd like to see actual benchmarks showing how this is faster/smaller compared to an OOBTree
  • (in fact I'd like to see actual benchmarks comparing OOBTree to an LLBTree to see what gains are possible)

from btrees.

freddrake avatar freddrake commented on July 18, 2024

If necessary, 128-bit support could be implemented using two 64-bit values (assuming we're careful!), and we'd get all the memory advantages and at least most of the performance advantages, and could make it work on any platform with 64-bit C types. An available 128-bit type might offer better performance, but I'd worry about having two flavors that need to be tested, especially with not all developers having easy access to both hardware platforms. I think the variations in support are too great to support single 128-bit types directly.

It's unclear whether it makes more sense to use 16-byte bytes objects or int objects (long in Python 2.7) as the Python-facing representation, and there's the question of whether signed or unsigned interpretation makes sense (if using integers instead of bytes).

Coming up with a good name would certainly be a good start.

from btrees.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.