...to verify that bidict performance is comparable to manually managing two dicts and

Put up a general note about performance at <a href="https://bidict.readthedocs.org/en/

Thanks for the tip, <a class="user-mention notranslate" data-hovercard-type="user" dat

You can see an example of the results I'm getting e.g. <a href="https://api.travis-ci.

Thanks for taking a look <a class="user-mention notranslate" data-hovercard-type="user

You can check out the latest work on this here: <a href="https://github.com/jab/bi

add performance tests about bidict HOT 15 CLOSED

jab commented on June 7, 2024

add performance tests

from bidict.

Comments (15)

jab commented on June 7, 2024

I could imagine implementing these tests by using a profiler to count the total number of bytecode instructions required when using bidict to accomplish some basic tasks, and then compare this to using two dicts kept in sync manually (baseline). (Wall clock times could be measured too.)

The tests could then either pass or fail depending on whether bidict's results are within some threshold of the baseline results.

Running these tests could also generate a more detailed report that would allow seeing how much a change to the code improves or degrades performance on the tests.

Any suggestions on how to add something like this to our test suite, @tomviner?

from bidict.

jab commented on June 7, 2024

Would also be cool to see how performance with pypy compares to cpython.

from bidict.

jab commented on June 7, 2024

Put up a general note about performance at https://bidict.readthedocs.org/en/master/performance.html (in part to address some more or less well-founded concern from potential users -- see here and #1 + #1 (comment).

Once we have performance tests running as part of CI, I can link to our results from various of those places.

from bidict.

thedrow commented on June 7, 2024

You can just use pytest-benchmark for performance tests.

from bidict.

jab commented on June 7, 2024

Thanks for the tip, @thedrow. Look forward to checking that out.

from bidict.

jab commented on June 7, 2024

Took a first crack at this. Haven't done anything with pytest-benchmark before. Does 8e6686b look like a reasonable starting place? /cc @thedrow et al.

from bidict.

jab commented on June 7, 2024

You can see an example of the results I'm getting e.g. here. /cc @lordmauve in case this is of interest

from bidict.

thedrow commented on June 7, 2024

Yes. Looks correct.
You can now start profiling and see where you can improve your results.

from bidict.

jab commented on June 7, 2024

Thanks for taking a look @thedrow.

Before I profile, I'd first like to adapt the current benchmarks to demonstrate the performance not just of a 10-item bidict, but rather the asymptotic behavior as number of elements increases. I'd like to confirm that bidict's space and time complexity is on the same order of the space and time complexity of manually keeping two inverse dicts in sync.

Please let me know if you have any other suggestions, and thanks again!

from bidict.

jab commented on June 7, 2024

You can check out the latest work on this here:
https://github.com/jab/bidict/compare/benchmark?expand=1

Check out the "benchmark" branch on Travis for the latest benchmark results.

With these, I was able to make some micro-optimizations to bidict._update, and the benchmarks allow us to see that in the Travis builds, test_bidict_init went from 8.27x slower than test_invdict_init on average without the optimizations, to 6.52x slower on average with the optimizations.

The benchmarks are currently pretty crude, and I have doubts about how representative they are of real-world workloads (not to mention pytest-benchmark advises against running in a VM, which there's no way around on Travis). I'd love to work together with someone on this to check over and improve on this work. Anyone interested? @lordmauve, would love your take on this. Anyone interested, please ping me on Gitter and I'll look forward to getting these landed.

from bidict.

lordmauve commented on June 7, 2024

Function call overhead is pretty high in Python. Maintaining two dicts manually, incline in your own code, would be pretty hard to beat due to the elimination of function call overhead.

We're unlikely to get down to performance comparable to that without a C implementation.

However, that's not to say that big improvements can't be made. The overhead of bidict for reads could be minimal. Inlining _put would help for writes.

from bidict.

jab commented on June 7, 2024

Thanks for taking a look, @lordmauve. I think the read performance is already good enough. As for the write performance, inlining _put turned out not to make much of a difference in my testing, so I'm going to leave the function call in place. I don't have experience speeding up Python with C, do you? I'd definitely be interested to accept a (pypy-compatible) PR that accomplishes this.

While I was looking at this, I decided bidict's update method (and other bulk insert operations, including __init__) should be atomic, which they previously weren't. i.e. If you tried to add several items, where one of them would result in a ValueExistsException, it was previously likely that many of them would be added until the exception-causing one was reached. This resulted in bidict's init benchmark slowing down from about 8x that of the inverse dict's to about 13x with CPython, and about 7x with PyPy. I think the safety gains probably make this worth it, but would love to hear feedback.

We could also consider adding a way to perform a bulk update that ignores new mappings that would clobber existing keys or values, rather than either allowing them to succeed or causing the entire update to fail. Does anyone think that would be worth it?

Also, after doing the above, it made sense to change put to accept keyword arguments allowing you to customize the key- and value-overwriting behavior, with the defaults matching the current behavior (i.e. don't allow either kind of overwrite). This makes put more flexible without breaking backwards compatibility, so it seemed worthwhile. (Now you can even say e.g. put(key, val, overwrite_key=True, overwrite_val=False) where previously there was no way to do so.) And with a more flexible put operation, adding a complementary putall was a quick and easy win, bringing the bulk API in line with the single-item API.

Could you please take a look at #33 and let me know what you think? I also left some questions in the benchmark tests I'd love some comments on. Thanks in advance, and hope you find these latest changes helpful.

from bidict.

jab commented on June 7, 2024

I just switched from the "overwrite" flags to the "collision behavior" abstraction, allowing for a third option besides "overwrite" or "raise": "ignore". PTAL! a039c42

from bidict.

jab commented on June 7, 2024

As of f6227e9, got the test_bidict_init benchmark down to only ~3.5x slower on average than the test_invdict_init benchmark -- down from ~13x slower before. This is the fastest it's ever been, even compared to before the new atomicity guarantees were added. ✨ 🚀 ✨

from bidict.

jab commented on June 7, 2024

Fixed in 0.12.0.

from bidict.

add performance tests about bidict HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent