Giter VIP home page Giter VIP logo

Comments (15)

jab avatar jab commented on June 7, 2024

I could imagine implementing these tests by using a profiler to count the total number of bytecode instructions required when using bidict to accomplish some basic tasks, and then compare this to using two dicts kept in sync manually (baseline). (Wall clock times could be measured too.)

The tests could then either pass or fail depending on whether bidict's results are within some threshold of the baseline results.

Running these tests could also generate a more detailed report that would allow seeing how much a change to the code improves or degrades performance on the tests.

Any suggestions on how to add something like this to our test suite, @tomviner?

from bidict.

jab avatar jab commented on June 7, 2024

Would also be cool to see how performance with pypy compares to cpython.

from bidict.

jab avatar jab commented on June 7, 2024

Put up a general note about performance at https://bidict.readthedocs.org/en/master/performance.html (in part to address some more or less well-founded concern from potential users -- see here and #1 + #1 (comment).

Once we have performance tests running as part of CI, I can link to our results from various of those places.

from bidict.

thedrow avatar thedrow commented on June 7, 2024

You can just use pytest-benchmark for performance tests.

from bidict.

jab avatar jab commented on June 7, 2024

Thanks for the tip, @thedrow. Look forward to checking that out.

from bidict.

jab avatar jab commented on June 7, 2024

Took a first crack at this. Haven't done anything with pytest-benchmark before. Does 8e6686b look like a reasonable starting place? /cc @thedrow et al.

from bidict.

jab avatar jab commented on June 7, 2024

You can see an example of the results I'm getting e.g. here. /cc @lordmauve in case this is of interest

from bidict.

thedrow avatar thedrow commented on June 7, 2024

Yes. Looks correct.
You can now start profiling and see where you can improve your results.

from bidict.

jab avatar jab commented on June 7, 2024

Thanks for taking a look @thedrow.

Before I profile, I'd first like to adapt the current benchmarks to demonstrate the performance not just of a 10-item bidict, but rather the asymptotic behavior as number of elements increases. I'd like to confirm that bidict's space and time complexity is on the same order of the space and time complexity of manually keeping two inverse dicts in sync.

Please let me know if you have any other suggestions, and thanks again!

from bidict.

jab avatar jab commented on June 7, 2024

You can check out the latest work on this here:
https://github.com/jab/bidict/compare/benchmark?expand=1

Check out the "benchmark" branch on Travis for the latest benchmark results.

With these, I was able to make some micro-optimizations to bidict._update, and the benchmarks allow us to see that in the Travis builds, test_bidict_init went from 8.27x slower than test_invdict_init on average without the optimizations, to 6.52x slower on average with the optimizations.

The benchmarks are currently pretty crude, and I have doubts about how representative they are of real-world workloads (not to mention pytest-benchmark advises against running in a VM, which there's no way around on Travis). I'd love to work together with someone on this to check over and improve on this work. Anyone interested? @lordmauve, would love your take on this. Anyone interested, please ping me on Gitter and I'll look forward to getting these landed.

from bidict.

lordmauve avatar lordmauve commented on June 7, 2024

Function call overhead is pretty high in Python. Maintaining two dicts manually, incline in your own code, would be pretty hard to beat due to the elimination of function call overhead.

We're unlikely to get down to performance comparable to that without a C implementation.

However, that's not to say that big improvements can't be made. The overhead of bidict for reads could be minimal. Inlining _put would help for writes.

from bidict.

jab avatar jab commented on June 7, 2024

Thanks for taking a look, @lordmauve. I think the read performance is already good enough. As for the write performance, inlining _put turned out not to make much of a difference in my testing, so I'm going to leave the function call in place. I don't have experience speeding up Python with C, do you? I'd definitely be interested to accept a (pypy-compatible) PR that accomplishes this.

While I was looking at this, I decided bidict's update method (and other bulk insert operations, including __init__) should be atomic, which they previously weren't. i.e. If you tried to add several items, where one of them would result in a ValueExistsException, it was previously likely that many of them would be added until the exception-causing one was reached. This resulted in bidict's init benchmark slowing down from about 8x that of the inverse dict's to about 13x with CPython, and about 7x with PyPy. I think the safety gains probably make this worth it, but would love to hear feedback.

We could also consider adding a way to perform a bulk update that ignores new mappings that would clobber existing keys or values, rather than either allowing them to succeed or causing the entire update to fail. Does anyone think that would be worth it?

Also, after doing the above, it made sense to change put to accept keyword arguments allowing you to customize the key- and value-overwriting behavior, with the defaults matching the current behavior (i.e. don't allow either kind of overwrite). This makes put more flexible without breaking backwards compatibility, so it seemed worthwhile. (Now you can even say e.g. put(key, val, overwrite_key=True, overwrite_val=False) where previously there was no way to do so.) And with a more flexible put operation, adding a complementary putall was a quick and easy win, bringing the bulk API in line with the single-item API.

Could you please take a look at #33 and let me know what you think? I also left some questions in the benchmark tests I'd love some comments on. Thanks in advance, and hope you find these latest changes helpful.

from bidict.

jab avatar jab commented on June 7, 2024

I just switched from the "overwrite" flags to the "collision behavior" abstraction, allowing for a third option besides "overwrite" or "raise": "ignore". PTAL! a039c42

from bidict.

jab avatar jab commented on June 7, 2024

As of f6227e9, got the test_bidict_init benchmark down to only ~3.5x slower on average than the test_invdict_init benchmark -- down from ~13x slower before. This is the fastest it's ever been, even compared to before the new atomicity guarantees were added. ✨ 🚀 ✨

from bidict.

jab avatar jab commented on June 7, 2024

Fixed in 0.12.0.

from bidict.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.