Comments (15)
I could imagine implementing these tests by using a profiler to count the total number of bytecode instructions required when using bidict to accomplish some basic tasks, and then compare this to using two dicts kept in sync manually (baseline). (Wall clock times could be measured too.)
The tests could then either pass or fail depending on whether bidict's results are within some threshold of the baseline results.
Running these tests could also generate a more detailed report that would allow seeing how much a change to the code improves or degrades performance on the tests.
Any suggestions on how to add something like this to our test suite, @tomviner?
from bidict.
Would also be cool to see how performance with pypy compares to cpython.
from bidict.
Put up a general note about performance at https://bidict.readthedocs.org/en/master/performance.html (in part to address some more or less well-founded concern from potential users -- see here and #1 + #1 (comment).
Once we have performance tests running as part of CI, I can link to our results from various of those places.
from bidict.
You can just use pytest-benchmark for performance tests.
from bidict.
Thanks for the tip, @thedrow. Look forward to checking that out.
from bidict.
Took a first crack at this. Haven't done anything with pytest-benchmark before. Does 8e6686b look like a reasonable starting place? /cc @thedrow et al.
from bidict.
You can see an example of the results I'm getting e.g. here. /cc @lordmauve in case this is of interest
from bidict.
Yes. Looks correct.
You can now start profiling and see where you can improve your results.
from bidict.
Thanks for taking a look @thedrow.
Before I profile, I'd first like to adapt the current benchmarks to demonstrate the performance not just of a 10-item bidict, but rather the asymptotic behavior as number of elements increases. I'd like to confirm that bidict's space and time complexity is on the same order of the space and time complexity of manually keeping two inverse dicts in sync.
Please let me know if you have any other suggestions, and thanks again!
from bidict.
You can check out the latest work on this here:
https://github.com/jab/bidict/compare/benchmark?expand=1
Check out the "benchmark" branch on Travis for the latest benchmark results.
With these, I was able to make some micro-optimizations to bidict._update
, and the benchmarks allow us to see that in the Travis builds, test_bidict_init went from 8.27x slower than test_invdict_init on average without the optimizations, to 6.52x slower on average with the optimizations.
The benchmarks are currently pretty crude, and I have doubts about how representative they are of real-world workloads (not to mention pytest-benchmark advises against running in a VM, which there's no way around on Travis). I'd love to work together with someone on this to check over and improve on this work. Anyone interested? @lordmauve, would love your take on this. Anyone interested, please ping me on Gitter and I'll look forward to getting these landed.
from bidict.
Function call overhead is pretty high in Python. Maintaining two dicts manually, incline in your own code, would be pretty hard to beat due to the elimination of function call overhead.
We're unlikely to get down to performance comparable to that without a C implementation.
However, that's not to say that big improvements can't be made. The overhead of bidict for reads could be minimal. Inlining _put would help for writes.
from bidict.
Thanks for taking a look, @lordmauve. I think the read performance is already good enough. As for the write performance, inlining _put
turned out not to make much of a difference in my testing, so I'm going to leave the function call in place. I don't have experience speeding up Python with C, do you? I'd definitely be interested to accept a (pypy-compatible) PR that accomplishes this.
While I was looking at this, I decided bidict's update
method (and other bulk insert operations, including __init__
) should be atomic, which they previously weren't. i.e. If you tried to add several items, where one of them would result in a ValueExistsException
, it was previously likely that many of them would be added until the exception-causing one was reached. This resulted in bidict's init
benchmark slowing down from about 8x that of the inverse dict's to about 13x with CPython, and about 7x with PyPy. I think the safety gains probably make this worth it, but would love to hear feedback.
We could also consider adding a way to perform a bulk update that ignores new mappings that would clobber existing keys or values, rather than either allowing them to succeed or causing the entire update to fail. Does anyone think that would be worth it?
Also, after doing the above, it made sense to change put
to accept keyword arguments allowing you to customize the key- and value-overwriting behavior, with the defaults matching the current behavior (i.e. don't allow either kind of overwrite). This makes put
more flexible without breaking backwards compatibility, so it seemed worthwhile. (Now you can even say e.g. put(key, val, overwrite_key=True, overwrite_val=False)
where previously there was no way to do so.) And with a more flexible put
operation, adding a complementary putall
was a quick and easy win, bringing the bulk API in line with the single-item API.
Could you please take a look at #33 and let me know what you think? I also left some questions in the benchmark tests I'd love some comments on. Thanks in advance, and hope you find these latest changes helpful.
from bidict.
I just switched from the "overwrite" flags to the "collision behavior" abstraction, allowing for a third option besides "overwrite" or "raise": "ignore". PTAL! a039c42
from bidict.
As of f6227e9, got the test_bidict_init benchmark down to only ~3.5x slower on average than the test_invdict_init benchmark -- down from ~13x slower before. This is the fastest it's ever been, even compared to before the new atomicity guarantees were added. ✨ 🚀 ✨
from bidict.
Fixed in 0.12.0.
from bidict.
Related Issues (20)
- Bug in BidirectionalMapping.__subclasshook__(): Any class with an inverse attribute is considered a collections.abc.Mapping HOT 8
- Suggestion: Include licence in documentation HOT 3
- Not installable from GitHub release tarball HOT 6
- Add more properties based tests HOT 1
- global name 'BidirectionalMapping' is not defined HOT 21
- github git tags out of sync since 0.18.4 HOT 3
- Missing __all__ in bidict/__init__.py leads to implicit reexport error with mypy in strict mode. HOT 9
- logo HOT 5
- Dependency Dashboard
- Type hints for Bidict type HOT 2
- Try slipcover HOT 1
- Maybe use nix for devcontainer and GHA “tests” workflow
- Why it returns None when I use bidict?
- Will you add a persistence method? HOT 1
- Improve OpenSSF Scorecard HOT 6
- Swap syntax does not work HOT 4
- Automate upgrading dev dependencies HOT 1
- Type alias definition for `MISSING` results in type violations HOT 1
- test fails with python3.12 HOT 3
- Remove dead batteries
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bidict.