Comments (13)
I'm a bit confused by the comments in the test suite:
# Check that a None key can be deleted in Python 2.
# This doesn't work on Python 3 because None is unorderable,
# so the tree can't be searched. But None also can't be inserted,
# and we don't support migrating Python 2 databases to Python 3.
This is stated in a test function, which is explicitly run for py2 only. While at the same time in the very same file some lines above, there are a couple of tests using None
as a key which are done for py2 as well as for py3. So I can add None
as a key, but it doesn't get tested, if I can delete it?! I guess some more consistent behaviour would be desireable in general. Also, not supporting py2 databases is something I'd hope should be considered last resort.
To add some more incentives to fix this issue, I'd set a bug bounty of $300 to get this sorted out in a consistent way.
from btrees.
Those comments are presumably outdated. Support for None was added back to BTrees in 4.4.0, and I'm guessing those comments weren't updated.
The support for None was special cased to make Python 3 behave like Python 2.
In general, though, I don't know what we can do about arbitrary unorderable keys. It doesn't seem practical to try to special case tuples of any length that may have None at any position.
>>> sys.version_info
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> (1, None) < (None, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'
from btrees.
Would it be possible, to let the user provide a custom compare function? So it might be possible, to recreate the previous behaviour, even if it may be rather slow.
from btrees.
Wouldn't a "custom compare function" just be a class implementing __cmp__
? Then you can make sure it works the same in both Python 2 and 3.
Note that it's possible to implement __cmp__
in a collections.namedtuple
subclass IIRC.
from btrees.
tree[('c', None)] # works
tree[('a', None)] # TypeError
This is another example of #52 (empty tree will accept a non-orderable type, but non-empty trees won't).
from btrees.
Nitpick: there's no __cmp__
on Python 3, you have to implement __lt__
and all the other rich comparison functions (usuall the help of functools.total_ordering).
from btrees.
I'd rather like to keep the original None
and native tuples in there and not make a custom comparable class out of it, in order to remain backwards compatible. Also, I consider the pickling of custom classes, even if comfortable in the beginning, as very painful in the long run, as it makes code refactoring always require database updates. And the only way that I see to avoid that, is to work with inbuilt types only. Providing a sort key on the other hand is rather wide spread throughout the whole standard lib.
from btrees.
There are three ways I have thought of to provide sort keys, and they all have drawbacks as far as BTrees itself goes.
I'll prefix this by saying that I feel that providing a comparison function (cmp
) is right out. That pattern is deprecated in Python 2 and gone in Python 3 (as @mdedmin reminded us), and it would be unPythonic to try to revive it publicly.
So, implementations. The first way is to copy sorted
and list.sort
and accept a key
function for everything that might need to do some comparisons. That includes both __setitem__
and __getitem__
for the tree[key]
and tree[key] = value
cases. As BTree is a dict-like object, and a dict-like object only accepts one parameter for those two methods, that just can't work, unless you start down the path of "magic" key values that the BTree knows to inspect and split into a key and a function pair...and nobody wants to deal with that. Nobody wants to deal with having to pass a key function every time they try to subscript a tree either.
The second way would be to make the key function a property of the BTree itself. That avoids the dict-like issues, it's true. But lambda functions can't be pickled, so the key function would have to be named as a module global---and at that point you're right back to "it makes code refactoring always require database updates" (which isn't strictly true, there are ways to manage that pretty cleanly, starting with leaving BWC imports in place moving up to zope.deprecation
). It's not clear to me that a module global function is any better than a module global class with custom comparison functions anyway. (This approach has the secondary problem of needing a new pickle format for the tree, which is complicated but possible, I think.)
The last way I thought of is to define a _sort_key
instance method for OOBTree and have it internally be called every time the BTree needed to do a comparison (which is a lot). Then you could subclass the OOBTree and override that method. This has performance implications even for those trees that don't use it, which is not good (the C implementation uses a macro to inline the call to the C API comparison function, and this would add the substantial overhead of calling a python function, and it's non-trivial to make sure it's called the absolute minimum number of times; the previous approach could probably do a null-pointer check to avoid this overhead but I don't think that's possible with a potential subclass method). Plus, that's right back to the "it makes code refactoring always require database updates" concern, and once again, it's not strictly clear to me how that's any kind of an improvement over just implementing the comparison in the datatype---surely if they're meant to be sortable in a BTree, it would be good to be able to sort them other places too?
The BTree approach (and indeed, the most general Python approach) to sorting non-orderable objects is to have the user define the order by implementing the appropriate comparison methods. That seems more sound to me than any of the options I discussed above.
If one can accept having custom objects live in the ZODB, then I think code compatibility can easily be kept by using a wrapper (similar to zope.container) around the OOBTree that checks incoming keys to see if they are the appropriate namedtuple comparable subclass, and if not, converting them.
from btrees.
Big thanks for your long discussion of this issue! I'm afraid I have to agree with you, that the cleanest way would be sortable objects then. In that case, there is not much more to talk about for this issue, except that it's rather unintuitive that the first tuple with None
gets accepted.
from btrees.
it's rather unintuitive that the first tuple with
None
gets accepted
I was thinking about that some. The only reason the first tuple got accepted was because it was automatically already higher in sort order than anything else in the tree simply based on the first element (my comment about this being like #52 is incorrect). If the first tuple with None
in it was equal to some other element in its first index, that wouldn't happen:
>>> from BTrees.OOBTree import OOBTree
>>> tree = OOBTree()
>>> tree[('a', 100)] = 1
>>> tree[('a', None)] = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()
The same thing happens in reverse:
>>> tree = OOBTree()
>>> tree[('a', None)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < int()
In other words, this is still a continuing result of mixing objects that aren't mutually comparable in the same tree. Because tuples lazily only compare as many elements as necessary to determine the result it can show up at any point, just depending on the data.
>>> tree = OOBTree()
>>> tree[(None, 100)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < str()
So maybe it's the tuple's behaviour that's unintuitive
>>> ('a', 100) < ('a',)
False
>>> ('a', 100) < ('a', None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()
from btrees.
Python has a surprising number of footguns for a safe language. :/
from btrees.
I fully agree, that the tuple behaviour is unintuitive and imho not a good fit for a DB key. Some custom tuple class, fixing length to a certain value, and providing comparison for occuring None
s might be the solution for that.
from btrees.
Cool. Are we ready to close this issue then?
from btrees.
Related Issues (20)
- Release 4.7.3 HOT 2
- Pylance has problem resolving imports HOT 5
- Add support to release aarch64 wheels HOT 2
- Python 2: OxBTrees allow types as keys; Python 3 does not
- Can we move to the src/ layout? HOT 3
- Support PURE_PYTHON=0 to require C extensions
- BTree.get() swallows POSKeyError on internal corruption (C only) HOT 2
- Python/C Inconsistency: Detecting classes that just implement `__eq__`
- Python/C Inconsistency: Getting/Setting max_internal_size on the BTree class
- Regression in 4.9: Subclasses can't use @adapter
- fsBTree and fsTreeSet broken in 4.9.0/4.9.1
- fsBTree.difference fails when the second argument is a set HOT 2
- Zope5.2.1 install warnings HOT 3
- Convert to meta/config HOT 2
- Consider using cibuldwheel for building binary wheels. HOT 1
- 4.10.0: sphinx warnings `reference target not found` HOT 3
- 4.10.0: pytest is failing in some units HOT 6
- btrees not installing on m1 computer HOT 2
- Get a random element from a BTree HOT 2
- 'IFBucket' object has no attribute 'byValue' when running with PURE_PYTHON HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from btrees.