Giter VIP home page Giter VIP logo

Comments (13)

ml31415 avatar ml31415 commented on July 18, 2024

I'm a bit confused by the comments in the test suite:

        # Check that a None key can be deleted in Python 2.
        # This doesn't work on Python 3 because None is unorderable,
        # so the tree can't be searched. But None also can't be inserted,
        # and we don't support migrating Python 2 databases to Python 3.

This is stated in a test function, which is explicitly run for py2 only. While at the same time in the very same file some lines above, there are a couple of tests using None as a key which are done for py2 as well as for py3. So I can add None as a key, but it doesn't get tested, if I can delete it?! I guess some more consistent behaviour would be desireable in general. Also, not supporting py2 databases is something I'd hope should be considered last resort.

To add some more incentives to fix this issue, I'd set a bug bounty of $300 to get this sorted out in a consistent way.

from btrees.

jamadden avatar jamadden commented on July 18, 2024

Those comments are presumably outdated. Support for None was added back to BTrees in 4.4.0, and I'm guessing those comments weren't updated.

The support for None was special cased to make Python 3 behave like Python 2.

In general, though, I don't know what we can do about arbitrary unorderable keys. It doesn't seem practical to try to special case tuples of any length that may have None at any position.

>>> sys.version_info
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> (1, None) < (None, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'

from btrees.

ml31415 avatar ml31415 commented on July 18, 2024

Would it be possible, to let the user provide a custom compare function? So it might be possible, to recreate the previous behaviour, even if it may be rather slow.

from btrees.

jamadden avatar jamadden commented on July 18, 2024

Wouldn't a "custom compare function" just be a class implementing __cmp__? Then you can make sure it works the same in both Python 2 and 3.

Note that it's possible to implement __cmp__ in a collections.namedtuple subclass IIRC.

from btrees.

jamadden avatar jamadden commented on July 18, 2024
tree[('c', None)] # works
tree[('a', None)] # TypeError

This is another example of #52 (empty tree will accept a non-orderable type, but non-empty trees won't).

from btrees.

mgedmin avatar mgedmin commented on July 18, 2024

Nitpick: there's no __cmp__ on Python 3, you have to implement __lt__ and all the other rich comparison functions (usuall the help of functools.total_ordering).

from btrees.

ml31415 avatar ml31415 commented on July 18, 2024

I'd rather like to keep the original None and native tuples in there and not make a custom comparable class out of it, in order to remain backwards compatible. Also, I consider the pickling of custom classes, even if comfortable in the beginning, as very painful in the long run, as it makes code refactoring always require database updates. And the only way that I see to avoid that, is to work with inbuilt types only. Providing a sort key on the other hand is rather wide spread throughout the whole standard lib.

from btrees.

jamadden avatar jamadden commented on July 18, 2024

There are three ways I have thought of to provide sort keys, and they all have drawbacks as far as BTrees itself goes.

I'll prefix this by saying that I feel that providing a comparison function (cmp) is right out. That pattern is deprecated in Python 2 and gone in Python 3 (as @mdedmin reminded us), and it would be unPythonic to try to revive it publicly.

So, implementations. The first way is to copy sorted and list.sort and accept a key function for everything that might need to do some comparisons. That includes both __setitem__ and __getitem__ for the tree[key] and tree[key] = value cases. As BTree is a dict-like object, and a dict-like object only accepts one parameter for those two methods, that just can't work, unless you start down the path of "magic" key values that the BTree knows to inspect and split into a key and a function pair...and nobody wants to deal with that. Nobody wants to deal with having to pass a key function every time they try to subscript a tree either.

The second way would be to make the key function a property of the BTree itself. That avoids the dict-like issues, it's true. But lambda functions can't be pickled, so the key function would have to be named as a module global---and at that point you're right back to "it makes code refactoring always require database updates" (which isn't strictly true, there are ways to manage that pretty cleanly, starting with leaving BWC imports in place moving up to zope.deprecation). It's not clear to me that a module global function is any better than a module global class with custom comparison functions anyway. (This approach has the secondary problem of needing a new pickle format for the tree, which is complicated but possible, I think.)

The last way I thought of is to define a _sort_key instance method for OOBTree and have it internally be called every time the BTree needed to do a comparison (which is a lot). Then you could subclass the OOBTree and override that method. This has performance implications even for those trees that don't use it, which is not good (the C implementation uses a macro to inline the call to the C API comparison function, and this would add the substantial overhead of calling a python function, and it's non-trivial to make sure it's called the absolute minimum number of times; the previous approach could probably do a null-pointer check to avoid this overhead but I don't think that's possible with a potential subclass method). Plus, that's right back to the "it makes code refactoring always require database updates" concern, and once again, it's not strictly clear to me how that's any kind of an improvement over just implementing the comparison in the datatype---surely if they're meant to be sortable in a BTree, it would be good to be able to sort them other places too?

The BTree approach (and indeed, the most general Python approach) to sorting non-orderable objects is to have the user define the order by implementing the appropriate comparison methods. That seems more sound to me than any of the options I discussed above.

If one can accept having custom objects live in the ZODB, then I think code compatibility can easily be kept by using a wrapper (similar to zope.container) around the OOBTree that checks incoming keys to see if they are the appropriate namedtuple comparable subclass, and if not, converting them.

from btrees.

ml31415 avatar ml31415 commented on July 18, 2024

Big thanks for your long discussion of this issue! I'm afraid I have to agree with you, that the cleanest way would be sortable objects then. In that case, there is not much more to talk about for this issue, except that it's rather unintuitive that the first tuple with None gets accepted.

from btrees.

jamadden avatar jamadden commented on July 18, 2024

it's rather unintuitive that the first tuple with None gets accepted

I was thinking about that some. The only reason the first tuple got accepted was because it was automatically already higher in sort order than anything else in the tree simply based on the first element (my comment about this being like #52 is incorrect). If the first tuple with None in it was equal to some other element in its first index, that wouldn't happen:

>>> from BTrees.OOBTree import OOBTree
>>> tree = OOBTree()
>>> tree[('a', 100)] = 1
>>> tree[('a', None)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

The same thing happens in reverse:

>>> tree = OOBTree()
>>> tree[('a', None)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < int()

In other words, this is still a continuing result of mixing objects that aren't mutually comparable in the same tree. Because tuples lazily only compare as many elements as necessary to determine the result it can show up at any point, just depending on the data.

>>> tree = OOBTree()
>>> tree[(None, 100)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < str()

So maybe it's the tuple's behaviour that's unintuitive 😉

>>> ('a', 100) < ('a',)
False
>>> ('a', 100) < ('a', None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

from btrees.

mgedmin avatar mgedmin commented on July 18, 2024

Python has a surprising number of footguns for a safe language. :/

from btrees.

ml31415 avatar ml31415 commented on July 18, 2024

I fully agree, that the tuple behaviour is unintuitive and imho not a good fit for a DB key. Some custom tuple class, fixing length to a certain value, and providing comparison for occuring Nones might be the solution for that.

from btrees.

jamadden avatar jamadden commented on July 18, 2024

Cool. Are we ready to close this issue then?

from btrees.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.