This grows out of the discussion in <a class="issue-link js-issue-link" data-error-tex

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pickling issues between C and Python implementations. about btrees HOT 19 CLOSED

zopefoundation commented on July 18, 2024

Pickling issues between C and Python implementations.

from btrees.

Comments (19)

mgedmin commented on July 18, 2024

This sounds like a serious bug that ought to be fixed.

from btrees.

jamadden commented on July 18, 2024

It may be as simple as reversing the names of the Python implementations. Instead of:

class OOBTreePy(BTree): 
   ...

try:
    from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
    OOBTree = OOBTreePy
    ...
else: #pragma NO COVER w/o C extensions
    from ._OOBTree import OOBTree
    ...

Do this:

class OOBTree(BTree): 
   ...

OOBTreePy = OOBTree # NEW; expose the Python implementation always
try:
    from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
    pass # Nothing to do here anymore
else: #pragma NO COVER w/o C extensions
    from ._OOBTree import OOBTree
    ...

I haven't really thought this through...

from btrees.

jamadden commented on July 18, 2024

That does appear to work.

The second issue (pickling empty BTrees) appears in the plain pure-python implementation, without any cross-implementation factors in play.

I'm working on a branch to fix these.

from btrees.

jamadden commented on July 18, 2024

Argh. No, no that doesn't work. I get test failures as soon as ZODB is available with the C implementation.

from btrees.

ml31415 commented on July 18, 2024

Maybe rather something like this from the persistent module?

https://github.com/zopefoundation/persistent/blob/master/persistent/__init__.py

from btrees.

jamadden commented on July 18, 2024

@ml31415 I'm not sure what you're suggesting. Persistent doesn't need to do anything special to make sure that both the C and Python implementations pickle and unpickle the same way (cross implementation) because people usually only pickle subclasses of Persistent. So the module and/or class name differences don't matter much.

from btrees.

ml31415 commented on July 18, 2024

Well, but it applies to PersistentList and PersistentMapping. If I remember correctly, they both got the same class location, independent of implementation.

from btrees.

jamadden commented on July 18, 2024

PersistentList and PersistentMapping only have one implementation, in Python.

from btrees.

ml31415 commented on July 18, 2024

doh. Ok, I'm out and leave this to the experts :)

from btrees.

jamadden commented on July 18, 2024

By adding a __reduce__ method to the Python base class BTrees._base.Base and setting a few attributes, it's possible to make the Python versions pickle with the same class names as the C implementation: because __reduce__ returns the class to pickle, the Python implementation can return the C class. At unpickling time, the best available implementation is used. All pickles are otherwise unchanged and backward and forward compatibility is assured.

This has the minor problem that any "legacy" pickles that still have the Python class name in them will unpickle as the Python class (so Michael's DB would still be corrupted). I don't see any way to get around that while maintaining pickle compatibility, short of doing some nasty things with stack introspection (which is slow on PyPy). But that seems livable, especially because it could be worked around during unpickling by setting find_global.

However, there's a substantial problem that turns out to be a deal breaker: the approach of returning a different class from __reduce__ is simply not allowed under pickle protocols >= 2 if the usual copy_reg.__newobj__ is used; pickling raises an exception in that case (those protocols have a special opcode to directly call cls.__new__ and they make sure that __reduce__ returns the same class as the object being pickled. For some reason.). Those protocols are the default on Python 3. This can be handled by using a custom constructor function instead of copy_reg.__newobj__.

So far, I'm at a loss for any way to make the C and Python pickles always result in the C version if it's available (and preferably being equal) that doesn't change the pickle format, meaning older versions couldn't unpickle new pickles.

Thoughts?

from btrees.

jamadden commented on July 18, 2024

FWIW, the changes so far are at master...NextThought:issue_2_pickle

from btrees.

ml31415 commented on July 18, 2024

Fixed my db by doing lots of undos this afternoon, so as long as noone else complains ... I wouldn't bother about me or reverting any messed up objects. Besides that, would it be so bad to use a custom constructor?

from btrees.

jamadden commented on July 18, 2024

A custom constructor means at least these things:

The new pickles can't be read by older releases. This can be annoying for moving databases between environments. It may or may not be a show stopper, I don't know.
The new constructor function effectively becomes part of the API of the package and can't be removed without breaking pickles.
The higher pickle protocol optimizations don't kick in, so the pickles are slightly larger/slower than they would be using those protocols.
The pickle formats produced by C and Python implementations would diverge unless someone wants to go in and mess with the C code (I don't). They're different now, so that's not a huge deal, but it would be nice if they were the same (I was this close to that, until I tested on Python 3).

from btrees.

jamadden commented on July 18, 2024

Here's a subversive thought: When performing the class check for the protocol >= 2, all implementations of pickle (pickle.py, cPickle, zodbpickle) do the equivalent of obj.__class__; that is, they don't use type(obj) or PyObject*->tp_type (or whatever the C equivalent is).

This means we can lie about the __class__ with an appropriate @property definition such that we return the C implementation. In turn, this means that the Python __reduce__ can return the C implementation for all protocols: no custom constructor, and the pickles for C and Python are identical. Yay!

Downsides: calling the property getter function may add overhead to accessing __class__, which is done a lot, apparently; may need a metaclass to make issubclass work as expected (not sure).

from btrees.

mgedmin commented on July 18, 2024

such that we return the C implementation

What if there's no C implementation? E.g. the user didn't have the right libraries/headers when they installed BTrees?

from btrees.

jamadden commented on July 18, 2024

What if there's no C implementation?

My proof-of-concept handles this by making sure that the Python implementation has the same __name__ as the C implementation would and returning that. e.g., https://github.com/NextThought/BTrees/blob/issue_2_pickle/BTrees/_base.py#L1538

from btrees.

jamadden commented on July 18, 2024

With the __class__ approach, all tox environments for the branch I referenced before are green and pickles are identical.

There's no need for a metaclass to make isinstance work out. For reasons having to do with the way isinstance is implemented, it's possible to be an instance of two unrelated classes without any subclass customization:

>>> class X(object):
...   pass
...
>>> class Y(object):
...   @property
...   def __class__(self):
...     return X
...
>>> y = Y()
>>> isinstance(y, type(y)) # Y
True
>>> isinstance(y, y.__class__) # X
True

In isinstance(inst, Type), if the C object's ob_type is identical to or a subclass of Type, checking is short-circuited---that's the first case. Otherwise, inst.__class__ is checked to see if it's a subclass of Type---that's the second case.

from btrees.

jamadden commented on July 18, 2024

I've opened PR #20 for the __class__ based approach.

from btrees.

ml31415 commented on July 18, 2024

Looks good, simpler than expected!

from btrees.

Pickling issues between C and Python implementations. about btrees HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent