Comments (19)
This sounds like a serious bug that ought to be fixed.
from btrees.
It may be as simple as reversing the names of the Python implementations. Instead of:
class OOBTreePy(BTree):
...
try:
from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
OOBTree = OOBTreePy
...
else: #pragma NO COVER w/o C extensions
from ._OOBTree import OOBTree
...
Do this:
class OOBTree(BTree):
...
OOBTreePy = OOBTree # NEW; expose the Python implementation always
try:
from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
pass # Nothing to do here anymore
else: #pragma NO COVER w/o C extensions
from ._OOBTree import OOBTree
...
I haven't really thought this through...
from btrees.
That does appear to work.
The second issue (pickling empty BTrees) appears in the plain pure-python implementation, without any cross-implementation factors in play.
I'm working on a branch to fix these.
from btrees.
Argh. No, no that doesn't work. I get test failures as soon as ZODB is available with the C implementation.
from btrees.
Maybe rather something like this from the persistent module?
https://github.com/zopefoundation/persistent/blob/master/persistent/__init__.py
from btrees.
@ml31415 I'm not sure what you're suggesting. Persistent doesn't need to do anything special to make sure that both the C and Python implementations pickle and unpickle the same way (cross implementation) because people usually only pickle subclasses of Persistent. So the module and/or class name differences don't matter much.
from btrees.
Well, but it applies to PersistentList and PersistentMapping. If I remember correctly, they both got the same class location, independent of implementation.
from btrees.
PersistentList and PersistentMapping only have one implementation, in Python.
from btrees.
doh. Ok, I'm out and leave this to the experts :)
from btrees.
By adding a __reduce__
method to the Python base class BTrees._base.Base
and setting a few attributes, it's possible to make the Python versions pickle with the same class names as the C implementation: because __reduce__
returns the class to pickle, the Python implementation can return the C class. At unpickling time, the best available implementation is used. All pickles are otherwise unchanged and backward and forward compatibility is assured.
This has the minor problem that any "legacy" pickles that still have the Python class name in them will unpickle as the Python class (so Michael's DB would still be corrupted). I don't see any way to get around that while maintaining pickle compatibility, short of doing some nasty things with stack introspection (which is slow on PyPy). But that seems livable, especially because it could be worked around during unpickling by setting find_global
.
However, there's a substantial problem that turns out to be a deal breaker: the approach of returning a different class from __reduce__
is simply not allowed under pickle protocols >= 2 if the usual copy_reg.__newobj__
is used; pickling raises an exception in that case (those protocols have a special opcode to directly call cls.__new__
and they make sure that __reduce__
returns the same class as the object being pickled. For some reason.). Those protocols are the default on Python 3. This can be handled by using a custom constructor function instead of copy_reg.__newobj__
.
So far, I'm at a loss for any way to make the C and Python pickles always result in the C version if it's available (and preferably being equal) that doesn't change the pickle format, meaning older versions couldn't unpickle new pickles.
Thoughts?
from btrees.
FWIW, the changes so far are at master...NextThought:issue_2_pickle
from btrees.
Fixed my db by doing lots of undos this afternoon, so as long as noone else complains ... I wouldn't bother about me or reverting any messed up objects. Besides that, would it be so bad to use a custom constructor?
from btrees.
A custom constructor means at least these things:
- The new pickles can't be read by older releases. This can be annoying for moving databases between environments. It may or may not be a show stopper, I don't know.
- The new constructor function effectively becomes part of the API of the package and can't be removed without breaking pickles.
- The higher pickle protocol optimizations don't kick in, so the pickles are slightly larger/slower than they would be using those protocols.
- The pickle formats produced by C and Python implementations would diverge unless someone wants to go in and mess with the C code (I don't). They're different now, so that's not a huge deal, but it would be nice if they were the same (I was this close to that, until I tested on Python 3).
from btrees.
Here's a subversive thought: When performing the class check for the protocol >= 2, all implementations of pickle
(pickle.py
, cPickle
, zodbpickle
) do the equivalent of obj.__class__
; that is, they don't use type(obj)
or PyObject*->tp_type
(or whatever the C equivalent is).
This means we can lie about the __class__
with an appropriate @property
definition such that we return the C implementation. In turn, this means that the Python __reduce__
can return the C implementation for all protocols: no custom constructor, and the pickles for C and Python are identical. Yay!
Downsides: calling the property getter function may add overhead to accessing __class__
, which is done a lot, apparently; may need a metaclass to make issubclass
work as expected (not sure).
from btrees.
such that we return the C implementation
What if there's no C implementation? E.g. the user didn't have the right libraries/headers when they installed BTrees?
from btrees.
What if there's no C implementation?
My proof-of-concept handles this by making sure that the Python implementation has the same __name__
as the C implementation would and returning that. e.g., https://github.com/NextThought/BTrees/blob/issue_2_pickle/BTrees/_base.py#L1538
from btrees.
With the __class__
approach, all tox environments for the branch I referenced before are green and pickles are identical.
There's no need for a metaclass to make isinstance
work out. For reasons having to do with the way isinstance
is implemented, it's possible to be an instance of two unrelated classes without any subclass customization:
>>> class X(object):
... pass
...
>>> class Y(object):
... @property
... def __class__(self):
... return X
...
>>> y = Y()
>>> isinstance(y, type(y)) # Y
True
>>> isinstance(y, y.__class__) # X
True
In isinstance(inst, Type)
, if the C object's ob_type
is identical to or a subclass of Type, checking is short-circuited---that's the first case. Otherwise, inst.__class__
is checked to see if it's a subclass of Type---that's the second case.
from btrees.
I've opened PR #20 for the __class__
based approach.
from btrees.
Looks good, simpler than expected!
from btrees.
Related Issues (20)
- Python 2: OxBTrees allow types as keys; Python 3 does not
- Can we move to the src/ layout? HOT 3
- Support PURE_PYTHON=0 to require C extensions
- BTree.get() swallows POSKeyError on internal corruption (C only) HOT 2
- Python/C Inconsistency: Detecting classes that just implement `__eq__`
- Python/C Inconsistency: Getting/Setting max_internal_size on the BTree class
- Regression in 4.9: Subclasses can't use @adapter
- fsBTree and fsTreeSet broken in 4.9.0/4.9.1
- fsBTree.difference fails when the second argument is a set HOT 2
- Zope5.2.1 install warnings HOT 3
- Convert to meta/config HOT 2
- Consider using cibuldwheel for building binary wheels. HOT 1
- 4.10.0: sphinx warnings `reference target not found` HOT 3
- 4.10.0: pytest is failing in some units HOT 6
- btrees not installing on m1 computer HOT 2
- Get a random element from a BTree HOT 2
- 'IFBucket' object has no attribute 'byValue' when running with PURE_PYTHON HOT 1
- Length Object not working from outside tree class HOT 4
- 5.2: pytest fails in 24 units with pytest 8.2.1 (mostly with `AssertionError: TypeError not raised`) HOT 9
- Move global statics to module state
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from btrees.