biqqles / dataclassy Goto Github PK
View Code? Open in Web Editor NEWA fast and flexible reimplementation of data classes
Home Page: https://pypi.org/project/dataclassy
License: Mozilla Public License 2.0
A fast and flexible reimplementation of data classes
Home Page: https://pypi.org/project/dataclassy
License: Mozilla Public License 2.0
inspect.Signature
is irritating to use, and so we convert it to a string to check function signatures against their expected value. The problem with this is that Python periodically changes the behaviour of Signature.__str__
. This results in spurious test failures (the tests are currently set up for the format produced in 3.7..3.9).
This blocks #41.
We ran into this mypy typing error below when upgrading from 0.10.3 to 0.11.0.
Here are the steps to reproduce:
x.py:
import dataclassy
@dataclassy.dataclass(slots=True, kwargs=True, repr=True)
class Foo:
bar: str = "bar"
if __name__ == "__main__":
x = Foo()
print(x.bar)
$ python3 -mvenv ~/env3
$ ~/env3/bin/pip install mypy dataclassy==0.10.3
Collecting mypy
Using cached mypy-0.910-cp38-cp38-manylinux2010_x86_64.whl (22.8 MB)
Collecting dataclassy==0.10.3
Using cached dataclassy-0.10.3-py3-none-any.whl (23 kB)
Collecting typing-extensions>=3.7.4
Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting toml
Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting mypy-extensions<0.5.0,>=0.4.3
Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
Installing collected packages: typing-extensions, toml, mypy-extensions, mypy, dataclassy
Successfully installed dataclassy-0.10.3 mypy-0.910 mypy-extensions-0.4.3 toml-0.10.2 typing-extensions-3.10.0.2
$ ~/env3/bin/python x.py
bar
$ ~/env3/bin/mypy x.py
Success: no issues found in 1 source file
$ ~/env3/bin/pip install --upgrade dataclassy==0.11.0
Collecting dataclassy==0.11.0
Using cached dataclassy-0.11.0-py3-none-any.whl (23 kB)
Installing collected packages: dataclassy
Attempting uninstall: dataclassy
Found existing installation: dataclassy 0.10.3
Uninstalling dataclassy-0.10.3:
Successfully uninstalled dataclassy-0.10.3
Successfully installed dataclassy-0.11.0
$ ~/env3/bin/python x.py
bar
$ ~/env3/bin/mypy x.py
x.py:4: error: Cannot instantiate type "Type[<nothing>]"
Found 1 error in 1 file (checked 1 source file)
DataClassMeta
currently collects the contents of the inherited classes' dicts and passes this as the dictionary of the new class to type.__new__
. This can prevent type.__new__
from doing method inheritance "correctly" and is the reason this test fails.
Following on from the example in #6, this currently works: s = State(7, path=1)
.
s.path
is 1
, but that's not obvious, and it would normally cause TypeError: __new__() got multiple values for argument 'path'
as it does if __init__
is not defined.
The solution to this could be making all the arguments in __new__
keyword only, then in __call__
converting positional arguments destined for it into keyword arguments. Removing *args
from __new__
means that duplicated positional arguments will not be silently passed back to __call__
and will instead raise the normal exception. Since __new__
is (normally) never called by the user directly, this would also be fully backwards compatible.
I just found dataclassy and it looks really useful! I wanted to test it together with https://github.com/s-knibbs/dataclasses-jsonschema which I'm using in one of my projects, but got this:
>>> from dataclassy import dataclass
>>> from dataclasses_jsonschema import JsonSchemaMixin
>>> @dataclass
... class Item(JsonSchemaMixin):
... id: str = ""
...
>>> it=Item()
>>> it.to_json()
Traceback (most recent call last):
File "/usr/lib/python3.8/dataclasses.py", line 1031, in fields
fields = getattr(class_or_instance, _FIELDS)
AttributeError: type object 'Item' has no attribute '__dataclass_fields__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 861, in to_json
return json.dumps(self.to_dict(omit_none, validate), **json_kwargs)
File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 389, in to_dict
for f in self._get_fields():
File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 380, in _get_fields
cls.__mapped_fields = _get_fields_uncached()
File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 353, in _get_fields_uncached
for f in fields(cls):
File "/usr/lib/python3.8/dataclasses.py", line 1033, in fields
raise TypeError('must be called with a dataclass type or instance')
TypeError: must be called with a dataclass type or instance
Would it be possible to add/emulate __dataclass_fields__
to make these two compatible?
Is there a way to add metadata to the fields? For instance, with the buildin dataclasses
module, I use
title: Optional[str] = field(default=None, metadata={"sa": Column(String)})
url: Optional[str] = field(default=None, metadata={"sa": Column(String)})
The metadata
dict allows SQLAlchemy to read the generated dataclass's annotations and map it automatically. Dataclassy doesn't seem to have the field
generator function, so is it possible to attach metadata?
How do I combine dataclassy's dataclass with dataclass-property's dataclass or marshmallow_dataclass with marshmallow.validate? I need sub classing as well as validation. I think there's no other way than to write normal classes with property getters and setters. All my dataclasses have one ABC parent
I have existing code that uses dataclasses with __post_init__
. It would be great if dataclassy would allow me to continue using this function without having to refactor all of my old dataclasses.
__post_init__
has the added benefit of allowing me to skip redefining the parameter names a second time, e.g.
@dataclass
class SolverState:
possible: Tuple[FrozenSet[str], ...] = ()
switches: Tuple[str, ...] = ()
path: Tuple[bool, ...] = ()
roles: Dict[str, int] = field(default_factory=dict)
count_true: int = -1
def __post_init__(self) -> None:
if not self.possible:
self.possible = (5,)
if self.count_true == -1:
self.count_true = self.path.count(True)
if not self.roles:
self.roles = self._roles_init()
@dataclass(slots=True)
class SolverState:
possible: Tuple[FrozenSet[str], ...] = ()
switches: Tuple[str, ...] = ()
path: Tuple[bool, ...] = ()
roles: Dict[str, int] = {}
count_true: int = -1
def __init__(
self,
possible: Tuple[FrozenSet[str], ...] = (),
switches: Tuple[str, ...] = (),
path: Tuple[bool, ...] = (),
roles: Dict[str, int] = {},
count_true: int = -1,
) -> None:
if not self.possible:
self.possible = (5,)
if self.count_true == -1:
self.count_true = self.path.count(True)
if not self.roles:
self.roles = self._roles_init()
Back when dataclassy used __new__
to set class fields (pre-v0.7), __init__
was left free for the user to use for post-initialisation logic. This worked well, but since v0.7, dataclassy has used __init__
for its own purposes. To retain compatibility, an __init__
method defined on a class where init=True
is aliased to __post_init__
. This is rather confusing and raises a lot of uncertain edge cases:
__init__
and __post_init__
are defined?In addition, strictly speaking these theoretical subtle differences in behaviour mean that test cases should test the same scenarios for both method names.
Finally, judging by the code examples in the issues here, no one really uses it. Therefore I now think this compatibility should be removed. I expect this to be the last breaking change before a stable release (and there haven't been many!).
My plan is to:
DeprecationWarning
in the next release (v0.9)__init__
equal to __post_init__
if the former is not generated (init=False
or no fields) and the latter is defined. This is the only reason not to use __post_init__
currently, and fixes another bug in dataclasses.The current __hash__
implementation is not good. It simply filters the fields of the instance it's being applied to, ignores fields that aren't hashable and creates a tuple which it then hashes. This is slow and also provides no warning if a field that should be hashable is not, potentially causing hash collisions. The fact that no one has complained so far is likely testament to the fact that it's almost always better to write a __hash__
manually anyway. But this should be improved.
There are at least two ways to do this:
typing.get_type_hints
at class initialisation time, to work out which fields are hashable. This is slow and wouldn't give the flexibility of being able to determine which fields to include in the hashInternal
, called Hashed
which would include the field in hash. I think this is better. The only caveat is that I'm pretty sure this breaks tools like mypy. Maybe there is a way to tell it to treat the "wrapped" type as the significant one, like with Optional
.Still debugging, but it appears to be caused by this line in this commit (which is present starting in v0.10.1
)
0f34238#diff-daf1e1a0a18a32d8c50faa743257d9e130ac95adeab1fddb2932525d147c7520R80
Error:
ImportError while loading conftest '~/work/ApeWorX/ape/tests/conftest.py'.
...
src/ape_accounts/accounts.py:34: in <module>
class KeyfileAccount(AccountAPI):
~/.venv/python3.8/site-packages/dataclassy/dataclass.py:80: in __new__
all_attrs = {a for b in bases for a in dir(b) if is_user_func(getattr(b, a))} | dict_.keys()
~/.venv/python3.8/site-packages/dataclassy/dataclass.py:80: in <setcomp>
all_attrs = {a for b in bases for a in dir(b) if is_user_func(getattr(b, a))} | dict_.keys()
E AttributeError: __abstractmethods__
I'm probably using it incorrectly, would be open for a better way to create subclass-able dataclass
classes with abstract methods. It's a pretty integral part of how we're using it for our plugin system.
For reference, AccountAPI
is a dataclassy.dataclass
using abc.ABCMeta
abstract metaclass and defining some abc.abstractmethod
s. Can check it out here:
https://github.com/ApeWorX/ape/blob/687032e0b122a46fc73184fd7a0bea18e7b1a383/src/ape/api/accounts.py#L62
Is there a proposed way to define a dataclass with a custom metaclass? It seems like it would have to inherit from DataClassMeta and the signature of apply_metaclass takes the metaclass as a kwarg but the dataclass decorator doesn't pass along an option to override that. Currently, this is ignored:
@dataclass
class DataClass(metaclass=CustomMetaClass):
prop: str
This project looks great!
I've been heavily using dacite - in particular, the from_dict
method to initialise a nested data class from a nested dictionary.
Have you come across this library? What would be required for these to work together?
Thanks for creating dataclassy! In terms of features it does everything I want, but I'm also very sensitive to performance since I'm working on a project that constructs hundreds of thousands of small objects in a tight loop.
I've observed roughly a 33% slowdown for small object creation between dataclasses and dataclassy in CPython 3.9.0 (benchmark below). Do you have any thoughts on this? How much of a priority is performance for dataclassy? Would you welcome a PR that closed the performance gap at the cost of significantly longer or more complex code?
from timeit import timeit
import dataclasses
import dataclassy
@dataclasses.dataclass
class Foo1:
__slots__ = 'x', 'y'
x: int
y: int
@dataclassy.dataclass(slots=True)
class Foo2:
x: int
y: int
def f1():
Foo1(1, 2)
def f2():
Foo2(1, 2)
print(timeit(f1, number=1000000))
print(timeit(f2, number=1000000))
Simple example
from dataclassy import dataclass, as_dict
@dataclass
class MyClass:
_my_internal_tracking_field: str = "Parent Class"
my_other_field: str
>>> my_instance = MyClass(my_other_field="test")
>>> my_instance.__repr__
<bound method __repr__ of MyClass(my_other_field='test')>
>>> as_dict(my_instance)
{'_my_internal_tracking_field': 'Parent Class', 'my_other_field': 'test'}
From the README, hide_internals
is True
by default but seems to only apply to:
__repr__ and excludes them from comparison and iteration
Is there any case where we would want to definitely want to expose internal fields in the dictionary representation?
I see internals=True
by default here in functions.py
, maybe it should pass the hide_internals
option instead of setting True
automatically.
https://github.com/biqqles/dataclassy/blob/master/dataclassy/functions.py#L65-L69
I don't mind making this change but just want to see if I am missing any use cases.
As suggested in the README, the non-copyable fields are populated in the constructor:
import asyncio
import dataclassy
@dataclassy.dataclass(slots=True)
class C:
cnd: asyncio.Condition = None
evt: asyncio.Event = None
def __init__(self) -> None:
self.cnd = asyncio.Condition()
self.evt = asyncio.Event()
c = C()
print(repr(c.cnd))
print(repr(c.evt))
In that case, the strict mypy typing fails:
$ python --version
Python 3.9.1
$ mypy --version
mypy 0.800
$ mypy --strict _dcy_late_init.py
_dcy_late_init.py:6: error: Incompatible types in assignment (expression has type "None", variable has type "Condition")
_dcy_late_init.py:7: error: Incompatible types in assignment (expression has type "None", variable has type "Event")
_dcy_late_init.py:10: error: Property "cnd" defined in "C" is read-only
_dcy_late_init.py:11: error: Property "evt" defined in "C" is read-only
# mypy.ini
[mypy]
warn_unused_configs = True
ignore_missing_imports = True
plugins = dataclassy.mypy
dataclassy
is installed from git as of 1273387.
Actually, two issues at once:
None
to a non-optional field. Making it Optional[...]
it not desired, as it requires changing the logic all around the code (or mypy will complain on accessing fields of None
all around).Is there any other way of initialising the non-copyable objects (a functional equivalent of dataclasses' default_factory
)?
I would suggest auto-instantiating the objects if the default value is a class. The cases when a class is actually a value to be stored can be detected by having a field annotation Type[...]
โ in that case, no instantiation should happen. But I am not sure if this covers all the use-cases and does not break anything.
So it's that time of year when I finally have a bit more time to experiment with my projects. I've been thinking about the uninspiring performance when (and only when) a custom __init__
is used, as @TylerYep pointed out in #6 (comment). This arises because the current __call__
implementation used is generic and dynamically modifies the parameters to either __new__
or __init__
- what would be really good is to generate a static __call__
that does this with no overhead. To keep performance good and code simple, this method would both initialise the instance with its parameters and redirect any additional ones to __init__
if it's defined. However, because an object's __call__
method has to be defined on its type (i.e. in the case of a class its metaclass), this means we have to dynamically create a subclass of the metaclass at the time of decorator use.
I've implemented this, and though it works surprisingly well in my other tests, it breaking multiple inheritance (highlighted below) means it's nowhere near ready for rolling out in releases yet. Despite this, I'm curious to see how well it works in the code of others, if you would like to test it. The code is in the branch static-call
.
Advantages
__call__
), not __call__
, __new__
and __signature__
__new__
could be considered hacky as it's not "supposed" to be used to initialise the class instanceDisadvantages
__new__
on a data class to instantiate it without __init__
being executed. I'm not sure if this is actually useful (or a good idea to do), but it is a feature nonethelessB
is a subclass of A
. With this method, type(A) is not type(B)
which is unusual and surprising. Only issubclass(type(B), type(A))
is true. However, Python supports this "dynamic metaclass" paradigm fine. Besides looking odd (and how often do you compare the types of classes, really!) I've found no side effects other than...class C(A, B)
, now you have to do something likeclass CMeta(type(A), type(B)):
pass
@dataclass(meta=CMeta)
class C:
...
Hi again! I'm getting the following error (simplified for your convenience):
@dataclass
class State:
path: Tuple[int, ...] = 1
def __init__(self):
pass
raises_error = State(7)
The error message:
cls = <class 'tests.State'>, args = (7,), kwargs = {}, instance = State(path=1)
def __call__(cls, *args, **kwargs):
instance = cls.__new__(cls, *args, **kwargs)
args = args[cls.__new__.__code__.co_argcount - 1:] # -1 for 'cls'
for parameter in kwargs.keys() & cls.__annotations__.keys():
del kwargs[parameter]
> instance.__init__(*args, **kwargs)
E TypeError: __init__() takes 1 positional argument but 2 were given
dataclassy/dataclass.py:111: TypeError
The code works fine when I add the keyword State(path=7)
or remove the default or remove the __init__(self)
method.
I found a bug when I attempt to mix metaclasses w/ dataclassy
. Here is what I was doing:
@abstractdataclass
class ConverterAPI(Generic[ABIType]):
...
@abstractmethod
def convert(self, value: str) -> ABIType:
...
Generic
has nothing to do with dataclassy'
mechanics, but I could see how the metaclass might interfere.
This is the error it raises in Python 3.6 (from my CI):
...
class ConverterAPI(Generic[ABIType]):
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/decorator.py:40: in dataclass
return apply_metaclass(cls)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/decorator.py:35: in apply_metaclass
return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/dataclass.py:137: in __new__
return super().__new__(mcs, name, bases, dict_)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/abc.py:133: in __new__
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
E TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
For reference, here is the code that defines the abstractdataclass
decorator:
from abc import ABCMeta, abstractmethod
from functools import partial
from dataclassy import dataclass
from dataclassy.dataclass import DataClassMeta
class AbstractDataClassMeta(DataClassMeta, ABCMeta):
pass
abstractdataclass = partial(dataclass, kwargs=True, meta=AbstractDataClassMeta)
It works perfectly fine with Python 3.7+, I think related to this:
https://stackoverflow.com/questions/11276037/resolving-metaclass-conflicts
I encountered this issue while testing dataclassy. From reading the documentation, it looks like I should pass something into the meta= parameter to dataclasses? I'm not quite sure how to fix this. If this is easily solvable, maybe you could edit to the error message to explain what to do.
This was while testing the performance branch, but I suspect it applies to the current main branch as well.
from typing import Any, Generic, Mapping, TypeVar
from dataclassy import dataclass
V = TypeVar("V")
@dataclass
class Edge(Generic[V], Mapping[str, Any]):
start: V
end: V
weight: float
or a simpler version:
from typing import Mapping
from dataclassy import dataclass
@dataclass
class Edge(Mapping[str, float]):
weight: float
Running this code gives:
Traceback (most recent call last):
File "/Users/tyler.yep/Documents/Github/workshop/explore/ff.py", line 8, in <module>
class Edge( Mapping[str, Any]):
File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/decorator.py", line 25, in dataclass
return apply_metaclass(cls)
File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/decorator.py", line 20, in apply_metaclass
return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/dataclass.py", line 96, in __new__
return type.__new__(mcs, name, bases, dict_)
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Thanks!
In the docs, it stats that a copy is made of data items provided as defaults. But in core dataclasses, you can also supply a default_factory
. How would you do this in dataclassy?
Commit 5a2d1a5, included in release v0.7.3, somehow introduced a large performance penalty to class initialisation in some cases. Testing this function with timeit:
python -m timeit -s "import flint; flint.paths.set_install_path('...')" -n1 -r1 "flint.routines.get_markets()"
The runtime goes from ~1.2 seconds to ~5 seconds.
Will update this as I learn more.
Tagging @hroskes as this is related to a change you suggested.
Classes should support the PEP 622 Structural Pattern Matching feature of Python 3.10. From reading the PEP it seems this should be rather simple, simply adding a __match_args__
tuple which is the same as the arguments to __init__
.
First of all, great work!
I wondered if the verbosity of the code could be further reduced by allowing the factory function to be used without arguments. In that case, the argument to factory
should be derived from the type annotation. E.g.
@dataclass(kw_only=True)
class GenerateCodeFn(Entity):
post_process_source_code_fn: PostProcessSourceCodeFn = factory
Of course, this is only a minor improvement, but still potentially a useful one.
By the way, it seems that the following alternative has a drawback (for me), since vscode fails to detect the type of post_process_source_code_fn
and therefore symbol lookup doesn't work.
@dataclass(kw_only=True)
class GenerateCodeFn(Entity):
post_process_source_code_fn = factory(PostProcessSourceCodeFn)
I want to use slots in CPython using dataclassy, but I also want to be able to hide initializers for certain variables from the user. For example:
@dataclass(slots=True)
class TestDataClass:
normal: float
_hidden: str = post_init("Test")
_post: float = post_init()
def __post_init__(self):
self._post = 3.141 * self.normal * self.normal
What I'm doing here is making _hidden
and _post
unable to be initialized as part of the given args on construction (meaning you can't set it via _hidden="hello"
) . It's possible to do this kind of thing without slots by just initializing it in __post_init__
but with slots, you can't without having to define __slots__
at the top, which looks ugly.
Is it possible to add this functionality? Or am I misreading documentation and this is already possible?
Set up a CI pipeline that runs the unit tests for the supported Python versions (3.6..3.10).
Hi,
Great stuff. Only one question:
Old way:
from dataclasses import dataclass
@DataClass
class Try:
a : str
b : str
p = Try()
While typing "Try(" visual code gives me: (a: str, b: str) -> None as a hint
When I am using dataclassy this is not presented.
Did I miss something or is there a simple solution.
Thanks in advance.
Bert
I'm not quite sure where this is happening in my application (to be able to give a MWE) but there appears to be an issue with using this library in Py3.6 (before the inclusion of dataclasses
into stdlib
):
File "python3.6/site-packages/dataclassy/decorator.py", line 40, in dataclass
return apply_metaclass(cls)
File "python3.6/site-packages/dataclassy/decorator.py", line 35, in apply_metaclass
return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
File "python3.6/site-packages/dataclassy/dataclass.py", line 120, in __new__
dict_.setdefault('__hash__', generate_hash(all_annotations))
File "python3.6/site-packages/dataclassy/dataclass.py", line 170, in generate_hash
hash_of = ', '.join(['self.__class__', *(f'self.{f}' for f, h in annotations.items() if Hashed.is_hinted(h))])
File "python3.6/site-packages/dataclassy/dataclass.py", line 170, in <genexpr>
hash_of = ', '.join(['self.__class__', *(f'self.{f}' for f, h in annotations.items() if Hashed.is_hinted(h))])
File "python3.6/site-packages/dataclassy/dataclass.py", line 29, in is_hinted
return (hasattr(hint, '__args__') and cls in hint.__args__ or
TypeError: argument of type 'NoneType' is not iterable
I think the breaking assumption is that hint.__args__
is not None
in this case.
First of all thanks for a great tool!
For some reason, this raises TypeError: super(type, obj): obj must be an instance or subtype of type
:
from dataclassy import dataclass
@dataclass
class Dummy:
def __setattr__(self, name, value):
# do some checks
super().__setattr__(name, value)
d = Dummy()
d.x = None
Code speaks more precisely than words:
In [1]: from dataclassy import dataclass
...:
...: @dataclass(slots=True)
...: class Base:
...: foo: int
...:
...: @dataclass(slots=True)
...: class Derived(Base):
...: bar: int
In [2]: Base.__slots__
Out[2]: ('foo',)
In [3]: Derived.__slots__
Out[3]: ()
In [4]: Derived(1, 2)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-24-1d764d7cf19f> in <module>
----> 1 Derived(1, 2)
<string> in __init__(self, foo, bar)
AttributeError: 'Derived' object has no attribute 'bar'
My expectation, if it isn't obvious, is that Derived.__slots__ == ("bar",)
, just as would be true of any other python derived class adding additional slots to its parent.
The desired effect can be acheived by omitting the decorator on the derived class:
In [5]: class DerivedNoDecorator(Base):
...: bar: int
In [6]: DerivedNoDecorator.__slots__
Out[6]: ('bar',)
In [7]: DerivedNoDecorator(1,2)
Out[7]: DerivedNoDecorator(foo=1, bar=2)
but this seems non-ideal. If I use the built-in dataclasses
with __slots__
, I can decorate the derived class without any issues, so it would be much clearer if dataclassy
behaved the same way:
In [1]: from dataclasses import dataclass
In [2]: @dataclass
...: class Base:
...: __slots__ = ("foo")
...: foo: int
In [3]: @dataclass
...: class Derived(Base):
...: __slots__ = ("bar")
...: bar: int
In [4]: Derived.__slots__
Out[4]: 'bar'
In [5]: Derived(1, 2)
Out[5]: Derived(foo=1, bar=2)
Of course, I also want to use default values, so I would much prefer to use dataclassy
. You'll probably not be surprised to learn that I came here via your SO answer.
I've tried to do some debugging myself, and it looks like for the Base
/Derived
example that I began with, we end up in DataClassMeta.__new__
three times, during the third of which we begin with dict_["__slots__"] == ("bar",) and bases == (Base,)
, Base
thus providing foo
to all_slots
, so at line 110, we set dict_["__slots__"] = ()
. I literally learned what a metaclass is today in order to try to figure this out, so I might be off the mark, but I think the problem is that DataClassMeta.__new__
is invoked once by the dataclass
decorator, and then once again going up the chain from Derived
-> Base
-> DataClassMeta
(-> type
), which sees the result of the first invocation and decides it doesn't need to add any new slots, only to have the result of the first invocation lost in the final result. Does that sound like a reasonable version of events?
This is a lot more nuts and bolts of python than I normally dig into, so I can't say I know what to do about it, but perhaps there's an obvious or at least feasible solution here that you can illuminate. Thanks in advance.
EDIT: I just finished the docs, and I realized that you tout the need to only apply the decorator to base classes as a feature. I think I'm inclined to agree, but it probably also explains why this issue wasn't considered during design. I'll probably go ahead and remove the decorator from subclasses in my project for now, but I still believe that this should be fixed to more closely mimic the dataclasses
API.
This issue is similar to #11, but it happens on circular references of dataclass objects rather than a single dataclass with a property that references itself.
Here's how to reproduce infinite recursion in dataclassy's __repr__
method with dataclassy==0.10.2
installed from pypi:
from dataclassy import dataclass
from typing import Optional
@dataclass
class A:
b: Optional["B"] = None
@dataclass
class B:
c: Optional["C"] = None
@dataclass
class C:
a: Optional[A] = None
a = A()
b = B()
c = C(a=a)
a.b, b.c = b, c
print(repr(a))
Causes this traceback:
...
File "dataclassy/dataclass.py", line 231, in __repr__
field_values = ', '.join(f'{f}=...' if v is self
File "dataclassy/dataclass.py", line 232, in <genexpr>
else f'{f}={v!r}' for f, v in values(self, show_internals).items())
File "dataclassy/dataclass.py", line 232, in __repr__
else f'{f}={v!r}' for f, v in values(self, show_internals).items())
File "dataclassy/functions.py", line 37, in values
return {f: getattr(dataclass, f) for f in fields(dataclass, internals)}
File "dataclassy/functions.py", line 28, in fields
assert is_dataclass(dataclass)
File "dataclassy/functions.py", line 16, in is_dataclass
return isinstance(obj, DataClassMeta) or is_dataclass_instance(obj)
File "dataclassy/functions.py", line 21, in is_dataclass_instance
return isinstance(type(obj), DataClassMeta)
RecursionError: maximum recursion depth exceeded while calling a Python object
'''
This was fixed in this commit: 0246588
Then reverted in this commit: 73ebd10
If you add recursive_repr
back then the issue goes away.
First of all, love dataclassy, it solves all of my problems with dataclasses/namedtuples and the like wonderfully :)
Only problem is lack of checking with mypy! Following demonstrates the issue:
from dataclassy import dataclass
@dataclass
class Thing:
prop: int
x = Thing(1)
$ mypy --strict foo.py
foo.py:1: error: Skipping analyzing 'dataclassy': found module but no type hints or library stubs
foo.py:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
foo.py:7: error: Too many arguments for "Thing"
Found 1 error in 1 file (checked 1 source file)
Am not sure what the solution is here. Can this be solved by simply adding a dataclassy.pyi
stub file and setting py.typed
to the module ?
I tried stubgen, which at least got the arguments for the decorator...
The following example raises a TypeError when using dataclassy.dataclass
, while it works fine with dataclasses.dataclass
.
from dataclassy import dataclass
from dataclasses import InitVar
from typing import Any
@dataclass
class Foo:
initvar: InitVar[Any]
def __post_init__(self, initvar):
pass
foo = Foo(initvar = "foo") # TypeError: Foo.__post_init__() missing 1 required positional argument: 'initvar'
Is this intended behavior?
from dataclassy import dataclass
@dataclass
class Foo:
bar = []
>>> a = Foo()
>>> b = Foo()
>>> a.bar.append(1)
>>> b.bar
[1]
I get the same results on the last two stable releases (0.7.0
and 0.6.2
). If I'm reading correctly, the README says it's not intended:
A shallow copy will be created for mutable arguments (defined as those defining a copy method). This means that default field values that are mutable (e.g. a list) will not be mutated between instances.
Hi,
Having a rather subtle issue with properties in inherited classes more than 2 levels deep:
from dataclassy import dataclass
@dataclass(slots=True)
class Foo(object):
foo_prop: str = "foo_prop_value"
class Bar(Foo):
bar_prop: str = "bar_prop_value"
class FooBar(Bar):
pass
get_foo_prop = Foo.foo_prop
get_bar_prop = Bar.bar_prop
print("get bar prop from Bar: ", get_bar_prop.__get__(Bar()))
print("get bar prop from FooBar: ", get_bar_prop.__get__(FooBar()))
print()
print("get foo prop from Foo: ", getattr(Foo(), get_foo_prop.__name__))
print("get foo prop from Bar: ", getattr(Bar(), get_foo_prop.__name__))
print("get foo prop from FooBar: ", getattr(FooBar(), get_foo_prop.__name__))
print()
print("get foo prop from Foo: ", get_foo_prop.__get__(Foo()))
print("get foo prop from Bar: ", get_foo_prop.__get__(Bar()))
print("get foo prop from FooBar: ", get_foo_prop.__get__(FooBar()))
output:
get bar prop from Bar: bar_prop_value
get bar prop from FooBar: bar_prop_value
get foo prop from Foo: foo_prop_value
get foo prop from Bar: foo_prop_value
get foo prop from FooBar: foo_prop_value
get foo prop from Foo: foo_prop_value
get foo prop from Bar: foo_prop_value
Traceback (most recent call last):
File "test.py", line 29, in <module>
print("get foo prop from FooBar: ", get_foo_prop.__get__(FooBar()))
AttributeError: foo_prop
It's odd that get causes an AttributeError on FooBar (2 levels of inheritance) but getattr
works fine. I'd expect both to work.
Worth noting this also breaks dataclasses, but seems like something we should fix.
>>> from typing import Optional
>>> import dataclassy as dataclass
>>> @dataclass.dataclass
... class Node:
... value: str
... linked_node: Optional[Node] = None
...
>>> n = Node("n")
>>> n.linked_node = n
>>> n
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
...
RecursionError: maximum recursion depth exceeded while calling a Python object
A stdlib tool to avoid this is reprlib.recursive_repr
.
Multiple inheritance works, but we need tests for this.
As of version 0.7.0 it looks like inheritance no longer works the way it used to. I assume this is a bug as there was no breaking change documented, but lmk if I'm wrong.
Basically this used to work:
@dataclass
class Derps(object):
name: str = ""
class Terps(Derps):
_test: int = 0
def __init__(self, test: int):
self._test = test
Terps(test=3)
But as of version 0.7.0 throws:
Terps(test=3)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'test'
I suspect this is in relation to the post_init changes. If I get time tomorrow/on the weekend I'll have a look, but you may have a better idea than me :)
Great work on this lib so far. Much more intuitive than dataclasses and even solves some bugs (e.g in dataclasses you can't define a post_init parameter with the same name as a method on the class).
The only issue I've run into so far is that if you specify init=False
but you have a custom __init__
, you can't create your class with positional arguments, and you have to always specify keyword arguments.
Seems to me that we should just pass all arguments straight through to the user defined __init__
when init=False
, and just call __new__
with no args.
I've done a PR (#9) that fixes this issue I think without breaking any existing usage, but keen to get your thoughts.
I'm comparing this package to the standard dataclasses.
In this case, the __init__
method will only accepts the wheels
argument:
from dataclasses import dataclass, field
@dataclass(eq=False)
class Vehicle:
wheels: int
_wheels: int = field(init=False, repr=False)
However, the equivalent with dataclassy
:
from dataclassy import dataclass
@dataclass(eq=False)
class Vehicle:
wheels: int
_wheels: int = None
will accept both wheels
and _wheels
as arguments. Is this intended?
Per the docs (emphasis mine),
If true (the default), generate an eq method that compares this data class to another of the same type as if they were tuples created by as_tuple.
However, it appears that internals are ignored, just as they are in __repr__
when hide_internals=True
:
In [1]: from dataclassy import dataclass, as_tuple
In [2]: from typing import Optional
In [3]: @dataclass
...: class Foo():
...: _private: Optional[int] = None
...:
In [4]: Foo() == Foo(5)
Out[4]: True
In [5]: as_tuple(Foo()) == as_tuple(Foo(5))
Out[5]: False
In [6]: @dataclass
...: class Bar():
...: public: Optional[int] = None
...:
In [7]: Bar() == Bar(5)
Out[7]: False
This may have been intentional, but it is also surprising, given that
In [1]: from dataclasses import dataclass
In [2]: from typing import Optional
In [3]: @dataclass
...: class Foo():
...: _private: Optional[int] = None
...:
In [4]: Foo() == Foo(5)
Out[4]: False
and especially given that I can do this:
In [1]: from dataclassy import dataclass
In [2]: from typing import Optional
In [3]: @dataclass(hide_internals=False)
...: class Baz():
...: _private: Optional[int] = None
...:
In [4]: Baz() == Baz(5)
Out[4]: True
In [5]: repr(Baz()) == repr(Baz(5))
Out[5]: False
which just feels inconsistent to me (__repr__
may choose to hide some internal state, so (a == b) == False
but (repr(a) == repr(b)) == True
seems fine, but not the inverse, as above).
As for the why, it appears that at dataclass.py:147
, inside DataClassMeta.__init__
, you decline to pass internals
to fields
, so the default of False
is used when creating the expression for __tuple__
, which is of course what is used inside of __eq__
.
tuple_expr = ', '.join((*(f'self.{f}' for f in fields(cls)), '')) # '' ensures closing comma
cls.__tuple__ = property(eval(f'lambda self: ({tuple_expr})'))
def fields(dataclass: Type[DataClass], internals=False) -> Dict[str, Type]:
def __eq__(self: DataClass, other: DataClass):
return type(self) is type(other) and self.__tuple__ == other.__tuple__
However, as_tuple
has no concept of internals and in general functions quite differently.
I see two ways to fix this:
internals=False
to fields
at dataclass.py:147
, or make it configurable with a decorator param, something like consider_internals_eq=True
. I think this should be True
by default so as to not be obscurely inconsistent with dataclasses
. I don't think this should be tied to hide_internals
, since one may not want internals returned from __repr__
but still want them considered in __eq__
.as_tuple
in __eq__
. This seems right because it unifies two things that seem intuitively the same. However, your comments mention "efficient representation for internal methods" re: __tuple__
. I'm not sure offhand how much more efficient evaluating a static expression is v. recursing through a structure is, but if the difference is significant, maybe unification isn't the way to go.Please let me know which you prefer and why.
First of all, thanks for this great library. I like how it makes data classes easier than the built-in dataclasses
, and especially the support for __slots__
.
While switching my framework to dataclassy
, I've hit one problem that I cannot express in code properly:
How can I declare pseudo-positional InitVars?
Here is the equivalent code for dataclasses
:
import dataclasses
@dataclasses.dataclass()
class Selector:
arg1: dataclasses.InitVar[Union[None, str, Marker]] = None
arg2: dataclasses.InitVar[Union[None, str, Marker]] = None
arg3: dataclasses.InitVar[Union[None, str, Marker]] = None
argN: dataclasses.InitVar[None] = None # a runtime guard against too many positional arguments
group: Optional[str] = None
version: Optional[str] = None
plural: Optional[str] = None
# ... more things here
def __post_init__(
self,
arg1: Union[None, str, Marker],
arg2: Union[None, str, Marker],
arg3: Union[None, str, Marker],
argN: None, # a runtime guard against too many positional arguments
) -> None:
...
The supposed use-case is:
# All notations are equivalent and must create exactly the same objects:
CRDS = Selector('apiextensions.k8s.io', 'customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', plural='customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', None, plural='customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', version=None, plural='customresourcedefinitions')
CRDS = Selector(group='apiextensions.k8s.io', version=None, plural='customresourcedefinitions')
I.e., it is either explicitly specifying the kwargs to be stored on the data class, or passing them as positional (pseudo-positional) init-vars. The positional init-vars "arg1..arg3" are then interpreted in the post-init method to be stored as one of the group
/version
/plural
/etc fields as it seems appropriate. In some cases, it might even parse and split positional init-args into several fields: e.g. Selector('apiextensions.k8s.io/v1')
would be split to Selector(group='apiextensions.k8s.io', version='v1')
.
The details are not essential, the only essential part here is the post-init contains "some logic" for converting these pseudo-positional arg1..argN
into the actual useful storeable fields.
For the full example:
When I try to do this the dataclassy
-way, and remove the InitVar[]
declarations, the positional arguments go to the first fields: e.g., group
& version
for the first example line, while the intention is to interpret them as group
& plural
(as it would be implemented in the post-init function) โ which expectedly gives wrong results.
If I keep the arg1..argN
fields in the top of the list of fields, they are accepted as needed but are stored on the object as the same-named fields. I can make them internal and hide them from reprs, but I would prefer to not store them at all and keep them as init-only.
What would be the best way to implement the positional init-only variables with dataclassy?
Thank you in advance.
from dataclassy import dataclass
@dataclass
class Base:
a: int = 0
@dataclass
class Inherit(Base):
b: int = 0
def __post_init__(self) -> None:
pass
inst = Inherit()
Traceback (most recent call last):
File "test.py", line 17, in <module>
inst = Inherit()
File "<string>", line 4, in __init__
File "<string>", line 4, in __init__
File "<string>", line 4, in __init__
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.