biqqles / dataclassy Goto Github PK

View Code? Open in Web Editor NEW

80.0 3.0 8.0 180 KB

A fast and flexible reimplementation of data classes

Home Page: https://pypi.org/project/dataclassy

License: Mozilla Public License 2.0

Python 100.00%

metaprogramming library dataclasses

dataclassy's People

Contributors

Stargazers

Watchers

Forkers

kgreav cameronccohen hroskes giannitedesco fubuloubu thisisrandy 185504a9 arpitjain799

dataclassy's Issues

Tests: make signature checks version-agnostic

inspect.Signature is irritating to use, and so we convert it to a string to check function signatures against their expected value. The problem with this is that Python periodically changes the behaviour of Signature.__str__. This results in spurious test failures (the tests are currently set up for the format produced in 3.7..3.9).

This blocks #41.

New mypy error with 0.11.0: Cannot instantiate type "Type[<nothing>]"

We ran into this mypy typing error below when upgrading from 0.10.3 to 0.11.0.

Here are the steps to reproduce:

x.py:

import dataclassy

@dataclassy.dataclass(slots=True, kwargs=True, repr=True)
class Foo:
    bar: str = "bar"

if __name__ == "__main__":
    x = Foo()
    print(x.bar)

$ python3 -mvenv ~/env3
$ ~/env3/bin/pip install mypy dataclassy==0.10.3
Collecting mypy
  Using cached mypy-0.910-cp38-cp38-manylinux2010_x86_64.whl (22.8 MB)
Collecting dataclassy==0.10.3
  Using cached dataclassy-0.10.3-py3-none-any.whl (23 kB)
Collecting typing-extensions>=3.7.4
  Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting mypy-extensions<0.5.0,>=0.4.3
  Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
Installing collected packages: typing-extensions, toml, mypy-extensions, mypy, dataclassy
Successfully installed dataclassy-0.10.3 mypy-0.910 mypy-extensions-0.4.3 toml-0.10.2 typing-extensions-3.10.0.2

$ ~/env3/bin/python x.py
bar
$ ~/env3/bin/mypy x.py
Success: no issues found in 1 source file

$ ~/env3/bin/pip install --upgrade dataclassy==0.11.0
Collecting dataclassy==0.11.0
  Using cached dataclassy-0.11.0-py3-none-any.whl (23 kB)
Installing collected packages: dataclassy
  Attempting uninstall: dataclassy
    Found existing installation: dataclassy 0.10.3
    Uninstalling dataclassy-0.10.3:
      Successfully uninstalled dataclassy-0.10.3
Successfully installed dataclassy-0.11.0
$ ~/env3/bin/python x.py
bar
$ ~/env3/bin/mypy x.py
x.py:4: error: Cannot instantiate type "Type[<nothing>]"
Found 1 error in 1 file (checked 1 source file)

Metaclass is too inclusive with new class dict

DataClassMeta currently collects the contents of the inherited classes' dicts and passes this as the dictionary of the new class to type.__new__. This can prevent type.__new__ from doing method inheritance "correctly" and is the reason this test fails.

Custom init silences duplicate arguments

Following on from the example in #6, this currently works: s = State(7, path=1).

s.path is 1, but that's not obvious, and it would normally cause TypeError: __new__() got multiple values for argument 'path' as it does if __init__ is not defined.

The solution to this could be making all the arguments in __new__ keyword only, then in __call__ converting positional arguments destined for it into keyword arguments. Removing *args from __new__ means that duplicated positional arguments will not be silently passed back to __call__ and will instead raise the normal exception. Since __new__ is (normally) never called by the user directly, this would also be fully backwards compatible.

__dataclass_fields__ is missing

I just found dataclassy and it looks really useful! I wanted to test it together with https://github.com/s-knibbs/dataclasses-jsonschema which I'm using in one of my projects, but got this:

>>> from dataclassy import dataclass
>>> from dataclasses_jsonschema import JsonSchemaMixin
>>> @dataclass
... class Item(JsonSchemaMixin):
...     id: str = ""
... 
>>> it=Item()
>>> it.to_json()
Traceback (most recent call last):
  File "/usr/lib/python3.8/dataclasses.py", line 1031, in fields
    fields = getattr(class_or_instance, _FIELDS)
AttributeError: type object 'Item' has no attribute '__dataclass_fields__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 861, in to_json
    return json.dumps(self.to_dict(omit_none, validate), **json_kwargs)
  File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 389, in to_dict
    for f in self._get_fields():
  File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 380, in _get_fields
    cls.__mapped_fields = _get_fields_uncached()
  File "/home/zdenal/arcor2_repos/arcor2/build-support/.venv/lib/python3.8/site-packages/dataclasses_jsonschema/__init__.py", line 353, in _get_fields_uncached
    for f in fields(cls):
  File "/usr/lib/python3.8/dataclasses.py", line 1033, in fields
    raise TypeError('must be called with a dataclass type or instance')
TypeError: must be called with a dataclass type or instance

Would it be possible to add/emulate __dataclass_fields__ to make these two compatible?

Metadata fields? (e.g., SQLAlchemy)

Is there a way to add metadata to the fields? For instance, with the buildin dataclasses module, I use

    title: Optional[str] = field(default=None, metadata={"sa": Column(String)})
    url: Optional[str] = field(default=None, metadata={"sa": Column(String)})

The metadata dict allows SQLAlchemy to read the generated dataclass's annotations and map it automatically. Dataclassy doesn't seem to have the field generator function, so is it possible to attach metadata?

Compatibility with libraries dependent on dataclasses

How do I combine dataclassy's dataclass with dataclass-property's dataclass or marshmallow_dataclass with marshmallow.validate? I need sub classing as well as validation. I think there's no other way than to write normal classes with property getters and setters. All my dataclasses have one ABC parent

Add __post_init__ for additional compatibility?

I have existing code that uses dataclasses with __post_init__. It would be great if dataclassy would allow me to continue using this function without having to refactor all of my old dataclasses.

__post_init__ has the added benefit of allowing me to skip redefining the parameter names a second time, e.g.

Using built-in dataclasses

@dataclass
class SolverState:
    possible: Tuple[FrozenSet[str], ...] = ()
    switches: Tuple[str, ...] = ()
    path: Tuple[bool, ...] = ()
    roles: Dict[str, int] = field(default_factory=dict)
    count_true: int = -1

    def __post_init__(self) -> None:
        if not self.possible:
            self.possible = (5,)
        if self.count_true == -1:
            self.count_true = self.path.count(True)
        if not self.roles:
            self.roles = self._roles_init()

Using dataclassy

@dataclass(slots=True)
class SolverState:
    possible: Tuple[FrozenSet[str], ...] = ()
    switches: Tuple[str, ...] = ()
    path: Tuple[bool, ...] = ()
    roles: Dict[str, int] = {}
    count_true: int = -1

    def __init__(
        self,
        possible: Tuple[FrozenSet[str], ...] = (),
        switches: Tuple[str, ...] = (),
        path: Tuple[bool, ...] = (),
        roles: Dict[str, int] = {},
        count_true: int = -1,
    ) -> None:
        if not self.possible:
            self.possible = (5,)
        if self.count_true == -1:
            self.count_true = self.path.count(True)
        if not self.roles:
            self.roles = self._roles_init()

Removing support for post-initialisation logic in `init`

Back when dataclassy used __new__ to set class fields (pre-v0.7), __init__ was left free for the user to use for post-initialisation logic. This worked well, but since v0.7, dataclassy has used __init__ for its own purposes. To retain compatibility, an __init__ method defined on a class where init=True is aliased to __post_init__. This is rather confusing and raises a lot of uncertain edge cases:

what happens if both __init__ and __post_init__ are defined?
what happens if one is defined or redefined later in the class hierarchy?
and likely many more (e.g. #43)

In addition, strictly speaking these theoretical subtle differences in behaviour mean that test cases should test the same scenarios for both method names.

Finally, judging by the code examples in the issues here, no one really uses it. Therefore I now think this compatibility should be removed. I expect this to be the last breaking change before a stable release (and there haven't been many!).

My plan is to:

Raise a DeprecationWarning in the next release (v0.9)
Remove the feature the next major release after that (v0.10)
Set __init__ equal to __post_init__ if the former is not generated (init=False or no fields) and the latter is defined. This is the only reason not to use __post_init__ currently, and fixes another bug in dataclasses.

Implementing hash properly

The current __hash__ implementation is not good. It simply filters the fields of the instance it's being applied to, ignores fields that aren't hashable and creates a tuple which it then hashes. This is slow and also provides no warning if a field that should be hashable is not, potentially causing hash collisions. The fact that no one has complained so far is likely testament to the fact that it's almost always better to write a __hash__ manually anyway. But this should be improved.

There are at least two ways to do this:

Running typing.get_type_hints at class initialisation time, to work out which fields are hashable. This is slow and wouldn't give the flexibility of being able to determine which fields to include in the hash
Adding a new "type wrapper", like Internal, called Hashed which would include the field in hash. I think this is better. The only caveat is that I'm pretty sure this breaks tools like mypy. Maybe there is a way to tell it to treat the "wrapped" type as the significant one, like with Optional.

Inheriting from a dataclass using ABCMeta causes `AttributeError`

Still debugging, but it appears to be caused by this line in this commit (which is present starting in v0.10.1)

0f34238#diff-daf1e1a0a18a32d8c50faa743257d9e130ac95adeab1fddb2932525d147c7520R80

Error:

ImportError while loading conftest '~/work/ApeWorX/ape/tests/conftest.py'.
...
src/ape_accounts/accounts.py:34: in <module>
    class KeyfileAccount(AccountAPI):
~/.venv/python3.8/site-packages/dataclassy/dataclass.py:80: in __new__
    all_attrs = {a for b in bases for a in dir(b) if is_user_func(getattr(b, a))} | dict_.keys()
~/.venv/python3.8/site-packages/dataclassy/dataclass.py:80: in <setcomp>
    all_attrs = {a for b in bases for a in dir(b) if is_user_func(getattr(b, a))} | dict_.keys()
E   AttributeError: __abstractmethods__

I'm probably using it incorrectly, would be open for a better way to create subclass-able dataclass classes with abstract methods. It's a pretty integral part of how we're using it for our plugin system.

For reference, AccountAPI is a dataclassy.dataclass using abc.ABCMeta abstract metaclass and defining some abc.abstractmethods. Can check it out here:
https://github.com/ApeWorX/ape/blob/687032e0b122a46fc73184fd7a0bea18e7b1a383/src/ape/api/accounts.py#L62

pass custom metaclass to dataclass

Is there a proposed way to define a dataclass with a custom metaclass? It seems like it would have to inherit from DataClassMeta and the signature of apply_metaclass takes the metaclass as a kwarg but the dataclass decorator doesn't pass along an option to override that. Currently, this is ignored:

@dataclass
class DataClass(metaclass=CustomMetaClass):
    prop: str

Question - How does this play with dacite?

This project looks great!

I've been heavily using dacite - in particular, the from_dict method to initialise a nested data class from a nested dictionary.

Have you come across this library? What would be required for these to work together?

Instantiation performance vs. dataclasses

Thanks for creating dataclassy! In terms of features it does everything I want, but I'm also very sensitive to performance since I'm working on a project that constructs hundreds of thousands of small objects in a tight loop.

I've observed roughly a 33% slowdown for small object creation between dataclasses and dataclassy in CPython 3.9.0 (benchmark below). Do you have any thoughts on this? How much of a priority is performance for dataclassy? Would you welcome a PR that closed the performance gap at the cost of significantly longer or more complex code?

from timeit import timeit
import dataclasses
import dataclassy

@dataclasses.dataclass
class Foo1:
    __slots__ = 'x', 'y'
    x: int
    y: int

@dataclassy.dataclass(slots=True)
class Foo2:
    x: int
    y: int

def f1():
    Foo1(1, 2)

def f2():
    Foo2(1, 2)

print(timeit(f1, number=1000000))
print(timeit(f2, number=1000000))

Add `internals` parameter to `as_dict`/`as_tuple`

Simple example

from dataclassy import dataclass, as_dict

@dataclass
class MyClass:
    _my_internal_tracking_field: str = "Parent Class"
   my_other_field: str

>>> my_instance = MyClass(my_other_field="test")
>>> my_instance.__repr__
<bound method __repr__ of MyClass(my_other_field='test')>
>>> as_dict(my_instance)
{'_my_internal_tracking_field': 'Parent Class', 'my_other_field': 'test'}

From the README, hide_internals is True by default but seems to only apply to:

__repr__ and excludes them from comparison and iteration

Is there any case where we would want to definitely want to expose internal fields in the dictionary representation?

I see internals=True by default here in functions.py, maybe it should pass the hide_internals option instead of setting True automatically.
https://github.com/biqqles/dataclassy/blob/master/dataclassy/functions.py#L65-L69

I don't mind making this change but just want to see if I am missing any use cases.

Type mismatch in strict mode on deferred field initialisation

As suggested in the README, the non-copyable fields are populated in the constructor:

import asyncio
import dataclassy

@dataclassy.dataclass(slots=True)
class C:
    cnd: asyncio.Condition = None
    evt: asyncio.Event = None

    def __init__(self) -> None:
        self.cnd = asyncio.Condition()
        self.evt = asyncio.Event()

c = C()
print(repr(c.cnd))
print(repr(c.evt))

In that case, the strict mypy typing fails:

$ python --version
Python 3.9.1

$ mypy --version
mypy 0.800

$ mypy --strict _dcy_late_init.py 
_dcy_late_init.py:6: error: Incompatible types in assignment (expression has type "None", variable has type "Condition")
_dcy_late_init.py:7: error: Incompatible types in assignment (expression has type "None", variable has type "Event")
_dcy_late_init.py:10: error: Property "cnd" defined in "C" is read-only
_dcy_late_init.py:11: error: Property "evt" defined in "C" is read-only

# mypy.ini
[mypy]
warn_unused_configs = True
ignore_missing_imports = True
plugins = dataclassy.mypy

dataclassy is installed from git as of 1273387.

Actually, two issues at once:

The read-only fields — I don't know what does it mean.
The type mismatch: assigning None to a non-optional field. Making it Optional[...] it not desired, as it requires changing the logic all around the code (or mypy will complain on accessing fields of None all around).

Is there any other way of initialising the non-copyable objects (a functional equivalent of dataclasses' default_factory)?

I would suggest auto-instantiating the objects if the default value is a class. The cases when a class is actually a value to be stored can be detected by having a field annotation Type[...] — in that case, no instantiation should happen. But I am not sure if this covers all the use-cases and does not break anything.

RFC: using `call` instead of `new`

So it's that time of year when I finally have a bit more time to experiment with my projects. I've been thinking about the uninspiring performance when (and only when) a custom __init__ is used, as @TylerYep pointed out in #6 (comment). This arises because the current __call__ implementation used is generic and dynamically modifies the parameters to either __new__ or __init__ - what would be really good is to generate a static __call__ that does this with no overhead. To keep performance good and code simple, this method would both initialise the instance with its parameters and redirect any additional ones to __init__ if it's defined. However, because an object's __call__ method has to be defined on its type (i.e. in the case of a class its metaclass), this means we have to dynamically create a subclass of the metaclass at the time of decorator use.

I've implemented this, and though it works surprisingly well in my other tests, it breaking multiple inheritance (highlighted below) means it's nowhere near ready for rolling out in releases yet. Despite this, I'm curious to see how well it works in the code of others, if you would like to test it. The code is in the branch static-call.

Advantages

As-fast-as-possible initialiser performance
Even simpler code in dataclassy since it only has to work with one method (__call__), not __call__, __new__ and __signature__
The current use of __new__ could be considered hacky as it's not "supposed" to be used to initialise the class instance

Disadvantages

You (the user) can no longer call __new__ on a data class to instantiate it without __init__ being executed. I'm not sure if this is actually useful (or a good idea to do), but it is a feature nonetheless
Say class B is a subclass of A. With this method, type(A) is not type(B) which is unusual and surprising. Only issubclass(type(B), type(A)) is true. However, Python supports this "dynamic metaclass" paradigm fine. Besides looking odd (and how often do you compare the types of classes, really!) I've found no side effects other than...
Multiple inheritance now becomes ugly. Whereas before it worked perfectly (e.g. class C(A, B), now you have to do something like

class CMeta(type(A), type(B)):
    pass

@dataclass(meta=CMeta)
class C:
    ...

Error with removed init parameters

Hi again! I'm getting the following error (simplified for your convenience):

@dataclass
class State:
    path: Tuple[int, ...] = 1

    def __init__(self):
        pass

raises_error = State(7)

The error message:

cls = <class 'tests.State'>, args = (7,), kwargs = {}, instance = State(path=1)

    def __call__(cls, *args, **kwargs):
        instance = cls.__new__(cls, *args, **kwargs)
    
        args = args[cls.__new__.__code__.co_argcount - 1:]  # -1 for 'cls'
        for parameter in kwargs.keys() & cls.__annotations__.keys():
            del kwargs[parameter]
    
>       instance.__init__(*args, **kwargs)
E       TypeError: __init__() takes 1 positional argument but 2 were given

dataclassy/dataclass.py:111: TypeError

The code works fine when I add the keyword State(path=7) or remove the default or remove the __init__(self) method.

Bug when mixing metaclasses w/ `dataclassy` (Python 3.6 only)

I found a bug when I attempt to mix metaclasses w/ dataclassy. Here is what I was doing:

@abstractdataclass
class ConverterAPI(Generic[ABIType]):
    ...

    @abstractmethod
    def convert(self, value: str) -> ABIType:
        ...

Generic has nothing to do with dataclassy' mechanics, but I could see how the metaclass might interfere.

This is the error it raises in Python 3.6 (from my CI):

...
    class ConverterAPI(Generic[ABIType]):
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/decorator.py:40: in dataclass
    return apply_metaclass(cls)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/decorator.py:35: in apply_metaclass
    return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/dataclassy/dataclass.py:137: in __new__
    return super().__new__(mcs, name, bases, dict_)
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/abc.py:133: in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
E   TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

For reference, here is the code that defines the abstractdataclass decorator:

from abc import ABCMeta, abstractmethod
from functools import partial
from dataclassy import dataclass
from dataclassy.dataclass import DataClassMeta

class AbstractDataClassMeta(DataClassMeta, ABCMeta):
    pass

abstractdataclass = partial(dataclass, kwargs=True, meta=AbstractDataClassMeta)

It works perfectly fine with Python 3.7+, I think related to this:
https://stackoverflow.com/questions/11276037/resolving-metaclass-conflicts

Metaclass Error

I encountered this issue while testing dataclassy. From reading the documentation, it looks like I should pass something into the meta= parameter to dataclasses? I'm not quite sure how to fix this. If this is easily solvable, maybe you could edit to the error message to explain what to do.

This was while testing the performance branch, but I suspect it applies to the current main branch as well.

from typing import Any, Generic, Mapping, TypeVar
from dataclassy import dataclass
V = TypeVar("V")

@dataclass
class Edge(Generic[V], Mapping[str, Any]):
    start: V
    end: V
    weight: float

or a simpler version:

from typing import Mapping
from dataclassy import dataclass
@dataclass
class Edge(Mapping[str, float]):
    weight: float

Running this code gives:

Traceback (most recent call last):
  File "/Users/tyler.yep/Documents/Github/workshop/explore/ff.py", line 8, in <module>
    class Edge( Mapping[str, Any]):
  File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/decorator.py", line 25, in dataclass
    return apply_metaclass(cls)
  File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/decorator.py", line 20, in apply_metaclass
    return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
  File "/Users/tyler.yep/Documents/Github/dataclassy/dataclassy/dataclass.py", line 96, in __new__
    return type.__new__(mcs, name, bases, dict_)
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Thanks!

How would you provide a no-argument called to manufacture default values

In the docs, it stats that a copy is made of data items provided as defaults. But in core dataclasses, you can also supply a default_factory. How would you do this in dataclassy?

Method inheritance broken after v0.7.2

Commit 5a2d1a5, included in release v0.7.3, somehow introduced a large performance penalty to class initialisation in some cases. Testing this function with timeit:

python -m timeit -s "import flint; flint.paths.set_install_path('...')" -n1 -r1 "flint.routines.get_markets()"

The runtime goes from ~1.2 seconds to ~5 seconds.

Will update this as I learn more.

Tagging @hroskes as this is related to a change you suggested.

Python 3.10 pattern matching

Classes should support the PEP 622 Structural Pattern Matching feature of Python 3.10. From reading the PEP it seems this should be rather simple, simply adding a __match_args__ tuple which is the same as the arguments to __init__.

Short-cut for the factory function (Feature request)

First of all, great work!

I wondered if the verbosity of the code could be further reduced by allowing the factory function to be used without arguments. In that case, the argument to factory should be derived from the type annotation. E.g.

@dataclass(kw_only=True)
class GenerateCodeFn(Entity):
    post_process_source_code_fn: PostProcessSourceCodeFn = factory

Of course, this is only a minor improvement, but still potentially a useful one.

By the way, it seems that the following alternative has a drawback (for me), since vscode fails to detect the type of post_process_source_code_fn and therefore symbol lookup doesn't work.

@dataclass(kw_only=True)
class GenerateCodeFn(Entity):
    post_process_source_code_fn = factory(PostProcessSourceCodeFn)

No "post-init only" marker

I want to use slots in CPython using dataclassy, but I also want to be able to hide initializers for certain variables from the user. For example:

@dataclass(slots=True)
class TestDataClass:
    normal: float
    _hidden: str = post_init("Test") 
    _post: float = post_init()

    def __post_init__(self):
        self._post = 3.141 * self.normal * self.normal

What I'm doing here is making _hidden and _post unable to be initialized as part of the given args on construction (meaning you can't set it via _hidden="hello") . It's possible to do this kind of thing without slots by just initializing it in __post_init__ but with slots, you can't without having to define __slots__ at the top, which looks ugly.

Is it possible to add this functionality? Or am I misreading documentation and this is already possible?

Add Python 3.10 to CI

Set up a CI pipeline that runs the unit tests for the supported Python versions (3.6..3.10).

Code Completion in Visual Code missing

Hi,

Great stuff. Only one question:

Old way:
from dataclasses import dataclass

@DataClass
class Try:
a : str
b : str

p = Try()

While typing "Try(" visual code gives me: (a: str, b: str) -> None as a hint

When I am using dataclassy this is not presented.

Did I miss something or is there a simple solution.

Thanks in advance.

Bert

TypeError: argument of type 'NoneType' is not iterable

I'm not quite sure where this is happening in my application (to be able to give a MWE) but there appears to be an issue with using this library in Py3.6 (before the inclusion of dataclasses into stdlib):

  File "python3.6/site-packages/dataclassy/decorator.py", line 40, in dataclass
    return apply_metaclass(cls)
  File "python3.6/site-packages/dataclassy/decorator.py", line 35, in apply_metaclass
    return metaclass(to_class.__name__, to_class.__bases__, dict_, **options)
  File "python3.6/site-packages/dataclassy/dataclass.py", line 120, in __new__
    dict_.setdefault('__hash__', generate_hash(all_annotations))
  File "python3.6/site-packages/dataclassy/dataclass.py", line 170, in generate_hash
    hash_of = ', '.join(['self.__class__', *(f'self.{f}' for f, h in annotations.items() if Hashed.is_hinted(h))])
  File "python3.6/site-packages/dataclassy/dataclass.py", line 170, in <genexpr>
    hash_of = ', '.join(['self.__class__', *(f'self.{f}' for f, h in annotations.items() if Hashed.is_hinted(h))])
  File "python3.6/site-packages/dataclassy/dataclass.py", line 29, in is_hinted
    return (hasattr(hint, '__args__') and cls in hint.__args__ or
TypeError: argument of type 'NoneType' is not iterable

I think the breaking assumption is that hint.__args__ is not None in this case.

Use of super() in base class raises TypeError

First of all thanks for a great tool!

For some reason, this raises TypeError: super(type, obj): obj must be an instance or subtype of type:

from dataclassy import dataclass

@dataclass
class Dummy:
    def __setattr__(self, name, value):
        # do some checks
        super().__setattr__(name, value)

d = Dummy()
d.x = None

Derived classes with @dataclass(slots=True) generate empty slots

Code speaks more precisely than words:

In [1]: from dataclassy import dataclass
   ...: 
   ...: @dataclass(slots=True)
   ...: class Base:
   ...:     foo: int
   ...: 
   ...: @dataclass(slots=True)
   ...: class Derived(Base):
   ...:     bar: int

In [2]: Base.__slots__
Out[2]: ('foo',)

In [3]: Derived.__slots__
Out[3]: ()

In [4]: Derived(1, 2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-1d764d7cf19f> in <module>
----> 1 Derived(1, 2)

<string> in __init__(self, foo, bar)

AttributeError: 'Derived' object has no attribute 'bar'

My expectation, if it isn't obvious, is that Derived.__slots__ == ("bar",), just as would be true of any other python derived class adding additional slots to its parent.

The desired effect can be acheived by omitting the decorator on the derived class:

In [5]: class DerivedNoDecorator(Base):
   ...:     bar: int

In [6]: DerivedNoDecorator.__slots__
Out[6]: ('bar',)

In [7]: DerivedNoDecorator(1,2)
Out[7]: DerivedNoDecorator(foo=1, bar=2)

but this seems non-ideal. If I use the built-in dataclasses with __slots__, I can decorate the derived class without any issues, so it would be much clearer if dataclassy behaved the same way:

In [1]: from dataclasses import dataclass

In [2]: @dataclass
   ...: class Base:
   ...:     __slots__ = ("foo")
   ...:     foo: int

In [3]: @dataclass
   ...: class Derived(Base):
   ...:     __slots__ = ("bar")
   ...:     bar: int

In [4]: Derived.__slots__
Out[4]: 'bar'

In [5]: Derived(1, 2)
Out[5]: Derived(foo=1, bar=2)

Of course, I also want to use default values, so I would much prefer to use dataclassy. You'll probably not be surprised to learn that I came here via your SO answer.

I've tried to do some debugging myself, and it looks like for the Base/Derived example that I began with, we end up in DataClassMeta.__new__ three times, during the third of which we begin with dict_["__slots__"] == ("bar",) and bases == (Base,), Base thus providing foo to all_slots, so at line 110, we set dict_["__slots__"] = (). I literally learned what a metaclass is today in order to try to figure this out, so I might be off the mark, but I think the problem is that DataClassMeta.__new__ is invoked once by the dataclass decorator, and then once again going up the chain from Derived -> Base -> DataClassMeta (-> type), which sees the result of the first invocation and decides it doesn't need to add any new slots, only to have the result of the first invocation lost in the final result. Does that sound like a reasonable version of events?

This is a lot more nuts and bolts of python than I normally dig into, so I can't say I know what to do about it, but perhaps there's an obvious or at least feasible solution here that you can illuminate. Thanks in advance.

EDIT: I just finished the docs, and I realized that you tout the need to only apply the decorator to base classes as a feature. I think I'm inclined to agree, but it probably also explains why this issue wasn't considered during design. I'll probably go ahead and remove the decorator from subclasses in my project for now, but I still believe that this should be fixed to more closely mimic the dataclasses API.

Circular references to dataclasses cause infinite recursion in repr

This issue is similar to #11, but it happens on circular references of dataclass objects rather than a single dataclass with a property that references itself.

Here's how to reproduce infinite recursion in dataclassy's __repr__ method with dataclassy==0.10.2 installed from pypi:

from dataclassy import dataclass
from typing import Optional

@dataclass
class A:
    b: Optional["B"] = None

@dataclass
class B:
    c: Optional["C"] = None

@dataclass
class C:
    a: Optional[A] = None

a = A()
b = B()
c = C(a=a)
a.b, b.c = b, c
print(repr(a))

Causes this traceback:

...
  File "dataclassy/dataclass.py", line 231, in __repr__
    field_values = ', '.join(f'{f}=...' if v is self
  File "dataclassy/dataclass.py", line 232, in <genexpr>
    else f'{f}={v!r}' for f, v in values(self, show_internals).items())
  File "dataclassy/dataclass.py", line 232, in __repr__
    else f'{f}={v!r}' for f, v in values(self, show_internals).items())
  File "dataclassy/functions.py", line 37, in values
    return {f: getattr(dataclass, f) for f in fields(dataclass, internals)}
  File "dataclassy/functions.py", line 28, in fields
    assert is_dataclass(dataclass)
  File "dataclassy/functions.py", line 16, in is_dataclass
    return isinstance(obj, DataClassMeta) or is_dataclass_instance(obj)
  File "dataclassy/functions.py", line 21, in is_dataclass_instance
    return isinstance(type(obj), DataClassMeta)
RecursionError: maximum recursion depth exceeded while calling a Python object
'''

This was fixed in this commit: 0246588

Then reverted in this commit: 73ebd10

If you add recursive_repr back then the issue goes away.

Mypy support

First of all, love dataclassy, it solves all of my problems with dataclasses/namedtuples and the like wonderfully :)

Only problem is lack of checking with mypy! Following demonstrates the issue:

from dataclassy import dataclass

@dataclass
class Thing:
    prop: int

x = Thing(1)

$ mypy --strict foo.py
foo.py:1: error: Skipping analyzing 'dataclassy': found module but no type hints or library stubs
foo.py:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
foo.py:7: error: Too many arguments for "Thing"
Found 1 error in 1 file (checked 1 source file)

Am not sure what the solution is here. Can this be solved by simply adding a dataclassy.pyi stub file and setting py.typed to the module ?

I tried stubgen, which at least got the arguments for the decorator...

Error with dataclass with InitVar variable

The following example raises a TypeError when using dataclassy.dataclass, while it works fine with dataclasses.dataclass.

from dataclassy import dataclass
from dataclasses import InitVar
from typing import Any


@dataclass
class Foo:

    initvar: InitVar[Any]

    def __post_init__(self, initvar):
        pass

foo = Foo(initvar = "foo") # TypeError: Foo.__post_init__() missing 1 required positional argument: 'initvar'

Mutable default values shared between instances

Is this intended behavior?

from dataclassy import dataclass

@dataclass
class Foo:
    bar = []

>>> a = Foo()
>>> b = Foo()
>>> a.bar.append(1)
>>> b.bar
[1]

I get the same results on the last two stable releases (0.7.0 and 0.6.2). If I'm reading correctly, the README says it's not intended:

A shallow copy will be created for mutable arguments (defined as those defining a copy method). This means that default field values that are mutable (e.g. a list) will not be mutated between instances.

Multi-level inheritance causes get to fail on properties

Hi,

Having a rather subtle issue with properties in inherited classes more than 2 levels deep:

from dataclassy import dataclass

@dataclass(slots=True)
class Foo(object):
    foo_prop: str = "foo_prop_value"

class Bar(Foo):
    bar_prop: str = "bar_prop_value"

class FooBar(Bar):
    pass

get_foo_prop = Foo.foo_prop
get_bar_prop = Bar.bar_prop

print("get bar prop from Bar: ", get_bar_prop.__get__(Bar()))
print("get bar prop from FooBar: ", get_bar_prop.__get__(FooBar()))

print()

print("get foo prop from Foo: ", getattr(Foo(), get_foo_prop.__name__))
print("get foo prop from Bar: ", getattr(Bar(), get_foo_prop.__name__))
print("get foo prop from FooBar: ", getattr(FooBar(), get_foo_prop.__name__))

print()

print("get foo prop from Foo: ", get_foo_prop.__get__(Foo()))
print("get foo prop from Bar: ", get_foo_prop.__get__(Bar()))
print("get foo prop from FooBar: ", get_foo_prop.__get__(FooBar()))

output:

get bar prop from Bar:  bar_prop_value
get bar prop from FooBar:  bar_prop_value

get foo prop from Foo:  foo_prop_value
get foo prop from Bar:  foo_prop_value
get foo prop from FooBar:  foo_prop_value

get foo prop from Foo:  foo_prop_value
get foo prop from Bar:  foo_prop_value
Traceback (most recent call last):
  File "test.py", line 29, in <module>
    print("get foo prop from FooBar: ", get_foo_prop.__get__(FooBar()))
AttributeError: foo_prop

It's odd that get causes an AttributeError on FooBar (2 levels of inheritance) but getattr works fine. I'd expect both to work.

Worth noting this also breaks dataclasses, but seems like something we should fix.

Recursion error in repr

>>> from typing import Optional
>>> import dataclassy as dataclass
>>> @dataclass.dataclass
... class Node:
...     value: str
...     linked_node: Optional[Node] = None
... 
>>> n = Node("n")
>>> n.linked_node = n
>>> n
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
...
RecursionError: maximum recursion depth exceeded while calling a Python object

A stdlib tool to avoid this is reprlib.recursive_repr.

Tests needed for multiple inheritance

Multiple inheritance works, but we need tests for this.

User-defined init not detected when declared on subclass

As of version 0.7.0 it looks like inheritance no longer works the way it used to. I assume this is a bug as there was no breaking change documented, but lmk if I'm wrong.

Basically this used to work:

@dataclass
class Derps(object):
    name: str = ""
    
class Terps(Derps):
    _test: int = 0

    def __init__(self, test: int):
        self._test = test

Terps(test=3)

But as of version 0.7.0 throws:

Terps(test=3)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'test'

I suspect this is in relation to the post_init changes. If I get time tomorrow/on the weekend I'll have a look, but you may have a better idea than me :)

Calling user defined init when init=False doesn't pass args through

Great work on this lib so far. Much more intuitive than dataclasses and even solves some bugs (e.g in dataclasses you can't define a post_init parameter with the same name as a method on the class).

The only issue I've run into so far is that if you specify init=False but you have a custom __init__, you can't create your class with positional arguments, and you have to always specify keyword arguments.
Seems to me that we should just pass all arguments straight through to the user defined __init__ when init=False, and just call __new__ with no args.

I've done a PR (#9) that fixes this issue I think without breaking any existing usage, but keen to get your thoughts.

Late-init fields still appear in `init`

I'm comparing this package to the standard dataclasses.

In this case, the __init__ method will only accepts the wheels argument:

from dataclasses import dataclass, field


@dataclass(eq=False)
class Vehicle:

    wheels: int
    _wheels: int = field(init=False, repr=False)

However, the equivalent with dataclassy:

from dataclassy import dataclass


@dataclass(eq=False)
class Vehicle:
    wheels: int
    _wheels: int = None

will accept both wheels and _wheels as arguments. Is this intended?

Internals are not considered in eq

Per the docs (emphasis mine),

If true (the default), generate an eq method that compares this data class to another of the same type as if they were tuples created by as_tuple.

However, it appears that internals are ignored, just as they are in __repr__ when hide_internals=True:

In [1]: from dataclassy import dataclass, as_tuple

In [2]: from typing import Optional

In [3]: @dataclass
   ...: class Foo():
   ...:     _private: Optional[int] = None
   ...: 

In [4]: Foo() == Foo(5)
Out[4]: True

In [5]: as_tuple(Foo()) == as_tuple(Foo(5))
Out[5]: False

In [6]: @dataclass
   ...: class Bar():
   ...:     public: Optional[int] = None
   ...: 

In [7]: Bar() == Bar(5)
Out[7]: False

This may have been intentional, but it is also surprising, given that

In [1]: from dataclasses import dataclass

In [2]: from typing import Optional

In [3]: @dataclass
   ...: class Foo():
   ...:     _private: Optional[int] = None
   ...: 

In [4]: Foo() == Foo(5)
Out[4]: False

and especially given that I can do this:

In [1]: from dataclassy import dataclass

In [2]: from typing import Optional

In [3]: @dataclass(hide_internals=False)
   ...: class Baz():
   ...:     _private: Optional[int] = None
   ...: 

In [4]: Baz() == Baz(5)
Out[4]: True

In [5]: repr(Baz()) == repr(Baz(5))
Out[5]: False

which just feels inconsistent to me (__repr__ may choose to hide some internal state, so (a == b) == False but (repr(a) == repr(b)) == True seems fine, but not the inverse, as above).

As for the why, it appears that at dataclass.py:147, inside DataClassMeta.__init__, you decline to pass internals to fields, so the default of False is used when creating the expression for __tuple__, which is of course what is used inside of __eq__.

        tuple_expr = ', '.join((*(f'self.{f}' for f in fields(cls)), ''))  # '' ensures closing comma
        cls.__tuple__ = property(eval(f'lambda self: ({tuple_expr})'))

def fields(dataclass: Type[DataClass], internals=False) -> Dict[str, Type]:

def __eq__(self: DataClass, other: DataClass):
    return type(self) is type(other) and self.__tuple__ == other.__tuple__

However, as_tuple has no concept of internals and in general functions quite differently.

I see two ways to fix this:

Pass internals=False to fields at dataclass.py:147, or make it configurable with a decorator param, something like consider_internals_eq=True. I think this should be True by default so as to not be obscurely inconsistent with dataclasses. I don't think this should be tied to hide_internals, since one may not want internals returned from __repr__ but still want them considered in __eq__.
Use as_tuple in __eq__. This seems right because it unifies two things that seem intuitively the same. However, your comments mention "efficient representation for internal methods" re: __tuple__. I'm not sure offhand how much more efficient evaluating a static expression is v. recursing through a structure is, but if the difference is significant, maybe unification isn't the way to go.

Please let me know which you prefer and why.

Positional init-only arguments?

First of all, thanks for this great library. I like how it makes data classes easier than the built-in dataclasses, and especially the support for __slots__.

While switching my framework to dataclassy, I've hit one problem that I cannot express in code properly:

How can I declare pseudo-positional InitVars?

Here is the equivalent code for dataclasses:

import dataclasses

@dataclasses.dataclass()
class Selector:
    arg1: dataclasses.InitVar[Union[None, str, Marker]] = None
    arg2: dataclasses.InitVar[Union[None, str, Marker]] = None
    arg3: dataclasses.InitVar[Union[None, str, Marker]] = None
    argN: dataclasses.InitVar[None] = None  # a runtime guard against too many positional arguments

    group: Optional[str] = None
    version: Optional[str] = None
    plural: Optional[str] = None
    # ... more things here

    def __post_init__(
            self,
            arg1: Union[None, str, Marker],
            arg2: Union[None, str, Marker],
            arg3: Union[None, str, Marker],
            argN: None,  # a runtime guard against too many positional arguments
    ) -> None:
        ...

The supposed use-case is:

# All notations are equivalent and must create exactly the same objects:
CRDS = Selector('apiextensions.k8s.io', 'customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', plural='customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', None, plural='customresourcedefinitions')
CRDS = Selector('apiextensions.k8s.io', version=None, plural='customresourcedefinitions')
CRDS = Selector(group='apiextensions.k8s.io', version=None, plural='customresourcedefinitions')

I.e., it is either explicitly specifying the kwargs to be stored on the data class, or passing them as positional (pseudo-positional) init-vars. The positional init-vars "arg1..arg3" are then interpreted in the post-init method to be stored as one of the group/version/plural/etc fields as it seems appropriate. In some cases, it might even parse and split positional init-args into several fields: e.g. Selector('apiextensions.k8s.io/v1') would be split to Selector(group='apiextensions.k8s.io', version='v1').

The details are not essential, the only essential part here is the post-init contains "some logic" for converting these pseudo-positional arg1..argN into the actual useful storeable fields.

For the full example:

Declaration & interpretation: https://github.com/nolar/kopf/blob/1.31rc3/kopf/structs/references.py#L253-L322
Object instantiation (only a few hard-coded examples): https://github.com/nolar/kopf/blob/1.31rc3/kopf/structs/references.py#L394-L401

When I try to do this the dataclassy-way, and remove the InitVar[] declarations, the positional arguments go to the first fields: e.g., group & version for the first example line, while the intention is to interpret them as group & plural (as it would be implemented in the post-init function) — which expectedly gives wrong results.

If I keep the arg1..argN fields in the top of the list of fields, they are accepted as needed but are stored on the object as the same-named fields. I can make them internal and hide them from reprs, but I would prefer to not store them at all and keep them as init-only.

What would be the best way to implement the positional init-only variables with dataclassy?

Thank you in advance.

__post_init__ and repeated decorator produces max recursion depth

from dataclassy import dataclass


@dataclass
class Base:
    a: int = 0


@dataclass
class Inherit(Base):
    b: int = 0

    def __post_init__(self) -> None:
        pass


inst = Inherit()

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    inst = Inherit()
  File "<string>", line 4, in __init__
  File "<string>", line 4, in __init__
  File "<string>", line 4, in __init__
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded