Giter VIP home page Giter VIP logo

marcosschroh / dataclasses-avroschema Goto Github PK

View Code? Open in Web Editor NEW
211.0 7.0 64.0 7.66 MB

Generate avro schemas from python classes. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas

Home Page: https://marcosschroh.github.io/dataclasses-avroschema/

License: MIT License

Python 99.74% Shell 0.26%
avro apache-avro avro-schemas json-schema pydantic python3 serialization schema json code-generation

dataclasses-avroschema's People

Contributors

aaronfowles avatar anneum avatar bboggs-streambit avatar chapm250 avatar cmatache-bamfunds avatar dependabot[bot] avatar gioruffa avatar github-actions[bot] avatar jairhenrique avatar judahrand avatar kevinjacobs-delfi avatar kevinkjt2000 avatar lsantosdemoura avatar marcosschroh avatar masqueey avatar mhils avatar michal-rogala avatar nhuber-tc avatar nikitabarskov avatar offbyonee avatar omer-shtivi avatar oskar-bonde avatar pawelrubin avatar prometheus3375 avatar rgson avatar sanjayjayaramu-cruise avatar vishalkuo avatar woile avatar xgamer4 avatar yankees714 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataclasses-avroschema's Issues

typing.Optional[typing.List[int]] field fails during serialization

from dataclasses_avroschema import AvroModel
import typing
from dataclasses import dataclass, field


@dataclass
class X(AvroModel):
    y: typing.Optional[typing.List[int]]

X.avro_schema()

Fails with error:

AttributeError: type object 'list' has no attribute 'avro_schema_to_python'

Invalid schema when record is referenced only by two different child records

Describe the bug
Similar to #114 (but not identical), when two nested records (of differing types) both have a field of the same record type, but this record type does not exist in the parent record, the generated schema will redefine the common, inner record type.

To Reproduce

from dataclasses import field
from typing import List, Optional


from dataclasses_avroschema import AvroModel
from fastavro import parse_schema


class Location(AvroModel):
    lat: float
    long: float
    altitude: Optional[float] = None
    bearing: Optional[float] = None


class Photo(AvroModel):
    filename: str
    data: bytes
    width: int
    height: int
    geo_tag: Optional[Location] = None


class Video(AvroModel):
    filename: str
    data: bytes
    duration: int
    geo_tag: Optional[Location] = None


class HolidayAlbum(AvroModel):
    album_name: str
    photos: List[Photo] = field(default_factory=list)
    videos: List[Video] = field(default_factory=list)

    class Meta:
        namespace = "test.namespace"


parse_schema(HolidayAlbum.avro_schema_to_python())
Traceback (most recent call last):
  File "test.py", line 40, in <module>
    parse_schema(HolidayAlbum.avro_schema_to_python())
  File "fastavro/_schema.pyx", line 106, in fastavro._schema.parse_schema
  File "fastavro/_schema.pyx", line 245, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 290, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 190, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 245, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 290, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 115, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 237, in fastavro._schema._parse_schema
fastavro._schema_common.SchemaParseException: redefined named type: test.namespace.Location

Expected behavior
A valid schema is generated, with test.namespace.Location only defined the first time it is encountered.

Add types.Int32

It's nice to have explicit support fo Avro int type for compatibility with existing software

class Int32(int):
    ...

Without setup.py register codecs dosent work for Avro Schemas | Faust + AvroModel + dataclasses|

Describe the bug

I get the error when using topic.send() when using avro_serializer

httpcore.ConnectError: [Errno 8] nodename nor servname provided, or not known

This error confuses us as something is wrong with kafka but the moment I switch from value_serializer from avro_serializer to 'json' everything works fine

below is my model.py code

class VmModel(faust.Record, AvroModel, serializer='avro_vm'):
    imo: int #= 435677
    carrier: str # = "MAERSK"
    port: str = "Singapore"

class VmsModel(faust.Record, AvroModel, serializer='avro_vms'):
    schedule: typing.List[VmModel] = dataclasses.field(default_factory=lambda: [{"imo": 435677, "carrier": "MAERSK", "port": "Singapore"},{"imo": 432377, "carrier": "ONE", "port": "Shanghai"},{"imo": 493377, "carrier": "YANGMING", "port": "Sydney"}])


and my avro.py

# Initialize Schema Registry Client
client = SchemaRegistryClient(url="http://schema-registry:8081")

# VESSEL_MILESTONE
# example of how to use it with dataclasses-avroschema
avro_vm_serializer = FaustSerializer(
    client, 
    "vessel_milestone", 
    VmModel.avro_schema())

def avro_vm_codec():
    return avro_vm_serializer


# VESSEL_MILESTONE_SCHEDULE
# example of how to use it with dataclasses-avroschema
avro_vms_serializer = FaustSerializer(
    client, 
    "vessel_milestone_schedule", 
    VmsModel.avro_schema())

def avro_vms_codec():
    return avro_vms_serializer

To Reproduce
Try to register avro_schema codecs without setup.py config using the following commands

codecs.register("avro_vm", avro_vm_codec)
codecs.register("avro_vms", avro_vms_codec) 

and the error I get

[2021-03-19 02:33:39,537] [78421] [ERROR] [^-App]: Crashed reason=ConnectError('[Errno 8] nodename nor servname provided, or not known')
Traceback (most recent call last):
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_exceptions.py", line 342, in map_exceptions
    yield
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 803, in _send_single_request
    timeout=timeout.as_dict(),
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpcore/_sync/connection_pool.py", line 189, in request
    method, url, headers=headers, stream=stream, timeout=timeout
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpcore/_sync/connection.py", line 83, in request
    self.socket = self._open_socket(timeout)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpcore/_sync/connection.py", line 109, in _open_socket
    local_address=self.local_address,
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpcore/_backends/sync.py", line 144, in open_tcp_stream
    return SyncSocketStream(sock=sock)
  File "/Users/sagungargs/.pyenv/versions/3.7.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
    raise to_exc(exc) from None
httpcore.ConnectError: [Errno 8] nodename nor servname provided, or not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/mode/services.py", line 802, in _execute_task
    await task
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/app/base.py", line 969, in _wrapped
    return await task()
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/app/base.py", line 1022, in around_timer
    await fun(*args)
  File "/Users/sagungargs/Work/portcast2/faustapp/hello_world.py", line 43, in publish_milestones
    await milestones.send(value=milestone, value_serializer=avro_vm_serializer)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/agents/agent.py", line 897, in send
    force=force,
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/topics.py", line 198, in send
    callback=callback,
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/channels.py", line 322, in _send_now
    callback,
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/channels.py", line 269, in as_future_message
    value, value_serializer, schema, open_headers
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/channels.py", line 703, in prepare_value
    self.app, value, serializer=value_serializer, headers=headers
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/serializers/schemas.py", line 132, in dumps_value
    serializer=serializer or self.value_serializer,
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/serializers/registry.py", line 180, in dumps_value
    return dumps(serializer, value)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/serializers/codecs.py", line 359, in dumps
    return get_codec(codec).dumps(obj) if codec else obj
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/faust/serializers/codecs.py", line 224, in dumps
    obj = cast(Codec, node)._dumps(obj)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/schema_registry/serializers/faust_serializer.py", line 50, in _dumps
    return self.encode_record_with_schema(self.schema_subject, self.schema, payload)  # type: ignore
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/schema_registry/serializers/message_serializer.py", line 73, in encode_record_with_schema
    schema_id = self.schemaregistry_client.register(subject, avro_schema)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/schema_registry/client/client.py", line 264, in register
    response = self.check_version(subject, avro_schema, headers=headers, timeout=timeout)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/schema_registry/client/client.py", line 526, in check_version
    result, code = self.request(url, method=method, body=body, headers=headers, timeout=timeout)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/schema_registry/client/client.py", line 219, in request
    response = session.request(method, url, headers=_headers, json=body, timeout=timeout)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 674, in request
    request, auth=auth, allow_redirects=allow_redirects, timeout=timeout
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 704, in send
    request, auth=auth, timeout=timeout, allow_redirects=allow_redirects
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 733, in _send_handling_redirects
    request, auth=auth, timeout=timeout, history=history
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 769, in _send_handling_auth
    response = self._send_single_request(request, timeout)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_client.py", line 803, in _send_single_request
    timeout=timeout.as_dict(),
  File "/Users/sagungargs/.pyenv/versions/3.7.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/sagungargs/Work/portcast2/faustapp/.venv/lib/python3.7/site-packages/httpx/_exceptions.py", line 359, in map_exceptions
    raise mapped_exc(message, **kwargs) from exc  # type: ignore
httpx.ConnectError: [Errno 8] nodename nor servname provided, or not known

Expected behavior

The Avro schemas should work for nested objects, but if I switch to serializer type json in "send" then everything works fine but it then defeats Avro requirement

@app.timer(interval=1.0,on_leader=True)
async def publish_milestones():
    logger.info('PUBLISHING ON LEADER FOR SINGLE VESSEL MILESTONE RECORD!')
    milestone = {"imo": 99999, "carrier": "CMDU", "port": "JNPT"}
    logger.info(milestone)
    logger.info(VmModel(**milestone))
    logger.info(VmModel.fake())
    logger.info(VmModel(imo=9999,carrier="CMDU",port="JNPT"))
    await milestones.send(value=milestone, value_serializer=avro_vm_serializer)

but the following works

@app.timer(interval=1.0,on_leader=True)
async def publish_milestones():
    logger.info('PUBLISHING ON LEADER FOR SINGLE VESSEL MILESTONE RECORD!')
    milestone = {"imo": 99999, "carrier": "CMDU", "port": "JNPT"}
    logger.info(milestone)
    logger.info(VmModel(**milestone))
    logger.info(VmModel.fake())
    logger.info(VmModel(imo=9999,carrier="CMDU",port="JNPT"))
    await milestones.send(value=milestone, value_serializer="json")

Nested metadata not respected

Describe the bug
If I override a schema_name attribute for a class that's used as a field, that schema_name isn't respected.

To Reproduce

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass

@dataclass
class MyClass(AvroModel):
    field_1: str
    class Meta:
        schema_name = "custom_class" # <-- this is not respected
class MySecondClass(AvroModel):
    field_2: MyClass
    class Meta:
        schema_name = "custom_name"
        
MySecondClass.avro_schema_to_python()

This outputs

{'type': 'record',
 'name': 'custom_name',
 'fields': [{'name': 'field_2',
   'type': {'type': 'record',
    'name': 'MyClass', # <-- this line is wrong
    'fields': [{'name': 'field_1', 'type': 'string'}],}}],
}

Expected behavior

I would expect

{'type': 'record',
 'name': 'custom_name',
 'fields': [{'name': 'field_2',
   'type': {'type': 'record',
    'name': 'custom_class', # This is the important line
    'fields': [{'name': 'field_1', 'type': 'string'}],}}],
}

Support for Sequence, Mapping and Set types

We would like to have support for the following typoes:

typing.Sequence, typing.MutableSequence, typing.Mapping and typing.MutableMapping.

In the case os Sequence, the avro type should be Array and for Mappings should be Map

typing.AbstractSet, typing.Set, typing.MutableSet is an interesting case, mabye Set should be mapped to an Array as well.

How can I add documentation for field

I know i can add documentation for Record. ButI want to add documentation for each field in a data-class (Record) can I know how can i add it ?

Example
@DataClass
class User(Faust.Record, AvroModel):
name: str
age:int

In the above example i wanted to add documentation for fields (name, age) in a User record. is there a way to do it? If so can you please provide an example on how to do it? Thanks

Union field with default generating out-of-spec schema

According to the Avro specification, the default value of a union corresponds to the first listed type in the union.

default: A default value for this field, used when reading instances that lack this field (optional). Permitted values depend on the field's schema type, according to the table below. Default values for union fields correspond to the first schema in the union. Default values for bytes and fixed fields are JSON strings, where Unicode code points 0-255 are mapped to unsigned 8-bit byte values 0-255. .

I've got three examples of this being broken below:

To Reproduce
Using this class:

@dataclass
class User(AvroModel):
    "An User"
**Expected behavior**
A clear and concise description of what you expected to happen.
    name: str
    age: int
    school_grade: typing.Optional[int]
    school_name: typing.Optional[str] = None
    school_id: typing.Union[int, str] = ''

it generates the following schema:

   "type": "record", 
   "name": "User",
   "fields": [
       {"name": "name", "type": "string"},
       {"name": "age", "type": "long"},
       {"name": "school_grade", "type": ["long", "null"]},
       {"name": "school_name", "type": ["string", "null"], "default": null},
       {"name": "school_id", "type": ["long", "string"], "default": ""}], 
   "doc": "An User"
}
  1. school_id is typed as a union of int and str, and a default that's a str, but the generated schema has a union beginning with long and a default that's a string.

  2. school_name is typed as an Optional[str] (a union [str, None] according to the docs) that defaults to None, but the generated schema has a union beginning with 'str' and a default null.

  3. This one isn't actually a bug, but I do think it potentially violates expectations. If I type something as Optional I'd expect null to be a possible default (else why would it be optional in the first place?). school_grade is typed as an Optional[int], but the generated schema has long as the first value, meaning null can't be the default.

A fix for the first two would be to verify that the provided default (if there is one) is in the union, and if so set that type as the first in the union on generation. Another option would be to verify and throw an exception if the type isn't the first listed in the union, which would be more informative, but less friendly and probably not worth it for something that's specific to the avro schema.

It would probably be worth changing the logic so Optional[T] generates the union [null, T] instead of [T, null] as well. It makes more intuitive sense that something that's optional allows null to be the default than for it to require something of the type to be the default and require null to be explicitly assigned.

EDIT:
So specifically, the generated schema should look like this (including making Optional[T] generate [null, T]).

   "type": "record", 
   "name": "User",
   "fields": [
       {"name": "name", "type": "string"},
       {"name": "age", "type": "long"},
       {"name": "school_grade", "type": ["null", "long"]},
       {"name": "school_name", "type": ["null", "string"], "default": null},
       {"name": "school_id", "type": ["string", "long"], "default": ""}], 
   "doc": "An User"
}

null defaults for strings results in string nulls rather than avro nulls

import typing

from dataclasses_avroschema import AvroModel, types


class foo(AvroModel):
    foobar: str = None


print(foo.avro_schema())

Results in
{"type": "record", "name": "foo", "fields": [{"name": "foobar", "type": ["null", "string"], "default": "null"}], "doc": "foo(foobar: str = None)"}

Where it should instead result in
{"type": "record", "name": "foo", "fields": [{"name": "foobar", "type": ["null", "string"], "default": null}], "doc": "foo(foobar: str = None)"}

with "default": null not surrounded by strings

Do default values work?

Describe the bug
In the example it seems that fastavro supports default values, but when I try to deserialize a value without I get the following exception

To Reproduce

from dataclasses import dataclass

import typing

from dataclasses_avroschema import AvroModel, types


@dataclass
class User(AvroModel):
    "An User"
    name: str
    age: int
    pets: typing.List[str]
    accounts: typing.Dict[str, int]
    favorite_colors: types.Enum = types.Enum(["BLUE", "YELLOW", "GREEN"])
    country: str = "Argentina"
    address: str = None

    class Meta:
        namespace = "User.v1"
        aliases = ["user-v1", "super user"]

# IMPORTANT: this is a JSON generated by .fake, but with the **country** removed.
fake_user_json_without_country = b'{"name": "MgXqfDAqzbgJSTTHDXtN", "age": 551, "pets": ["aRvwODwbOWfrkxYYkJiI"], "accounts": {"DQSZRzofFrNCiOhhIOvX": 4431}, "favorite_colors": "GREEN",  "address": {"string": "YgmVDKhXctMgODKkhNHJ"}}'


fake_deserialized_instance = User.deserialize(fake_user_json_without_country, serialization_type="avro-json")
# throws exception "KeyError: 'country'"

Exception:

ANACONDA_PATH_ON_WIN\python.exe ******/db_json_schema.py
Traceback (most recent call last):
  File "MY_PROJECT_PATHdb_json_schema.py", line 53, in <module>
    fake_deserialized_instance = User.deserialize(fake_user_json, serialization_type="avro-json")
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\dataclasses_avroschema\schema_generator.py", line 94, in deserialize
    payload = serialization.deserialize(data, schema, serialization_type=serialization_type)
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\dataclasses_avroschema\serialization.py", line 43, in deserialize
    payload = list(records)[0]
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\_read_py.py", line 950, in _elems
    yield read_data(
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\_read_py.py", line 558, in read_data
    data = reader_fn(
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\_read_py.py", line 446, in read_record
    record[field['name']] = read_data(
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\_read_py.py", line 558, in read_data
    data = reader_fn(
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\_read_py.py", line 239, in read_utf8
    return decoder.read_utf8()
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\io\json_decoder.py", line 91, in read_utf8
    return self.read_value()
  File "ANACONDA_PATH_ON_WIN\lib\site-packages\fastavro\io\json_decoder.py", line 36, in read_value
    return self._current[self._key]
KeyError: 'country'

Process finished with exit code 1

Expected behavior
That it didn't throw an exception and added the default value

Nested AvroModel definitions create invalid Avro schemas

Describe the bug
When building a nested data structure of records, if you have multiple fields with the same type, you can end up generating a schema that tried to re-use the record names.

This seems very similar to #100, but the error occurs a level removed from where we can define an alternative name for the redefined field.

To Reproduce

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass
import fastavro


@dataclass
class C(AvroModel):
    pass


@dataclass
class B(AvroModel):
    c: C


@dataclass
class A(AvroModel):
    b1: B
    b2: B

fastavro.parse_schema(A.avro_schema_to_python())
Traceback (most recent call last):
  File "sample.py", line 21, in <module>
    fastavro.parse_schema(A.avro_schema_to_python())
  File "fastavro/_schema.pyx", line 100, in fastavro._schema.parse_schema
  File "fastavro/_schema.pyx", line 239, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 284, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 239, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 284, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 231, in fastavro._schema._parse_schema
fastavro._schema_common.SchemaParseException: redefined named type: c_record

For completions sake, here is what the generated schema looks like

{'doc': 'A(b1: __main__.B, b2: __main__.B)',
 'fields': [{'name': 'b1',
             'type': {'doc': 'B(c: __main__.C)',
                      'fields': [{'name': 'c',
                                  'type': {'doc': 'C()',
                                           'fields': [],
                                           'name': 'c_record',
                                           'type': 'record'}}],
                      'name': 'b1_record',
                      'type': 'record'}},
            {'name': 'b2',
             'type': {'doc': 'B(c: __main__.C)',
                      'fields': [{'name': 'c',
                                  'type': {'doc': 'C()',
                                           'fields': [],
                                           'name': 'c_record',
                                           'type': 'record'}}],
                      'name': 'b2_record',
                      'type': 'record'}}],
 'name': 'A',
 'type': 'record'}

This only seems to be the case when having the 2 types of a B in the parent class A, where itself has a type of another AvroModel class, C in this example.

Expected behavior
The schema should be constructed properly, without using a redefined name.

Support null default for decimal types

Is your feature request related to a problem? Please describe.
It doesn't appear you can set decimal types to have a default of null, if it is possible an example on the decimal documentation would fix this as it's not immediately clear

Describe the solution you'd like
A field along the lines of

foo_bar: decimal.Decimal = types.Decimal(
        scale=14, precision=29, default=None
    )

Would result in an avro output of

{
	"name": "foo_bar",
	"type": [
		"null",
		{
			"type": "bytes",
			"scale": 14,
			"precision": 29,
			"logicalType": "decimal"
		}
	],
	"default": null
}

Avro schema -> dataclass

In my case avro schema is primary and uncontrolled by me.
I'd like to generate dataclass from the avro schema.
Could you share your experience and give me a suggestion on my case?
Also would it be appropriate to add such schema decompliler in your project code if a descent one decompiler was found/written?

Have you considered utilizing Pydantic?

This is really a fairly open ended question but I was wondering if this project could effectively wrap (or seamlessly integrate with) Pydantic. This could be as simple as seeing if it is possible to use their dataclass implementation or we could allow inheritance from their BaseModel?

Any thoughts on integrating with Pydantic to allow for 'smart' type inferences and checking on the Python side?

Enforcing namespace for user defined types is invalid

Describe the bug
Currently, if a type is repeated, its definition must have a namespace defined or a NameSpaceRequiredException is raised. However, a namespace is not a requirement in the avro spec. For example, this is a valid avro record:

{
    "type": "record",
    "name": "container",
    "fields": [
        {
            "name": "inner_ref",
            "type": {
                "type": "record",
                "name": "inner_ref",
                "fields": [
                    {
                        "name": "name",
                        "type": "string"
                    }
                ]
            }
        },
        {
            "name": "inner_ref_2",
            "type": "inner_ref"
        },
        {
            "name": "additional_field",
            "type": "long"
        }
    ]
}

To Reproduce

@dataclass
class InnerClass(AvroModel):
    name: str

    class Meta:
        schema_name = "inner_ref"


@dataclass
class ContainerClass(AvroModel):
    inner_ref: InnerClass
    inner_ref_2: InnerClass
    additional_field: int

We should be able to generate a schema without a namespace on InnerClass

Expected behavior
The above example should not raise a NameSpaceRequiredException

dictionary

I have issue with dictionary avro

    labels: typing.Dict[str, str] = dict

generate

{"name": "labels", "type": {"type": "map", "values": "string", "name": "label"}}

I think the issue comes from BaseField.render

        template = OrderedDict([("name", self.name), ("type", self.get_avro_type())] + self.get_metadata())

and DictField

    @property
    def avro_type(self) -> typing.Dict:
        return {"type": MAP, "values": self.values_type}

License discrepancy

I noticed that this project uses the MIT license, but the setup.py specifies the license as GPLv3. Based on the commit history, seems like MIT is the intended license, and I think the setup.py may just need updating?

Optional Enums don't work

Describe the bug
When trying to generate the schema with the .avro_schema() function, if there's a Optional[types.Enum] field defined, the generation fails.

To Reproduce
To reproduce just create a class with an optional Enum field and try to generate the avro schema:
Example field:

from dataclasses import dataclass
from typing import Optional
from dataclasses_avroschema import AvroModel, types


@dataclass
class Test(AvroModel):
  optional_enum: Optional[types.Enum] = types.Enum(
          symbols=["foo", "bar"], docs="an optional enum"
      )


Test.avro_schema()

This fails with the following error:

File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/schema_generator.py", line 66, in avro_schema
     avro_schema = cls.generate_schema(schema_type=AVRO)
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/schema_generator.py", line 54, in generate_schema
     cls.rendered_schema = cls.schema_def.render()
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/schema_definition.py", line 112, in render
     ("fields", self.get_rendered_fields()),
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/schema_definition.py", line 105, in get_rendered_fields
     return [field.render() for field in self.fields]
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/schema_definition.py", line 105, in <listcomp>
     return [field.render() for field in self.fields]
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/fields.py", line 162, in render
     template = OrderedDict([("name", self.name), ("type", self.get_avro_type())] + self.get_metadata())
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/fields.py", line 435, in get_avro_type
     self.unions = self.generate_unions_type()
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/fields.py", line 420, in generate_unions_type
     unions.append(default_field.get_avro_type())
   File "/home/ty-app/.local/lib/python3.7/site-packages/dataclasses_avroschema/fields.py", line 490, in get_avro_type
     "symbols": self.default.symbols,
AttributeError: '_MISSING_TYPE' object has no attribute 'symbols'

Expected behavior
This should render a field with a union of enum and null avro types

Priority ordering for record field naming

Describe the bug
Thanks for the quick fix on #171. One behavior I think that should be switched is the priority for the schema name. In particular, I think that aliases should be first, then the metadata schema name, then the class name. The aliases seem like a specific user override. What do you think?

To Reproduce

@dataclass
class InnerClass(AvroModel):
    name: str

    class Meta:
        namespace = "dummy"
        schema_name = "my_name"

@dataclass
class ContainerClass(AvroModel):
    inner_ref: InnerClass
    inner_ref_2: InnerClass

    class Meta:
        namespace = "container_ns"
        alias_nested_items = {
            "inner_ref_2": "inner_ref_2"
        }

results in

{
    "type": "record",
    "name": "ContainerClass",
    "fields": [
        {
            "name": "inner_ref",
            "type": {
                "type": "record",
                "name": "my_name",
                "fields": [...],
                "namespace": "dummy"
            }
        },
        {
            "name": "inner_ref_2",
            "type": "dummy.my_name"
        },
    ],
    "namespace": "container_ns"
}

Expected behavior
The above code should respect alias_nested_items first:

{
    "type": "record",
    "name": "ContainerClass",
    "fields": [
        {
            "name": "inner_ref",
            "type": {
                "type": "record",
                "name": "my_name",
                "fields": [...],
                "namespace": "dummy"
            }
        },
        {
            "name": "inner_ref_2",
            "type":  {
                "type": "record",
                "name": "inner_ref_2",
                "fields": [...], // this should be redefined
                "namespace": "dummy"
            }
        },
    ],
    "namespace": "container_ns"
}

alias_nested_items not working for Lists

Describe the bug
When using alias_nested_items on e.g. Lists, the avro schema doesn't contain the alias.

To Reproduce

from dataclasses_avroschema import AvroModel
import typing

class Address(AvroModel):
    "An Address"
    street: str
    street_number: int

class User(AvroModel):
    "An User with Address"
    name: str
    age: int
    address: typing.List[Address]  # default name Address

    class Meta:
        alias_nested_items = {"address": "MySuperAddress"}

User.avro_schema()
{
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "address",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Address",
          "fields": [
            {
              "name": "street",
              "type": "string"
            },
            {
              "name": "street_number",
              "type": "long"
            }
          ],
          "doc": "An Address"
        },
        "name": "addres"
      }
    }
  ],
  "doc": "An User with Address"
}

Expected behavior
The expected behavior would be to substitute the alias also for List (other types ?) Or document this behavior ?

Thanks !

Any benchmarking data?

This should really be a discussion item, rather than an issue but I don't see Discussions activated for this project.

I love the concept of this project! I wonder how it stacks up in terms of speed. Have you performed any benchmarks? I would personally be very keen to see how it compares against https://github.com/fastavro/fastavro/

Test fixtures are dangerously specified

Describe the bug

In the test field type fixtures, what is "tuple" notation in some places is iterator notation in others. For example, this (and all other instances of ( ... for x in y ) are iterators, not tuples): https://github.com/marcosschroh/dataclasses-avroschema/blob/master/tests/fields/consts.py#L92

This means if you ever use the field fixture to parametrize > 1 test like https://github.com/marcosschroh/dataclasses-avroschema/blob/3e35968350c45e5c42eb64dce7deb0514e5cf889/tests/fields/test_complex_types.py, the second test will have no parametrized instances, b/c the iterator will yield no values the second usage.

To Reproduce
Duplicate any of the tests in test_complex_types.py, and change both bodies to assert False. One set of tests will "pass" b/c there will be no test instances.

Expected behavior
A clear and concise description of what you expected to happen.

Support for typing.Optional

Hello, and thanks for the awesome project!

I'm looking to generate schemas with nullable fields from dataclass fields of type typing.Optional. It looks like a similar result can be achieved using a default value of = None, but (1) there are cases where a default might not be desired, and (2) this doesn't play nicely with type checkers like mypy without using typing.Optional.

I have a potential patch here (which works for my use case): marcosschroh:c730da6...yankees714:a28d2ac. If this looks like something you'd be interested in adding to the project, I'd be happy to open a PR!

Field.internal_field is None after 0.22

Describe the bug
I relied on the Field.internal_field attribute - it is now only set after a Field.generate_items_type(). Not sure if this was intented or not but at least it broke my code. (I'm dynamically adapting a payload to the schema before instantiating).

To Reproduce

class Child(AvroModel):
    name: str

class Parent(AvroModel):
    children: typing.List[Child]

> assert Parent.get_fields()[0].internal_field is None
> Parent.get_fields()[0].generate_items_type()
> Parent.get_fields()[0].internal_field
RecordField(name='child', type=<class 'avrotest.Child'>..............

Expected behavior
I expected Field.internal_field to be populated from the start.

Versions are too locked down in setup.py

Describe the bug
As it is, it is impossible for me to use this library with exactly fastavro==1.3.5 due to the constraints imposed.

install_requires=["inflect", "fastavro==1.3.0", "pytz", "dacite", "faker",],

install_requires=["inflect==5.3.0", "fastavro==1.4.0", "pytz", "dacite==1.6.0", "faker==8.1.1",],

To Reproduce

$ pip install fastavro==1.3.5 dataclasses-avroschema==0.20.3
Collecting fastavro==1.3.5
  Using cached fastavro-1.3.5-cp38-cp38-macosx_10_14_x86_64.whl (475 kB)
Collecting dataclasses-avroschema==0.20.3
  Using cached dataclasses_avroschema-0.20.3-py3-none-any.whl
ERROR: Cannot install dataclasses-avroschema==0.20.3 and fastavro==1.3.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested fastavro==1.3.5
    dataclasses-avroschema 0.20.3 depends on fastavro==1.4.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Or try the last version:

$ pip install fastavro==1.3.5 dataclasses-avroschema==0.20.2
Collecting fastavro==1.3.5
  Using cached fastavro-1.3.5-cp38-cp38-macosx_10_14_x86_64.whl (475 kB)
Collecting dataclasses-avroschema==0.20.2
  Using cached dataclasses_avroschema-0.20.2-py3-none-any.whl
Collecting dacite
  Using cached dacite-1.6.0-py3-none-any.whl (12 kB)
ERROR: Cannot install dataclasses-avroschema==0.20.2 and fastavro==1.3.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested fastavro==1.3.5
    dataclasses-avroschema 0.20.2 depends on fastavro==1.3.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Expected behavior
It's a common expectation that libraries in Python only specify exact versions in rare circumstances; otherwise, version ranges should be used that specify a minimum version (and possibly a cap on the major version). Please consider using version ranges instead of specific versions. An easy minimum would be to change all the == to >=:

install_requires=["inflect>=5.3.0", "fastavro>=1.4.0", "pytz", "dacite>=1.6.0", "faker>=8.1.1",],

If a newer version of one of these does happen to break the library that's where this would be useful:

install_requires=["inflect>=5.3,<5.4", "fastavro>=1.4,<1.5", "pytz", "dacite>=1.6,<1.7", "faker>=8.1.1,<8.2",],

Or limit the major version instead:

install_requires=["inflect>=5.3,<6", "fastavro>=1.4,<2", "pytz", "dacite>=1.6,<2", "faker>=8.1.1,<9",],

Handle typing.Union inside typing.List

Currentely we can not handle the case typing.List[typing.Union[...].

Probably is a general thing that we can not handle complex types inside a typing.List, with the exception of Records type

if class instance has default value, there is issue with result

@DataClass
class User(AvroModel):
"User with multiple Address"
name: str = None
age: int = None
addresses: typing.List[Address]

address_data = {
"street": "street",
"street_number": 10,
}

create an Address instance

address = Address(**address_data)

data_user = {
# "name": None,
# "age": 20,
"addresses": [address],
}

create an User instance

user = User(**data_user)

a = user.to_json()

result:
{'name': None, 'age': {'int': 20}, 'addresses': [{'street': 'street', 'street_number': 10}]}

you can see age is {'int':20} instead of 'age':20

Support Numpy types

Is your feature request related to a problem? Please describe.
There is no way to define an Avro INT field or FLOAT field.

Describe the solution you'd like
We could support Numpy annotations which include np.int32 and np.float32.

Support for typing.Union, avro logical types and interoperation with faust

Hi,

i took the liberty to fork your code and take a shoot at some improvements i needed for my work.
The aim was to be able to use complex faust models for schema generation in such a way that the resulting schema is compatible with faust.Record.to_representation() and faust.Record.from_data() operations.
Second goal was to produce a schema which is compatible with fastavro. Fastavro does not handle nested schemas very well as they pose a high risk of multiple definition of the same records. I implemented a crude way to traverse a final schema and flatten it.
The code changes are in
slawak/dataclasses-avroschema:feature/faust-records

Feel free to look at the result and improve upon it.

importing annotations from future causes an exception to occur when calling avro_schema()

Describe the bug
doing a from __future__ import annotations in a file that defines an AvroModel makes it raise an exception when you call avro_schema()

To Reproduce

from __future__ import annotations

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass


@dataclass
class A(AvroModel):
    foo: str

print(A.avro_schema())

This raises AttributeError: 'str' object has no attribute 'avro_schema_to_python'. Note that the example here doesn't actually use any features needed from that import but in my case I was adding a custom constructor as a class method and specifying the return type.

Expected behavior
Using the annotations feature from __future__ shouldn't impact this library.

Reorganize Documentation

The documentation seems a bit confusing.

Maybe we should ordered with the following creteria:

  1. Avro Schemas and Dataclasses
  2. Field Specification (small introduction) and table with all types
  3. Primitive types. Explanation, examples and defaults
  4. Complex types. Explanation, examples and defaults
  5. Logical Types. Explanation, examples and defaults
  6. Schema relationships. Explanation, examples and defaults

Allow statement to specify the casing (camel case, snake case, uppercase, etc) for field names when serializing/deserializing

Feature Description
When using Avro data format with different sources - some written in Scala, some in Python, it would be nice to have a converter between casing, so that one is not forced to use camel case in Python for example.

Feature Example

from dataclasses import dataclass

@dataclass
class Event(AvroModel):
    event_id: str

User.avro_schema(case='camel_case')

'{
    "type": "record",
    "name": "Event",
    "namespace": "Event.v1",
    "fields": [
        {"name": "eventId", "type": "string"}
    ]
}'

Event.serialize(case='camel_case')
Event.deserialize(case='snake_case')

Fields of type fixed

I am trying to create a field of type fixed, even something simple like this

{
   "type": "record",
   "name": "record_name",
   "fields": [
      {
         "name": "field_name",
         "type": {
            "type": "fixed",
            "size": "16"
         }
      }
   ]
}

I can't find any documentation for this and grep-ing the repository directory doesn't find anything relevant.

Support for decimal logicaltype?

Hello,

I'm wondering if there's plans to support the decimal logical type like there's support for the date/time logical types. If there's no plans, do you have suggested workarounds, or could you point me to the file(s) to modify so I could look into implementing it myself and putting up a pull request?

Thanks!

UUID type is incorrect

Describe the bug
Assigning a UUID instance to an attribute for an AvroModel subclass results in AttributeError: type object 'UUID' has no attribute 'avro_schema_to_python'. Looking at your code, it appears you use uuid.uuid4 as the type for uuid objects. This is incorrect, as uuid4 is a method. UUID is the correct type. The avromodel code should not care how the UUID was generated (uuid1,2,3,4,5).

To Reproduce
I created an object such as the following:

@dataclass
class ATestClass(AvroModel):
    """Test."""

    id: UUID

Where UUID is from python's uuid package.

Do jsc = ATestClass(id=uuid.uuid4()) and then jsc.avro_schema() to get the error.

Expected behavior
I would expect that the UUID type is handled correctly.

I'm happy to help out implementing this if you would like. Thank you for making this much needed library. Cheers!

serialize() doesn't work if there is field value with None

serialize() doesn't work if there is field value with None:

data_user = {
"name": None,
"age": 20,
"addresses": [address],
}

create an User instance

user = User(**data_user)

user.serialize()

fastavro.schemaless_writer(file_like_output, schema, payload)

File "fastavro_write.pyx", line 635, in fastavro._write.schemaless_writer
File "fastavro_write.pyx", line 335, in fastavro._write.write_data
File "fastavro_write.pyx", line 285, in fastavro._write.write_record
File "fastavro_write.pyx", line 313, in fastavro._write.write_data
File "fastavro_write.pyx", line 125, in fastavro._write.write_utf8
File "fastavro_six.pyx", line 25, in fastavro._six.py3_utob
TypeError: encoding without a string argument

Use Python class __name__ for nested record typenames.

Is your feature request related to a problem? Please describe.
To ease the burden of converting existing schemas into dataclass format, it would be nice to have the generated schema be an exact match. Mismatching names on existing schemas can lead to the tremendous task of updating names in many places for a widely used schema in a large codebase.

Describe the solution you'd like
Looking at the documentation for dataclasses, I noticed that there is a metadata parameter on dataclasses.field that would probably be perfect for this. I'm picturing something like:

import dataclasses
from dataclasses_avroschema import AvroModel

@dataclasses.dataclass
class Address(AvroModel):
    stuff: str = ""

@dataclasses.dataclass
class Person(AvroModel):
    address: Address = dataclasses.field(default_factory=Address, metadata={"type_name": "Address"}) # I'd also be fine with record_type_name or something similar

Instead of

{'type': 'record',
 'name': 'Person',
 'fields': [{'name': 'address',
   'type': {'type': 'record',
    'name': 'address_record',
    'fields': [{'name': 'stuff', 'type': 'string', 'default': ''}],
    'doc': "Address(stuff: str = '')"}}],
 'doc': 'Person(address: __main__.Address = <factory>)'}

the generated schema would be

{'type': 'record',
 'name': 'Person',
 'fields': [{'name': 'address',
   'type': {'type': 'record',
    'name': 'Address',
    'fields': [{'name': 'stuff', 'type': 'string', 'default': ''}],
    'doc': "Address(stuff: str = '')"}}],
 'doc': 'Person(address: __main__.Address = <factory>)'}

Describe alternatives you've considered
Keeping what is already implemented in the form of hand-coded avro schema dictionaries ๐Ÿ˜ข
Or patching the overrides in:

person_schema = Person.avro_schema_to_python()
person_schema["fields"][0]["name"] = "Address"  # Of course, looping over the array to find things with a matching field name that have the _record version is what I would do instead

Additional context
Record type names are already generated by the library from what I can tell
https://marcosschroh.github.io/dataclasses-avroschema/schema_relationships/#avoid-name-colision-in-multiple-relationships

record_name = self.type.__name__.lower()
if record_name not in self.name:
name = f"{self.name}_{record_name}_record"
else:
name = f"{self.name}_record"

Docs use enum but Types seem to have removed in latest version

Describe the bug
Having run the examples on a previous release and played with examples, i was able to run examples with Enums
They have been removed perhaps for good reason but would like to know how to do the same thing in the latest version and/OR maybe docs should be updated?

To Reproduce
Try to run the examples where Enums are used

Expected behavior
Expect to be able to read examples in the readme without an exception saying that the avro schema types does not contain Enum

Specifying the key attribute

Is your feature request related to a problem? Please describe.
In Kafka it's sometimes necessary to specify which attribute is the key, for example, to enable log compaction.

Describe the solution you'd like
It can be achieved via the Meta class attibute:

@dataclass
class User(AvroModel):
    _id: str
    name: str

    class Meta:
        key = "_id"

or a special type

@dataclass
class User(AvroModel):
    _id: Key[str]
    name: str

Then, it can be accessed easily:

user = User.fake()
assert user.key == user._id

A key would have to follow the following rules:

  • key value has to match some field name
  • there can only be one key present
  • it cannot be optional
  • it cannot have a default value (tbh, I'm not sure if this wouldn't be too strict)

Describe alternatives you've considered
Currently, it can be done via @property or by somehow manipulating the metadata.

@dataclass
class User(AvroModel):
    _id: str
    name: str
    
    @property
    def key(self) -> bytes:
        return self._id.encode()

None with Schema Logical Types and Relationships

Hi,

As describe in the documentation,

meeting_date: datetime.date = None

produce a schema like :

    {
      "name": "meeting_date",
      "type": {
        "type": "int",
        "logicalType": "date"
      },
      "default": "null"
    },

shouldn't it be instead

    {
      "name": "meeting_date",
      "type": [                                                                                                                                                                                                                                                                  
        {                                                                                                                                                                                                                                                                        
          "type": "int",
          "logicalType": "date"
        },
        "null"
      ],
      "default": "null"
    },

None and Relationships

    resources: Resources

produce a schema like :

{
      "name": "resources",
      "type": {
        "type": "record",
        "name": "Resources",
        "fields": [
              ...
        ],
        "doc": "..."
      },
      "default": "null"
    },

shouldn't it be instead

{
      "name": "resources",
      "type": {
        "type": "record",
        "name": "Resources",
        "fields": [
              ...
        ],
        "doc": "..."
      }
    },

Union types that contain a type that require an avrocschema.types specifier fail to generate schema

Describe the bug
A field with a Union type, where the Union contains a type that requires a dataclasses_avroschema.types default to provide additional metadata (Decimal, Enum, etc) fails to generate a schema.

To Reproduce

@dataclass
class TestEnum(AvroModel):
    OptionalEnum: typing.Optional[types.Enum] = types.Enum(['Mr', 'Mrs', 'Miss', 'Ms'])
    # DateEnum = typing.Union[datetime, types.Enum] = types.Enum(['Mr', 'Mrs', 'Miss', 'Ms'])
    # DecimalEnum = typing.Union[decimal.Decimal, types.Enum] = (types.Decimal(precision=2, scale=1), types.Enum(['Mr', 'Mrs', 'Miss', 'Ms']))

print(TestEnum.avro_schema())

Generates the following stacktrace

faust_1  | Traceback (most recent call last):
faust_1  |   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
faust_1  |     return _run_code(code, main_globals, None,
faust_1  |   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
faust_1  |     exec(code, run_globals)
faust_1  |   File "/app/proj/__main__.py", line 17, in <module>
faust_1  |     print(TestEnum.avro_schema())
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_generator.py", line 58, in avro_schema
faust_1  |     return json.dumps(cls.generate_schema(schema_type=AVRO).render())
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_generator.py", line 46, in generate_schema
faust_1  |     cls.schema_def = cls._generate_avro_schema()
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_generator.py", line 54, in _generate_avro_schema
faust_1  |     return schema_definition.AvroSchemaDefinition("record", cls.klass, metadata=cls.metadata)
faust_1  |   File "<string>", line 9, in __init__
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_definition.py", line 55, in __post_init__
faust_1  |     self.fields = self.parse_dataclasses_fields()
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_definition.py", line 60, in parse_dataclasses_fields
faust_1  |     return self.parse_fields()
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_definition.py", line 63, in parse_fields
faust_1  |     return [
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/schema_definition.py", line 64, in <listcomp>
faust_1  |     AvroField(
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/fields.py", line 820, in field_factory
faust_1  |     return container_klass(  # type: ignore
faust_1  |   File "<string>", line 10, in __init__
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/fields.py", line 355, in __post_init__
faust_1  |     self.unions = self.generate_unions_type()
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/fields.py", line 382, in generate_unions_type
faust_1  |     unions.append(default_field.get_avro_type())
faust_1  |   File "/usr/local/lib/python3.8/site-packages/dataclasses_avroschema/fields.py", line 449, in get_avro_type
faust_1  |     "symbols": self.default.symbols,
faust_1  | AttributeError: '_MISSING_TYPE' object has no attribute 'symbols'

Expected behavior
An AvroSchema something like:

 {"type": "record", "name": "TestEnum", "fields": [{"name": "OptionalEnum", "type": ["null", {"type": "enum", "name": "OptionalEnum", "symbols": ["Mr", "Mrs", "Miss", "Ms"]}]}], "doc": "TestEnum(OptionalEnum: dataclasses_avroschema.types.Enum = ['Mr', 'Mrs', 'Miss', 'Ms'])"}

feat: support `NamedTuple` to `record` mapping

Is your feature request related to a problem? Please describe.
NamedTuple is currently unsupported.

Describe the solution you'd like
NameTuple -> record

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Nested schema resolution directly from dictionaries

I want to be able to create nested objects in the definition of the primary object, such that I can pass in dictionaries that does not know about avro.

So from the tutorial:

from dataclasses import dataclass

import typing

from dataclasses_avroschema import AvroModel


@dataclass
class Address(AvroModel):
    "An Address"
    street: str
    street_number: int

@dataclass
class User(AvroModel):
    "User with multiple Address"
    name: str
    age: int
    addresses: typing.List[Address]

data_user = {
    "name": "john",
    "age": 20,
    "addresses": [ {
    "street": "test",
    "street_number": 10,
}],
}
user = User(**data_user)
assert type(user.addresses[0]) is Address

Inpiration for an implimentaion can be found on https://stackoverflow.com/questions/51564841/creating-nested-dataclass-objects-in-python/65326010

In order not to introduce backwards inconvertabilities an optional init argument or an alternative init could be introduced:


user = User(**data_user, nested=True)
assert type(user.addresses[0]) is Address


user = User.nested_init(**data_user)
assert type(user.addresses[0]) is Address

record types are duplicated in generated schema when referenced by more than one field

Describe the bug
If a dataclass has fields whose types are other dataclasses the generated schema properly embeds a "record" object that describes the child field. However if the same child field type appears more than once in the dataclass the same "record" object is duplicated in the schema. The resulting schema cannot be parsed by apache's avro python library - failing with an error such as: avro.schema.SchemaParseException: The name "Location" is already in use.

To Reproduce
The following simple set of dataclasses and the 3 lines of code after them will reproduce the error:

import avro
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
from datetime import datetime
import json


@dataclass
class Location(AvroModel):
    latitude: float
    longitude: float


@dataclass
class Trip(AvroModel):

    start_time: datetime
    start_location: Location
    finish_time: datetime
    finish_location: Location


trip_schema_dict = Trip.avro_schema_to_python()
print(json.dumps(trip_schema_dict, indent=2))
trip_schema = avro.schema.parse(json.dumps(trip_schema_dict))

Expected behavior
The generated schema should be valid and parseable by the apache avro library.

Nested records redefine type when used in an array

Describe the bug
Here is a small example that reproduces the problem on version 0.25.2

from dataclasses_avroschema import AvroModel
from typing import List


class Location(AvroModel):
    start: int
    end: int

    class Meta:
        namespace = "types.test"


class IndexedLocations(AvroModel):
    idx: int
    locations: List[Location]

    class Meta:
        namespace = "types.test"


class TClass(AvroModel):
    location: Location
    idx_locations: IndexedLocations

    class Meta:
        namespace = "types.test"

This schema doesn't parse with fastavro:

fastavro.parse_schema(TClass.avro_schema_to_python())

gives error:

fastavro._schema_common.SchemaParseException: redefined named type: types.test.Location

The schema dict is

{
  "type": "record",
  "name": "TClass",
  "fields": [
    {
      "name": "location",
      "type": {
        "type": "record",
        "name": "Location",
        "fields": [
          {
            "name": "start",
            "type": "long"
          },
          {
            "name": "end",
            "type": "long"
          }
        ],
        "doc": "Location(start: int, end: int)",
        "namespace": "types.test"
      }
    },
    {
      "name": "idx_locations",
      "type": {
        "type": "record",
        "name": "IndexedLocations",
        "fields": [
          {
            "name": "idx",
            "type": "long"
          },
          {
            "name": "locations",
            "type": {
              "type": "array",
              "items": {
                "type": "record",
                "name": "Location",
                "fields": [
                  {
                    "name": "start",
                    "type": "long"
                  },
                  {
                    "name": "end",
                    "type": "long"
                  }
                ],
                "doc": "Location(start: int, end: int)",
                "namespace": "types.test"
              },
              "name": "location"
            }
          }
        ],
        "doc": "IndexedLocations(idx: int, locations: List[tests.test_types.Location])",
        "namespace": "types.test"
      }
    }
  ],
  "doc": "TClass(location: tests.test_types.Location, idx_locations: tests.test_types.IndexedLocations)",
  "namespace": "types.test"

It looks like the resulting schema dict is incorrect and redefines the Location type for the array type but it should use types.test.Location. My schema is much more complex but has similar re-use of a type via List.

Default null record

I'm trying to create a nested record with a default value of null. I tried something like this but am getting a AttributeError: 'NoneType' object has no attribute 'avro_schema_to_python' error

import typing

from dataclasses_avroschema import AvroModel, types


class foobar(AvroModel):
    Name: str = None
    Parent: int = None


class foo(AvroModel):
    foobar: foobar = None


print(foo.avro_schema())

Support for timestamp at microsecond precision

Hi,

Microsecond-precision timestamps are covered in the spec at Timestamp (microsecond precision). I was looking at the way the library handles them, and it seems they're rounded to millis.

I don't have a PR to propose at the moment, but if there's interest I could write one to handle both use cases.

Proposal

  • Millis should probably stay as the default for the datetime.time type so as not to break existing assumptions/code
  • Users could pass some flag in the metadata to signal the field is in micros (opt-in, non-disruptive behaviour)

Implementation

Introduces extra logical type mappings datetime.time -> fields.TimeMicrosField and datetime.datetime -> fields.DatetimeMicrosField. This is turn would require some small changes to the internal API when resolving the field to use, according to the flag.

Let me know if you see a better solution, as I haven't tested the changes yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.