ch-robinson / dotnet-avro Goto Github PK

View Code? Open in Web Editor NEW

129.0 8.0 49.0 3.12 MB

An Avro implementation for .NET

Home Page: https://engineering.chrobinson.com/dotnet-avro/

License: MIT License

C# 100.00%

avro kafka schema-registry dotnet hacktoberfest

dotnet-avro's People

Stargazers

Watchers

dotnet-avro's Issues

Tombstone support

Hi @dstelljes!

Maybe you could help us out... we are trying to produce a tombstone message to Kafka. From what I understand, it is not yet supported by onfluent-kafka-dotnet: see confluentinc/confluent-kafka-dotnet#905.

We tried sending a null value to an avro topic that has a union of null and a record (the POCO), but the subscriber (a JDBC sink) doesn't seem to see this as a tombstone message, and crashes into a null-reference exception trying to deserialize the avro message. Therefore, I think that we shouldn't be sending the tombstone message as an avro serialized null value, but shortcircuit the serialization and sent a plain (non avro serialized) null, like the java implementation does:

https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java#L56

Is this currently possible with this library?

Set up Actions for CI/CD

Would be nice to have unit test checks on PRs and automated package/documentation releases.

how can i generate code from avsc file

how can i generate code from avsc file?

Support multidimensional arrays

Jagged (T[][]) arrays are currently supported; multidimensional (T[,]) are not. This would require some major changes to how the binary serdes treat arrays. Currently we just build a list and feed it to Enumerable.ToArray, which sidesteps the problem of knowing array size in advance.

Support default values when building serialization functions

Open questions:

The serializer builder currently throws when a field on a record does not have a matching property on the .NET type being mapped. If a default value were used instead, would that obscure errors (like typos on member names)?
How would default values be represented on the Schema class?

Usage with confluent-kafka-dotnet's DependentProducerBuilder?

Hi--thank you for putting this library out there. It looks like it could help us get past several limitations we've encountered with the current Apache/Confluent versions.

The confluent-kafka-dotnet library provides a DependentProducerBuilder which can be used to create additional Avro producers for messages of different types, while still using a single underlying librdkafka handle.

Between this library and the Confluent one, I am not seeing a way to create Avro producers of multiple types that use a single handle. Am I missing anything, or is this something that would require extending your existing ProducerBuilder extension methods to cover DependentProducerBuilders as well?

Update mapping guide to reflect constructor deserialization changes.

at some point the mapping guide will need to be updated to reflect the changes in #38 & #42.

Confluent extension methods should take deserializer/serializer builder interfaces

Currently, extension methods with deserializer/serializer builder parameters use the concrete types instead of the interfaces.

Avro Generate CLI Message Incorrect

Usage wording is incorrect when parameters are missing on the avro generate functionality.

dotnet avro generate --registry-url http://localhost:8081
Either --id or --schema (and optionally --version) must be provided.

--schema is not correct however, since it should be --subject, as shown when --schema is used in the previous command:

dotnet avro generate --registry-url http://localhost:8081 --schema testTopic-value --version 1
Chr.Avro.Cli 7.0.2
Copyright (C) 2020 C.H. Robinson

ERROR(S):
  Option 'schema' is unknown.
USAGE:
Generate code for a schema by ID:
  dotnet avro generate --id 120 --registry-url http://registry:8081

  -r, --registry-url    The URL of the schema registry.

  -i, --id              If a subject/version is not specified, the ID of the schema.

  -s, --subject         If an ID is not specified, the subject of the schema.

  -v, --version         The version of the schema.

  --help                Display this help screen.

  --version             Display version information.

Unseal schema classes

Right now, the sealed Chr.Avro.Abstract classes could be a barrier to extensibility (an application may want to extend the abstract schema model based on some custom metadata).

Implement bytes schema case in Confluent wire format

According to this (and the wire format docs if you squint), Chr.Avro doesn’t conform to the Confluent wire format when serializing "bytes".

Remove BinaryDeserializer<T> and BinarySerializer<T>

Investigate emitting IBinaryDeserializer<T>/IBinarySerializer<T> implementations dynamically from the serde builders instead.

Decimal serialization fails with large numbers/scales

Able to reproduce with this test case:

using Chr.Avro.Representation;
using Chr.Avro.Serialization;
using Xunit;

namespace Chr.Avro.Bugs
{
    public class DecimalTriage
    {
        public const string Schema = @"{
  ""type"": ""bytes"",
  ""logicalType"": ""decimal"",
  ""precision"": 29,
  ""scale"": 14
}";

        [Fact]
        public void TestOverflow()
        {
            var schema = new JsonSchemaReader().Read(Schema);
            var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<decimal>(schema);
            var serializer = new BinarySerializerBuilder().BuildSerializer<decimal>(schema);

            var value = 999246978759766M;
            Assert.Equal(value, deserializer.Deserialize(serializer.Serialize(value)));
        }
    }
}

Fix is probably to create two BigIntegers, one for the whole part and one for the fractional part, instead of multiplying the decimal value by the scale: https://github.com/ch-robinson/dotnet-avro/blob/master/src/Chr.Avro.Binary/BinarySerializerBuilder.cs#L696

Type could not be found error using assembly and type console arguments

Using the --assembly and --type arguments on the commandline of the Chr.Avro.Cli tool (e.g. create --type KafkaTests.Models.TestAvro --assembly C:\Projects\KafkaTests\bin\Debug\netcoreapp3.1\KafkaTests.Models.dll), I'm getting an error "The type could not be found. You may need to provide additional assemblies." Looking into the code, I can see that the ResolveType extension methods is loading assemblies, but those assemblies do not get saved to any variables, so when the code gets to this line...

return Type.GetType(options.TypeName, ignoreCase: true, throwOnError: true);

... it's not actually using the assembly that was loaded from the previous lines. I'm proposing this be fixed by adding a collection of Assemblies that the Assembly.Load or the Assembly.LoadFrom methods return and then iterating through that array to find the type that belongs to one of these assemblies.

    internal static class TypeOptionExtensions
    {
        public static Type ResolveType(this IClrTypeOptions options)
        {
            List<Assembly> assemblies = new List<Assembly>();
            foreach (var assembly in options.AssemblyNames)
            {
                try
                {
                    // If found, save this assembly to the assemblies collection.
                    assemblies.Add(Assembly.Load(assembly));
                    continue;
                }
                catch (FileNotFoundException)
                {
                    // nbd
                }
                catch (FileLoadException)
                {
                    // also nbd
                }

                try
                {
                    // If found, save this assembly to the assemblies collection.
                    assemblies.Add(Assembly.LoadFrom(Path.GetFullPath(assembly)));
                }
                catch (FileNotFoundException)
                {
                    throw new ProgramException(message: $"{assembly} could not be found. Make sure that you’ve provided either a recognizable name (e.g. System.Runtime) or a valid assembly path.");
                }
                catch (BadImageFormatException)
                {
                    throw new ProgramException(message: $"{assembly} is not valid. Check that the path you’re providing points to a valid assembly file.");
                }
            }

            try
            {
                // Iterate through the loaded assemblies to find the first one that contains the given type.
                foreach (var assembly in assemblies)
                {
                    Type type = assembly.GetType(options.TypeName, throwOnError: false, ignoreCase: true);
                    if (type != null)
                    {
                        return type;
                    }
                }

                return Type.GetType(options.TypeName, throwOnError: true, ignoreCase: true);
            }
            catch (TypeLoadException)
            {
                throw new ProgramException(message: "The type could not be found. You may need to provide additional assemblies.");
            }
        }
    }

Support parameterized constructors for record deserialization

Constructors could be a fallback for public setters. Proposed flow:

The deserializer builder looks for a matching publicly-writable field or property for each record field.
If exactly one match is not found for any record field, the deserializer builder looks for exactly one public constructor with exactly one matching parameter for each field (and no other non-optional parameters).

cc @evanbb

Evaluate support for Generic/Immutable collection classes

Follow up from #59: Also support immutable array/list types? More comprehensive deserialization support for (both mutable and immutable) stacks/sets/etc.?

Current support:

System.Collections.Generic

System.Collections.Immutable

Update NuGet icon definition

iconUrl is deprecated. Use icon instead.

In Chr.Avro.Build.props:

update PackageIconUrl -> PackageIcon
include docs/static/nuget-icon.png in the package (like the readme)

Account for nullable reference types when generating schemas

At present, the type resolvers take an all-or-nothing approach to determining nullability of reference types. The schema generator would be more useful if the resolvers took nullable reference type metadata into account.

This would be a breaking change given that it’s incongruent to the existing resolveReferenceTypesAsNullable option on the TypeResolver. One possible implementation:

Remove the resolveReferenceTypesAsNullable boolean and introduce a NullableReferenceTypeBehavior enum (like TemporalBehavior, TombstoneBehavior, etc.).
- None (never generate nullable union schemas; equivalent to resolveReferenceTypesAsNullable: false)
- Semantic (always generate nullable union schemas; equivalent to resolveReferenceTypesAsNullable: true)
- FromMetadata (look for nullable metadata, falling back to None behavior if oblivious)
Make FromMetadata the default behavior. This matches the current default behavior (resolveReferenceTypesAsNullable: false) for oblivious types and transparently enables better behavior for non-oblivious types.
Keep the default behavior of dotnet avro generate as long as nullable reference types are opt-in, but add a flag to support generating non-oblivious code.

Open questions:

What’s the safest way to reflect on NullableAttribute/NullableContextAttribute? Direct use of those types isn’t allowed in source, but it’s still possible to grab them by name:
```
var attribute = Type.CustomAttributes.SingleOrDefault(attribute => attribute.FullName == "...");
```
For consistency, should we also enum-ify resolveUnderlyingEnumTypes?

String deserializer doesn’t recognize all valid ISO 8601 strings when mapping to DateTime

ParseExact rejects strings that don’t conform exactly to the round-trip format:

var date = "2018-06-25T01:01:00.000Z";

DateTime.ParseExact(date, "O", CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind);
// System.FormatException: String was not recognized as a valid DateTime.

Add readme info for benchmarks

The benchmark project doesn't appear in the solution explorer when opening the solution in Visual Studio. This issue is for adding some documentation to the readme on how to run the benchmarks.

Failing to provide values for enum-typed properties can silently produce corrupt Avro data

In integrating Chr.Avro with the Confluent stack and producing from POCOs, we've run across a case where Avro-encoded messages are corrupt when enum-typed properties are null. Specifically, if the enum-typed property in the POCO is not a Nullable but no value is passed, the serialized bytes may either default to the first value of the enum, or not write any bytes for that property at all. This simplified code sample which was run against v5.0.1 demonstrates:

using System;
using Chr.Avro.Representation;
using Chr.Avro.Serialization;

namespace IssueExample
{
    public enum MyExplicitEnum
    {
        MyEnumOne = 2,
        MyEnumTwo = 3
    }

    public enum MyImplicitEnum
    {
        MyEnumOne,
        MyEnumTwo
    }

    public class TestRecord
    {
        public bool MyBool { get; set; }
        public MyExplicitEnum MyExplicitEnum { get; set; }
        public string MyString { get; set; }
        public MyImplicitEnum MyImplicitEnum { get; set; }
    }

    internal static class Program
    {
        private const string SchemaJson = @"
        {
            ""type"": ""record"",
            ""name"": ""TestRecord"",
            ""fields"": [
                {
                    ""name"": ""MyBool"",
                    ""type"":  ""boolean""
                },
                {
                    ""name"": ""MyExplicitEnum"",
                    ""type"": {
                        ""type"": ""enum"",
                        ""name"": ""MyExplicitEnum"",
                        ""symbols"": [
                            ""MyEnumOne"",
                            ""MyEnumTwo""
                        ]
                    }
                },
                {
                    ""name"": ""MyString"",
                    ""type"": ""string""
                },
                {
                    ""name"": ""MyImplicitEnum"",
                    ""type"": {
                        ""type"": ""enum"",
                        ""name"": ""MyImplicitEnum"",
                        ""symbols"": [
                            ""MyEnumOne"",
                            ""MyEnumTwo""
                        ]
                    }
                }
            ]
        }
        ";

        private static void Main()
        {
            var schema = new JsonSchemaReader().Read(SchemaJson);
            var serializer = new BinarySerializerBuilder().BuildSerializer<TestRecord>(schema);

            var enumsSpecified = new TestRecord
            {
                MyBool = true,
                MyExplicitEnum = MyExplicitEnum.MyEnumTwo,
                MyString = "abcd",
                MyImplicitEnum = MyImplicitEnum.MyEnumTwo
            };

            // This prints 01-02-08-61-62-63-64-02... all OK:
            Console.WriteLine(BitConverter.ToString(serializer.Serialize(enumsSpecified)));

            var enumsOmitted = new TestRecord
            {
                MyBool = true,
                MyString = "abcd"
            };

            // But this prints 01-08-61-62-63-64-00... MyImplicitEnum has been "defaulted" to 00 (i.e. "MyEnumOne") and
            // MyExplicitEnum is not encoded at all, leading to corrupt data:
            Console.WriteLine(BitConverter.ToString(serializer.Serialize(enumsOmitted)));

            // If the JSON Avro schema is changed to make the two enumeration fields union types of [null, <the enum type>],
            // and we run again we get similar troubles:
            //   enumsSpecified: 01-02-02-08-61-62-63-64-02-02
            //   enumsOmitted:   01-02-08-61-62-63-64-02-00

            // Only if the types of the enums in the `TestRecord` POCO are made nullable does it behave as expected:
            //   Nullable in POCO, NOT nullable in Avro schema:
            //     Throws "System.InvalidOperationException: The binary operator Equal is not defined for the types
            //     'System.Nullable`1[IssueExample.MyExplicitEnum] and 'IssueExample.MyExplicitEnum'."
            //   Nullable in POCO, nullable in Avro schema:
            //     enumsSpecified: 01-02-02-08-61-62-63-64-02-02
            //     enumsOmitted:   01-00-08-61-62-63-64-00
        }
    }
}

When a non-Nullable property is left null on a POCO passed to Chr.Avro, I think I would expect to see a null reference exception instead. Does that seem reasonable? In the meantime we're just going to make the problem fields nullable in the POCOs we are passing to Chr.Avro.

Clarify auto registration behavior

The registerAutomatically parameter can be deceptive. Even if true, the Schema Registry producer builder won’t attempt to register a schema that matches the type unless (1) there’s no existing schema or (2) mapping the type to the schema fails.

Proposal: Make registerAutomatically an enum instead of a boolean, something like:

enum AutomaticRegistrationBehavior
{
    Never,
    WhenIncompatible,
    Always
}

Newtonsoft.Json to System.Text.Json

System.Text.Json lands with .NET Core 3.0 later this month. Switching to built-in JSON support should happen before tackling #2.

Invalid record alias; a definition for XYZ.ABC was already read.

@dstelljes While consuming message using Chr.avro (Message produce by Apache.avro) facing issue for Aliases Consume:System.IO.InvalidDataException: Invalid record alias; a definition for XYZ.ABC was already read.

{
	"type": "record",
	"name": "ABC",
	"namespace": "XYZ",
	aliases":["XYZ.ABC"]
	"fields": [{
			"name": "name",
			"type": ["null", "string"]
		}, {
			"name": "code",
			"type": ["null", "string"]
		},
		{
			"name": "cancel_date",
			"default": null,
			"type": ["null", {
				"type": "int",
				"logicalType": "date"
			}]
		}
	]
}

Same schema works fine after remove aliases. But In our case we require aliases
Can you suggest any workaround this?

Missing support for loading additional support assemblies during avro create

If you are using extra libraries around the models that use attributes, dotnet avro create -a ASSEMBLY -t TYPE throws assembly loading exceptions. For example if you are also using these models with the System.Text.Json serializer and have properties marked with [JsonPropertyName], then when running the avro create command will throw could not load file or assembly error. What I think is necessary is to make the -a flag a list value, allowing the assemblies to be loaded. That, or add support for loading a csproj file instead which would include all the nuget includes, etc. Or allow for setting the current working directory. I'm guessing that assemblies are being loaded from the global install path of dotnet-avro.

In the steps to reproduce the app wants to load System.Text.Json version 4.0.1.2. This fails.
On my machine not using the nuget install sets the assembly to require System.Text.Json version 4.0.1.0. This Succeeds.
I don't know why but project references load fine.

Steps to reproduce:

Create new dotnet core console application
Use nuget to add System.Text.Json version 4.7.2
Create a dummy class to be the avro model
Add [JsonPropertyName()] attribute to a property of the model
Build
Run the command to build a schema from that assembly and type: dotnet-avro -a assembly -t type
The error will reproduce

Migrate .NET benchmarks to BenchmarkDotNet

As it’s an officially supported project and would give us better results: https://github.com/dotnet/BenchmarkDotNet

Support JSON encoding

https://avro.apache.org/docs/current/spec.html#json_encoding

Writing named schema for T fails when both T and Nullable<T> are present

Writing a schema for the following class fails with an InvalidSchemaException:

class TestClass
{
    public TestEnum NonNullable { get; set; }
    public TestEnum? Nullable { get; set; }
}

enum TestEnum { }

cc @worthyarchitecture

Exceptions in new version

Hi @dstelljes!

We see a big performance degrade in creating the schema's when switches to master (I think this is 3.0). I think this is because of the many exceptions being thrown in code like:

   foreach (var @case in Cases)
            {
                try
                {
                    return @case.Read(element, cache, scope);
                }
                catch (UnknownSchemaException exception)
                {
                    exceptions.Add(exception);
                }
            }

            throw new AggregateException($"No schema reader case matched {element.ToString()}", exceptions);

Such code is located in JsonSchemaReader, TypeResolver, BinarySerializerBuilder.

This is especially performance heavy when the type/schema is located at the end of the array.

For this POCO...

	public class Foo
	{
		public string Bar { get; set; }
		public DateTime Date { get; set; }
	}

... I'm currently getting 391 exceptions. All handled, but it takes a long time.

Support deserializing to System.Collections.ObjectModel collection types

Since the library already supports deserializing to concrete System.Collections.Generic/System.Collections.Immutable types, support for the other .NET Standard collections namespaces makes sense.

Align decimal serialization with other implementations

Currently, Chr.Avro decimal serdes truncate decimal numbers to the specified precision, which doesn’t really make sense. The serde builder should be pared down to match the Java implementation.

Support adding default values to generated schema

@dstelljes I'm creating this issue based on your comment on #7

Since default values aren't accessible via reflection, maybe an easier way to implement this functionality would be via an annotation? Something to the effect of

public class Message
{
  [AvroDefaultValue(null)]
  public int? Property { get; set; } = null;
}

Add DI examples to consumer/producer guides

Since .NET Core DI is ubiquitous now, we should have examples of using registered consumer/producer/registry clients in addition to the existing console app examples.

cc @LukeSchlangen

DateTimeOffset serialization does not preserve local date/time

Because DateTimeOffset is converted to a UTC-based DateTime prior to serialization, DateTimeOffsets created from local times are not deserialized back to the same representation.

The unit tests do not catch this because the equality operator for DateTimeOffset returns true if the represented points in time are the same. It does not care how they are represented as noted here.

To resolve this, DateTimeOffset should be handled separately from DateTime. The serializer and deserializer should use the ToString() and Parse() methods, respectively, on DateTimeOffset itself to preserve the original semantics of the DateTimeOffset value.

Reference:

BinarySerializerBuilder.cs / StringSerializerBuilderCase.BuildDelegate()
https://github.com/ch-robinson/dotnet-avro/blob/master/src/Chr.Avro.Binary/BinarySerializerBuilder.cs#L1953

BinaryDeserializerBuilder.cs / StringDeserializerBuilderCase.BuildDelegate()
https://github.com/ch-robinson/dotnet-avro/blob/development/src/Chr.Avro.Binary/BinaryDeserializerBuilder.cs#L1996

I'm still familiarizing myself with the code base here, but it appears that this implementation is consistent across the master, development, and 3.x branches as of this writing. I was focused on the string serializer because strings are the recommended schema type for handling DateTimes and DateTimeOffsets in this library. I have not looked closely at the micro- and millisecond logical type implementations to see if / how they might be affected by this.

This code illustrates the problem:

var loc = DateTime.Now;
var utc = loc.ToUniversalTime();
var dtoFromLocal = new DateTimeOffset(loc);
var dtoFromUtc = new DateTimeOffset(utc);

Although .NET considers dtoFromLocal and dtoFromUtc to be "equal", note that the DateTime and Offset properties differ, so they are not identical, and this can affect how they are rendered and persisted by other applications.

This code demonstrates handling roundtrip the value identically:

var dtoIn = DateTimeOffset.Now;
var dtoStr = dtoIn.ToString("O");
var dtoOut = DateTimeOffset.Parse(dtoStr);

Generate skipping delegates as well as deserialization delegates

This would probably involve renaming IBinaryDeserializerBuilderCase.BuildDelegate to BuildDeserializer and adding something like BuildSkipper.

Pros:

better for perf, especially when dealing with arrays
cleaner than this

Cons:

increase in complexity/surface

Migrate CLI to System.CommandLine

As it’s an officially supported project and would keep the project in line with dotnet conventions: https://github.com/dotnet/command-line-api

Add high-level serde builder cases to support polymorphic mapping

To support polymorphic mapping (i.e., mapping an interface or abstract class to concrete classes), an application has to provide custom cases for the serde builders. In practice, this is really onerous—building a custom deserializer case, for instance, entails copying and pasting the union deserializer case and tweaking it to work with a specific interface.

We should provide some high-level cases that enable most of that union logic to be recycled (or make the existing cases more extensible), something like:

public class UnionDeserializerBuilderCase : IDeserializerBuilderCase
{
    public Delegate BuildDelegate(TypeResolution resolution, Schema schema, ConcurrentDictionary<(Type, Schema), Delegate> cache)
    {
        // all of the complicated stuff here
    }

    protected virtual TypeResolution SelectType(TypeResolution resolution, Schema schema)
    {
        // resolution is the same as the one passed to BuildDelegate (the resolution for the interface or abstract type)
        // schema is a member of the union (SelectType is called for each member)

        // determine which concrete type applies; return the resolution for that concrete type

        // default implementation just returns the existing resolution
        return resolution;
    }
}

Then, it’d be easier to build cases that disambiguated interfaces:

public class EventDeserializerBuilderCase : UnionDeserializerBuilderCase
{
    public override TypeResolution SelectType(TypeResolution resolution, Schema schema)
    {
        if (!(resolution is RecordResolution recordResolution) || recordResolution.Type != typeof(IEvent))
        {
            throw new UnsupportedTypeException(resolution.Type);
        }

        switch ((schema as RecordSchema)?.Name)
        {
            case "Concrete1":
                return Resolver.ResolveType<Concrete1>();

             // ...

            default:
                throw new UnsupportedSchemaException(schema);
        }
    }
}

Schema registry HTTP error responses are cached

We've recently been getting spates of tracebacks like the one below in an app of ours that uses Chr.Avro:

[("HResult": -2146233088), ("Message": "System.Net.Http.HttpRequestException: [https://schema-registry.***.com/] GatewayTimeout[https://schema-registry.***.com/] GatewayTimeout -1 
   at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
   at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
   at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
   at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
   at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.<SerializeAsync>b__24_0(String subject)
   at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.SerializeAsync(T data, SerializationContext context)
   at Confluent.Kafka.SyncOverAsync.SyncOverAsyncSerializer`1.Serialize(T data, SerializationContext context)
   at Confluent.Kafka.Producer`2.Produce(TopicPartition topicPartition, Message`2 message, Action`1 deliveryHandler)"), ...<snip>...
   at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
   at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
   at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
   at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
   at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.<SerializeAsync>b__24_0(String subject)
   at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.SerializeAsync(T data, SerializationContext context)
   at Confluent.Kafka.SyncOverAsync.SyncOverAsyncSerializer`1.Serialize(T data, SerializationContext context)
   at Confluent.Kafka.Producer`2.Produce(TopicPartition topicPartition, Message`2 message, Action`1 deliveryHandler)"), ("IsError": True), ("IsLocalError": True), ("IsBrokerError": False)]), ("Type": "Confluent.Kafka.ProduceException`2[[...<snip>...]]")]

After digging a bit, I think what is happening is that our application is--for reasons unrelated to this library or any C# code in general--receiving HTTP 504 responses in some of its initial attempts to contact the schema registry. When this happens, I think Chr.Avro caches this error result since here it is adding a single task to the cache, and after the initial add subsequent hits on the cache are awaiting that same task, leading to the HttpRequestException being raised on every access.

Of course, solving the 504s is a thing we should work on, but more specific to Chr.Avro: does it sound like I'm reading the code right, there? If so, would it make sense to try to come up with a way of skipping addition to the cache for HTTP 5xx response statuses?

Thank you!

Improve documentation home page

Some art/layout work would be nice, as well as answering some of these questions:

What makes Chr.Avro different from other Avro libraries? (mapping to POCOs, schema builder, CLI)
Where is it used? (dotnet avro, Chr.Avro.Confluent)
How do I get started? (consumer/producer guide, CLI guide)

Improve documentation around resolver customization

Maybe add a guide about customizing the resolver for higher-level classes (particularly wrt. resolution options)?

DateTime serialization

Hi @dstelljes!

Is it possible to create a schema with "logicalType": "date" instead of string?

Clean up builder/case contracts

IDictionary/ConcurrentDictionary use should be consistent.
Properties should be public/read-only instead of protected when appropriate.
Case implementations should throw if not compatible (no IsMatch).

There should be a consistent way for cases to be constructed with whatever instance is using them:

public static readonly IEnumerable<Func<SomeBuilder, SomeBuilderCase>> DefaultCaseBuilders;

public IEnumerable<SomeBuilderCase> Cases { get; }

public SomeBuilder(IEnumerable<Func<SomeBuilder, SomeBuilderCase>> caseBuilders)
{
    Cases = (caseBuilders ?? DefaultCaseBuilders).Select(builder => builder(this));
}

Output more friendly exception messages in case of serialization failure, like what property failed to serialize according to the schema

When I try to serialize an object that has a not null string property in avro schema but I set is as null in code I'm getting System.ArgumentNullException: String reference not set to an instance of a String. (Parameter 's')
at System.Text.Encoding.GetBytes(String s)
at Transaction serializer(Closure , TransactionAvro )
at Chr.Avro.Serialization.BinarySerializer`1.Serialize(T value)

Not clear at what property it failed, it will be better to have an exception like Serialization failed, expected property {nameOfTheProperty} to be not null but was null. Or exepected property type to be string but was int, etc.

Add interfaces for Schema Registry serde builder classes

SchemaRegistryDeserializerBuilder and SchemaRegistrySerializerBuilder currently only implement IDisposable. People might want to mock these, so we should add matching interfaces.

Schemas for flag enums should match underlying type

Currently, the schema builder always produces "long" for flag enums. This is inconsistent with the resolver’s resolveUnderlyingEnumTypes option, which will result in "int" or "long" depending on the underlying type.

Eliminate codec method calls

Replace the binary codec with a codec builder (should return Expressions given stream/value ParameterExpressions).

Investigate:

cost of ArrayLength in Read
- assigning count to a variable instead of using ArrayLength results in a slim (< 2%) boost for small arrays (12-byte benchmark) and an appreciable (~ 10%) boost for large arrays (2048-byte benchmark), regardless of whether count is a constant
possible use of ArrayPool in Read
Math.Abs vs. * -1 in ReadBlocks
NotEqual in ReadBoolean
- GreaterThan appears to be about 2% slower
widths in ReadInteger

Clarify support for (ReadOnly)Memory<T>, (ReadOnly)Span<T>, and ArraySegment<T>

Some implicit conversions are defined, but our documentation doesn’t make it clear what works and what doesn’t, and we don’t have any tests around these.

Support for polymorphism

I'm adding support for polymorpishm via the use of an interface together with a SchemaKnownType attribute.

This will result in a union with multiple records.

Just checking on how you are feeling about this.

Support constructors for enumerable deserialization

Currently, Chr.Avro only supports deserializing lists as arrays or types assignable from List<T>. Similarly to #38, we could fall back to a constructor with a single IEnumerable<T> parameter (like on HashSet<T>).

Schema Registry serdes are incompatible with Confluent.SchemaRegistry 1.4.0

ISchemaRegistryClient.GetLatestSchemaAsync returns RegisteredSchema starting with 1.4.0:

Method not found:  System.Threading.Tasks.Task`1<Confluent.SchemaRegistry.Schema> Confluent.SchemaRegistry.ISchemaRegistryClient.GetLatestSchemaAsync(System.String)'.

The new lower bound for Chr.Avro.Confluent should be 1.4.0, and since the Confluent clients follow librdkafka versioning, not semver, we should narrow the range of allowed versions (probably to patch instead of minor).

With this change, when fetching a schema, the serdes should check the type to ensure it's Avro.

ch-robinson / dotnet-avro Goto Github PK

dotnet-avro's People

Stargazers

Watchers

Forkers

dotnet-avro's Issues

System.Collections.Generic

System.Collections.Immutable

Recommend Projects

Recommend Topics

Recommend Org