Giter VIP home page Giter VIP logo

azure-functions-kafka-extension's Issues

Nice to have deadlettering

Include an optional setting where you can specify a deadletter Kafka stream in your config somewhere and it will deadletter to that for you after retries have been exceeded.

dynamic binding fails

When trying to bind to a KafkaAsyncCollector via the IBinder Binder.BindAsync<> with the KafkaAttribute the following exception is thrown

System.Private.CoreLib: Exception while executing function: ….. 
Microsoft.Azure.WebJobs.Host: Can't bind Kafka to type 'Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaAsyncCollector'.

Please provide an example to show how to dynamically bind to the KafkaAsyncCollector

Must have testing documentation

Add documentation about testing giving instructions to people on how to run End to end tests locally.
Ideally create required topics from code to simplify E2E setup.

Should support message headers

Add support to message header properties in Kafka events.

Trigger

  • Add headers to KafkaEventData
  • Expose properties in KafkaEventData and binding properties

Output

  • Enable creation of Kafka messages with headers

Should use the "new" way of writing bindings

Currently uses the "old" method of writing bindings, which some earlier bindings (SB/Storage/etc) all still use.

There is a newer way to do this that doesn't require nearly as much code, and there is support for open generics.

You can see an example in the cosmos binding for a collector, which starts here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Config/CosmosDBExtensionConfigProvider.cs#L56

There's also some documentation on how to do this here: https://github.com/Azure/azure-webjobs-sdk/wiki/Creating-custom-input-and-output-bindings#binding-to-generic-types-with-opentypes

We should update this "old" method of doing bindings to the "new" method.

Nice to have: create topics if it does not exist

It would be nice to create topic on the fly if it does not exist.
That would require topic information such as name, partition count, replication factor, etc.

When defining the attribute there should be a specific parameter indicating if creating is allowed.

Must have Stylecop in project

Use Stylecop so all team members follow the same coding rules. Start from what has been used in EventHubs extension

Must have documentation

Must document usage of trigger
Also document all the config options avail in host.json
Must have working end to end sample

Should review serialisation

Currently we support built-in serialisation for avro and protobuf.
Avro relies on Confluent.Kafka. Protobuf relies on google.protobuf.

Having serialisation built-in has the following advantages:

  • Performance when using a language worker, removing the need to serialising byte[] to language worker and do the serialisation there
  • Simplicity: user doesn't have to come up with much code to get it going

Disadvantages:

  • Opinionated: we use specific libraries for serialisation. Currently there is no way to inject a different one. Using specific library versions can cause problems when building functions that depends on a different versions of the library.

Must be able to bind to different types in trigger functions

The trigger implementation has been tested using KafkaEventData and string as parameter types.
We must implement the following scenarios:

POCO (importance: high)

  • If the POCO class implements ISpecificRecord. Avro deserialiser should be set when creating the KafkaListener
  • If the POCO class implements IMessage (Google.Protobuf contract). The protobuf deserialiser should be set when creating the KafkaListener

byte[] (importance: high)

  • Allows deserialisation to be implemented directly in the function.

IGenericRecord (importance: low)

  • If an Avro schema was provided and getting the fields will be implemented direct in the function. The Avro deserialiser should be set during KafkaListener creation

string (importance: very low)

  • If an Avro schema was provided we should return a JSON presentation of the object (currently it only does 1-level depth)
  • If a Protobuf contract was supplied we could return a JSON presentation of the object (currently it only does 1-level depth)

support key, partition, offset, timestamp and topic as stand-alone parameters (importance: medium)

  • In single dispatches
  • In multi dispatch

Verify checkpoint saving strategy

Checkpoint saving current is done using Consumer.Commit which blocks the thread. An alternative is to use StoreOffset that will save the checkpoint asynchronously in librdkafka.

Commit is more accurate while StoreOffset offers a better throughput.

Would love your feedback @jeffhollan, @anirudhgarg and @ryancrawcour

Must have each consumer lock on a single partition

It’s less around a function requirement and more around a Kafka limitation.
Because you may have 5 independent function instances running at the same time, Kafka only allows one reader per partition per consumer group at one time.

In Event Hubs we leverage an SDK called the “EventProcessorHost.” This automatically helps coordinate what partitions are locked by what instances. So if only 1 instance is active it will let that 1 instance lock all of them. Once a 2nd pops up and tries to connect it will rebalance and let consumers know.
I don’t know exactly how well do that in Kafka - I believe there’s a concept of a “leader” that needs to assign partitions. So in the example above if only one function is active it would by default be the leader.
As soon as a 2nd gets scaled out, Kafka would ask the leader (instance 1) how many partitions should go to #2 and how many should stay with #1.

So I expect the trigger would need “leader logic” so at any time any instance could be the leader, and as a leader it just evenly distributes partitions. Again I’m not positive how exactly it’ll work in Kafka as we rely on the event processor host SDK, but this is what I've pieced together

Nice to have custom retry configuration for checkpointing

Config option that would allow you to specify a retry of a batch. So if the batch results in an exception, instead of checkpointing, retry the batch.

Set the number of retries. At least 1 retry? At least 5 retries? Unlimited retries?

Decide if we have a strong named sign

Ideally, our nuget package should have strong named sign. However, We use several libraries which is not have strong named sign.

I send request confluent and directly talk with them. They said they can do it.
confluentinc/confluent-kafka-dotnet#879

I'm planning that we can publish the first version without strong named sign, however, if they quickly introduce the strong named sign, then I'd like to go with strong named sign from the first version.

One of the downside of not having strong named sign is, in the future, if you change it to have a strong named sign, it cause a breaking change. Since we release it as alpha it might be ok, however, if they provide one very fast, I'd happy to start with signing.

Should have TLS client certificates auth

Cert based auth is very typical with Kafka especailly when you have func running on a container or on Azure where Kafka clusters are running on VM/Confulent Cloud. Having this support could be important to target production workloads.

Must have a custom scale controller

Until this extension is supported by the Functions scale controller we will need logic to handle scale.

Something (a web app or a web job, running in the same App Svc?) will need to check the "queue length" in Kafka topic(s) configured and determine if the current number of Function instances are keeping up adequately.

If we're falling behind, we need to scale out.
If we're ahead, or have drained the queue then we need to scale back in.

@jeffhollan & @anirudhgarg can you please add some details to this requirement that will get us to a MVP scale controller.

Javascript / Typescript example for input and output bindings

As a developer, I would like to have an example that shows me how to to bind inputs and outputs with Kafka such that I can quickly build an Azure Function with Kafka.

There is this example from a KEDA repo but it doesn't have:

  • An example of how you configure an output connection to Kafka and send one or more messages to that output connection.
  • An example of how can process multiple input messages with Kafka in one function execution.

I'm happy to send a PR if this would be helpful. Let me know where the right spot is to put this example (in this repo or expand the KEDA sample) and rough notes on how to do this with the current interface (if possible).

Should refactor to accomodate multi-language support

I talked with Azure Functions product team about the Java bindings, and we found out that current design doesn't support multi language support.

e.g. We have "Type" for KafkaTriggerAttribute.

We can only use Basic types and POCO for that. I'll set a meeting to discuss about this. I post an issue not to forget this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.