azure / azure-functions-kafka-extension Goto Github PK

View Code? Open in Web Editor NEW

112.0 45.0 76.0 5.05 MB

Kafka extension for Azure Functions

License: MIT License

C# 94.91% Shell 0.29% PowerShell 0.52% Dockerfile 0.77% Java 2.62% Python 0.88%

azure-functions kafka azure-webjobs-sdk

azure-functions-kafka-extension's Issues

Must build a signed assembly ready for publishing to NuGet

We need two things to publish nuget package.

Sign assembly as strong name
Add information for the csproj file

We can reference azure-functions-eventgrid-extension as an example.

Nice to have deadlettering

Include an optional setting where you can specify a deadletter Kafka stream in your config somewhere and it will deadletter to that for you after retries have been exceeded.

Must support connection to native Kafka

dynamic binding fails

When trying to bind to a KafkaAsyncCollector via the IBinder Binder.BindAsync<> with the KafkaAttribute the following exception is thrown

System.Private.CoreLib: Exception while executing function: ….. 
Microsoft.Azure.WebJobs.Host: Can't bind Kafka to type 'Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaAsyncCollector'.

Please provide an example to show how to dynamically bind to the KafkaAsyncCollector

KafkaEventData doesn't do anything in byte[] constructor

See https://github.com/Microsoft/azure-functions-kafka-extension/blob/master/src/Microsoft.Azure.WebJobs.Extensions.Kafka/KafkaEventData.cs#L22

This constructor gets used by the trigger in places, but effectively does nothing. Needs to be implemented or removed.

cc @fbeltrao

Must have testing documentation

Add documentation about testing giving instructions to people on how to run End to end tests locally.
Ideally create required topics from code to simplify E2E setup.

Should support message headers

Add support to message header properties in Kafka events.

Trigger

Add headers to KafkaEventData
Expose properties in KafkaEventData and binding properties

Output

Enable creation of Kafka messages with headers

Should have native Javascript/Typescript binding support

Should use the "new" way of writing bindings

Currently uses the "old" method of writing bindings, which some earlier bindings (SB/Storage/etc) all still use.

There is a newer way to do this that doesn't require nearly as much code, and there is support for open generics.

You can see an example in the cosmos binding for a collector, which starts here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Config/CosmosDBExtensionConfigProvider.cs#L56

There's also some documentation on how to do this here: https://github.com/Azure/azure-webjobs-sdk/wiki/Creating-custom-input-and-output-bindings#binding-to-generic-types-with-opentypes

We should update this "old" method of doing bindings to the "new" method.

Nice to have: create topics if it does not exist

It would be nice to create topic on the fly if it does not exist.
That would require topic information such as name, partition count, replication factor, etc.

When defining the attribute there should be a specific parameter indicating if creating is allowed.

Must have Stylecop in project

Use Stylecop so all team members follow the same coding rules. Start from what has been used in EventHubs extension

CI occassionally has error : Could not find a part of the path '/tmp/NuGetScratch on build

This issue happens one time. Looks known issue.

NuGet/Home#7341

Must have consumers gracefully shutting down

Consumers must shutdown gracefully to ensure that data is not committed even though the process has stopped

Must have ability to run without an Azure Storage account

Many triggers today depend on an Azure Storage account for one reason or another. This trigger should not, and should be runnable even if "AzureWebJobsStorage" is not defined.

More of a requirement than an issue but creating here as an FYI

/cc @ryancrawcour

Must have registration for all OSS components used

Our CI/CD process ( #25 ) needs to include registrations for each OSS component we're using

We need to do two registrations for each component -

that we're using it internally
that we're distributing it

Must have documentation

Must document usage of trigger
Also document all the config options avail in host.json
Must have working end to end sample

Must be prepared to Kafka.Confluent release 1.0 by closing the gap, updating to RC3

As Kafka.Confluent is approaching release 1.0 we should keep a close gap on releases they supply.

We should update our references and validate the functionality and contracts used.

Must have an alternative to auto-commit

Need to have an alternative to auto-commit for a better commit strategy to optimise workload throughput

Should move repo to azure/azure-functions-kafka-extension

Not urgent but just tracking, we have the code for all other extensions under the "azure" org instead of the "microsoft" org. Would be good to move this over the "azure" sometime.

Must have ability to checkpoint and continue on execution completion, by default

On success, or failure, of batch processing, we need to "checkpoint" and continue.
Follow the same behaviour as current Event Hubs trigger.

Should review serialisation

Currently we support built-in serialisation for avro and protobuf.
Avro relies on Confluent.Kafka. Protobuf relies on google.protobuf.

Having serialisation built-in has the following advantages:

Performance when using a language worker, removing the need to serialising byte[] to language worker and do the serialisation there
Simplicity: user doesn't have to come up with much code to get it going

Disadvantages:

Opinionated: we use specific libraries for serialisation. Currently there is no way to inject a different one. Using specific library versions can cause problems when building functions that depends on a different versions of the library.

Should support external Avro Schema Registry

Should support external Avro schema registry. Currently only fix Avro schemas are supported.

Must have an entire batch processed before the next batch can be pulled

After an entire batch has been processed, the next batch can be pulled

Must be able to bind to different types in trigger functions

The trigger implementation has been tested using KafkaEventData and string as parameter types.
We must implement the following scenarios:

POCO (importance: high)

If the POCO class implements ISpecificRecord. Avro deserialiser should be set when creating the KafkaListener
If the POCO class implements IMessage (Google.Protobuf contract). The protobuf deserialiser should be set when creating the KafkaListener

byte[] (importance: high)

Allows deserialisation to be implemented directly in the function.

IGenericRecord (importance: low)

If an Avro schema was provided and getting the fields will be implemented direct in the function. The Avro deserialiser should be set during KafkaListener creation

string (importance: very low)

If an Avro schema was provided we should return a JSON presentation of the object (currently it only does 1-level depth)
If a Protobuf contract was supplied we could return a JSON presentation of the object (currently it only does 1-level depth)

support key, partition, offset, timestamp and topic as stand-alone parameters (importance: medium)

In single dispatches
In multi dispatch

Should have native .NET output binding support

Must be able to write output to a Kafka topic using output bindings

Output binding
Expose important configuration options for producers
Add unit tests

Nice to include Source Link to improve debugging experience

This is not high priority however, source link might help customer to debug this trigger.
https://github.com/dotnet/sourcelink

Should use Confluent.Kafka version 1.0.0

Verify checkpoint saving strategy

Checkpoint saving current is done using Consumer.Commit which blocks the thread. An alternative is to use StoreOffset that will save the checkpoint asynchronously in librdkafka.

Commit is more accurate while StoreOffset offers a better throughput.

Would love your feedback @jeffhollan, @anirudhgarg and @ryancrawcour

Should have native Python binding support

Would need a "rich binding type" on the worker

Must have each consumer lock on a single partition

It’s less around a function requirement and more around a Kafka limitation.
Because you may have 5 independent function instances running at the same time, Kafka only allows one reader per partition per consumer group at one time.

In Event Hubs we leverage an SDK called the “EventProcessorHost.” This automatically helps coordinate what partitions are locked by what instances. So if only 1 instance is active it will let that 1 instance lock all of them. Once a 2nd pops up and tries to connect it will rebalance and let consumers know.
I don’t know exactly how well do that in Kafka - I believe there’s a concept of a “leader” that needs to assign partitions. So in the example above if only one function is active it would by default be the leader.
As soon as a 2nd gets scaled out, Kafka would ask the leader (instance 1) how many partitions should go to #2 and how many should stay with #1.

So I expect the trigger would need “leader logic” so at any time any instance could be the leader, and as a leader it just evenly distributes partitions. Again I’m not positive how exactly it’ll work in Kafka as we rely on the event processor host SDK, but this is what I've pieced together

Must have each consumer instance able to pull a batch of messages from the partition(s) they have locked

After pulling the batch, it can either pass in the entire batch of message (KafkaMessage[]), or a single message. In Event Hubs this is a flag on the function.json called "cardinality"
Batch should be the DEFAULT mode (from a template point of view). WE know from Event Hubs this gives better perf and more expected checkpoint handling

Must have CI/CD process

Must have AzDO CI/CD pipelines to auto build, test, and publish component

Nice to have custom retry configuration for checkpointing

Config option that would allow you to specify a retry of a batch. So if the batch results in an exception, instead of checkpointing, retry the batch.

Set the number of retries. At least 1 retry? At least 5 retries? Unlimited retries?

Python Kafka Output Trigger - Nested collections issue

Kafka output trigger for Python gives a nested collection error from host. This is not related to Python code.

Decide if we have a strong named sign

Ideally, our nuget package should have strong named sign. However, We use several libraries which is not have strong named sign.

I send request confluent and directly talk with them. They said they can do it.
confluentinc/confluent-kafka-dotnet#879

I'm planning that we can publish the first version without strong named sign, however, if they quickly introduce the strong named sign, then I'd like to go with strong named sign from the first version.

One of the downside of not having strong named sign is, in the future, if you change it to have a strong named sign, it cause a breaking change. Since we release it as alpha it might be ok, however, if they provide one very fast, I'd happy to start with signing.

Add connecting to secure Kafka broker to e2e test

Ensure that as part of the e2e tests connections to a secure Kafka broker are verified.

Should have TLS client certificates auth

Cert based auth is very typical with Kafka especailly when you have func running on a container or on Azure where Kafka clusters are running on VM/Confulent Cloud. Having this support could be important to target production workloads.

Must have a custom scale controller

Until this extension is supported by the Functions scale controller we will need logic to handle scale.

Something (a web app or a web job, running in the same App Svc?) will need to check the "queue length" in Kafka topic(s) configured and determine if the current number of Function instances are keeping up adequately.

If we're falling behind, we need to scale out.
If we're ahead, or have drained the queue then we need to scale back in.

@jeffhollan & @anirudhgarg can you please add some details to this requirement that will get us to a MVP scale controller.

Must use Confluent.Kafka version 1.0.0-RC2

Their documentation has been updated the last few days indicating we should be using version 1.0.0-RC2.

Nice to support connection to Azure EventHubs with Kafka head

Nice to have GitHub releases automated

When releasing new versions to NuGet it would be nice to automate a GitHub release that corresponds with this, as 3rd parties and other interested parties monitor GitHub release feed for changes.

Reference - https://twitter.com/marcduiker/status/1122416965388242944

For info on automating releases in GitHub
https://developer.github.com/v3/repos/releases/#create-a-release

Must support multiple consumers of a single Kafka stream (via consumer groups)

Must have end-to-end functional tests

Investigate usage of DependentProducerBuilder for the output binding

Today we create a single producer per brokerList, keyType and valueType. Meaning that multiple topics with same message type will share the same producer.

We should investigate using the DependentProducerBuilder since librdkafka does not know about serialisers.

Javascript / Typescript example for input and output bindings

As a developer, I would like to have an example that shows me how to to bind inputs and outputs with Kafka such that I can quickly build an Azure Function with Kafka.

There is this example from a KEDA repo but it doesn't have:

An example of how you configure an output connection to Kafka and send one or more messages to that output connection.
An example of how can process multiple input messages with Kafka in one function execution.

I'm happy to send a PR if this would be helpful. Let me know where the right spot is to put this example (in this repo or expand the KEDA sample) and rough notes on how to do this with the current interface (if possible).

Must have all configuration and defaults conforming to Functions conventions

All config values should be settable in host.json and override any defaults being used.
Follow the convention of other triggers.

Should refactor to accomodate multi-language support

I talked with Azure Functions product team about the Java bindings, and we found out that current design doesn't support multi language support.

e.g. We have "Type" for KafkaTriggerAttribute.

We can only use Basic types and POCO for that. I'll set a meeting to discuss about this. I post an issue not to forget this.

azure / azure-functions-kafka-extension Goto Github PK

azure-functions-kafka-extension's Issues

Recommend Projects

Recommend Topics

Recommend Org