azure / azure-functions-kafka-extension Goto Github PK
View Code? Open in Web Editor NEWKafka extension for Azure Functions
License: MIT License
Kafka extension for Azure Functions
License: MIT License
We need two things to publish nuget package.
We can reference azure-functions-eventgrid-extension as an example.
Include an optional setting where you can specify a deadletter Kafka stream in your config somewhere and it will deadletter to that for you after retries have been exceeded.
When trying to bind to a KafkaAsyncCollector via the IBinder Binder.BindAsync<> with the KafkaAttribute the following exception is thrown
System.Private.CoreLib: Exception while executing function: …..
Microsoft.Azure.WebJobs.Host: Can't bind Kafka to type 'Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaAsyncCollector'.
Please provide an example to show how to dynamically bind to the KafkaAsyncCollector
This constructor gets used by the trigger in places, but effectively does nothing. Needs to be implemented or removed.
cc @fbeltrao
Add documentation about testing giving instructions to people on how to run End to end tests locally.
Ideally create required topics from code to simplify E2E setup.
Add support to message header properties in Kafka events.
Trigger
Output
Currently uses the "old" method of writing bindings, which some earlier bindings (SB/Storage/etc) all still use.
There is a newer way to do this that doesn't require nearly as much code, and there is support for open generics.
You can see an example in the cosmos binding for a collector, which starts here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Config/CosmosDBExtensionConfigProvider.cs#L56
There's also some documentation on how to do this here: https://github.com/Azure/azure-webjobs-sdk/wiki/Creating-custom-input-and-output-bindings#binding-to-generic-types-with-opentypes
We should update this "old" method of doing bindings to the "new" method.
It would be nice to create topic on the fly if it does not exist.
That would require topic information such as name, partition count, replication factor, etc.
When defining the attribute there should be a specific parameter indicating if creating is allowed.
Use Stylecop so all team members follow the same coding rules. Start from what has been used in EventHubs extension
This issue happens one time. Looks known issue.
Consumers must shutdown gracefully to ensure that data is not committed even though the process has stopped
Many triggers today depend on an Azure Storage account for one reason or another. This trigger should not, and should be runnable even if "AzureWebJobsStorage" is not defined.
More of a requirement than an issue but creating here as an FYI
/cc @ryancrawcour
Our CI/CD process ( #25 ) needs to include registrations for each OSS component we're using
We need to do two registrations for each component -
Must document usage of trigger
Also document all the config options avail in host.json
Must have working end to end sample
As Kafka.Confluent is approaching release 1.0 we should keep a close gap on releases they supply.
We should update our references and validate the functionality and contracts used.
Need to have an alternative to auto-commit for a better commit strategy to optimise workload throughput
Not urgent but just tracking, we have the code for all other extensions under the "azure" org instead of the "microsoft" org. Would be good to move this over the "azure" sometime.
On success, or failure, of batch processing, we need to "checkpoint" and continue.
Follow the same behaviour as current Event Hubs trigger.
Currently we support built-in serialisation for avro and protobuf.
Avro relies on Confluent.Kafka. Protobuf relies on google.protobuf.
Having serialisation built-in has the following advantages:
Disadvantages:
Should support external Avro schema registry. Currently only fix Avro schemas are supported.
After an entire batch has been processed, the next batch can be pulled
The trigger implementation has been tested using KafkaEventData and string as parameter types.
We must implement the following scenarios:
POCO (importance: high)
byte[] (importance: high)
IGenericRecord (importance: low)
string (importance: very low)
support key, partition, offset, timestamp and topic as stand-alone parameters (importance: medium)
Must be able to write output to a Kafka topic using output bindings
This is not high priority however, source link might help customer to debug this trigger.
https://github.com/dotnet/sourcelink
Checkpoint saving current is done using Consumer.Commit which blocks the thread. An alternative is to use StoreOffset that will save the checkpoint asynchronously in librdkafka.
Commit is more accurate while StoreOffset offers a better throughput.
Would love your feedback @jeffhollan, @anirudhgarg and @ryancrawcour
Would need a "rich binding type" on the worker
It’s less around a function requirement and more around a Kafka limitation.
Because you may have 5 independent function instances running at the same time, Kafka only allows one reader per partition per consumer group at one time.
In Event Hubs we leverage an SDK called the “EventProcessorHost.” This automatically helps coordinate what partitions are locked by what instances. So if only 1 instance is active it will let that 1 instance lock all of them. Once a 2nd pops up and tries to connect it will rebalance and let consumers know.
I don’t know exactly how well do that in Kafka - I believe there’s a concept of a “leader” that needs to assign partitions. So in the example above if only one function is active it would by default be the leader.
As soon as a 2nd gets scaled out, Kafka would ask the leader (instance 1) how many partitions should go to #2 and how many should stay with #1.
So I expect the trigger would need “leader logic” so at any time any instance could be the leader, and as a leader it just evenly distributes partitions. Again I’m not positive how exactly it’ll work in Kafka as we rely on the event processor host SDK, but this is what I've pieced together
After pulling the batch, it can either pass in the entire batch of message (KafkaMessage[]), or a single message. In Event Hubs this is a flag on the function.json called "cardinality"
Batch should be the DEFAULT mode (from a template point of view). WE know from Event Hubs this gives better perf and more expected checkpoint handling
Must have AzDO CI/CD pipelines to auto build, test, and publish component
Config option that would allow you to specify a retry of a batch. So if the batch results in an exception, instead of checkpointing, retry the batch.
Set the number of retries. At least 1 retry? At least 5 retries? Unlimited retries?
Kafka output trigger for Python gives a nested collection error from host. This is not related to Python code.
Ideally, our nuget package should have strong named sign. However, We use several libraries which is not have strong named sign.
I send request confluent and directly talk with them. They said they can do it.
confluentinc/confluent-kafka-dotnet#879
I'm planning that we can publish the first version without strong named sign, however, if they quickly introduce the strong named sign, then I'd like to go with strong named sign from the first version.
One of the downside of not having strong named sign is, in the future, if you change it to have a strong named sign, it cause a breaking change. Since we release it as alpha it might be ok, however, if they provide one very fast, I'd happy to start with signing.
Ensure that as part of the e2e tests connections to a secure Kafka broker are verified.
Cert based auth is very typical with Kafka especailly when you have func running on a container or on Azure where Kafka clusters are running on VM/Confulent Cloud. Having this support could be important to target production workloads.
Until this extension is supported by the Functions scale controller we will need logic to handle scale.
Something (a web app or a web job, running in the same App Svc?) will need to check the "queue length" in Kafka topic(s) configured and determine if the current number of Function instances are keeping up adequately.
If we're falling behind, we need to scale out.
If we're ahead, or have drained the queue then we need to scale back in.
@jeffhollan & @anirudhgarg can you please add some details to this requirement that will get us to a MVP scale controller.
Their documentation has been updated the last few days indicating we should be using version 1.0.0-RC2.
When releasing new versions to NuGet it would be nice to automate a GitHub release that corresponds with this, as 3rd parties and other interested parties monitor GitHub release feed for changes.
Reference - https://twitter.com/marcduiker/status/1122416965388242944
For info on automating releases in GitHub
https://developer.github.com/v3/repos/releases/#create-a-release
Today we create a single producer per brokerList, keyType and valueType. Meaning that multiple topics with same message type will share the same producer.
We should investigate using the DependentProducerBuilder since librdkafka does not know about serialisers.
As a developer, I would like to have an example that shows me how to to bind inputs and outputs with Kafka such that I can quickly build an Azure Function with Kafka.
There is this example from a KEDA repo but it doesn't have:
I'm happy to send a PR if this would be helpful. Let me know where the right spot is to put this example (in this repo or expand the KEDA sample) and rough notes on how to do this with the current interface (if possible).
All config values should be settable in host.json and override any defaults being used.
Follow the convention of other triggers.
I talked with Azure Functions product team about the Java bindings, and we found out that current design doesn't support multi language support.
e.g. We have "Type" for KafkaTriggerAttribute.
We can only use Basic types and POCO for that. I'll set a meeting to discuss about this. I post an issue not to forget this.
Would need the Maven attributes to know how to use this
BeginProduce requires the usage of Flush that blocks the thread.
To avoid the blocking we must replace with ProduceAsync
We shouldn't be publishing nuget packages for PR validation builds. They should be built once the master (or dev) is updated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.