Comments (4)
From the Kafka O'Reilly guide:
Automatic Commit
The easiest way to commit offsets is to allow the consumer to do it for you. If you configure enable.auto.commit = true then every 5 seconds the consumer will commit the largest offset your client received from poll(). The 5 seconds interval is the default and is controlled by setting auto.commit.interval.ms. As everything else in the consumer, the automatic commits are driven by the poll loop. Whenever you poll, the consumer checks if its time to commit, and if it is, it will commit the offsets it returned in the last poll.
Before using this convenient option, however, it is important to understand the consequences.
Consider that by defaults automatic commit occurs every 5 seconds. Suppose that we are 3 seconds after the most recent commit and a rebalance is triggered. After the rebalancing all consumers will start consuming from the last offset committed. In this case the offset is 3 seconds old, so all the events that arrived in those 3 seconds will be processed twice. It is possible to configure the commit interval to commit more frequently and reduce the window in which records will be duplicated, but it is impossible to completely eliminate them.
Note that with auto-commit enabled, a call to poll will always commit the last offset returned by the previous poll. It doesnโt know which events were actually processed, so it is critical to always process all the events returned by poll before calling poll again (or before calling close(), it will also automatically commit offsets). This is usually not an issue, but pay attention when you handle exceptions or otherwise exit the poll loop prematurely.
Automatic commits are convenient, but they donโt give developers enough control to avoid duplicate messages.
from kaffe.
@rwdaigle That is definitely the case. Offsets are committed back to Kafka periodically, it's not tracking them realtime. Whenever a group consumer using Kafka to track its offsets restarts there will always be a guarantee of at least some messages being consumed again.
Kafka clients allow the offset commit interval to be configured: defaulting to the (iirc) Kafka recommended value of 5 seconds. That strikes a balance between limiting the amount of repeat message consumption vs overwhelming Kafka with a lot of offset updates.
from kaffe.
In our case, Brod does an excellent job of following the spec and controlling the poll loop. Kaffe uses acknowledged synchronous consumption which is why the handle message function requires an :ok
response. Kaffe blocks waiting for that required response and then acknowledges the offset as successfully processed to Brod so that it can be committed back to Kafka.
from kaffe.
Great info
Yet another reason why Kafka consumers should be implemented to be idempotent - can only guarantee at least once delivery!
from kaffe.
Related Issues (20)
- Defining multiple handlers HOT 1
- worker_per_topic_partition with multiple topics HOT 1
- Examples not compatible with Elixir 1.10 or 1.11 HOT 2
- extract_der is giving error with SSL HOT 2
- Undefined function exponential_backoff HOT 10
- Offset doesn't get updated between runs and runs crash with OOM errors HOT 4
- async ack - lots of duplicate messages until I restart the application HOT 2
- Kaffe.Producer.produce_sync raises on timeout
- How to set kafka headers when publishing message HOT 1
- Invalid call to raise/reraise on brod/kpro error
- Add support for SCRAM mechanism in SASL authentication. HOT 1
- Module to help write ExUnit tests
- It's impossible to create 2 separate consumers for different topics
- Support connecting to Confluent Kafka HOT 6
- Brod consumers terminating HOT 4
- Repeated rebalance cycle with kafka broker 2.3.0 HOT 16
- kaffe cannot recover from unreachable Kafka HOT 18
- Missing documentation HOT 8
- Connecting to a TLS-based Kafka instance under AWS MSK? HOT 18
- Receives notification when rebalance in progress/assignments revoked HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaffe.