I witnessed the following behavior on Heroku, which seems to imply that the last succe

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Great info <g-emoji class="g-emoji" alias="+1" fallback-src="https://github.githubasse

Kaffe doesn't resume at last offset on consumer restart(?) about kaffe HOT 4 CLOSED

spreedly commented on June 22, 2024

Kaffe doesn't resume at last offset on consumer restart(?)

from kaffe.

Comments (4)

sdball commented on June 22, 2024 1

From the Kafka O'Reilly guide:

Automatic Commit

The easiest way to commit offsets is to allow the consumer to do it for you. If you configure enable.auto.commit = true then every 5 seconds the consumer will commit the largest offset your client received from poll(). The 5 seconds interval is the default and is controlled by setting auto.commit.interval.ms. As everything else in the consumer, the automatic commits are driven by the poll loop. Whenever you poll, the consumer checks if its time to commit, and if it is, it will commit the offsets it returned in the last poll.

Before using this convenient option, however, it is important to understand the consequences.

Consider that by defaults automatic commit occurs every 5 seconds. Suppose that we are 3 seconds after the most recent commit and a rebalance is triggered. After the rebalancing all consumers will start consuming from the last offset committed. In this case the offset is 3 seconds old, so all the events that arrived in those 3 seconds will be processed twice. It is possible to configure the commit interval to commit more frequently and reduce the window in which records will be duplicated, but it is impossible to completely eliminate them.

Note that with auto-commit enabled, a call to poll will always commit the last offset returned by the previous poll. It doesn’t know which events were actually processed, so it is critical to always process all the events returned by poll before calling poll again (or before calling close(), it will also automatically commit offsets). This is usually not an issue, but pay attention when you handle exceptions or otherwise exit the poll loop prematurely.

Automatic commits are convenient, but they don’t give developers enough control to avoid duplicate messages.

from kaffe.

sdball commented on June 22, 2024

@rwdaigle That is definitely the case. Offsets are committed back to Kafka periodically, it's not tracking them realtime. Whenever a group consumer using Kafka to track its offsets restarts there will always be a guarantee of at least some messages being consumed again.

Kafka clients allow the offset commit interval to be configured: defaulting to the (iirc) Kafka recommended value of 5 seconds. That strikes a balance between limiting the amount of repeat message consumption vs overwhelming Kafka with a lot of offset updates.

from kaffe.

sdball commented on June 22, 2024

In our case, Brod does an excellent job of following the spec and controlling the poll loop. Kaffe uses acknowledged synchronous consumption which is why the handle message function requires an :ok response. Kaffe blocks waiting for that required response and then acknowledges the offset as successfully processed to Brod so that it can be committed back to Kafka.

from kaffe.

rwdaigle commented on June 22, 2024

Great info 👍, thanks!

Yet another reason why Kafka consumers should be implemented to be idempotent - can only guarantee at least once delivery!

from kaffe.

Kaffe doesn't resume at last offset on consumer restart(?) about kaffe HOT 4 CLOSED

Comments (4)

Automatic Commit

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent