Giter VIP home page Giter VIP logo

Comments (4)

sdball avatar sdball commented on June 22, 2024 1

From the Kafka O'Reilly guide:

Automatic Commit

The easiest way to commit offsets is to allow the consumer to do it for you. If you configure enable.auto.commit = true then every 5 seconds the consumer will commit the largest offset your client received from poll(). The 5 seconds interval is the default and is controlled by setting auto.commit.interval.ms. As everything else in the consumer, the automatic commits are driven by the poll loop. Whenever you poll, the consumer checks if its time to commit, and if it is, it will commit the offsets it returned in the last poll.

Before using this convenient option, however, it is important to understand the consequences.

Consider that by defaults automatic commit occurs every 5 seconds. Suppose that we are 3 seconds after the most recent commit and a rebalance is triggered. After the rebalancing all consumers will start consuming from the last offset committed. In this case the offset is 3 seconds old, so all the events that arrived in those 3 seconds will be processed twice. It is possible to configure the commit interval to commit more frequently and reduce the window in which records will be duplicated, but it is impossible to completely eliminate them.

Note that with auto-commit enabled, a call to poll will always commit the last offset returned by the previous poll. It doesnโ€™t know which events were actually processed, so it is critical to always process all the events returned by poll before calling poll again (or before calling close(), it will also automatically commit offsets). This is usually not an issue, but pay attention when you handle exceptions or otherwise exit the poll loop prematurely.

Automatic commits are convenient, but they donโ€™t give developers enough control to avoid duplicate messages.

from kaffe.

sdball avatar sdball commented on June 22, 2024

@rwdaigle That is definitely the case. Offsets are committed back to Kafka periodically, it's not tracking them realtime. Whenever a group consumer using Kafka to track its offsets restarts there will always be a guarantee of at least some messages being consumed again.

Kafka clients allow the offset commit interval to be configured: defaulting to the (iirc) Kafka recommended value of 5 seconds. That strikes a balance between limiting the amount of repeat message consumption vs overwhelming Kafka with a lot of offset updates.

from kaffe.

sdball avatar sdball commented on June 22, 2024

In our case, Brod does an excellent job of following the spec and controlling the poll loop. Kaffe uses acknowledged synchronous consumption which is why the handle message function requires an :ok response. Kaffe blocks waiting for that required response and then acknowledges the offset as successfully processed to Brod so that it can be committed back to Kafka.

from kaffe.

rwdaigle avatar rwdaigle commented on June 22, 2024

Great info ๐Ÿ‘, thanks!

Yet another reason why Kafka consumers should be implemented to be idempotent - can only guarantee at least once delivery!

from kaffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.