Comments (16)
Thanks for your help 👍
from kaffe.
Alright.
This was actually a bug introduced in brod
version 3.3.0
where
the check (EC =:= ?EC_NONE andalso Offset =:= -1)
was removed.
Forgot that offset = -1
is actually a 'no-value' indicator from kafka.
It was removed because we received complains from someone
who wanted to fake a commit with offset = -2 | -1
and expected brod
to pick it up after restart.
Since brod
before 3.3.3
applies a +1
as begin_offset,
it by accident worked without causing any attention since starting from offset = 0 (earliest)
is usually the default setting.
Will fix.
from kaffe.
@rawkode interesting. We haven't come across this particular issue ourselves. Are you able to confirm that messages are making it to the topics via a tool like kafkacat? Either by reading from the beginning of the topic or manually adding new messages and verifying kafkacat can output them?
cc/ @objectuser
from kaffe.
Yes, @davidsantoso - we only notice this on a couple of our Kaffe services, while a few others are fine.
I'm pretty stumped!
from kaffe.
root@kafka-kafka-0:/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group mindetic-user-service
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
events 0 4645 4646 1 '[email protected]'/<0.8650.0>-25c13112-0011-499c-aba0-565d1a4ff1c8/10.52.26.3 '[email protected]'/<0.8650.0>
- - - - - '[email protected]'/<0.10200.0>-9e5a88bf-fa37-4a0d-921a-0ae947699bb9/10.52.25.6 '[email protected]'/<0.10200.0>
root@kafka-kafka-0:/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group mindetic-space-service-2017-12-05_2027
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
from kaffe.
I can't even reset the offsets, as the connection is "stable". It's as if the re-balancing lasts forever?
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --topic events:0 --reset-offsets --execute --to-offset 0 --group mindetic-space-service-2017-12-05_2027
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
Error: Assignments can only be reset if the group 'mindetic-space-service-2017-12-05_2027' is inactive, but the current state is Stable.
TOPIC PARTITION NEW-OFFSET
root@kafka-kafka-0:/opt/kafka#
from kaffe.
@rawkode That's very interesting. Am I interpreting the name of your consumer group correctly as indicating it's brand new? I was going to suggest creating a new one to see if it was the problem.
When the Kaffe processes start up, they should emit some reblanacing information to the logs. Is there any indication in there that the rebalance is problematic?
Do you know if this happens only with the latest version of Kaffe (which uses Brod 3) or if you can reproduce it with the previous version?
from kaffe.
I've tried 1.4 and 1.3. consumer is new, yes.
from kaffe.
Sorry, @objectuser - this is all Kaffe reports:
21:08:09.687 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1575.0>,cb=#PID<0.1572.0>,generation=0):
connected to group coordinator kafka-kafka-1.kafka-kafka-headless.mindetic.svc.cluster.local:9092
21:08:09.693 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1575.0>,cb=#PID<0.1572.0>,generation=5):
elected=true
21:08:09.703 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1575.0>,cb=#PID<0.1572.0>,generation=5):
assignments received:
events:
partition=0 begin_offset=-1
21:08:09.714 [info] client :"mindetic-space-service-2017-12-05_2027" connected to kafka-kafka-0.kafka-kafka-headless.mindetic.svc.cluster.local:9092
21:18:09.711 [info] client :"mindetic-space-service-2017-12-05_2027" metadata socket down kafka-kafka.mindetic.svc.cluster.local:9092
Reason:{:shutdown, :tcp_closed}
21:18:09.773 [info] client :kaffe_producer_client metadata socket down kafka-kafka.mindetic.svc.cluster.local:9092
Reason:{:shutdown, :tcp_closed}
from kaffe.
@rawkode Thanks, that helps (2x). So Kaffe sees a single partition and thinks it should start at the beginning. That makes sense.
Can you share your Kaffe consumer configuration?
from kaffe.
21:08:09.634 [debug] Configuration for kaffe:consumer loaded: [message_handler: SpaceService.EventHandler, async_message_ack: false, start_with_earliest_message: true, endpoints: ["kafka-kafka.mindetic.svc.cluster.local": 9092], topics: ["events"], consumer_group: "mindetic-space-service-2017-12-05_2027"]
from kaffe.
Checked this afternoon and it started to consume messages, but weirdly ...
Even though begin_offset=-1
, it consumed offset=4646
02:01:02.319 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
re-joining group, reason::GroupCoordinatorNotAvailable
02:01:02.321 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
failed to join group
reason::GroupCoordinatorNotAvailable
02:01:03.322 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
re-joining group, reason::GroupCoordinatorNotAvailable
02:01:03.323 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
failed to join group
reason::GroupCoordinatorNotAvailable
02:01:04.338 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
re-joining group, reason::GroupCoordinatorNotAvailable
02:01:04.355 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
connected to group coordinator kafka-kafka-1.kafka-kafka-headless.mindetic.svc.cluster.local:9092
02:01:04.371 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
failed to join group
reason::NotCoordinatorForGroup
02:01:05.372 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1753.0>,cb=#PID<0.1752.0>,generation=0):
Leaving group, reason: :max_rejoin_attempts
02:01:05.431 [error] GenServer #PID<0.1753.0> terminating
** (stop) :max_rejoin_attempts
Last message: {:lo_cmd_stabilize, 5, :NotCoordinatorForGroup}
State: {:state, :"mindetic-space-service-2017-12-05_2027", "mindetic-space-service-2017-12-05_2027", "", :undefined, 0, ["events"], {'kafka-kafka-1.kafka-kafka-headless.mindetic.svc.cluster.local', 9092}, #PID<0.1756.0>, :undefined, [], false, #PID<0.1752.0>, :brod_group_subscriber, [], #Reference<0.3899808867.1385955330.107438>, :roundrobin_v2, 10, 2, 5, 1, :undefined, :commit_to_kafka_v2, 5, :roundrobin_v2}
=CRASH REPORT==== 6-Dec-2017::02:01:05 ===
crasher:
initial call: brod_group_coordinator:init/1
pid: <0.1753.0>
registered_name: []
exception exit: max_rejoin_attempts
in function gen_server:handle_common_reply/8 (gen_server.erl, line 726)
ancestors: [<0.1752.0>,'Elixir.SpaceService.Supervisor',<0.1456.0>]
message_queue_len: 0
messages: []
links: [<0.1752.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 1380
neighbours:
neighbour:
pid: <0.1752.0>
registered_name: []
initial call: brod_group_subscriber:init/1
current_function: {gen_server,loop,7}
ancestors: ['Elixir.SpaceService.Supervisor',<0.1456.0>]
message_queue_len: 0
links: [<0.1457.0>,<0.1753.0>]
trap_exit: false
status: waiting
heap_size: 610
stack_size: 10
reductions: 200
current_stacktrace: [{gen_server,loop,7,[{file,"gen_server.erl"},{line,381}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]
=SUPERVISOR REPORT==== 6-Dec-2017::02:01:05 ===
Supervisor: {local,'Elixir.SpaceService.Supervisor'}
Context: child_terminated
Reason: max_rejoin_attempts
Offender: [{pid,<0.1752.0>},
{id,'Elixir.Kaffe.Consumer'},
{mfargs,{'Elixir.Kaffe.Consumer',start_link,[]}},
{restart_type,permanent},
{shutdown,5000},
{child_type,worker}]
02:01:05.437 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1758.0>,cb=#PID<0.1757.0>,generation=0):
connected to group coordinator kafka-kafka-1.kafka-kafka-headless.mindetic.svc.cluster.local:9092
02:01:14.762 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1758.0>,cb=#PID<0.1757.0>,generation=6):
elected=true
02:01:14.831 [info] Group member (mindetic-space-service-2017-12-05_2027,coor=#PID<0.1758.0>,cb=#PID<0.1757.0>,generation=6):
assignments received:
events:
partition=0 begin_offset=-1
02:01:14.858 [info] client :"mindetic-space-service-2017-12-05_2027" connected to kafka-kafka-0.kafka-kafka-headless.mindetic.svc.cluster.local:9092
02:02:21.119 [info] Heh, we have a message!
02:02:21.119 [info] Message: %{attributes: 0, crc: 3016209889, key: "", magic_byte: 1, offset: 4646, partition: 0, topic: "events", ts: -1, ts_type: :create, value: "{\"version\":1,\"type\":\"TaskAttachmentSynced\",\"serial_number\":29,\"payload\":{\"title\":\"title test\",\"task_identifier\":\"11f3c7e1-7a84-4d18-821e-5c343aef1629\"},\"metadata\":{},\"identifier\":\"e7dfc6af-133e-4ea4-8471-c52b2431606d\"}"}
from kaffe.
That is very strange. Do you see any updates on the offset for the consumer group?
from kaffe.
@objectuser I've gotten closer. Kaffe isn't to blame.
I'm now running Kaffe 1.4 with brod 3.3.2
and kafka_protocol 1.1.0
Updating brod and kafka_protocol to 3.3.3 and 1.1.1 breaks everything
from kaffe.
That's even crazier! I bet the Brod folks will be happy for an issue on that.
from kaffe.
I'm not quite sure if I understand what was the issue here.
brod
3.3.3 upgraded brod
implemented consumer group protocol from roundrobin
to roundrobin_v2
in order to be compliant with the kafka convention (i.e. commit the next begin_offset instead of the last consumed offset).
roundrobin
offsets should be upgraded to v2
by 3.3.3
code smoothly.
However, getting assignment -1
in the old implementation means fetch from Offset = -1 + 1 = 0
.
Now it means fetch from latest
.
If you want to make it fetch from earliest
instead, reset to -2
.
from kaffe.
Related Issues (20)
- Defining multiple handlers HOT 1
- worker_per_topic_partition with multiple topics HOT 1
- Examples not compatible with Elixir 1.10 or 1.11 HOT 2
- extract_der is giving error with SSL HOT 2
- Undefined function exponential_backoff HOT 10
- Offset doesn't get updated between runs and runs crash with OOM errors HOT 4
- async ack - lots of duplicate messages until I restart the application HOT 2
- Kaffe.Producer.produce_sync raises on timeout
- How to set kafka headers when publishing message HOT 1
- Invalid call to raise/reraise on brod/kpro error
- Add support for SCRAM mechanism in SASL authentication. HOT 1
- Module to help write ExUnit tests
- It's impossible to create 2 separate consumers for different topics
- Running mix with kaffe deps fails to download pc package from hex
- Wrong place for configuration
- Config dump on error leaks credentials HOT 1
- Endpoints configuration is weird for using ENV vars
- Missing documentation HOT 8
- Connecting to a TLS-based Kafka instance under AWS MSK? HOT 18
- Receives notification when rebalance in progress/assignments revoked HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaffe.