Giter VIP home page Giter VIP logo

Comments (12)

sdball avatar sdball commented on July 21, 2024

@objectuser The logs here might not show the full rebalancing that's happening with the join of consumer.1.

With these logs we see

  • consumer.1 starts
  • consumer.2 starts
  • consumer.2 is elected the leader and sees itself as the only consumer and gets all assignments
  • consumer.2 sees a rebalance event
  • consumer.1 joins and is elected the leader
  • consumer.1 gets assigned all partitions because it sees itself as the only member of the group

We should see more output from consumer.2 to see the results of its rebalance.

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

Looks like the next output from consumer.2 is a message process. These are consecutive lines:

2017-03-08T19:49:23.442167+00:00 app[consumer.2]: re-joining group, reason::RebalanceInProgress
2017-03-08T19:49:23.455289+00:00 app[consumer.2]: count#kafka-index.consumer.message.processed.count=1

Looking for more output now.

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

Looking at both, just for those kinds of messages:

2017-03-08T19:49:20.101522+00:00 app[consumer.2]: group coordinator (groupId=index,memberId=,generation=0,pid=#PID<0.195.0>):
2017-03-08T19:49:20.101542+00:00 app[consumer.2]: connected to group coordinator ec2-34-195-140-72.compute-1.amazonaws.com:9096
2017-03-08T19:49:20.142827+00:00 app[consumer.2]: group coordinator (groupId=index,memberId=nonode@nohost/<0.195.0>-59c8c562-c8de-4464-9360-dfeb0a719e98,generation=1,pid=#PID<0.195.0>):
2017-03-08T19:49:20.142840+00:00 app[consumer.2]: elected=true
2017-03-08T19:49:20.143712+00:00 app[consumer.2]: group coordinator (groupId=index,memberId=nonode@nohost/<0.195.0>-59c8c562-c8de-4464-9360-dfeb0a719e98,generation=1,pid=#PID<0.195.0>):
2017-03-08T19:49:20.143714+00:00 app[consumer.2]: assignments received:
2017-03-08T19:49:20.143715+00:00 app[consumer.2]: index-batch:
...
2017-03-08T19:49:20.143745+00:00 app[consumer.2]: whitelist:
...
2017-03-08T19:49:21.771443+00:00 app[consumer.1]: group coordinator (groupId=index,memberId=,generation=0,pid=#PID<0.195.0>):
2017-03-08T19:49:21.771469+00:00 app[consumer.1]: connected to group coordinator ec2-34-195-140-72.compute-1.amazonaws.com:9096
2017-03-08T19:49:23.442157+00:00 app[consumer.2]: group coordinator (groupId=index,memberId=nonode@nohost/<0.195.0>-59c8c562-c8de-4464-9360-dfeb0a719e98,generation=1,pid=#PID<0.195.0>):
2017-03-08T19:49:23.442167+00:00 app[consumer.2]: re-joining group, reason::RebalanceInProgress
2017-03-08T19:49:31.355289+00:00 app[consumer.1]: group coordinator (groupId=index,memberId=nonode@nohost/<0.195.0>-526d20e3-a23a-4c24-9c7a-ee38c210b88f,generation=2,pid=#PID<0.195.0>):
2017-03-08T19:49:31.355299+00:00 app[consumer.1]: elected=true
2017-03-08T19:49:31.429594+00:00 app[consumer.1]: group coordinator (groupId=index,memberId=nonode@nohost/<0.195.0>-526d20e3-a23a-4c24-9c7a-ee38c210b88f,generation=2,pid=#PID<0.195.0>):
2017-03-08T19:49:31.429597+00:00 app[consumer.1]: assignments received:
2017-03-08T19:49:31.429598+00:00 app[consumer.1]: index-batch:
...
2017-03-08T19:49:31.429632+00:00 app[consumer.1]: whitelist:

After that, I just see message processing metrics and errors.

So I don't see any elected=false or any reassignments to consumer.2 after consumer.1 is elected.

from kaffe.

sdball avatar sdball commented on July 21, 2024

@objectuser Try bringing up two local nodes pointing to local kafka topics with multiple partitions and see if the behavior is reproducible. If we can isolate it down then we can start to address it.

To add partitions to a topic on our kafka dev-services machine:

vagrant@kafka:/opt/kafka_2.11-0.10.1.0/bin$ ./kafka-topics.sh --zookeeper localhost:2181 --topic whitelist --partitions 32 --alter
WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Adding partitions succeeded!

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

Really interesting results. It seems to work everywhere just fine except Indexer production. I tested Indexer locally, staging and production. I even tested Keyster locally, staging and production. Only Indexer production has an issue.

I did see weird behavior the first time I ran it locally, like it wasn't seeing all the partitions. It only saw 0. For some reason what you said about using index as a consumer group was in my head and I changed it locally and it saw all the partitions. But it sees all the partitions in production. So I don't know. I'm going to change it and see what happens.

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

Implemented spreedly/kafka-whitelist-index#35.

Now the partitions seem to be assigned correctly.

I'm hesitant to say it was the consumer group name, since the same name worked fine in staging. But I wonder if the consumer group "went bad" somehow.

from kaffe.

rwdaigle avatar rwdaigle commented on July 21, 2024

Interesting!

I wonder if, because staging Kafka is multi-tenant, Heroku is doing some behind the scenes namespacing that means we don't see the same behavior. E.g., in prod it's the index CG, but staging its prefix-index. So if there was an internal index CG that Kafka uses, it'd only collide in the prod instance.

Plausible?

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

It's certainly possible and that's definitely a difference between staging and production.

I think it's more likely that there's something wrong with that consumer group. I didn't find anything about reserved names in Kafka. I might try deleting the index consumer group and see if that "fixes" it.

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

Heroku doesn't support the kafka:consumer-groups commands on dedicated clusters.

heroku kafka:consumer-groups:destroy index
 ▸    This command will affect the cluster: kafka-curly-37251, which is on
 ▸    spreedly-kafka
 ▸    To proceed, type spreedly-kafka or re-run this command
 ▸    with --confirm spreedly-kafka

> spreedly-kafka
Deleting consumer group index... !
 ▸    this command is not required or enabled on dedicated clusters

That's very unfortunate.

from kaffe.

sdball avatar sdball commented on July 21, 2024

Great investigation!

Consumer groups are (well should be) automatically purged from Kafka after some period of inactivity: whatever the retention period is on Kafka's consumer group topic.

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

OK, good. I'll ask Slack to remind me to revisit in two weeks and see if there are different results. (We need reminders in GitHub issues.)

from kaffe.

objectuser avatar objectuser commented on July 21, 2024

In the end, this was resolved in #20.

from kaffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.