Giter VIP home page Giter VIP logo

Comments (10)

urbandan avatar urbandan commented on June 12, 2024 1

Thanks, fixed the formatting.

from strimzi-kafka-operator.

scholzj avatar scholzj commented on June 12, 2024

Thanks for raising the issue. Could you please format the YAMLs to make them readable? Thanks.

from strimzi-kafka-operator.

ShubhamRwt avatar ShubhamRwt commented on June 12, 2024

@urbandan I tried it with the latest operator(0.41.0) and was not able to get this error. For me when you increase the node pool brokers to 4 then the KafkaRebalance moves into NotReady state since the Kafka cluster is in the NotReady state as the pods are still coming which should be what we desire. Then once the Kafka cluster is up with all pods running then the KafkaRebalance will move again to ProposalReady state. I will try with 0.40.0 now

from strimzi-kafka-operator.

scholzj avatar scholzj commented on June 12, 2024

@ShubhamRwt Why would the Kafka cluster move to NotReady just because you scaled up the node pool to 4 nodes? That sounds like you had some other issue interfering with the reproducer.

from strimzi-kafka-operator.

ShubhamRwt avatar ShubhamRwt commented on June 12, 2024

@scholzj I meant when we scale up the nodes then pods corresponding to cruise-control and the 4th node will be coming up. It would mean that the kafka cluster is not ready yet so there we have the logic in the Rebalance operator that if the Kafka cluster is not up yet then we say Kafka cluster is not Ready.

from strimzi-kafka-operator.

scholzj avatar scholzj commented on June 12, 2024

No, the Kafka cluster should stay in Ready while scaling up.

from strimzi-kafka-operator.

scholzj avatar scholzj commented on June 12, 2024

Also, look at it differently -> what happens if you simply delete the CC pod while a rebalance is in progress?

from strimzi-kafka-operator.

scholzj avatar scholzj commented on June 12, 2024

Discussed on the community call on 16.5.2024: (Assuming this can be reproduced - see the discussion above), this should be addressed by failing the rebalance or restarting the process if possible. (Let's keep it in triage for next time to make sure it is reproducible and discuss the options)

Note: This should be already handled by the Topic Operator when changing the replication factor, where the TO detects this and automatically restarts the RF change. @fvaleri will double-check.

from strimzi-kafka-operator.

fvaleri avatar fvaleri commented on June 12, 2024

Yeah, we have this corner case covered in the Topic Operator.

TL/DR: Cruise Control has no memory of the task that it was working on before restart, so the operator is responsible for detecting this event and resubmit any ongoing task.


The operator periodically calls the user_tasks endpoint with one or more Cruise Control generated User-Task-ID, in order to check the status of pending tasks. If it gets back an empty task list, then it means that Cruise Control has restarted, and the tasks may or may not have been completed. This means that the operator must reset its internal state (switch back the resource state from ongoing to pending), and resubmit the tasks.

Note that there is a small chance that the task could have been completed just before Cruise Control restarted, but the operator didn't had time to know that. In this case, the new task submission would be a duplicate. This is not a problem in practice, as the work has already been done, and the duplicated task would be completed quickly (no-op).

from strimzi-kafka-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.