-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refreshing a remove brokers operation on KafkaRebalance
resource while one rebalancing is already running can drive to a NotReady
state
#10571
Comments
Triaged on 19.9.2024: This should be fixed. |
After some investigation, I came to conclusion that it is not straight forward to solve this issue with the suggested flag due to our current rebalance reconcile flow. The current flow is:
Setting Although, it is not simple for the operator to automatically refresh the rebalance in this scenario, the user would be notified with the reason for
One improvement we could do is to handle this error and modify the error message slightly. Instead of the @ppatierno please let me know what you think. |
So on one side, I would leave the message as it is because that's exactly what we get from Cruise Control instead of starting to handle specific errors (we don't know how many others we can face in the future) and changing the message for a more understandable one for the user. |
Create a
Kafka
custom resource (for example with 7 brokers) with thecruiseControl
field to run Cruise Control within the cluster deployment.Run a rebalancing by creating a
KafkaRebalance
custom resource to remove nodes 5, 6 (with auto-approval enabled), like this:Wait for the rebalancing to go from ProposalPendy, to ProposalReady and automatically (auto-approval enabled) to Rebalancing.
While rebalancing is running, ask for a new rebalancing (using the "refresh" annotation on the already existing custom resource) including nodes 3, 4 as well, so having all 3,4,5 and 6, like this:
Sometimes (it could depending on the timing and where Cruise Control is on the current rebalancing), the operator will go through the following log error and the
KafkaRebalance
moves toNotReady
state:It seems that asking for a new rebalancing with different nodes to remove needs that currently running task is stopped via
stop_ongoing_execution=true
in the query string on the POST request to the REST API.Maybe we should have this addition in any POST operation for rebalancing when our intention is to not waiting for the current operation ending but starting a new one straight away.
The text was updated successfully, but these errors were encountered: