-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323
Conversation
368e073
to
c4b90ea
Compare
c4b90ea
to
fc0b719
Compare
fc0b719
to
de2e474
Compare
@@ -102,9 +102,8 @@ func (r *StatefulSetResource) handleScaling(ctx context.Context) error { | |||
return r.setCurrentReplicas(ctx, *r.nodePool.Replicas, r.nodePool.Name, r.logger) | |||
} | |||
|
|||
if ptr.Deref(r.nodePool.Replicas, 0) == npCurrentReplicas && !r.nodePool.Deleted { | |||
if ptr.Deref(r.nodePool.Replicas, 0) == npCurrentReplicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need this special case anymore (great!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In my humble opinion it would be good to have test (regression) that would catch that there should not be any unnecessary restarts occurring.
You could file a JIRA issue to address the test.
redpandaContainer = &container | ||
break | ||
var np vectorizedv1alpha1.NodePoolSpec | ||
if nodePoolSpecJSON, ok := sts.Annotations[labels.NodePoolSpecKey]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to work with already existing NodePool
specs? If there is no annotation here, what's the expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the reconciliation, this is set. So once this version is deployed, it will update the StatefulSet and add the annotation.in addition, NodePools are not yet in production. The only case not covered is already in-deleting nodepools - but since there's no prod clusters yet, we'll be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect it to always be present, I feel we should handle the else
case here either either an error or panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM provided we add some error handling around the missing key case.
We should add a regression test that would fail without this change but we don't have to block merging on getting that test added.
redpandaContainer = &container | ||
break | ||
var np vectorizedv1alpha1.NodePoolSpec | ||
if nodePoolSpecJSON, ok := sts.Annotations[labels.NodePoolSpecKey]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect it to always be present, I feel we should handle the else
case here either either an error or panic.
we encountered corner cases, where it becomes extremely difficult to synthesize a NodePoolSpec just by looking at the StatefulSet - which is our fallback, if a nodePool was removed from the spec. AdditionalCommandlineArguments is hard to reconstruct, because we'd need to pull out of of the args field in the pod spec of the STS, removing all "other default" args - very error prone. Instead, we now store the NodePoolSpec used to create the STS in the STS as an annotation. This way we can always find the NodePoolSpec to create the (deleted) STS. In addition, we take this chance to remove small special cases for handling delete nodepools: - Do not set replicas=currentReplicas anymore. It was more of a trick. Instead, we now set for a deleted nodePool replicas=0, which exactly represents what should happen with it (scale down to zero). - Add check for Deleted bool in scale-down handler. It prevented replicas=currentReplicas being accepted as "do notthing" if it's a deleted nodepool. Then, the control flow would proceed and downscaling happens. This was not very explicit and very hard to find out, why downscale even works in deleted NodePools. With the refactor, replicas is 0, and no special case is needed for deleting anymore.
de2e474
to
6921473
Compare
Good call out on the annotation-not-found case. added the else block. |
we encountered corner cases, where it becomes extremely difficult to synthesize a NodePoolSpec just by looking at the StatefulSet - which is our fallback, if a nodePool was removed from the spec. AdditionalCommandlineArguments is hard to reconstruct, because we'd need to pull that out of the args field in the pod spec of the STS, removing all "other default" args - very error prone.
in practice, this caused a rolling restart of a nodePool, because its arguments were not assembled correctly. in the moment it got deleted from spec, a diff came up (args missing).
To fix / solve correctly, we change our strategy. We now store the NodePoolSpec used to create the STS in the STS as an annotation. This way we can always find the NodePoolSpec to create the (deleted) STS.
In addition, we take this chance to remove small special cases for
handling delete nodepools:
Instead, we now set for a deleted nodePool replicas=0, which exactly
represents what should happen with it (intent to scale down to zero).
replicas=currentReplicas being accepted as "do notthing"
if it's a deleted nodepool. Then, the control flow would proceed and
downscaling happens. This was not very explicit and very hard to find
out, why downscale even works in deleted NodePools. With the refactor,
replicas is 0, and no special case is needed for deleting anymore.
This way, nodePool deletion works more like an ordinary scale down.