operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323

birdayz · 2024-11-20T19:47:31Z

we encountered corner cases, where it becomes extremely difficult to synthesize a NodePoolSpec just by looking at the StatefulSet - which is our fallback, if a nodePool was removed from the spec. AdditionalCommandlineArguments is hard to reconstruct, because we'd need to pull that out of the args field in the pod spec of the STS, removing all "other default" args - very error prone.

in practice, this caused a rolling restart of a nodePool, because its arguments were not assembled correctly. in the moment it got deleted from spec, a diff came up (args missing).

To fix / solve correctly, we change our strategy. We now store the NodePoolSpec used to create the STS in the STS as an annotation. This way we can always find the NodePoolSpec to create the (deleted) STS.

In addition, we take this chance to remove small special cases for
handling delete nodepools:

Do not set replicas=currentReplicas anymore. It was more of a trick.
Instead, we now set for a deleted nodePool replicas=0, which exactly
represents what should happen with it (intent to scale down to zero).
Add check for Deleted bool in scale-down handler. It prevented
replicas=currentReplicas being accepted as "do notthing"
if it's a deleted nodepool. Then, the control flow would proceed and
downscaling happens. This was not very explicit and very hard to find
out, why downscale even works in deleted NodePools. With the refactor,
replicas is 0, and no special case is needed for deleting anymore.

This way, nodePool deletion works more like an ordinary scale down.

operator/pkg/resources/statefulset.go

birdayz · 2024-11-21T12:45:38Z

operator/pkg/resources/statefulset_scale.go

@@ -102,9 +102,8 @@ func (r *StatefulSetResource) handleScaling(ctx context.Context) error {
 		return r.setCurrentReplicas(ctx, *r.nodePool.Replicas, r.nodePool.Name, r.logger)
 	}

-	if ptr.Deref(r.nodePool.Replicas, 0) == npCurrentReplicas && !r.nodePool.Deleted {
+	if ptr.Deref(r.nodePool.Replicas, 0) == npCurrentReplicas {


we do not need this special case anymore (great!)

RafalKorepta

LGTM.

In my humble opinion it would be good to have test (regression) that would catch that there should not be any unnecessary restarts occurring.

You could file a JIRA issue to address the test.

andrewstucki · 2024-11-21T14:26:08Z

operator/pkg/nodepools/pools.go

-				redpandaContainer = &container
-				break
+		var np vectorizedv1alpha1.NodePoolSpec
+		if nodePoolSpecJSON, ok := sts.Annotations[labels.NodePoolSpecKey]; ok {


Does this need to work with already existing NodePool specs? If there is no annotation here, what's the expected behavior?

As part of the reconciliation, this is set. So once this version is deployed, it will update the StatefulSet and add the annotation.in addition, NodePools are not yet in production. The only case not covered is already in-deleting nodepools - but since there's no prod clusters yet, we'll be fine

If we expect it to always be present, I feel we should handle the else case here either either an error or panic.

chrisseto

LGTM provided we add some error handling around the missing key case.

We should add a regression test that would fail without this change but we don't have to block merging on getting that test added.

chrisseto · 2024-11-21T14:53:20Z

operator/pkg/nodepools/pools.go

-				redpandaContainer = &container
-				break
+		var np vectorizedv1alpha1.NodePoolSpec
+		if nodePoolSpecJSON, ok := sts.Annotations[labels.NodePoolSpecKey]; ok {


If we expect it to always be present, I feel we should handle the else case here either either an error or panic.

we encountered corner cases, where it becomes extremely difficult to synthesize a NodePoolSpec just by looking at the StatefulSet - which is our fallback, if a nodePool was removed from the spec. AdditionalCommandlineArguments is hard to reconstruct, because we'd need to pull out of of the args field in the pod spec of the STS, removing all "other default" args - very error prone. Instead, we now store the NodePoolSpec used to create the STS in the STS as an annotation. This way we can always find the NodePoolSpec to create the (deleted) STS. In addition, we take this chance to remove small special cases for handling delete nodepools: - Do not set replicas=currentReplicas anymore. It was more of a trick. Instead, we now set for a deleted nodePool replicas=0, which exactly represents what should happen with it (scale down to zero). - Add check for Deleted bool in scale-down handler. It prevented replicas=currentReplicas being accepted as "do notthing" if it's a deleted nodepool. Then, the control flow would proceed and downscaling happens. This was not very explicit and very hard to find out, why downscale even works in deleted NodePools. With the refactor, replicas is 0, and no special case is needed for deleting anymore.

birdayz · 2024-11-21T15:24:42Z

Good call out on the annotation-not-found case. added the else block.

birdayz changed the title ~~operator v1: store NodePoolSpec in STS annotations~~ [WIP] operator v1: store NodePoolSpec in STS annotations Nov 20, 2024

birdayz force-pushed the jb/virtual-nodepool-improvement branch from 368e073 to c4b90ea Compare November 20, 2024 22:42

birdayz changed the title ~~[WIP] operator v1: store NodePoolSpec in STS annotations~~ operator v1: store NodePoolSpec in STS annotations Nov 20, 2024

birdayz marked this pull request as ready for review November 20, 2024 22:44

birdayz requested review from RafalKorepta, chrisseto and andrewstucki as code owners November 20, 2024 22:44

birdayz marked this pull request as draft November 20, 2024 22:45

birdayz changed the title ~~operator v1: store NodePoolSpec in STS annotations~~ [WIP] operator v1: store NodePoolSpec in STS annotations Nov 20, 2024

birdayz force-pushed the jb/virtual-nodepool-improvement branch from c4b90ea to fc0b719 Compare November 21, 2024 08:44

birdayz changed the title ~~[WIP] operator v1: store NodePoolSpec in STS annotations~~ operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly Nov 21, 2024

birdayz marked this pull request as ready for review November 21, 2024 08:51

birdayz force-pushed the jb/virtual-nodepool-improvement branch from fc0b719 to de2e474 Compare November 21, 2024 09:02

sbocinec reviewed Nov 21, 2024

View reviewed changes

operator/pkg/resources/statefulset.go Show resolved Hide resolved

birdayz commented Nov 21, 2024

View reviewed changes

RafalKorepta approved these changes Nov 21, 2024

View reviewed changes

andrewstucki reviewed Nov 21, 2024

View reviewed changes

chrisseto approved these changes Nov 21, 2024

View reviewed changes

birdayz force-pushed the jb/virtual-nodepool-improvement branch from de2e474 to 6921473 Compare November 21, 2024 15:24

birdayz merged commit 27896a4 into main Nov 21, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323

operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323

birdayz commented Nov 20, 2024 •

edited

Loading

birdayz Nov 21, 2024

RafalKorepta left a comment

andrewstucki Nov 21, 2024

birdayz Nov 21, 2024

chrisseto Nov 21, 2024

chrisseto left a comment

chrisseto Nov 21, 2024

birdayz commented Nov 21, 2024

operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323

operator v1: store NodePoolSpec in STS annotations & refactor nodePool deletion slightly #323

Conversation

birdayz commented Nov 20, 2024 • edited Loading

birdayz Nov 21, 2024

Choose a reason for hiding this comment

RafalKorepta left a comment

Choose a reason for hiding this comment

andrewstucki Nov 21, 2024

Choose a reason for hiding this comment

birdayz Nov 21, 2024

Choose a reason for hiding this comment

chrisseto Nov 21, 2024

Choose a reason for hiding this comment

chrisseto left a comment

Choose a reason for hiding this comment

chrisseto Nov 21, 2024

Choose a reason for hiding this comment

birdayz commented Nov 21, 2024

birdayz commented Nov 20, 2024 •

edited

Loading