Skip to content

Commit

Permalink
Avoid bogus errors during deletion
Browse files Browse the repository at this point in the history
When deleting the DRPC we may need to adopt the VRG, delete the
secondary VRG, wait until the secondary VRG is deleted, delete the
primary VRG, and wait until the primary VRG is deleted. This takes 60-90
seconds and many reconciles (18 seen in e2e test), and creates huge
amount of noise in the log.

Suppress the noise using util.OperationInProgress error. When the
reconcile is successful but it is still in progress, we return a
util.OperationInProgress error describing the current progression. The
top level error handler logs an INFO message and requeue the request.

With this change we will see multiple logs for the secondary VRG:

    INFO    Deleting DRPC in progress {"reason", "secondary VRG deletion in progress"}
    ...

And finally more logs for the primary VRG:

    INFO    Deleting DRPC in progress {"reason", "primary VRG deletion in progress"}
    ...

Notes:

- We logged errors during finalizeDRPC twice; once as INFO log, and once
  as ERROR with a stacktrace when we return error from the reconcile.
  Remove the duplicate INFO log.

- The linter is not happy about the new nested if. We can avoid this by
  extracting a helper to handle finalize errors, but I want to keep the
  change minimal for easy backport. We can improve this later upstream.

Signed-off-by: Nir Soffer <[email protected]>
  • Loading branch information
nirs committed Dec 2, 2024
1 parent 7c66c06 commit 299ec4f
Showing 1 changed file with 15 additions and 6 deletions.
21 changes: 15 additions & 6 deletions internal/controller/drplacementcontrol_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (

rmn "github.com/ramendr/ramen/api/v1alpha1"
argocdv1alpha1hack "github.com/ramendr/ramen/internal/controller/argocd"
"github.com/ramendr/ramen/internal/controller/util"

Check failure on line 32 in internal/controller/drplacementcontrol_controller.go

View workflow job for this annotation

GitHub Actions / Golangci Lint (.)

ST1019: package "github.com/ramendr/ramen/internal/controller/util" is being imported more than once (stylecheck)
rmnutil "github.com/ramendr/ramen/internal/controller/util"

Check failure on line 33 in internal/controller/drplacementcontrol_controller.go

View workflow job for this annotation

GitHub Actions / Golangci Lint (.)

duplicated-imports: Package "github.com/ramendr/ramen/internal/controller/util" already imported (revive)
"github.com/ramendr/ramen/internal/controller/volsync"
clrapiv1beta1 "open-cluster-management.io/api/cluster/v1beta1"
Expand Down Expand Up @@ -119,7 +120,7 @@ func (r *DRPlacementControlReconciler) SetupWithManager(mgr ctrl.Manager) error
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/reconcile
//
//nolint:funlen,gocognit,gocyclo,cyclop
//nolint:funlen,gocognit,gocyclo,cyclop,nestif
func (r *DRPlacementControlReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := r.Log.WithValues("DRPC", req.NamespacedName, "rid", uuid.New())

Expand Down Expand Up @@ -166,13 +167,21 @@ func (r *DRPlacementControlReconciler) Reconcile(ctx context.Context, req ctrl.R
// then the DRPC should be deleted as well. The least we should do here is to clean up DPRC.
err := r.processDeletion(ctx, drpc, placementObj, logger)
if err != nil {
logger.Info(fmt.Sprintf("Error in deleting DRPC: (%v)", err))

statusErr := r.setDeletionStatusAndUpdate(ctx, drpc)
if statusErr != nil {
err = fmt.Errorf("drpc deletion failed: %w and status update failed: %w", err, statusErr)

return ctrl.Result{}, err
}

// Is this an expected condition?
if errorswrapper.Is(err, util.OperationInProgress("")) {
logger.Info("Deleting DRPC in progress", "reason", err)

return ctrl.Result{Requeue: true}, nil
}

// Unexpected error.
return ctrl.Result{}, err
}

Expand Down Expand Up @@ -736,7 +745,7 @@ func (r *DRPlacementControlReconciler) cleanupVRGs(
}

if len(vrgs) != 0 {
return fmt.Errorf("waiting for VRGs count to go to zero")
return util.OperationInProgress("waiting for VRGs count to go to zero")
}

// delete MCVs
Expand All @@ -761,7 +770,7 @@ func (r *DRPlacementControlReconciler) ensureVRGsDeleted(
for cluster, vrg := range vrgs {
if vrg.Spec.ReplicationState == replicationState {
if !ensureVRGsManagedByDRPC(r.Log, mwu, vrgs, drpc, vrgNamespace) {
return fmt.Errorf("%s VRG adoption in progress", replicationState)
return util.OperationInProgress(fmt.Sprintf("%s VRG adoption in progress", replicationState))
}

if err := mwu.DeleteManifestWork(mwu.BuildManifestWorkName(rmnutil.MWTypeVRG), cluster); err != nil {
Expand All @@ -773,7 +782,7 @@ func (r *DRPlacementControlReconciler) ensureVRGsDeleted(
}

if inProgress {
return fmt.Errorf("%s VRG manifestwork deletion in progress", replicationState)
return util.OperationInProgress(fmt.Sprintf("%s VRG manifestwork deletion in progress", replicationState))
}

return nil
Expand Down

0 comments on commit 299ec4f

Please sign in to comment.