Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add volume group replication controller code #610

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nikhil-Ladha
Copy link
Contributor

@Nikhil-Ladha Nikhil-Ladha commented Jul 8, 2024

This PR adds the following codes/logic:

  • Added volume group replication controller logic
  • Added generated crds, rbacs
  • Added docs to create VGR, VGRClass and VGRContent CRs

@Nikhil-Ladha
Copy link
Contributor Author

Nikhil-Ladha commented Jul 8, 2024

The changes are not yet tested, will wait for #605 to get merged and then update a few things and test it.
In the meantime, please feel free to provide some initial reviews.
Thanks :)

internal/proto/volumegroup.proto Outdated Show resolved Hide resolved
internal/proto/volumegroup.proto Outdated Show resolved Hide resolved
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to do few more rounds of review, just done it for few files

controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/utils.go Outdated Show resolved Hide resolved
// Check if the PVC has any labels defined
objLabels := obj.GetLabels()
if len(objLabels) == 0 {
return []reconcile.Request{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be a case where someone removed the label from the PVC, we need to remove that from the VGR as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from the docs, the enqueue function would be ran for both the old,new objects in case of update function. So, when the request is ran for older object, the details from the pvc annotation will be fetched and the VGR reconcile request will be enqueued, and since the object is already updated in the cluster the selector won't inlcude that PVC.
Though this is all theoritical, will test it to confirm.

Comment on lines 445 to 473
vgrObjsList := &replicationv1alpha1.VolumeGroupReplicationList{}
logger := log.FromContext(context)
err := r.Client.List(context, vgrObjsList)
if err != nil {
logger.Error(err, "failed to list VolumeGroupReplication instances")
return []reconcile.Request{}
}

// Check if the pvc labels match any VGRs based on selectors present in it's annotation
for _, vgr := range vgrObjsList.Items {
if vgr.Annotations != nil && vgr.Annotations["pvcSelector"] != "" {
labelSelector, err := labels.Parse(vgr.Annotations["pvcSelector"])
if err != nil {
logger.Error(err, "failed to parse selector from VolumeGroupReplication's annotation")
return []reconcile.Request{}
}
objLabelsSet := labels.Set(objLabels)
if labelSelector.Matches(objLabelsSet) {
return []reconcile.Request{
{
NamespacedName: types.NamespacedName{
Namespace: vgr.Namespace,
Name: vgr.Name,
},
},
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Nikhil-Ladha Nikhil-Ladha Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I will just update the code to fetch the VGR name from the annotation, since now we are adding the owner annotation to PVC.
EDIT: Ok, can't exactly do that. Since, new PVCs won't be considered then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is we can't use a helper function like the one you mentioned is because the selector is not a single field that we can compare or so, and to match labels we need to create a selector requirement as done in the getPVCFromDataSource function.

@mergify mergify bot added the api Change to the API, requires extra care label Jul 12, 2024
@mergify mergify bot requested a review from ShyamsundarR July 12, 2024 12:02
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 7 times, most recently from 0121241 to 8e8830d Compare July 18, 2024 11:38
@mergify mergify bot added the vendor Pull requests that update vendored dependencies label Jul 18, 2024
Copy link
Collaborator

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add documentation with an example in the docs/ directory? That makes it easier to visualize how things are supposed to work.

@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 2 times, most recently from c208c80 to cb4b33d Compare August 4, 2024 07:47
@nixpanic nixpanic dismissed their stale review August 5, 2024 09:49

Nothing blocking, only test results missing.


"github.com/csi-addons/kubernetes-csi-addons/internal/controller/replication.storage/replication"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorganize imports, internal pkgs go last

- Then, create the VR CR and add VGR name/namespace as the annotation to it
- Update the VGR status with the VR status.

In case of deletion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle deletion case of object first.
maybe define an instance object like https://github.com/csi-addons/kubernetes-csi-addons/blob/main/internal/controller/replication.storage/volumereplication_controller.go#L616
and then write helper functions with it ?

The reconcile function is >250 lines.

Copy link
Contributor Author

@Nikhil-Ladha Nikhil-Ladha Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle deletion case of object first.

I feel, if we handle deletion first then it becomes a bit arbritary while reading the code to understand what exactly is happening and why. Having it in the end tells the reader that these all operations happened when the CR was created and that's why all these cleanups are needed.

maybe define an instance object like https://github.com/csi-addons/kubernetes-csi-addons/blob/main/internal/controller/replication.storage/volumereplication_controller.go#L616
and then write helper functions with it ?

I don't see any use case of defining such an instance object here, the only helper function here is the getPVCFromDataSource and it doesn't makes sense to add this extra type for a single function.

The reconcile function is >250 lines.

It's a bit long I agree, but are there any rules defined for the lenght of the reconcile function?
Secondly, the operations that are happening in the reconcile function are very basic and can't be broken into individual helper functions. Those, that can be broken are already done. If you have any suggestions for it, I would be happy to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to move the creation of CRs into their individual functions.
Now it's < 250 :)

@Nikhil-Ladha
Copy link
Contributor Author

Looks very promising to me, thanks!

Can you include some testing results in a PR-comment?

Here's the result of creating the VGR:

apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeGroupReplication
  metadata:
    annotations:
      pvcSelector: group=replication
      replication.storage.openshift.io/volume-replication-name: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
      replication.storage.openshift.io/volumegroupreplication-content-name: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    creationTimestamp: "2024-08-08T05:53:25Z"
    finalizers:
    - replication.storage.openshift.io/vgr-protection
    generation: 3
    labels:
      app.kubernetes.io/created-by: kubernetes-csi-addons
      app.kubernetes.io/instance: volumegroupreplication-sample
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: volumegroupreplication
      app.kubernetes.io/part-of: kubernetes-csi-addons
    name: volumegroupreplication-sample
    namespace: rook-ceph
    resourceVersion: "193975"
    uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
  spec:
    autoResync: false
    replicationState: primary
    source:
      selector:
        matchLabels:
          group: replication
    volumeGroupReplicationClassName: volumegroupreplicationclass-sample
    volumeGroupReplicationContentName: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    volumeReplicationClassName: rbd-volumereplicationclass
    volumeReplicationName: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
  status:
    conditions:
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: FailedToPromote
      status: "False"
      type: Completed
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: Error
      status: "True"
      type: Degraded
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: NotResyncing
      status: "False"
      type: Resyncing
    message: 'volume 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
      not found: Failed as image not found (internal RBD image not found)'
    observedGeneration: 1
    persistentVolumeClaimsRefList:
    - name: test-pvc
    state: Unknown

The CR is created and the dependend CRs i.e, VR and VGRContent created as well

apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeGroupReplicationContent
  metadata:
    annotations:
      replication.storage.openshift.io/volumegroupref: volumegroupreplication-sample/rook-ceph
    creationTimestamp: "2024-08-08T05:53:25Z"
    finalizers:
    - replication.storage.openshift.io/vgr-protection
    generation: 2
    name: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    resourceVersion: "193969"
    uid: 10c4a992-107b-49ad-aa77-6bc989f42184
  spec:
    provisioner: rook-ceph.rbd.csi.ceph.com
    source:
      volumeHandles:
      - 0001-0009-rook-ceph-0000000000000002-6526bb74-044a-489f-a821-7d8c042719b4
    volumeGroupReplicationClassName: volumegroupreplicationclass-sample
    volumeGroupReplicationHandle: 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
    volumeGroupReplicationRef:
      apiVersion: replication.storage.openshift.io/v1alpha1
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
      namespace: rook-ceph
      uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
  status:
    persistentVolumeRefList:
    - name: pvc-7228cbb8-7a2f-4daf-bea4-4dabbd2ecabe
apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeReplication
  metadata:
    annotations:
      replication.storage.openshift.io/volumegroupref: volumegroupreplication-sample/rook-ceph
    creationTimestamp: "2024-08-08T05:53:26Z"
    finalizers:
    - replication.storage.openshift.io
    generation: 1
    name: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
    namespace: rook-ceph
    ownerReferences:
    - apiVersion: replication.storage.openshift.io/v1alpha1
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
      uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
    resourceVersion: "193974"
    uid: 2f1f45bf-417b-41ad-9973-71741dda555d
  spec:
    autoResync: false
    dataSource:
      apiGroup: replication.storage.openshift.io
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
    replicationHandle: ""
    replicationState: primary
    volumeReplicationClass: rbd-volumereplicationclass
  status:
    conditions:
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: FailedToPromote
      status: "False"
      type: Completed
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: Error
      status: "True"
      type: Degraded
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: NotResyncing
      status: "False"
      type: Resyncing
    message: 'volume 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
      not found: Failed as image not found (internal RBD image not found)'
    observedGeneration: 1
    state: Unknown

@nixpanic
Copy link
Collaborator

There are a few minor conflicts that need to be resolved.

Is there a way to set the status.conditions[*].message instead, or in addition to the single (last?) status.message?

@Nikhil-Ladha
Copy link
Contributor Author

Nikhil-Ladha commented Aug 27, 2024

There are a few minor conflicts that need to be resolved.

Resolved the conflicts.

Is there a way to set the status.conditions[*].message instead, or in addition to the single (last?) status.message?

Well, we are just copying the status from VolumeReplication CR, so I am not sure if there is any way we can change that other than updating the VolumeReplication CR :/

added controller logic for volume group replication

Signed-off-by: Nikhil-Ladha <[email protected]>
add generated changes for crds, rbacs

Signed-off-by: Nikhil-Ladha <[email protected]>
add docs for volumegroupreplication and volumegroupreplicationclass
CRs.

Signed-off-by: Nikhil-Ladha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Change to the API, requires extra care vendor Pull requests that update vendored dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants