-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Healer operation #40
base: main
Are you sure you want to change the base?
Conversation
39765f5
to
e70074f
Compare
Yes. The |
Thanks @nixpanic, a few minutes later I had realized that I need to add the proto file too, so pushed the proto file too, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to add a new capability in https://github.com/csi-addons/spec/tree/main/identity#getcapabilities too.
// active (staged/published) volume. | ||
service HealerNode { | ||
// NodeHealer is a procedure that gets called on the CSI NodePlugin. | ||
rpc NodeHealer (NodeHealerRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this HealVolume
?
I would expect a way to check if a volume needs healing. The current NodeHealer
call seems to do that, but also is expected to do the healing?
It would probably be useful to have a check, and only perform healing if the check returns that it is needed. This then can provide better feedback and tracking of actions that were performed. It might even be useful to fallback to some other recovery/healing mechanism in case volume-healing failed (extension for HealNode
which might do a reboot or inform medik8s or similar?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic
NodeHealer() is called on CSI NodePlugin, so it's up to SP to implement/perform healing or not, right? if so why do we need a flag/check?
I'm probably missing something? Is this flag set as part of the request or response?
- If it is part of request then who will be setting it because we don't have any CRDs here.
- If it is part of the response, may be based on the response CSI addon might want to do a reboot or similar?
Could you please make it more clear?
Also cannot we detect healing failed from the message string which is part of the response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider two procedures:
GetHealth()
returning the state of the volume, possibly with a selection of recovery options (VolumeHealing
orNodeHealing
required)HealVolume()
doing the actual healing, if it can heal the volume at all- the CSI-Addons implementation for the CO can then implement a
HealNode
function (with Kubernetes medik8s might be an option)
This makes it easier to follow the actions, and improve the reporting to users (Kubernetes events maybe?).
CRDs are Kubernetes specific, so such definitions should go in the kubernetes-csi-addons repository instead. There probably should be a way for the CO to configure the checking interval, maybe even per volume. When using Kubernetes, an annotation on the PVC might be suitable for that (depending on the number of options).
Not only the |
Healer Operation is used both for healing the volume and checking for health of the volume, it is upto the CSI driver whether to implement the healer mechanism or just live with health checks. Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Change to the build scripts Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Fixes: csi-addons#20 Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Healer Operation is used both for healing the volume and checking for
health of the volume, it is up to the CSI driver whether to implement the
healer mechanism or just live with health checks.
Fixes: #20
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
NOTE: I will add the proto auto gen files once the review is finalized