-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIG-309] Design: Display MTC version on per cluster basis, and warn if not in sync on plan #1096
Comments
@eriknelson It's not clear what the requirement is for the warning. Do all of the clusters need to have the same version of MTC installed? That's what's suggested by Derek in MIG-309, but not sure if that's the requirement. |
Yes, all clusters for a given migration should be running the same X.Y.Z version, otherwise we should warn. It's a little nuanced, because Clusters don't have any designation of "source" or "destination" until they've been added to a plan, and of course they can also be BOTH a source and destination simultaneously if you have more than one. So it feels like the MTC version should be a property of a registered cluster and should be displayed, possibly as another column for each of them in the cluster screen. Then as part of plan validation, we should add a warning in the event that the source, destination, and control cluster (the cluster running the controller+UI, which doesn't have to be the source or destination, but probably is one of them). |
@eriknelson This is along the lines of what I was thinking. Definitely add the version number to the clusters table and display a warning on clusters that don't match the controller cluster version. Was also thinking that if a plan will need to be "thrown out" if it's created with mis-matched versions among source, destination and controller, we should just not allow those clusters with versions that don't match the controller to be added to a plan. (I need to understand what happens to a plan if a cluster is updated to a newer version.) Finally, if a plan does get created with mis-matched versions or one of the clusters gets updated after plan creation we should tag the plan with a warning (or error?) in the plans table. Who can I ask about what happens when a cluster that's part of a plan but then gets updated? |
@eriknelson I had assumed that on the controller side we might actually want to put any remote cluster in a non-ready state if the versions don't match rather than just add a warning on the migcluster, but maybe that would break dev environments where we're not running on known/labeled releases? |
I'm not sure, on the controller side, we even need to involve the migplan in this, since it's not so much a question of comparing src and destination cluster versions to each other but comparing each remote cluster to the version of the MTC controller cluster. |
+1
+1, I think this makes sense, although I would want to still see them in the plan list, but greyed out with some kind of a warning + tooltip text that tells me I can't select it because it's not the same version as the control cluster.
It's possible the version diff is not a problem, but definitely not advised. Cluster version drift can occur after a plan is created (I think that's what you're getting at), so we probably want to alert to this immediately prior to any kind of stage/migrate/rollback action.
We can't concretely answer why a version difference may or may not be a problem, we don't know the full scope of compatibility issues that will exist in future versions (and even if we did, it's not practical for us to define the problems that could arise with every possible permutation). All that to say, a warning saying "these clusters are running different versions of MTC: X, Y. It's strongly advised you run the latest available version to prevent any compatibility issues" should be sufficient to push users to upgrade their clusters to the same version.
As I'm reading above putting remote clusters that are not running the same version as the control clutser into a non-ready state would actually resolve this, but is the more extreme choice compared to "warn not block". It doesn't seem very wise to attempt migrations with different versions of MTC on your clusters, I don't know why someone would ever wish to do that. This is not a case of "I know better" like say, a user that knows their special storage should support move. @vconzola I'd defer to you and Marco about whether or not we want to actually push the cluster into a non-ready state if it doesn't match the control cluster version (and therefore, block migrations). |
@mberube99 Question for you about clusters running different versions MTC. TL;DR of the above is that if source or target cluster is not running the same version as the controller cluster a migration is likely to fail, but we don't know for sure if it will or not or why. So, the question is should clusters that are not running the same version as the controller cluster be put into a non-ready state, meaning migration is blocked? Or should we warn, but not block in this case? |
you know me. I am a believer in "warn, do not block" philosophy. But warning description should be clear that something is wrong and should be fixed. |
So if we warn, the validation would still be a MigCluster validation, and the warning would need to go on the MigCluster. This feels like the logical place to put this -- warn the user when adding the migcluster, and if the situation changes later. The MigPlan UI could pick up the warning when listing clusters. I still think this is a separate category from "custom storage looks different, so this may not work", but as long as we document what the warning means. The thing is -- this could result in larger-than-expected failures. If the MTC versions don't match, it means the remote cluster is running with a version of velero and an operator-generated config that has never been tested with this version. There may be expected ConfigMaps missing (which could cause hard failures on the controller side), or the src/dest velero versions may be incompatible with each other or with the way the controller is creating CRs. This is not to say that we can't go with the warn instead of fail, but we'd better document that "really bad things" could happen. |
@sseago By "really bad things" you don't mean data loss, do you? If there's potential for data loss I think we should block. |
@vconzola While data loss seems unlikely, using an untested combination of velero versions with MTC on production data feels like a really bad idea. This is in contrast to our other "don't warn" scenarios which were limited to actions on particular PVs where customers might be on non-standard provisioners that we haven't tested. But even there, I suppose there's a potential for data loss if a customer is using a storage class we never tested with. |
@vconzola we're being vague because the right way to describe migrating with two differently versioned clusters is fundamentally indeterminate; we don't know if it will work, or what will happen if things go wrong. It's undefined. Data loss is absolutely within the scope of possibility. Based on that, I would argue this should be blocked. |
Attaching to: https://issues.redhat.com/browse/MIG-309 (Warn when MTC versions are out of sync)
Related: https://issues.redhat.com/browse/MIG-438 (Show MTC version in the UI)
We have an interest in displaying the MTC version in the UI, but really we need
to display the version that each cluster has installed. There are a few ways
we could approach this, either the operator or controller could write the version
to the status of a resource. The MigCluster is a candidate, and that would make
it easy for the UI to read this information.
Separately, we need a way to warn when the versions are out of sync with the
"control cluster" (the cluster running the controller).
The text was updated successfully, but these errors were encountered: