-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace deprecated pykube(-ng) library with official Kubernetes python library #3093
base: master
Are you sure you want to change the base?
Replace deprecated pykube(-ng) library with official Kubernetes python library #3093
Conversation
auth_method = luigi.Parameter( | ||
default="kubeconfig", | ||
description="Authorization method to access the cluster") | ||
kubeconfig_path = luigi.Parameter( | ||
default="~/.kube/config", | ||
description="Path to kubeconfig file for cluster authentication") | ||
max_retrials = luigi.IntParameter( | ||
default=0, | ||
description="Max retrials in event of job failure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note for reviewers. These things are safe to remove, and no longer need to live in code, as the upstream Kubernetes library supports handling them automatically via Environment Variables. The max_retrials also (unless I misunderstand) is completely unnecessary as it duplicates functionality with backoff_limit
. Arguably, one config option here I could keep is auth_method so an end-user can "force" which auth method to use. But, the Kubernetes library should be able to automatically detect and cascade into the "mode" it needs to function.
luigi/contrib/kubernetes.py
Outdated
try: | ||
kubernetes_api.config.load_incluster_config() | ||
except Exception as e: | ||
try: | ||
kubernetes_api.config.load_kube_config() | ||
except Exception as ex: | ||
raise ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: This is the "new" cascading automated configuration handling. If we're in Kubernetes, it will automatically detect this and use in-cluster configuration and API endpoints, if that fails, it will attempt to use the kubeconfig file and env vars that help support it. This logic could (arguably) be allowed to be forced one way or the other via a config variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few comments; once they're addressed, i'll approve
Note: I am merging and fixing up the MR #3089 in this MR. Picture of proof of work, added tons of debug logging as well. MR will be updated shortly I believe addressing all issues... |
@dav009 I merged(ish) your branch into mine logicall, but added code to detect if a scale up failed (aka, it won't fit even with scale up) which the previous code would infinite loop because it wasn't looking into the Kubernetes event stream. Pic of proof... |
@dlstadther Please have another look and review code, but please don't merge yet, I've decided I need to add some further tests because it's really abysmal to not really have any decent ones here to ensure things work as they should, there's so many different kinds of failure conditions which are trying to be detected here and I think we should test them. I'm not fantastic with Pytest so it might be a bit slow, but I'm working on adding tests which I think are critical before merging this. Only able to work on this in free time, so if anyone else can/wants to help me with tests, please do. I've left some of my hacking at tests commented out. |
And @dav009 please review the code here and make sure I encapsulated what you were trying to do with your MR. Overall, this MR will have a lot better QoL support for Kubernetes. There's still tons more things I think could/should be done in this library but mostly pedantic or improvements and could be in future merges instead. I'm still working on redoing some of the tests and will sorta include your tests once I fix them. Things I'd love to see in the future are for example on API failure or timeout or general internet jitter/brief unavailability some automated retry and/or exponential backoff mechanism. |
e991674
to
d340099
Compare
@AndrewFarley amazing job, thank you. I was also quite frustrated with how outdated this is. I will try to review this soonish. 😅 I made an internal library for my organisation that addresses these (old pykube) and also take into account other issues (logging). I am testing it at the moment. We were planning on releasing it at some point. |
@AndrewFarley In my team we ended up implementing this kubernetes contrib update as an independent package: |
Ping. Folks, I'm not using Luigi, but this MR still really needs to happen. If anyone has time to rebase and take over this PR, please let me know I'm happy to transfer my fork to you or add you permissions. I know a few folks are using my fork already because it's necessary if you want Kubernetes support, so perhaps one of you can take the reins? |
Hi, why was this not reviewed yet? I tested it locally and it seems to work great.. |
Hah, I wrote this 2 years ago and still no one merged it. Is this project dead? I'm really glad this client of mine I recommended to not choose Luigi. Eesh. |
d340099
to
a1ef345
Compare
I just rebased and re-pushed and validated this works on Kubernetes still. Hopefully, in the hopes and dreams of fellow engineers, someone can see it in their heart to merge this. Or, just don't use Kubernetes. :P Up to you |
Apologies for this not being reviewed. As far "is this project dead?" - there is not consistent maintainer involvement (myself included). The project is still owned by Spotify, but not actively developed (to my knowledge, mostly due to their articles related to their intent to replace Luigi with Flyte (note, this is just my personal assumption)). While I have permissions to merge PRs and push to PyPi, I have not used Luigi for development since changing employers a few years ago. I wish to see this project continue, but without an active community and additional maintainer participation, there's only so much I can do. I sympathize with your disappointment and I'm here to help as I reasonably can in my spare time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of review of this PR, I'm open to merging this code. However, 1 item prevents me and 1 item deters me:
- flake8 tests are failing
- a bunch of the unittests are commented out with TODO notes
I am willing to let point # 2 to slide so that master
has a functional k8s module. But # 1 will need to be fixed before I can click the merge button.
@dlstadther I've just fixed the code styling to satisfy that need. I indeed also don't use Luigi and more, and I don't really have tons of time to write/update/etc the tests to validate that this works. I can confirm it did work for me once upon a time, and others above have recently checked its (basic) functionality. I can/should leave the TODOs in the test code commented out, in hopes that someone else can take up the reigns and add the tests. At least the flake tests pass now - https://github.com/spotify/luigi/actions/runs/4034116089/jobs/6935056885 But it appears there are some other errors now? I don't have much time this weekend, but I will try to review this further sometime soon. |
@AndrewFarley , thank you for the quick turnaround to address the failing flake8 tests. I'm good with getting this included once test results allow me to merge. I hope whomever will come use this updated k8s support within luigi will contribute back any improvements and test coverage items. I just checked the current failures and they're all 1 failure:
So looks to me like once a kube config is mocked (or however it is expected) for tests, that we'd be good to go for now. |
Well, in order for the Kubernetes stuff to work, it needs to initialize the library and config somehow. I will need to have a think and a google around of some creative way to workaround this for testing only which won't have K8s support. |
Description
The pykube and pykube-ng libraries that Luigi depends on are flawed and don't seem to support more recent versions of Kubernetes. Both libraries appear to be abandonware, so we should pivot to the official python library for Kubernetes. This pivot will simplify some of the handlings of this library, as certain configuration values are already supported and configurable via environment variables, as is the widely accepted standard for Kubectl and its supported tools and toolchain. This also adds the following...
Motivation and Context
Luigi wasn't working for me on a number of my clusters, and after further investigation this was largely due to the outdated and deprecated libraries in-use. I've found numerous other projects which have pivoted over to the official library with a lot of success and used those largely as examples for how to do this.
I've added extra comments and documentation in all places that make sense for others to be able to more-easily understand how/what to do to use and adopt Kubernetes support for Luigi.
Have you tested this? If so, how?
I've updated the unit tests and run a few test pipelines successfully with all features.
Items to complete