-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share docker volume name in server state #384
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/14 |
From my first inspection it seems rather straight-forward: The volume name is stored in its dict - either as a string or as a dictionairy. It only needs to be stored into the state dictionary in the get_state method. Well, that I believed until I gave it a second glance. Now the details start - first the intention of sharing this is to identify the docker volumes that only contain the user's data and therefore can be deleted after a certain amount of time (see top). Hence, I am only interested in the docker volumes that contain the username to avoid deleting docker volumes shared between several users. Furthermore, to avoid malicious attacks, the check should be quite strict to avoid loopholes, (e.g. a user is called In case things are implemented as described above, are there any naming conventions to consider regarding the state dictionairy / JSON object key? What should it be? Because the previous description of reverse-engineering sounds slightly dangerous to me, is there a chance of keeping track of docker volumes in another way? Maybe even just some kind of flat file representation such as JSON? |
One suggestion I have is that when the docker volume template is filled, the answer could be more than just the prepared string. Instead, the answer could be a dictionairy that includes a flag for each template variable that has been used. If user-specific template variables have been used, that means the docker volume is user-specific as well. That would require to make adjustments to the volumenamingstrategy. Would that be acceptable for everybody? |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16 |
I wonder if docker volume labels would be a better fit for this information? Rather than passing docker-specific information (e.g. volume names) through jupyterhub to a culler that also needs custom docker logic to understand the information, maybe it makes sense to query docker volumes by label directly and not rely on jupyterhub for passing the information along? To implement the deletion, you need an understanding of finding and deleting docker volumes anyway and how to identify which ones should be deleted. It seems like it would be simpler to do that directly via docker rather than pass that info all the way through jupyterhub. For example (cli, not Python, but should be doable either way):
What do you think? |
Well, for the general approach the only issue here is that I need a way how to get the information whether the user was inactive. The logic which is currently kept in the idle-culler service is required here. I have no beautiful solution how that and your proposed commands can go hand in hand. For my specific usecase, however, your approach is very useful. But instead of CLI, I would need the label as a custom setting in https://github.com/jupyterhub/dockerspawner/blob/master/dockerspawner/dockerspawner.py . But I guess that docker volume labels should be a suitable extension of the dockerspawner functionality? |
Another lead to follow might be not to use the idle-culler service but instead the culling logic that is incorporated into the JupyterHub itself, which you also highlighted in this forum post. However, neither |
The volume template changes further suggests we should have perhaps been storing the volume(s) somehow in state, so that loading jupyterhub with an updated configuration could use the new scheme only for servers that didn't already have a volume mounted. It seems pretty hard, though, to reconcile:
It's an unrelated use case, but another indicator that we should have been persisting it in state. |
Proposed change
Add the user's docker volume name to the server state. This would allow removing them in an automated fashion through the API by the
jupyterhub-idle-culler
service as described in this GitHub issue.Alternative options
I fork this repository and maintain my personal copy that does exactly that. Or I do some monkey-patching similar to this JupyterHub config.
Who would use this feature?
Everyone who wants to automatically tidy up docker volumes of inactive users measured by the user's activity at the JupyterHub.
The text was updated successfully, but these errors were encountered: