Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content on dependency management #38

Open
nehamoopen opened this issue Jul 25, 2023 · 1 comment
Open

content on dependency management #38

nehamoopen opened this issue Jul 25, 2023 · 1 comment

Comments

@nehamoopen
Copy link
Collaborator

nehamoopen commented Jul 25, 2023

I think the section on dependency management could improve with some (re)exploration of the topic and (potential) reorganization of the content.

It would be nice to present options along the reproducibility spectrum from easy (noting dependencies in a README file) to advanced solutions (containerization). During the workshop, we dive into the easy and middle-ground solutions. We should also be prepared to explain the differences between these options better, like what is the difference between renv environments and Docker containers.

I'm going to organize the ideas per programming language for now:

R

  • easy: Suggest the annotater package to annotate package load calls.
  • easy: Use sessionInfo() to print version information about R, the OS and attached or loaded packages + automate writing the output of sessionInfo() into the README or another file.
  • middle-ground: Look into the groundhog package. This is an interesting solution but it has some caveats, see: https://www.brodrigues.co/blog/2023-01-12-repro_r/
  • advanced: Really figure out how renv works, including common issues. Some things to consider: should we ask participants to initialize renv at the beginning of the workshop already when they reorganize their project + how do you ensure that renv only records the project libraries in the lockfile and not the system/global libraries (this happens now and then during the workshop).

Python

I'm not aware of easy and middle-ground solutions in Python that are comparable to the ones listed for R above. It might be nice to do some research into it but I don't think they're necessary to include in the workshop if they are not standard/best practices.

  • advanced: Should we look into virtual environments for Python? venv is apparently the standard library for Python, I think there is also pipenv and conda environments if you use those package managers. Similar to renv - should we ask participants to initialize these environments at the beginning of the workshop already.
  • advanced: Look into the generation of requirements.txt and enviroment.yml again. Similar to renv - how do you ensure that only get project libraries noted and not the system/global libraries. We also assume that everyone uses either pip or conda - is that correct?

Other

  • Binder could be a demo or optional?
  • curate any MATLAB resources on the topic that participants have shared
  • provide links to Docker tutorials
@StefanoRapisarda
Copy link

In my opinion (for python) conda is the most straightforward of the three (not necessarily the best). We can briefly mention all the possible options with their pros and cons. I don't know about these differences, but maybe there can be something relevant depending on the project and how you want to share it in the future (like making a package). Better check with the SE about this? Maybe they already know.

About python advanced, once initialised the virtual environment, the generation of requirements and environment files should be pretty straightforward and referring only to the local environment.

As mentioned, we can discuss the level of deepness we want to go to in terms reproducibility:

  • dependencies and README files;
  • virtual environments;
  • "dockerization";
  • creation of official packages (this is last in the list because usually requires tests, so an additional topic to the course);

If we want to do both R and Python in a single shot, then we can stop at virtual environments (as it was done in the RepCo version I followed), otherwise, if doing only R for example, something can be said about how to create a proper R package (or at least about the basic settings, so to be open to the possibility of creating a package in the future).

About what to make them do at the beginning, if we go for virtual environment I would say to make them work on a dummy project from scratch (creation of VR -> creation of basic scripts -> generation of basic documentation and requirements -> publication on GitHub). If they try to put their already existing project in a virtual environment, they could face several problems. If it is only README and requirements compiled "by hand" (and other best practices), then they could totally work with their own projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants