Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docker/dagger-based testing workflow, adopt PEP 517/621 #300

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

Zeitsperre
Copy link
Contributor

@Zeitsperre Zeitsperre commented Jul 13, 2023

Changes

  • Stages a docker-based testing workflow (using Dagger) for running integration testing workflows both locally and on CI systems.
    • Include a linting stage using pre-commit
  • Adds a Dockerfile to stage the testing image
  • Adds a yaml linting step and a documentation code style checking step to the pre-commit configuration.
  • Updates the name of sphinx-rtd-theme (previously sphinx_rtd_theme).
  • Updates the setup build system to use PEP 517/PEP 621 using flit.

@Zeitsperre Zeitsperre requested a review from tlvu July 17, 2023 21:09
@Zeitsperre Zeitsperre marked this pull request as ready for review July 17, 2023 21:09
@Zeitsperre Zeitsperre changed the title Add a docker/dagger-based testing workflow Add a docker/dagger-based testing workflow, adopt PEP 517/621 Jul 17, 2023
Copy link
Contributor

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I am not too familiar with Dagger and Github actions in general but I just don't see where the same pytest as our Jenkins is being run.

Also same questions as all other pipelines you've been adding lately: can we trigger this manually without having to push new commits?

Dockerfile Outdated
@@ -0,0 +1,33 @@
# vim:set ft=dockerfile:
ARG BASE_IMAGE_TAG
FROM pavics/workflow-tests:${BASE_IMAGE_TAG:-latest}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest is whenever someone push something to master, which is never tested ! The current version in production is 230601, this one should be the default. Hum, how to keep this one up-to-date without additional manual commit on each Jupyter env release ...

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
LICENSE Show resolved Hide resolved
ci/dagger-pipeline.py Outdated Show resolved Hide resolved
@Zeitsperre
Copy link
Contributor Author

@tlvu @tlogan2000

While it takes several hours to run, it does indeed run! I haven't checked why, but a handful of notebooks are currently failing (many of them are passing). I imagine that there might be some missing configurations/environment variables.

Given the egregious amount of time needed to run these, I think we might need to either simplify some tests or only run this action via workflow dispatch or on a weekly/biweekly schedule. Thoughts?

@Zeitsperre
Copy link
Contributor Author

@tlvu

Issues affecting the notebooks:

  • Docker image is missing pyogrio (eccc-geoapi-climate-stations.ipynb, eccc-geoapi-xclim)
  • Docker image is missing pyesgf (esgf-dap.ipynb)
  • Issues with ESMPy installation (ImportError: The ESMFMKFILE environment variable is not available.; regridding.ipynb)
  • Mismatch in expected return (?) (mismatch 'application/javascript'; forecasts.ipynb::Cell 2)

@huard
Copy link
Contributor

huard commented Jul 19, 2023

xESMF error may be related to pangeo-data/xESMF#246

@tlvu
Copy link
Contributor

tlvu commented Jul 21, 2023

@tlvu

Issues affecting the notebooks:

* Docker image is missing `pyogrio` (eccc-geoapi-climate-stations.ipynb, eccc-geoapi-xclim)

* Docker image is missing `pyesgf` (esgf-dap.ipynb)

* Issues with ESMPy installation (`ImportError: The ESMFMKFILE environment variable is not available.`; regridding.ipynb)

@Zeitsperre

Nothing is missing in the Jupyter docker image. As I said in comment #300 (comment) earlier, the default Jupyter image should not be latest because that one is rebuilt randomly and never tested. Please use the current tested production version 230601.

* Mismatch in expected return (?) (`mismatch 'application/javascript'`; forecasts.ipynb::Cell 2)

This one is not expected. Intermittent or 100% reproducible? Might be due to latest image again. Retry with production 230601.

@tlvu
Copy link
Contributor

tlvu commented Jul 21, 2023

@tlvu @tlogan2000

While it takes several hours to run, it does indeed run! I haven't checked why, but a handful of notebooks are currently failing (many of them are passing). I imagine that there might be some missing configurations/environment variables.

https://github.com/Ouranosinc/pavics-sdi/actions/runs/5600724938/job/15171619608

Wow =========== 53 failed, 89 passed, 15 warnings in 17512.32s (4:51:52) ===========

@Zeitsperre
All the html and embeded javascript diff error because I forgot to tell you how to ignore them in output change, sorry !: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/d2c89f6ef72e189fbba90e7370d6c1aacdd7bbdf/conftest.py#L1-L6

Once we clear those noise and use the proper Jupyter image version, we can better see what's left to fix.

Given the egregious amount of time needed to run these, I think we might need to either simplify some tests or only run this action via workflow dispatch or on a weekly/biweekly schedule. Thoughts?

Can we run daily to catch errors faster? Agreed that almost 5h runtime duration is too much to run on every PR merge. However, if not run on each PR merge, how to automatically associate any errors found to the proper PR?

@Zeitsperre
Copy link
Contributor Author

@tlvu

Wow =========== 53 failed, 89 passed, 15 warnings in 17512.32s (4:51:52) ===========

@Zeitsperre All the html and embeded javascript diff error because I forgot to tell you how to ignore them in output change, sorry !: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/d2c89f6ef72e189fbba90e7370d6c1aacdd7bbdf/conftest.py#L1-L6

Added the override in and directed pytest to use it, but it doesn't seem to do anything? Am I missing something here?

Can we run daily to catch errors faster? Agreed that almost 5h runtime duration is too much to run on every PR merge. However, if not run on each PR merge, how to automatically associate any errors found to the proper PR?

Done. I've set it to run at 4:00 AM every day (so that we see the errors bright and early a few hours later).

@tlvu
Copy link
Contributor

tlvu commented Jul 26, 2023

Wow =========== 53 failed, 89 passed, 15 warnings in 17512.32s (4:51:52) ===========
@Zeitsperre All the html and embeded javascript diff error because I forgot to tell you how to ignore them in output change, sorry !: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/d2c89f6ef72e189fbba90e7370d6c1aacdd7bbdf/conftest.py#L1-L6

Added the override in and directed pytest to use it, but it doesn't seem to do anything? Am I missing something here?

@Zeitsperre
Nothing special to add.

However I noticed your conftest.py file is located at /code/tests while all your notebooks are under /code/docs/. Not sure, but maybe they should be under the same tree? Or maybe your starting working dir should be were the conftest.py file is located?

@tlvu
Copy link
Contributor

tlvu commented Aug 15, 2023

@Zeitsperre See comment #300 (comment)

Nothing is missing in the Jupyter docker image. As I said in comment #300 (comment) earlier, the default Jupyter image should not be latest because that one is rebuilt randomly and never tested. Please use the current tested production version 230601.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants