Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing comments to the pull request "Replace test_data_quality_at_scale.ipynb #208" #230

Closed
wants to merge 4 commits into from

Conversation

komashk
Copy link
Contributor

@komashk komashk commented Aug 16, 2024

Updated the dataset (amazon products reviews replaced with a synthetic data), added a couple of new examples

issue #207 issue #209

Description of changes:

Two updates:

For test_data_quality_at_scale.ipynb: Updated the tutorial accompanying the blog post "Testing data quality at scale with PyDeequ". The blog has been recently updated and published.
For the other ipynb tutorials (analyzers, profiles, repository, suggestions, verifications) updated S3 links, declaration of SPARK version before loading the library.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comments in the pull request #208 have been addressed.

Updated the dataset (amazon products reviews replaced with a synthetic data), added a couple of new examples
Replaced Amazon Reviews with a synthetically generated reviews dataset; added declaration of the SPARK version
Commented that Spark version needs to be indicated prior to importing pydeequ; specified that example is being shown with Spark 3.5/pydeequ 1.4.0
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@chenliu0831
Copy link
Contributor

Why there are duplicated notebooks outside and inside tutorials? let's move everything into tutorials/?

komashk added a commit to komashk/python-deequ that referenced this pull request Aug 21, 2024
@komashk
Copy link
Contributor Author

komashk commented Aug 21, 2024

@chenliu0831 Acknowledged. In a rush I added updated files and didn't move them to the tutorials folder. Fixing and opening a new PR.

@komashk komashk closed this Aug 21, 2024
chenliu0831 pushed a commit that referenced this pull request Aug 21, 2024
chenliu0831 pushed a commit that referenced this pull request Sep 6, 2024
…rials new data (#233)

* updated the notebooks to use a new synthetic data for demonstration; addressed PR comments #208 and #230

* added notebooks and python module that outline generation of the synthetic data used for 2 AWS blogs on PyDeequ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants