Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

rebuild NWIS, WQP, Ecosheds data #52

Merged
merged 17 commits into from
Apr 25, 2022
Merged

rebuild NWIS, WQP, Ecosheds data #52

merged 17 commits into from
Apr 25, 2022

Conversation

limnoliver
Copy link
Member

@limnoliver limnoliver commented Mar 29, 2022

This PR repulls NWIS and WQP temperature data for the nation.

For data updates:

38094d6 - For WQP data, note there was a gain in sites (17k) but a loss in the number of records (792k). I compared the old data file to the new data file to try to explain some of this. Note, the pull includes sub-daily data, and there was a gain in the number of unique site-days (311k). On the surface this is worrisome, but after parsing the differences, I'm not worried about

  • There were 1678 sites in the old pull that are not in the new pull. This amounts to a loss of 2.88 million records (but only 77k unique site-days).
  • There were 2451 sites that had fewer records in the new pull compared to the old pull. This amounts to a loss of 321k records (but only 8k unique site-days).
  • There were 15552 sites in the new pull that were not in the old pull. This amounts to a gain of 565k new records and 131k new site-dates.
  • There were 24005 sites that had a gain in the number of records from the old pull to the new pull (gain of 1.78 mil records and 265k site-dates).
  • The majority of sites (341238) had no change in the number of records (but +46 site-dates).

Of the missing sites, only one of those was in in the inventory/partition files (e.g., was returned from whatWQPsites). I spot checked a few of the sites missing with the most data by looking at the WQP homepage. Some appeared to have MonitoringLocationIdentifier changes (e.g., BTMUA-INTAKE became BTMUA-Intake) while others seem to no longer exist (KENAI_WQX-10000117). It doesn't seem like we systematically missed them in our pull (e.g., a bad partition pull or whatWQPsites changes).

b0017c9 - expected increases in sites and data from NWIS.

For moderate code changes:

42086ab - retaining all temperature columns that are returned from NWIS instead of a priori selecting one.

909988d - pulls HUC22 which includes US territories, ef1784d modifies some spatial filters to ensure those site are retained.

682466f - use .qs files instead of .rds files for temporary files. Forgot to do this with the WQP pull -- will do this next time!

9bd0aa8 - modified the task tables for WQP pulls to include the pull date so new pulls would be triggered.

…d task table to include pull date so pull date retriggers pull of each year chunk. Closes USGS-R#50
…h fewer sites and obs than the inventory suggested there were (674k sites and 6.2 mil records)
…etain all temperature data and report any location info we can from the column name.
@limnoliver limnoliver changed the title rebuild NWIS and WQP data rebuild NWIS, WQP, Ecosheds data Apr 20, 2022
@limnoliver
Copy link
Member Author

limnoliver commented Apr 20, 2022

dcd9a64 - This now includes an Ecosheds rebuild after receiving a new snapshot of Ecosheds from Jeff Walker. This produced increases in both records and sites as expected.

Also note, NorWest is a static database, so no updates to those data.

@padilla410 padilla410 self-requested a review April 20, 2022 20:33
@padilla410 padilla410 marked this pull request as ready for review April 20, 2022 21:06
@padilla410
Copy link
Collaborator

padilla410 commented Apr 20, 2022

Looks good to me, Sam. Only one question on retaining some commented-out code.

The use of qs and the updates required to get the territories are similar to the work on the national-flow-repo (minus the spatial work).

Side note, I converted this PR from draft to review.

Feel free to merge.

select(-agency_cd, -count_nu)

if (!all(names(fixed_dups) %in%
# fixed_dups <- dat_long %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to hold on to this commented-out code. Do you foresee going back on this change in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this. I think I just did this during development, but will delete now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, keeping in commented code so targets stay "clean".

@padilla410
Copy link
Collaborator

Looks good. Without running it, it looks like it does what you're saying.

@limnoliver limnoliver merged commit d1d11f9 into USGS-R:main Apr 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants