Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

Make GF download reproducible #47

Open
aappling-usgs opened this issue Jun 16, 2020 · 0 comments
Open

Make GF download reproducible #47

aappling-usgs opened this issue Jun 16, 2020 · 0 comments

Comments

@aappling-usgs
Copy link
Member

aappling-usgs commented Jun 16, 2020

Right now we have an .ind file and build/status file to represent the download of the geospatial fabric, and the corresponding command creates the file but doesn't push it to Drive. That's sorta weird and probably creates some fragilities that I haven't quite pinned down.

We're also referring to the file in later targets with I('1_network/in/GeospatialFabric_National.gdb') rather than via the .ind file, so if the .gdb contents change, downstream targets won't get rebuild. That's definitely fragile.

This file is tricky because it's a huge download, so we'd prefer for not everyone to need to download it.

I proposed a handful of solutions in a Teams thread with Hayley and Sam today, but now I think they're all wrong. At the moment I think the solution might be to

  • convert the current target into a getter in getters.yml that produces a summary file (.yml extension)
  • create an .ind target in 1_network.yml that builds the getter target?
  • make downstream targets depend on the .ind target and call sc_retrieve.

Scenarios:

  • Nobody has ever downloaded the file: neither target is built; .ind target gets requested, which builds getter target and then creates .ind file; ind file gets git committed.
  • Someobdy else has downloaded the file but you haven't: you get the .ind file, so you only rebuild downstream targets if there's a need. If you do rebuild downstream targets, those builds call sc_retrieve to build the getter...but this approach doesn't rebuild the .ind file, so there's potential for a mismatch between the .ind and an updated GF file. And isn't there potential for a double build somewhere in here, too? So I still don't have it right...

Why is this so hard for me today? Don't we handle big input files all the time?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant