Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed in Arctos: filtered flat definitions #8341

Open
happiah-madson opened this issue Nov 26, 2024 · 9 comments
Open

Help needed in Arctos: filtered flat definitions #8341

happiah-madson opened this issue Nov 26, 2024 · 9 comments
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Data Quality Help wanted I have a question on how to use Arctos Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.

Comments

@happiah-madson
Copy link

Tell us what you are trying to do

I'm looking at the DWCMapping (https://docs.google.com/spreadsheets/d/1aCBYX9ErjicL8VdNdHbJUI0JTwWu6L4D_37gJ7IneRY/edit?pli=1&gid=0#gid=0) and realizing a lot of the Arctos values are from filtered.flat but I don't know where to look to find the definitions of what is exported in filtered.flat.

for example:
filtered_flat.coordinateuncertaintyinmeters

as far as I know, we don't enter data into a field called coordinateuncertaintyinmeters...does Arctos convert all coordinate_max_error_distance into meters and export a standard field in filtered_flat? @dustymc

What are relevant pages in Arctos

Provide a link to or a description of the page where you need help.

@happiah-madson happiah-madson added Priority-Normal (Not urgent) Normal because this needs to get done but not immediately. Aggregator issues e.g., GBIF, iDigBio, etc Data Quality labels Nov 26, 2024
@dustymc dustymc added this to the Community Forum milestone Nov 26, 2024
@dustymc
Copy link
Contributor

dustymc commented Nov 26, 2024

There's not much of a clean answer to this, nothing is simple and that's the nature of cramming a structure which is complex because it needs to be into a spreadsheet. https://github.com/ArctosDB/PG_DDL/blob/master/function/flat_components/update_flat_row.sql makes flat and https://github.com/ArctosDB/PG_DDL/blob/master/function/flat_components/update_filtered_flat.sql makes filtered_flat, and of course those both rely on ~hundreds of little pieces of complexity.

coordinate_max_error_distance

... which comes from bulkloader, a pre-complex spreadsheet-ish thing and is not what's stored.

@happiah-madson
Copy link
Author

I'm not fancy enough to see those github links. What I am hearing is that I shouldn't dig too much 😬

@mkoo
Copy link
Member

mkoo commented Nov 26, 2024

Let me try an nonfancy response with no github blob links: The definitions are basically DarwinCore terminology so I see a column for the dwc link (DwC = dwc = DarwinCore) but they are not all filled in (that can be fixed). But you can look them up right here: https://dwc.tdwg.org/terms/

So in your example-- Uncertainty in Meters = https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters
(sigh, we used to teach weeklong workshops with this!)

If we're all going to rely on that spreadsheet more and more, I'm going to make it more useful then! Let me know if this helps/ resolves your original question @happiah-madson

@mkoo
Copy link
Member

mkoo commented Nov 26, 2024

(ps. we should also talk about calculating and storing Uncertainty in Meters for all coordinate estimates... another topic though!)

@mkoo mkoo added the Help wanted I have a question on how to use Arctos label Nov 26, 2024
@happiah-madson
Copy link
Author

Yep, I totally get the DwC fields and know how to naviage the dwc/tdwg pages, my question was more about what Arctos field is being mapped as I don't ever enter data into a field called "uncertaintyInMeters" in Arctos. Instead we enter coordinate_max_error_distance and specify units, and I didn't realize that there was maybe some under the covers magic that converts everyone's inputs into meters for the export.

@happiah-madson
Copy link
Author

In sum, I think there is more fancy behind-the-scenes Arctos stuff happening than I realize and I was looking for a way to learn what is happening! I'm use to expecting only the data I put into a database to come out of it, not new (or interpreted or something) data to come out, too.

@dustymc
Copy link
Contributor

dustymc commented Nov 26, 2024

more fancy behind-the-scenes Arctos stuff happening than I realize

Probably!

calculating and storing Uncertainty in Meters for all coordinate estimates

We could, but my conversion function mostly keeps up with the demand so I haven't bothered other than with the 'winning' locality in flat. (It's the constant battle: is the cost of async processing and storage or the cost of real-time calculation easier to pay? The answer here - and often - is "yes....")

expecting only the data I put into a database to come out of it,

Nope, "in" "stored" and "out" are three VERY different states! (And flat is sorta floating between 'stored' and 'out' so even that's a bit of a simplification.)

To actually attempt to answer a piece of the question: If you have simple 1:1-ish unencumbered point-radius-based localities, then flat is mostly what you put in converted to meters. If your data are more complicated then the answer is too.

@mkoo
Copy link
Member

mkoo commented Nov 26, 2024

If your data are more complicated then the answer is too.

That's basically the answer to your question (sorry I oversimplified it!) Only a handful are "straight" values to field mappings. There are almost always some munging/ processing/ filtering before outputs. It would probably be worthwhile to have a better data dictionary for filtered_flat

@Mesibov
Copy link

Mesibov commented Nov 27, 2024

Let me try an nonfancy response with no github blob links: The definitions are basically DarwinCore terminology so I see a column for the dwc link (DwC = dwc = DarwinCore) but they are not all filled in (that can be fixed). But you can look them up right here: https://dwc.tdwg.org/terms/

So in your example-- Uncertainty in Meters = https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters (sigh, we used to teach weeklong workshops with this!)

If we're all going to rely on that spreadsheet more and more, I'm going to make it more useful then! Let me know if this helps/ resolves your original question @happiah-madson

@mkoo, those would have been the VertNet workshops from years ago? Interested to know what your current thinking is on cUIM - please email directly, [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Data Quality Help wanted I have a question on how to use Arctos Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.
Projects
None yet
Development

No branches or pull requests

4 participants