-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAATD dataset for use case #19
Comments
the RAATD data is semi-famous in my circles now, I know how much work has gone into straightening that data out and I would love to implement that example in this format! |
it's possible that there's some overlap with this data example I've solicited from @ianjonsen as well: https://github.com/ianjonsen/tdwg_imos/wiki/Argos-satellite-tracking-of-southern-elephant-seals but I know the full RAATD dataset will have more coincident deployments on it and I think that's a very useful example. |
It might be best to go with the RAATD data, depending on how much @Antonarctica has (probably all of it Anton?). These would be a more comprehensive test case, with multiple species, deployment locations, tag types, etc... The data aren't in the public domain yet as the Scientific Data paper hasn't come out, but I expect this will happen quite soon (possibly next 1-2 months) |
Hi |
What's the best way to go about working on that? Like Movebank has, with a few examples on a spreadsheet in Github with the larger dataset sitting elsewhere? |
The original compiled datasets will reside at the Australian Antarctic Data Centre (this was diced a while ago, if making the decision now we might have gone with Zenodo). The plan is to have a full copy published through the biodiveristy.aq IPT In Darwincore Event core (feeding into OBIS/GBIF). It would be good if that could also be linked to move bank but I'm not that knowledgeable on Movesbank and how the flow would go best (also some of the data might already be in). Of course this is a good example to follow I guess We have standardised data and filtered data We have metadata on the deployment (see below) for standardised
for filtered eg.
Metadata variables |
I made a first attempt to format the RAATD data in the DarwinCore format, with an event core and occurrence extension. Before we want to push the whole dataset into this format, it might be useful for you guys to have a look at it and give some feedback on the approach taken. The most important formatting discussion are written down in the README file The data, R-script to format it and README can be found here: If possible, can someone with admin rights merge my fork of this repo with tdwg/dwc-for-biologging? |
This looks pretty good, only issue I've noticed so far is that the variable "fieldnotes" contains the Argos location quality indices. These indices are essential for Argos location quality control and other movement modelling processes and should have a more informative variable name. Will the schema allow "location quality" to be used instead of "fieldnotes"? I would worry that anything named "fieldnotes" would be the one of the first variables stripped by automated data processing workflows. Additionally, the values should simply be in the set: {3,2,1,0,"A","B","Z"} or {3,2,1,0,-1,-2,-9}, rather than "location_quality= Z", etc... |
"location quality" is not part of the Standard darwincore terms (an overview here: https://dwc.tdwg.org/terms/). A couple of options come to mind with option 2 maybe being e good compromise
Not sure how other solved this. |
Thank you for pulling this together, Maxime! I'll see if I have the right permissions to merge your fork into a demo/example subfolder here. I know that we're encouraged to keep dynamicProperties sparse if we can at all help it, but I agree with option 2, and can see the value in designating a transient variable that'd only be available in certain subclasses of biologging location data. Option 1 is inviting ourselves to repurpose location remarks as dynamicPropertiesAboutLocations, probably nobody will like us for doing that! Short of translating Argos location qualities into CoordinateUncertaintyInMeters, I don't know what else we'd do other than include something in DynamicProperties. To completely convince myself, I'm going to poke around a few other example DwC occurrence archives in GBIF/OBIS that are using Argos for location data. So far the ones i've found have not included the quality info inline and have simply alluded to the fact that they 'filtered erroneous location data' in the archive-level metadata, so that's a bar that I think we can clear with your proposed solution. |
wrt Argos location data: option 3 has some merits as there are now different "flavours" of Argos location data that could be captured in the "measurementDeterminedBy" variables: 1) locations based on CLS Argos' old Least-Squares algorithm; 2) locations based on their Kalman Filter algorithm; 3) locations based on their Kalman Filter & Smoother algorithms (users have to pay additional fees for this and it's only available in post-processing, so I'd guess it's relatively rare). "measurementMethod" could be used to identify type of location data (Argos, GPS, GLS, ...), no? In the case of "old" Least-Squares data, all you get is a "location quality" class for each observation. It is an index of Accuracy so could be capture by "measurementAccuracy". The Kalman Filtered and and Kalman Filtered & Smoothed flavours have "location quality" and error ellipse variables (Ellipse Semi-Major Axis, Ellipse Semi-Minor Axis, Ellipse Orientation). These are all important for modelling (location quality control and other applications). I'd guess I'm preaching to the choir here, but... you would never want to archive/serve Argos data that had "erroneous location data" filtered or otherwise removed. I'd think you'd want to either provide filtered (or otherwise quality-controlled) location data as a separate, derived ("modelled" in the broadest sense) version of the data, or via a flag that indicates whether a record passed or failed the quality control process(es). I'd guess the metadata would have to capture the essentials of the quality control process applied. In the case of statistical quality control processes, e.g., state-space models - this is where CoordinateUncertaintyInMeters can be used to capture the estimated location uncertainty. |
@jdpye if memory serves me right the OBIS logic would be to throw it all in extended Measurement of Fact (lat long locations quality) have a simplified track or range polygon at the event level. @ianjonsen The standardised vs filter discussion is on that comes back. Given that OBIS and GBIF mainly deal with primary observation, my feeling is that filtered data would be quite heavily processed and not really be the primary observation anymore. (also you ideally try to keep all of that close together. Im happy with option 2 as an intermediate for now @msweetlove So we can do a first push. Based on how the discussion go we can always redo the export to GBIF/OBIS. In any case with any approach for me the data on OBIS and GBIF would be a lead into discovering more detailed information which can be at movebank the AADC of another online repository. For instance for the Herring Gull data @peterdesmet used Zenodo. |
@Antonarctica yes, that makes sense - I knew I was wandering off into things beyond the primary observations |
What about using some of the location class terms for the Argos location qualities? For example, georeferenceProtocol and The latter recommends use of a controlled vocabulary, which the Argos location quality essentially is. |
The Argos LQs look to fit very neatly in those columns. We could set a good example with those. |
@peggynewman @jdpye @msweetlove |
Sounds like a great solution |
Yes, something like that, although a sanity check @peterdesmet would be appreciated. Movebank have added Argos terms to their vocabulary in NERC and it only refers to the Argos 2011 manual but doesn't link to it. They have "Argos LC" which must be the label they use. In the absence of a proper vocabulary, a link out to the manual seems like the right thing to do. |
Just throwing a comment here to see what still needs to happen :-) |
I hope nothing... it is public now https://ipt.biodiversity.aq/resource?r=scar_raatd_trackingdata after some long time calculating on @msweetlove computer and finding some small errors. I guess we'll register it next week.... |
Yeah the last open ticket we had about eventDates looks to be fixed up in that DwC-A, I think this is good to go! is this PR's branch up to date? |
And the RAATD dataset is also published in OBIS https://obis.org/dataset/48cb8624-a221-47ed-9a6d-b99b0bb394e0 |
Looks like Mirounga leonina still needs a scientificNameID, but not too many more is to dot and ts to cross, once we have the latest scriptlet and data example in the msweetlove:master branch. |
@jdpye all scientificNameIDs were collected from WoRMS in an automated loop. If the field is blank for a species, it means it had no exact match with the WoRMS database or there were multiple matches that could not be resolved automatically. I'll clean up the R-script and put it online today. |
the R-script for formatting the RAATD data is available here |
@jdpye I updated the occurrence file to add the scientificNameID of Mirounga leonina. The reason it was left blanc was due to multiple matches that could not be resolved automatically. |
Thanks Max! I suspected it was something like that. I've had to parse the AcceptedStatus of the results sometimes to arrive at the one that's approved for my species. Some other times, there are still ambiguities and I have to do as you did. I'll review this now! |
Is the updated file and workflow in the msweetlove fork's master branch? |
The updated occurrences file can be found here: https://ipt.biodiversity.aq/resource?r=scar_raatd_trackingdata. To do this step I used just two trivial lines of code, so I didn't update the script for that.
|
Hello,
@Antonarctica has a dataset from Retrospective Analysis of Antarctic Tracking Data Project which could be interesting for the use cases. The project uses a mix of sensors:
Global Location Sensors (GLS loggers or geolocators), satellite-relayed Platform Terminal Transmitters (PTTs), and Global Positioning System devices (GPS)
Thanks a lot!
The text was updated successfully, but these errors were encountered: