From the GBIF Community Webinar - Seeking Use Cases - Suggestion of Events and Community connections #4640
Replies: 17 comments
-
Yes, agree we need to provide some of our use cases, especially entities and also presence/absence data - which we capture for parasites, need to develop for pathogens, environmental DNA. Also see issue raised by David Shorthouse on the need for stable identifiers and the discussion at gbif/pipelines#677 |
Beta Was this translation helpful? Give feedback.
-
Link to discussions on Discourse: |
Beta Was this translation helpful? Give feedback.
-
See https://github.com/ArctosDB/internal/issues/168 I don't think that includes entities because they are (or should be!) fairly trivial eg GBIF already handles them reasonably well. Leave an example of the parasite thing over there and I'll include it in the next iteration. |
Beta Was this translation helpful? Give feedback.
-
From the webinar it didn't sound like they were really planning anything that would include or work with entities. When you have one that occurs across multiple places in time for multiple parts and uses, but is all the same individual? I could see some of ours objects and entities spanning across several of the data models they are developing as well. An eagle caught on a camera trap, that also has a blood sample that was genetically sequenced, has an egg specimen from a nest that has digital media attached to, and is part of a taxonomic checklist from ebird that the researcher used to track occurrence of point counts at a site when doing a survey. |
Beta Was this translation helpful? Give feedback.
-
Entities can be just about anything, at least for now all we need to care about is those used for Organisms. https://dwc.tdwg.org/list/#dwc_organismID (splat-f organismID because for some crazy reason that thing eats anchors) is the concept.
I think that fits OK in the GUM. From a slightly different perspective, those places and times are derived data - they're something that some Organism's "child" records (Occurrences for now, maybe something a bit more generic going forward) have done, and the Organism itself is just a unifying identifier. I think there's some text about organisms in the docs linked from the internal issue. |
Beta Was this translation helpful? Give feedback.
-
The not-Occurrences model would completely change that landscape for Arctos. We're currently providing OccurrenceIDs, which we make up on the fly as the price of admission for using the Occurrence model. They have no stability because they're not "real." (They are persistently resolvable, however - they still lead to catalog records even when the things we use to generate them have changed.) Getting out of that model should move us towards something where our actual stable resolvable identifiers assigned to the things we actually catalog are central. |
Beta Was this translation helpful? Give feedback.
-
We should definitely pose this use case. We have the data to model as well, although it is maybe not all we wish it could be, doing this process will hopefully make it better! |
Beta Was this translation helpful? Give feedback.
-
Yes, and the MSB and MEPA communities are actively pushing for development of a pathogen model, so I will post that separately as well since we need to initiate that discussion in Github. |
Beta Was this translation helpful? Give feedback.
-
Anything I can help with here? In the hope that some comments might help... In the opening comment, my email should be [email protected]. It sounds like Arctos Entities are comparable to GUM EntityOfInterests. There is no constraint on the type of Entity that might be. The vocabulary for EntityType is wide open. I would be happy to help come up with use cases. The ones we want to develop initially should bring a new challenge that can't be met with the Darwin Core Archive star schema. However, we are already thinking of developing use cases that bring nothing new to the GUM model, but that would help a particular community understand how a particular problem is resolved with the GUM and an associated publishing model. An example of this is the treatment of the combination of an ocean trawl with the lots of fish-like things that come from that and further individual preparations within the lots that are not separated out as individual specimens with their own identifiers. The model has no problem handling that, but people might have a problem handling that with the model if it isn't demonstrated. The same criterion could be applied to potential new use cases. BTW, our current highest priority for development is enshrined in the Arctos down with Occurrences issue.
I'd like to clarify this. We are not overwhelmed, we are prioritizing and working on several levels at once using the agile development paradigm. That's why a version of the IPT can already publish Camera Trap DP data in the Frictionless Data format even while we are still ironing out the GUM. We began with a list of use cases that had the potential to each bring something new to the GUM. We have developed 11 of those to the stage of being ready for public review. With those 11 in hand, we are taking some further quickly because we have excellent engagement for the interested community, while others are awaiting review by stakeholders, others aren't written yet, and others are being solicited. With two of us, and only me doing the modeling, we have to choose wisely and be efficient. Wisely means there is obvious potential for the impact/effort ratio. Presence/absence of parasites is definitely novel, as the location in that case is an Organism (with its geographic location, or course), but the point isn't so much that where the parasite occurs geographically as the parasite load, though the former is tractable from the locations of the hosts. It also may be nicely integrated with biotic interactions (use case which see).
Maybe I don't understand Arctos entities, but if I do, the EntityOfInterest in the GUM is the corollary. In many use cases at least on of the types of EntityOfInterest is an Organism. Sometimes it is an identified Organism (one you can point to), sometimes it is some Organism (a proxy because you don't have any persistent evidence). So, in the GUM, Organisms running around can be EntityOfInterests on multiple occasions (Events) with distinct evidence each time. If people get their identifiers in order, we could even put them all together across independent data sets in a view on the Event history of that Organism and its evidence. In principle an EntityOfInterest can be anything. And there can be proximate and ultimate EntityOfInterests. A proximate one could be a mouse from a trap (entityType "dwc:Organism") while another might be the small mammal species diversity of a park (entityType "geographic species diversity" or something). If I have Arctos entities wrong, I apologize and will do my best to rectify the situation with assistance welcome.
It does. Very well, if you identify (put an identifier on) your entity.
Exactly.
Yes, such as identifiers for MaterialEntities. GBIF is keen on building a Material index those that came in with persistent resolvable identifiers would play best in that sandbox. |
Beta Was this translation helpful? Give feedback.
-
A great opportunity to think about "collecting" event in multiple ways. |
Beta Was this translation helpful? Give feedback.
-
You do, they're just identifiers. (Not really, but that's probably close enough for those used as Organisms at the moment.)
Exactly. Entities do nothing(ish) new within Arctos, but they're prettier (maybe) in spreadsheets, and you can grab one, stick it in your Excel database, and semi-automagically join the party at GBIF (or anywhere Organisms are compiled).
FYI that's a social problem at this point - they're actually functional in Arctos, but I need some committed buy-in before I can share them (without making Arctos look broken - and rightly so - when they vaporize).
Maybe I'm not seeing something obvious, but that entire discussion seems utterly incapable of crossing into reality from here. If we recorded all events to the cubic centimeter and second that might almost be not-quite-realistic, but the reality is that we have things like "Indiana, before tomorrow" and that simply cannot be used to stitch hosts and parasites (or much of anything else) back together. Fortunately we have no need to make event-based inferences: we can just make direct unambiguous assertions. |
Beta Was this translation helpful? Give feedback.
-
I am talking about the "collection" of the parasite from the host which is an event of a sort that we do not currently track. |
Beta Was this translation helpful? Give feedback.
-
Adding important links here Discourse |
Beta Was this translation helpful? Give feedback.
-
Parasites might be a narrative for 14: Humboldt Core Monitoring and Absence data - we examined and either did or did not find them... |
Beta Was this translation helpful? Give feedback.
-
@ewommack re your catalog question, this is from the Diversifying the GBIF data model document:
|
Beta Was this translation helpful? Give feedback.
-
@Nicole-Ridgwell-NMMNHS @aklompma @cefilipek @lmtabak @wellerjes @mvzhuang any interest in coming up with paleo use cases? |
Beta Was this translation helpful? Give feedback.
-
Agents has been mentioned as a potential catalog - might be a relatively easy way to have an impact. |
Beta Was this translation helpful? Give feedback.
-
From the GBIF Community Webinar on the proposed Grand Unifying Model, they suggested that they are looking for new Use Cases.
A couple things suggested that I thought might be things specific from Arctos that might be a good idea to push forward would be:
To recommend new Use Cases, contact through: [email protected], [email protected] or bit.ly/data-model-forum
Though they do seem really overwhelmed as well in the discussion. They said they wanted more, and then the next sentence they say they haven't caught up with things and not gotten through what they have.
Beta Was this translation helpful? Give feedback.
All reactions