-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Term - parentOccurrenceID #413
Comments
Thank you @fja062 for this submission. I have added labels and fee the demand justification has been met. Now we'll have to figure out if this is the best solution. Would you be willing to submit one or more use cases highlighting why you arrived at this solution? It will be interesting to see if instead what you need can be modeled with the Events and the parentEventID, for example. |
We have dealt with this before in our data modelling and implementation. Our conclusion was that it mostly applies in cases where an instance of Organism has individual constituents (e.g. a wolf pack or whale pod or school of fish or something has individual organism constituents). Thus, we ended up implementing a parentOrganismID property to track hierarchical relationships among organisms. As such, only the top-level Organism instance in the hierarchy would need to participate in a single Occurrence instance, and all the child Organism instances would inherit the Occurrence participation. But this has limitations. First of all, you can't always assume that all constituents of a compound Organism instance participated in a particular Occurrence (e.g., if one or more members of a particular wolf pack were absent during a particular documented Occurrence). Second, there are properties of Occurrence (e.g., sex, lifeStage, behavior, reproductiveCondition, etc.) that are particular to individual members of a compound Organism instance. Our solution to this was simply mint additional Occurrence instances for each member of a compound Organism instance, and apply the properties to the individuals accordingly. In that context, these are not so much parent-child Occurrence relationships, but rather a set of related/parallel Occurrence instances, that happen to share the same Location and Time, and/or involve Organism instances that are collectively part of a broader-scope Organism instance. I'm not opposed to adding this term, but I agree with @tucotuco that it would need to be fleshed out with specific use cases to clarify when people would use this term, and clarify what the "parent" Occurrence instance is, and what the "child"/"children" instances are (and how many "generations" are allowable). It also needs to tease out the implications for Organisms as participants in Occurrence instances, vs. MaterialSample instances participating directly in Occurrence instances. I can elaborate more on what I mean by that, but this post is already too long so will refrain unless asked. |
I really don't like this use of the word 'parent'. It is a metaphor in a context where the original usage is not alien, thus prone to confusion or misuse sooner or later. Furthermore, 'collection of which this individual is a member' and 'structure of which this individual is an element' are rather different as well. I would recommend carefully distinguishing the different cases that you appear to be trying to aggregate here. |
The term that we propose applies to instantaneous, or snapshot, occurrences of an aggregation of individuals and can therefore be used in situations where the ID of a group is unknown. For our proposition, the term ‘group’ recorded in individualCount refers to a temporal aggregation of individuals, and not a social group. One such use case is hunting bag data, where the total number of individuals hunted throughout a hunting season corresponds to the individualCount of a parent Occurrence. Information that may be partially available on sex and lifeStage may be detailed in nested child Occurrences. Imagine a hunting season where 30 individuals are shot, and we know that 10 are female, 10 are male, and 3 of the females are adults. This partial information of the Occurrence demographics is hard to store in the main Occurrence, but can be described cleanly in child Occurrences. Another use case is when at least some individuals observed in the individualCount of an Occurrence contain individual ID tags. For example, a group of Ibex were observed in France, and 4 of the 7 individuals were tagged. We can record this partial information in nested child Occurrences of the parent Occurrence ID. In the example provided by @deepreef, the ‘group’ refers to a social group, which requires prior knowledge on the number of members etc. contained in the social group. By making the individualCount more generic, irrespective of the genetic and social relationship between the individuals observed, the parent Occurrence refers to the total number of individuals observed (and not necessarily the proportion of the social group observed). This structuring of child-parent Occurrences is therefore very general, and applicable to any group occurrence where partial information is available. This will be particularly useful for snapshot occurrences such as camera trap or hunting bag data. |
I'll attempt to describe a use case I see for this. A regular survey cruise is conducted every spring. It uses a trawl method. On a particular day at a particular location 614 Hippoglossoides platessoides are caught. It is not possible to weigh that many fish so a subsample is taken (5.5 kg) and a calculated weight of 27 kg for all 614 fish. Then 22 of those fish are measured for length with four being 14 cm, one being 15 cm, two being 17 cm, and so on. I don't see a way to represent this well in an occurrence table.
To me these are all the same occurrence of a taxon at a place and time but just subsets of the occurrence to facilitate measurements. OBIS's answer is to call all the subsets an event subset (third table) but I find this a bit confusing myself because these are subsets of the 614 individuals caught and I think it makes it more abstract to call these event subsets. A concern I have is that downstream users will see these as separate occurrences and think there were 636 individuals at that location and time. I would argue for having a parentOccurrenceID for the 614 individuals and then nested occurrences for the subsample weight and 22 length measurements. |
Thanks, @fja062 ! This is very helpful!
In this case, I would generate three Occurrence records, one with I assume I would need to do likewise if we had a term for I can certainly see the value in that, but the question is whether the new term allowing for this simple aggregation is sufficiently more effective or efficient than the alternative (i.e., aggregating them by Of course, the alternative method of aggregating requires more "work", and breaks if the associated
Similar solutions and questions for this use case as well. Again, I see the potential advantage of this, but I worry if the advantage offsets the potential cost (including costs of dealing with logical inconsistencies of differing associated I suppose the most common use case would involve my "fourth" However, having a I realize this is just sort of a long and winding ramble, but I'm still trying to get my head around the relative costs and benefits of introducing this new term.
Yeah, I guess that's the crux. In my mind, an On a related note in response to @fja062 :
Yes, that is true for the example I gave, but it's not necessarily the case for all "A particular organism or defined group of organisms considered to be taxonomically homogeneous." This includes any aggregation of taxonomically homogeneous individuals, regardless of whether they are defined in the context of social groups, kin, or any kind of ephemeral sets of individuals (e.g., flocks of birds, schools of fish, etc.). It even potentially includes "every individual identifiable to a particular taxon that has ever lived, or ever will live." Granted, (almost) no one in TDWG-land thinks of it this way, but going by the definitions (and the pure logic of structuring information), it's technically true. |
Then what is the purpose of |
This term predates the establishment of the Also, I should point out that the definition of an instance of the "An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time." Ref: https://dwc.tdwg.org/terms/#occurrence
An instance of If you wanted to record the fact that 311 of them were adult male, 209 of them were adult female, and 94 of them were subadults (sex indeterminate), then ideally you'd create three instances of I use the term "ideally" above because in a practical sense, almost no one actually does it this way. The question of whether we all should be doing this way is open to debate. But we probably ought to at least adhere to the existing DwC definitions of terms as closely as possible. |
@deepreef thanks for the detailed reply!
Could you give an example of where you might expect to see such a situation?
If we define an
and
The beauty I see in the structure of nested occurrences is that each primary (parent) occurrence supplies basic observation information (in this case
I do see though the potential for confusion if the parent and child |
I felt that the word According to my understanding, the concept seems to be similar to MaterialGroup for the new data model, see Environmental and community measurements? Would it make more sense to name it in a similar way? @fja062 I am a little confused about the tables. Why are they referring to |
@ymgan Yep that was exactly my concern too - see my comment above #413 (comment) |
I share the concerns about the term "parent" in this context, but I'll focus my comments here in response to @fja062 :
My point was less about the logic than about the mechanics. Suppose we have a "parent" Occurrence record with three "child" Occurrence records, but the Another way of looking at this is coming up with an elegant way (other than instances of I guess I'm biased, because I think we should abandon ALL |
Thanks for the interesting discussion points raised. A common theme seems to be an issue with the usage of the vocabulary 'parent' and 'child'. However, this terminology is already in place elsewhere in DwC, for example for the
I'm not sure that this scenario could arise if every entry of a child occurrence makes reference both to the eventID (=parentEventID in the tables above) and the so-called
Indeed, the parentXXXID configuration allows us to build a hierarchical structure.
I find the resourceRelationship extension quite heavy, and not elegant for the use cases described above. Use of the resourceRelationship would result in the splitting of a single data set into several, with potential for the loss of the understanding of the 'superset' effect captured in a hierarchy. I imagine it would also be more complicated to associate the measurement or facts to the corresponding parent and child occurrences, for example where a hierarchical structure could allow measurements or facts to apply at differing levels in the hierarchy without repetition of the MoF information.
This is an interesting suggestion @ymgan. Could you give an example of how you think this might work in practice? |
I'm sorry I missed this discussion when it began, because the "nested occurrences" idea is a novel and interesting solution to a difficult problem. However, while it suits relational databasing, it has the unfortunate effect of greatly multiplying record numbers in flat-file datasets, which is what GBIF users expect. It also seems a bit like a "slippery slope". If occurrences can be nested in order to disaggregate individual data items, why not do the same for recordedBy with multiple recorders and multiple recordedByIDs? And identifiedBy? Getting back to the "hunting bag" example, what would be the objections to the following entries in a single record? occurrenceID = [something unique] |
Hi, @guillaumebody suggested the following as potential solution for a dataset from Antarctic GBIF/OBIS. (Thanks Guillaume) I thought this could work~ For the context, the researchers were assessing the diet of Pachyptila belcheri (occ_001), which preyed on Crustacea (occ_002) and Euphausia vallentini (occ_003) in this example. Event core and eventID are omitted here for simplicity. occurrence
eMoF
|
New term
Proposed attributes of the new term:
The text was updated successfully, but these errors were encountered: