-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Term - parentMaterialSampleID #344
Comments
Excellent! |
Thanks! RE: the example, I was following the template of other similar terms in DwC (e.g., materialSampleID). Also, I generally try to minimize the inclusion of dereferencing metadata from identifiers; but that's more of a personal preference. |
Do you consider |
Example of a preserved specimen of bluethroat with a blood sample extracted for DNA in support of the need for the proposed parentMaterialSampleID term. Preserved specimen (mounted)basisOfRecord = PreservedSpecimen Blood samplebasisOfRecord = MaterialSample DNA sample (not yet here, but available for other)
...apropos, which begs the question if the reuse of the UUID for occurrenceID as the UUID for materialSampleID is at all the correct use, however, a value for occurrenceID is mandatory to enable the records to be published in GBIF. |
Here is an example of a bluethroat (Luscinia svecica subsp. svecica) from which 7 MaterialSamples were extracted, in support of the need for the proposed parentMaterialSampleID term. For many of these bluethroats we lack parentMaterialSampleID to describe the hierarchy between material samples, sub samples for DNA. (To describe if the DNA sample is sub-sampled from the blood sample, from the tissue sample, from the sperm sample, etc..., each preserved as separate biobank MaterialSamples).
|
@dagendresen : MANY thanks for the great example!
Well... I guess it's technically not "dereferencing" metadata (like I don't want to hijack this thread, but just to make a point... this is the closest representation of the actual identifier for your organismID in the post above, that can be rendered in textual form: A less cumbersome way to display this value to human eyeballs would be in hexadecimal form: It could also be represented as a decimal number: The most text-economical way to represent it is in base64: Of course, the most common way to represent it (and the way most people provide them to GBIF) is in the so-called canonical textual representation: Microsoft unhelpfully represents them sometimes using upper-case letters: I get why it's useful in the context of RFC 4122 to pre-pend them with the aforementioned metadata ( But here's my point: the actual identifier is 128 consecutive 1s and 0s -- which is how most database systems actually store them on disk, in the form of 16-byte numbers. However, they're almost always presented (and consumed) as text strings -- usually UTF text strings, which make them a whopping 576 bits in canonical form. So basically, we're consuming 4x as many bytes as the actual identifier, just to make them a little bit more human-friendly. You could argue that the form OK... like I said, I don't want to hijack this thread with a diatribe about identifiers, but it appears that ship has already left the barn (or something like that). |
@deepreef This looks like a solid proposal. I took pause at first at the "and potentially other MaterialSamples were derived, or which they collectively comprise" in the definition. It seemed odd to refer to other entities than required to define the concept, but these additions really do help to nail down more broadly how to use the term in practice, and they do nothing to obscure the immediate concept, so I end up quite liking it. |
Thanks, @tucotuco
Yeah, that's the part of the proposal I was most queasy about. I modelled the definition after the existing definition for I originally had it as: "An identifier for the broader MaterialSample from which this and potentially other MaterialSamples were derived." But that seemed incomplete, so I added the extra ", or which they collectively comprise" (to avoid people nit-picking the definition of "derived")
Yup! And not chosen at random either (here's a hint: search for In any case... I've added the example from @dagendresen as a second one (even though I'm queasy on the urn:uuid: thing...) |
Haha @deepreef wrote:
Me either. But here goes. Many moons ago Greg and I asked, do we need the prefix? (answer no). Who or what really needs the "urn:uuid" declaration? A machine can figure out it's a UUID. A human can see it? The field itself comes with expectations of what to find in it. The prefix is redundant, no? |
New term
Proposed attributes of the new term:
6e43b33d-88ce-4a37-ad94-74d6c99b9e25
,urn:uuid:11142195-4865-4b52-baed-1b76a39613a3
Term originally proposed a year ago by @thomasstjerne on the GBIF GitHub. Discussion around changes to MaterialSample on DwC (#314) and GBIF issue #37. This new term has direct relevance to
dwc:preparations
, in cases where multiple different preparations are derived from the same whole specimen.The text was updated successfully, but these errors were encountered: