Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result (from an Observation) #38

Open
mbeaufils opened this issue Mar 31, 2022 · 13 comments
Open

Result (from an Observation) #38

mbeaufils opened this issue Mar 31, 2022 · 13 comments
Labels
Book A - Observation & Measurements O&M part of the Conceptual Model

Comments

@mbeaufils
Copy link
Collaborator

mbeaufils commented Mar 31, 2022

As defined in OMS

The result of the Observation as defined in #19

If a reference to a result is provided, the association Range with the role result SHALL be used.

NOTE 1 The result can be of Any type as it may represent the value of any feature property.
NOTE 2 If the observed property is a spatial operation or function, the type of the result may be a coverage.
NOTE 3 In some contexts, particularly in earth and environmental sciences, the term “observation” is used to refer to the result itself.

An Observation SHALL have exactly 1 result .

@mbeaufils mbeaufils added the Book A - Observation & Measurements O&M part of the Conceptual Model label Mar 31, 2022
@mbeaufils
Copy link
Collaborator Author

mbeaufils commented Mar 31, 2022

I would like to explore with you the diversity of Any type for our geotechnical purpose.

Please provide some example of what it could be.

@dponti
Copy link
Collaborator

dponti commented Mar 31, 2022

An Observation SHALL have exactly 1 result .
NOTE 1 The result can be of Any type

Hmm, if any type means that a result at one location can have more than one observable property, or that one observable property can be observed at multiple locations, or both, then I'm ok with this. Otherwise I would argue that an Observation can have more than one result.

Conceptually, I understand the restriction, but as far as implementation, it would get awfully messy to restrict an observation at one location to having only one property-value pair.

Example from CPT, at depth 1 meter, the sensors on the probe return values for tip resistance (qc), sleeve friction (fs), friction ratio (Rf), and pore pressure(u). qc, fs and u are "measured" properties, whereas Rf is calculated. This would look like:

Location: 1.00 (in the LRS of the sounding = depth below land surface in m)
Observed Properties: qc,fs,Rf,u
Result: 0.3400,0.40,0.0040,0.0182

One could argue that this set constitutes one observation with 4 observed properties - the 4 are related in that they all are observed at the same location. One could also argue, (although I wouldn't) that this is an observation collection with 4 separate observations at one location.

10 cm lower, we have another set of results:

Location: 1.01
Observed Properties: qc,fs,Rf,u
Result: 0.3367,0.40,0.0030,0.0184

This could be another observation with 4 results at a different location.
Combining these two sets of data, one could consider the data at each depth an observation collection.

Alternatively, following Note 2, these same data can be encoded as a coverage
Spatial Domain
1.00
1.01
Observed Properties: qc,fs,Rf,u
Result:
0.3367,0.40,0.0030,0.0184
0.3400,0.40,0.0040,0.0182

In which case one would argue these data constitute one observation at multiple locations and reporting values of 4 properties.
You could also flip rows and columns and determine that this set then constitutes a collection of 4 observations with each observation returning one observable property result (meeting the definition above) at multiple locations.

It doesn't really matter whether you think of these as observations or various collections of observations. But what is important to remember is that all of these "results" are produced by the exact same procedure. So if they are multiple observations, you'll need to instantiate a procedure for each observation. To encode these data to have one observation incorporate one result property-value pair at one location is really inefficient encoding.

I would argue that the most efficient way to handle this type of data is to consider the CPT results to be ONE observation and encode the data into one Observation object, something like the following (with some explicit GML properties and following GML's object-property rules) and some additional stuff that observations require:

<Observation gml:id="Obs1">
   <observer>Vertec Corp</observer>
   <samplingTime>2018-08-17T14:30:00></samplingTime>
   <location>
     <MultiPointLocation srsName="#lrs" srsDimension="1" gml:id="MP001">
         <gml:posList>1.00 1.01</gml:posLisst>
     </MultipointLocation>
   </location>
   <result>
       <parameters>
         <ResultProperties gml:id = "rp1">
            <property>qc</property>
            <property>fs</property>
            <property>Rf</property>
            <property>u</property>
        </ResultProperties>
      <parameters>
      <data>
             0.3367,0.40,0.0030,0.0184
             0.3400,0.40,0.0040,0.0182
      </data>
   </result>
   <procedure>
      <ConePenetrationTestProcedure gml:id="CPT-proc1>"
          <!--Add metadata, for example -->
          <penetrometerType>piezocone</penetrometerType>
          <penetrationRate uom="cm/s">1</penetrationRate>
          <tipArea uom="cm2">15</tipArea>
       </ConePenetrationTestProcedure>
   </procedure>
</Observation>
Were this to be a real CPT test with results at several hundred depth locations, the only thing that would be different is that the values within the <gml:Poslist> and <data> tags would expand, but otherwise all would be the same.

This is a simplified version of how we handle this in DIGGS. I'll post a couple of real examples later for different types of measurements, but this gets the general point across, I think. The main differences is that the DIGGS equivalent to the <property> tag contains another object that describes each observable property in more detail (eg. data type, units of measure, measurement technique (eg. measured, calculated, estimated, etc.). Other types of measurement-type observations would be encoded the exact same way, the only differences being:

  1. the number of <property> elements will vary depending on the observable properties that the Observation "observes". An Observation could consist of one observable property result at one location, for example.
  2. The object contained within the <procedure> tag changes, as do the properties it contains depending on the procedure used to produce the results.

=

@dponti
Copy link
Collaborator

dponti commented Mar 31, 2022

The post above did not originally contain the formatting to preserve the XML tags. Now corrected, so if you receive this in email please view on Github directly.

@dponti
Copy link
Collaborator

dponti commented Apr 4, 2022

Attached are several xml examples of how DIGGS encodes results from

  1. CPT,
  2. in-situ pore pressure dissipation test (using a CPT rig),
  3. liquid limit, plastic limit, plasticity index (lab Atterberg test) and
  4. lab undrained shear strength from a triaxial test..

The file PorePressureDissipation.xml is a complete DIGGS instance, including information about the Sounding (which is a specialization of a DIGGS SamplingFeature - essentially equivalent to O&M SpatialSample. The file has two measurement properties that each contain a Test object, one for the cone measurements during insertion of the CPT probe, the other for a pore dissipation test within the same sounding that occurs at a depth of 32.48 m. The other two files just have the xml for the Observation.

I think this selection of observations is representative of the types of issues that come up when encoding geotechnical observations.

DIGGS has two Observation types - an observation for human-derived category-based results, and a measurement for instrument derived, largely numeric results. Observations are structured differently than measurements for a few reasons (not all of them necessarily good ones) and I won't show those here, but I think it's possible to structure category-based observations similarly to numeric ones. This should be a separate topic.

Within our measurement type, we have defined 3 specializations:

  1. Test - for results where the values relate to the location on the feature of interest and where those values either do not vary temporally, or where the temporal variation of the property is not relevant to the project's purpose.
  2. Monitor - for results obtained at a given location but where the value of the result property varies with time and it is the temporal variation that is important to record
  3. MaterialTest - where the result value is relevant only to the material tested, and does not relate to the property of a geographic feature of interest. This type of observation is used on "manufactured" samples such as grout or fills.

These measurement types are all structured similarly:

  1. They have some basic metadata like the observer (incorporated in the Role and/or Equipment objects), references to the sampling feature and/or sample observed and time of the test
  2. They have an outcome property, which is derived from a GML coverage. It includes the domain (location info or time, depending on the measurement type), the list of properties observed, and the value.
  3. A procedure object that contains both metadata about the test procedure, and any "intermediate results" or sub-observations relevant to producing the results contained in the outcome property. The procedure object's structure varies dependent on the procedure selected. This is not ideal, and certainly can be better organized in many cases, but this appeared to be the most expedient thing to do at the time. The variability in the approaches used for testing and the process steps involved makes this "procedure" part difficult to standardize. Especially as the "intermediate results" parts of some procedures are observations in their own right but are placed within the procedure object rather than the measurement object because they are tightly bound to the procedure only and don't directly relate to a feature of interest. How we can get the procedure pieces to fit within the standard O&M structure while keeping all of the stuff together that needs to be kept together that leads from "procedure observations" to the final observation results will take some thinking.

For the files attached:

The CPT observation (Test) produces several result properties distributed down the sounding (multipoint locations). The procedure object for CPT (called StaticConePenetrationTest) contains metadata only.

The pore pressure dissipation example is actually a Test, because the results of the measurement are relevant to a location and themselves are not time varying, but the PorePressureDissipationTest procedure is a monitoring activity, and so the "intermediate" test results of this observation are reported in a similar structure to the Monitor measurement.

The Atterberg example produces several results from a fairly simple lab test with a series of "sub-observations" in the AtterbergLimitsTest procedure object

The triaxial example produces only a single "result", but the procedure object is.a complex set of both metadata and a sequence of observations from the saturation, consolidation, and shear phases of the test procedure.

Note all lab test procedures (here Atterberg and triaxial) carry specimen properties that hold info about the specimens used for testing.

Bottom line - the basic structure of these measurement types fit within O&M fine as long as, as I mentioned here, more than one result at more than one location (or time instant) can be assigned to one observation. But I think there are some questions about how we fit the intermediate "observations" within the O&M schema in a way that don't get incredibly difficult to parse and keep together as a cogent whole. I'll leave it at that. Please feel free to post comments/questions here and I'll try to address them and/or we can discuss further at our meetings.
DIGGSExamples.zip

@mbeaufils
Copy link
Collaborator Author

mbeaufils commented Apr 4, 2022

Thanks @dponti for those examples. Very helpful. You mentionned a CPT example. I found one in the PorePressureDissipation.xml file. Did you put them together voluntarily, as an example of a multi-procedure test?

Bottom line - the basic structure of these measurement types fit within O&M fine as long as, as I mentioned #38 (comment), more than one result at more than one location (or time instant) can be assigned to one observation.

This is indeed the topic of the discussion #36. Finding the right balance between some complex observations that may address several properties (ie. complex result) vs (very numerous) simple observations.

@dponti
Copy link
Collaborator

dponti commented Apr 4, 2022

Yes the file PorePressureDissipation.xml file has both an example of an observation (Test) from the probe insertion itself (the test with the StaticConePenetrationTest procedure object), and another observation (Test) with a PorePressureDissipationTest at a specific depth. We consider the two as separate observations - not a multiprocedure test although DIGGS does allow more than one procedure to be associated with an observation. Both tests occur in the same Sounding.

For practical encoding purposes, I think we will need to allow for complex observations - eg. multiple result properties at multiple locations (or time instants) where the procedure (including the metadata and values of any intermediate "sub-observations") are the same for all of these simple observations. I think we need to delve into O&M's definition of a procedure a bit more. If it is limited to just a process - eg. a recipe for how an observation is made as opposed to both the recipe and the (observation-specific) metadata and intermediate values and results that are used in the procedure to end up with the observation results, then we need to think about how all of that info (eg. metadata and intermediate results) fit in. If we make all of that separate observations within an observation collection this is problematic because you lose the sequencing of intermediate results and the fact that intermediate observation results aren't' at the same hierarchical level - eg. they are not results that are estimates of the properties of the feature of interest.

@neilchadwick-dg
Copy link
Collaborator

Here are some extracts from the AGS format data dictionary - for some data that all geo engineers will be familiar with. I have annotated to identify what 'category' I think the data belongs to. These are arbitrary categories I have come up with, but I offer this as a starting point. We can remap however we want.

I have not got around to sorting out some real data yet, but I can do so (not sure it helps though). However, please remember that AGS is 'real' - it is used day to day here in the UK - especially the parts shown here. It works!

Apologies - it is just an excel file for now so I guess you will have to download it. Happy to run through it at the meeting.

I've not thought through the implications of all this yet, but will try to do so now.

Example data structure from AGS.xlsx
.

@mbeaufils
Copy link
Collaborator Author

Thanks @neilchadwick-dg

please remember that AGS is 'real' - it is used day to day here in the UK - especially the parts shown here. It works!

Not sure it will be relevant for our purpose then. As you know we are only chasing unicorns and data models that cannot be implemented.

Just kidding ;-). See you at the meeting!

@neilchadwick-dg
Copy link
Collaborator

@dponti , right now I have just skimmed your posts - I will try to study further, but based on my quick read I think we are broadly aligned (as usual).

It is pure coincidence that I chose most of the same examples as you! I was going to include CPT dissipation tests but I was running out of time.

To me the following are a must:

  • a test can yield many results, not just the one (no point in us continuing if we are stuck on this)
  • test metadata must be provided once, and once only
  • these tricky 'intermediate' results must be addressed

I've not really covered the intermediate results problem, but I agree that we need to. There are some example in AGS, and as I said last meeting AGS is looking to incorporate those we don't yet include in a future enhancement (work has already started).

If O&M can't handle this sort of stuff, then that is O&Ms problem, not ours. Our data structures are NOT complex, and certainly not unique. I'm sure they are repeated in many situations across many domains. However, maybe we are the first to 'challenge' O&M like this.

I would prefer we try to figure out what structure we need to cover our data. Then maybe consider how it meshes with O&M. I fear the tail is wagging the dog a bit here (hope that metaphor travels well around the world!)

@dponti
Copy link
Collaborator

dponti commented Apr 7, 2022

To complete the loop for comparison, I've attached a DIGGS example for an SPT test. It has a single test result (raw N-value). The procedure object attached to the Test is called DrivenPenetrationTest - a more generic structure to incorporate the range of tests that pound a sampler into the ground and where the observed property is related to the number of blows required to reach some penetration distance. The procedure has a property (penetrationTestType) where the specific type of test is identified.

As opposed to the CPT test, where the results relate to a specific depth in the sounding (eg. the locations of each measurement are points (MultiPointLocation object) described in one dimension as a depth down the sounding), the location of the SPT test result is contained in an object called LinearExtent, the depth interval in the borehole where the SPT test occurs, in the case of the example, from 1 to 2.5 feet.

@mbeaufils asked me at our meeting today why, in the case of the CPT test, we separated the depth information from the data results. The reason for keeping the "domain" information separate from the results is so we can substitute in different location geometries that are suitable for the samplingFeature geometry being "observed" and for the test procedure used. For one dimensional sampling features (or spatial samples), observations located within them can be either points or intervals (line segments). Within the linear referenced system of the 1-D sampling feature, a point will have a single value (eg. the depth) and an interval will have two values (the from and to depth or the top and bottom depth). If we have an observation in a 2-dimensional sampling feature, its location can be defined by a point, a line string or a polygon. In the linear referenced system of a 2-D sampling feature, the coordinates of each of these geometries would be 2D, eg. a point would be described by a tuple (x,y), a line would be a list of x,y tuples and a polygon would be a list of x,y tuples where the starting point and ending point are the same. Having a location property in the measurement result that can contain different types of geometries allows us to use the same structure for any kind of Test, regardless of the procedure used or the geometry of the observed location
SPTExample.xml.zip
.

@dponti
Copy link
Collaborator

dponti commented Apr 7, 2022

@neilchadwick-dg:
a test can yield many results, not just the one (no point in us continuing if we are stuck on this)
Totally agree, and to accommodate this by having to define separate observation objects for each location-observed property result pair would be implementation hell. However, we can conceptually think of each location-property result pair as "simple" observations and the grouping of these simple observations as an observation collection that could be associated with a single time instance (or time interval) and a single procedure. So in that sense, DIGGS' Test object and the various test tables in AGS might (I think) map into O&M's observation collection. If not cleanly, we can probably create a geotechnical specialization of observation/observation collection to do this.

test metadata must be provided once, and once only
these tricky 'intermediate' results must be addressed

Yup. I'm not sure where metadata fits into O&M, other than each object essentially having a metadata property. Or even more importantly what exactly constitutes what type of metadata. For example, is the cone tip surface area of a CPT test metadata for the observation, or metadata for the equipment used (eg. the cone) in the observation? I would probably argue the latter, but DIGGS has that property lumped into properties of the Test procedure (so does AGS). Ugh.

Perhaps this should be moved to the ObservingProcedure issue #34 . @mbeaufils mentioned at the meeting today that O&M's procedure can have anything in it. If that's the case, good, and I would argue that this object should be able to contain properties pertaining to the procedure specification (eg the recipe for the procedure), any metadata relevant to the procedure, as well as "intermediate results" - or sub-observations. Within these, however, we probably should try to define what types of data logically and properly should be contained in our geotech specialization of O&M's Procedure.

As a start, DIGGS has two abstract types of Procedure objects that both inherit from a "super" abstract class (AbstractTestProcedure). The two types are AbstractInSituTestProcedure and AbstractLaboratoryTestProcedure. Our StaticConePenetrationTest procedure object (eg. CPT) inherits from the in-situ class while a test procedure like AtterbergLimitsTest inherits from the laboratory class. I've attached a schema diagram for AbstractLaboratoryTest so you can see what properties (elements in lower case) are part of that class. All of the elements within the grey boxes are inherited from AbstractTestProcedure whereas the last two elements are unique to AbstractLaboratoryTestProcedure. InSituTestProcedure looks exactly the same, only it does not contain the specimen property.

A quick overview:

  1. A procedure can have a name, generic description, references to external files (like a report, photo or graphic, people or entities associated with the procedure (role) and general remarks.
  2. testProcedureMethod holds a Specification object, which describes the test process (the recipe). It could simply carry the name of a published standard (eg. ASTM test), or describe with standard parts and clauses, a test Specification.
  3. testProcedureEquipment contains an Equipment object of some type, with metadata properties about the equipment used for the observation.
  4. testingEnvironment contains metadata about external environmental factors observed during the observation (eg. temperature, weather, etc.)
  5. otherTestProperty - is a "catch-all" to record information that otherwise is not specified in the procedure, as a Parameter object that holds name-value pair properties.
  6. testEvent, records events that may occur and are observed during the time the procedure is being performed. The structure(s) of the event objects contained within this property are slightly different depending on whether it is a laboratory test or an insitu test
  7. specimen - holds Specimen objects that contain properties describing the specimen(s) used in the test procedure (applies to a laboratory test only).

After that, any specialized procedure can have any other properties added to it, and in DIGGS these are wildly different depending on the procedure. It is in this part of the procedure where "intermediate results" would go, along with additional metadata (some of which in our case probably should go into the equipment, specification, or other test property sections, etc. The intermediate results or sub-observations can be reported as single elements (properties) or in nested objects for repeated measurements, trials or monitoring activities - similar to AGS' use of child tables within a test.

For a geotech implementation of a procedure, we should look at ways to perhaps standardize how such results would be encoded, if we think that intermediate results should be bound to the procedure object and not left as separate observation results.
Kernel_AbstractLaboratoryTestProcedure

@Didymograptus
Copy link

The important thiing is that the test/observation/result is reported in compliance with the relevant standard/best practice. Any implementation needs to be flexible enough to cope with the variety of requirements across different continents/countries/standards. IMHO I think we should be providing the framework, not the solution.

@mbeaufils
Copy link
Collaborator Author

Following the meeting of April 14th, please find attached the proposed typology for observation.
image

Also proposed is a matrix to enable to choose how to declare your observation based on two criteria.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Book A - Observation & Measurements O&M part of the Conceptual Model
Projects
None yet
Development

No branches or pull requests

4 participants