Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore relationship between templates and RDF Shapes/ShEx #51

Open
cmungall opened this issue Nov 12, 2019 · 15 comments
Open

Explore relationship between templates and RDF Shapes/ShEx #51

cmungall opened this issue Nov 12, 2019 · 15 comments

Comments

@cmungall
Copy link
Collaborator

There are similarities and differences in semantics and use cases between templates (dosdps, robot, ottr) and shapes (shex, shacl).

We should explore these and formalize the linkages, and possibly even explore if there is a possible subsuming framework.

Some background: This is being driven in part by the go-shapes schema which is used to validate GO-CAMs but is increasingly becoming a general source of all truth about GO. Originally we had shapes only for obo-core level classes such as BiologicalProcess, CellComponent. But we are seeing the need for deeper subclasses; eg a transport subclass that we can parameterize with start-location and end-location.

This is obviously partly duplicative with the dosdp templates for go. This is not super-satisfying. Aside from duplication of effort, the worst effect is duplication of mindshare and confusion over not having one source of truth.

A current very rough proposal:

  • have a convention for annotating shex with information needed to make it on-par with dosdps. Call this t-shex
    • shex is v nice for annotating any part of a shape with annotations
    • we can imagine annotating the range constraint with a variable name and the shape with a generator string
  • write a t-shex to dosdp (or robot template header) converter
    • note the shex would be abox-based, but it would be trivial to generalize to a defining tbox expression
    • OR adapt dosdp-tools to go from t-shex. This gets around a whole bunch of issues such as optional variables
  • gradually migrate patterns to t-shex

E.g.

<Transport> <BiologicalProcess> AND EXTRA a {
  has-start-location: <CellComponent> // dosdp:var "start"
  has-end-location: <CellComponent> // dosdp:var "end"
} // rdfs:comment "this is for transport"
     dosdp:labelGen "transport [from {{start}}] [to {{end}}]"
`    dosdp:textdefGen "..."

no need for an equiv axiom generator: all the information is in the abox pattern

You could feed this either tuples (with optional fillers) or actual subgraphs, in order to do class generation

I am also assuming in the future many tools for doing things like driving form interfaces from shex/shacl (which are partly interconvertible)

I think there are many advantages to doing this for GO. We are becoming more abox-based. A lot of the standard tooling in ShEx is really nice, and it's a widely adopted standard.

This could just be creating busy work for other uses of dosdps, e.g. they have been phenomenally successful for phenotype reconciliation.

The counterpoint to all of this is skepticism about finding the One True Framework to bind them all (biolinkml?)

See Also

cc

@vanaukenk @dosumis @matentzn @balhoff @goodb @ukemi @jamesaoverton @beckyjackson

@matentzn
Copy link
Collaborator

We will make this the topic of our next ODK call. I must admit that I lack background to really understand what your are proposing here, but I generally want to start using shapes for the phenotype reconciliation effort soon so it makes sense to coordinate with GO and DOSDP.

@dosumis
Copy link
Collaborator

dosumis commented Nov 13, 2019

Makes sense. This was, of course, one of the motivating use-cases for DOSDPs in the first palce - see instance_graph spec on DOSDP-schema.

@cmungall
Copy link
Collaborator Author

cmungall commented Nov 13, 2019 via email

@dosumis
Copy link
Collaborator

dosumis commented Nov 13, 2019

May only be of historical interest, but spec here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L411

& here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L153

@balhoff - did you ever get around to wirting code for this. Think we discussed it at the time.

@wdduncan
Copy link

This is quite interesting. I'm a little lost on details. I think you are proposing t-shex to be be the ground truth ... right?
That is, dosdp would be transformed to t-shex. Or is it the other way round: t-shex would be transformed to dosdp?

@cmungall
Copy link
Collaborator Author

I think you are proposing t-shex to be be the ground truth ... right?

Correct

t-shex would be transformed to dosdp?

Correct

(of course there may be a bootstrapping and synchronization step where we iterate with the reverse)

And to be clear "t-shex" is nothing more than standard shex with some conventions as to how it is annotated (hmm, can we model that in shex itself, that's the kind of meta question @hsolbrig loves)

@wdduncan
Copy link

Ok. So you are proposing to use t-shex to generate data by translating the t-shex into dosdp, and then the dosdp to OWL/RDF?

@cmungall
Copy link
Collaborator Author

cmungall commented Nov 14, 2019 via email

@dosumis
Copy link
Collaborator

dosumis commented Nov 15, 2019

I think this appraoch is fine if you're willing to limit design pattern expressivity: patterns entirely EquivalentClass with no nested class expressions. The one case where I think this would be a loss for GO is GCIs used to align branches. e.g. I still think patterns with GCIs are the best way to align CC organization/assembly/dissasembly in BP with the CC heirarchy. IIRC, I even wrote patterns for this.

@dosumis
Copy link
Collaborator

dosumis commented Nov 15, 2019

Think this approach has the advantage that it should be reasonably transparent to those used to building GO-CAM models in a way that perhaps DOSDPs have failed to be. OTOH - isn't there a danger that it will result in unsafe patterns - that apply to some broad subset of cases but cause misclassification outside of these? To prevent this I think you'd still need a strong editorial step between deriving DOSDPs derived from ShEx patterns and implementing them in the ontology.

@cmungall
Copy link
Collaborator Author

Do you still have those GCI examples? I don't see in the current ones: https://github.com/geneontology/go-ontology/blob/master/src/design_patterns/cc_disassembly.yaml

My so far vague thoughts are that we can always bring across any aspect of dosdps into t-shex annotations, and just treat as an alternate syntax for dosdps.

But this isn't ideal if we want to embrace the abox shape as being the 'source of truth', we end up mixing the two in a slightly redundant way

I think the GCIs might be expressible in a more abox-centric way that can then be autogeneralized to tboxes, but this remains to be determined.

isn't there a danger that it will result in unsafe patterns

would this be in the tbox generalization step? Quite possibly, need to think of some examples..

@dosumis
Copy link
Collaborator

dosumis commented Dec 4, 2019

@cmungall
Copy link
Collaborator Author

Another possibility here is to build this in to biolinkml yaml, cc @hsolbrig

https://github.com/biolink/biolinkml -- note completely independent of biolink itself

A related ticket: https://github.com/biolink/biolinkml/issues/128

classes:
  transport:
    is_a: biological process
 slots:
   - start location
   - end location
 templates:
   name:
    as string value: "transport from {start location} to {end location}"
   definition:
    as string value: "...."
...

with equivalence/logdef pattern inferred automatically

for GCIs, how about just specifying these directly as abox rules and inferring a SPARQL update?

e.g.

?cp results-in-org-of ?c1, ?c1 part-of ?c
->
exists: ?p
?cp part-of ?p
?p a :organization, ?p has-input ?c

there is a deterministic translation of this structure to an ugly sparql tbox update command

@cmungall
Copy link
Collaborator Author

cmungall commented Jul 9, 2020

Thinking more about using the abox representation as primary (and using something like uml or biolinkml or shex) with derivations of tbox equiv axioms, @matentzn posed the question of what to do about complex patterns where the desired tbox expression employs nesting

I would do this through simple composition of standard class definitions

e.g for subq case, we may have

classes:
  phenotype:
    slot_usage:
      has part:
        range: atomic phenotype
     to_str: "{atomic phenotype}"
   atomic phenotype:
     slots: [inheres in, type, qualifier]
  morphology phenotype:
     is_a: atomic phenotype
     slot_usage:
       type:
         range: morphology class
       inheres in:
         range: anatomical structure
       to_str: "{inheres in} morphology"
  abnormal morphology phenotype:
     is_a: morphology phenotype
     slot_usage:
       qualifier:
         range: abnormal class
       to_str: "abnormal {inheres in} morphology"
etc

this constrains the shape of aboxes and gives string gen/parse. E.g. "morphology of patient123s left femur".

the shape of tboxes follows directly from this, together with patterns for equivalence axioms, no need for writing owl in macros.

@cmungall
Copy link
Collaborator Author

Here is an example of using biolinkml as a template language for a chemical ontology: https://github.com/cmungall/chemistry-ontology

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants