How should we standardise the use of database identifiers in OWL Axioms? #1981

matentzn · 2022-06-24T11:26:33Z

matentzn
Jun 24, 2022
Maintainer

We constantly have problems with using identifiers like HGNC, ENSEMBL and others in OWL because they do not have PURLs in the Semantic Web sense. So we get people using them in 50 different ways.

https://bioregistry.io/registry/hgnc lists at least four expansions, and some are even missing:

Mondo and other use the "old identifiers.org way":

https://identifiers.org/hgnc/16793 (Biolink preferred way)

I am sure http variants exist for all of these.

So people write axioms like (simplified here for illustration):

http://purl.obolibrary.org/obo/MONDO_123 RO:has_basis_in_dysfunction_of https://identifiers.org/hgnc/16793 or whatever other scheme above.

This clearly does not interoperate.

What should we deal with this heterogeneity for database identifiers without formal PURL system?

Create OBO version like we did with Chebi and NCITaxon?
Lobby the database owners to provide a proper purl system?
Pick one of the existing expansions and declare it at bioregistry level as the "canonical" expansion?

Opinions welcome!

cthoyt · 2022-06-24T11:36:54Z

cthoyt
Jun 24, 2022
Collaborator

As you know, I'm not a huge supporter of the semantic web stuff because of this URI debacle.

What should we deal with this heterogeneity for database identifiers without formal PURL system?

Create OBO version like we did with Chebi and NCITaxon?

Yes let's start making OBO dumps of everything e.g., HGNC is ready at https://github.com/pyobo/examples/tree/main/export/hgnc/2022-06-01, which lists the OBO, OWL, and OBO Graph JSON dumps of it. In this repo you can see several other dumps.

Lobby the database owners to provide a proper purl system?

There's always going to be discussions about which URIs have "ontological commitments", so why not just use the OBO PURL system to denote a URI for those commitments so that people curating in the OBO world can annotate stuff to them? The situation is already bad enough that one more URI isn't going to make it worse. Then people can just go about using CURIEs in their OBO files as they'd like without any extra stuff necessary.

Pick one of the existing expansions and declare it at bioregistry level as the "canonical" expansion?

I don't think using the Bioregistry to declare what's canonical is going to scale. There will always be conflict between what's practical, what's useful, and what the OBO community wants to serve its needs (which also has conflict based on each ontology)

0 replies

dosumis · 2022-06-24T11:38:48Z

dosumis
Jun 24, 2022

@cmungall has whole slide decks on this issue. For now I think the most viable solution is to use identifiers.org. This gives us a standard that we can all refer to, IRIs that resolve, and doesn't force us to mint new PURLs that keep up to date with DB releases.

3 replies

cthoyt Jun 24, 2022
Collaborator

@cmungall has whole slide decks on this issue.

That slide deck is here: https://docs.google.com/presentation/d/1aySEHTgkags7UPJYHyvQ9frYvAIqr1G5A3u7dGF26Y4/edit?usp=sharing

For now I think the most viable solution is to use identifiers.org. This gives us a standard that we can all refer to, IRIs that resolve

I think a lot of us agree that Identifiers.org is having some major issues both on the technical and governance side that makes it hard to trust its reliability and sustainability. Further, it doesn't really give a standard since there's 4 ways to write every CURIE with an Identifiers.org URI (http vs. https and colon vs. slash-delimited CURIEs), it is missing tons of important prefixes, it doesn't make fixes when requested anymore, and its resolution service is broken in several instances. If you're curious about specifics, you can browse their issue tracker which has examples of all of these issues and has been effectively neglected by the Identifiers.org team for the last 3 years. The alternative for most of these issues is to use the Bioregistry, but still, the Bioregistry is not meant to be yet another URI minting system, either.

and doesn't force us to mint new PURLs that keep up to date with DB releases.

Could you please elaborate on this?

matentzn Jun 24, 2022
Maintainer Author

and doesn't force us to mint new PURLs that keep up to date with DB releases.

Could you please elaborate on this?

We need to be running a pipeline to keep the owl/obo file up to date.

cthoyt Sep 25, 2023
Collaborator

That pipeline is here: https://github.com/biopragmatics/obo-db-ingest

jamesaoverton · 2022-06-24T14:37:36Z

jamesaoverton
Jun 24, 2022
Maintainer

Chris' slides are a very good overview. I have a bunch of these cases too, that we've handled in different ways. OBO has a bunch of database-translation ontologies already.

It would be good to collect some strategies and best practises, and maybe aim for a handful of general approaches with good documentation, but I don't think there will ever be a one-size-fits-all solution here. In each case, a lot depends on what the database exposes, what URLs/IRIs are used in the wild, how willing they are to work with us, etc.

I don't know enough about how Wikidata does this sort of thing, but it's always good to look carefully at their approaches.

0 replies

dosumis · 2022-06-24T14:43:52Z

dosumis
Jun 24, 2022

and doesn't force us to mint new PURLs that keep up to date with DB releases.

Could you please elaborate on this?

https://identifiers.org/hgnc:{$id}
-->
https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/{$id}
vs

Generating a new OBO OWL file regularly with all the latest IDs.

The latter will not always be viable, and required resources, but may sometimes be worthwhile if there is both demand and a suitable API to code against (as in HGNC). Note - there are already competing ontologised versions of HGNC build off the API - see https://bioportal.bioontology.org/ontologies/HGNC-NR/?p=classes&conceptid=http%3A%2F%2Fidentifiers.org%2Fhgnc%2F49667
We need some way to advertise these translations and let people know their status to try to discourage competing artefacts.

0 replies

cmungall · 2022-06-27T14:47:25Z

cmungall
Jun 27, 2022
Maintainer

If we go with OBO-ized versions of these databases, there are some questions we should be able to answer consistently, at least for each source

Partially-in (what Rector et al call "conceptual coat rack" vs All-in; see
slide 58

Partial is more likely to be successful, at least in the short term. What that means is less logical axiomatization (effectively curtailing "relationship" tags in the existing obo files that Charlie provides). While it is tempting to cram in as much knowledge as possible, everything about this is hard to standardize, from the RO object property to the OWL axiom interpretation. Even a decision to model as annotation assertions (logically silent) is a decision. The is-a would have been controversial in the past, but we are now all in agreement that SO is fine here, no need for MSO. Note there can still be an "open market" for axiomatizations but the OBO core should IMO be a conceptual coat rack, at least at first.

ownership and governance:
slide 62

This is hard but I think we have most chance of success if we stick to coat racks, and we don't try and get too clever or insert new concepts. For example, when translating a gene database, we make one class per gene ID in the source. We don't make fake groupings that are not in the source, the translation is completely isomorphic, unless there is agreement from the authority to diverge. Again there can be an open market on enhanced versions.

Status within OBO, see:
slide 44

I think these should be in a special category, and there should be automatic exemption from certain checks. We should not force providers to game the system and provide a fake ontological definition for something for which an ontological definition makes no sense. I don't expect this to be controversial but I am often surprised when I think that.

Ontology browsers can make use of this status too so they don't try and make fake ontological displays that explode when I expand the concept "variant" and get a flat list of a billion variants.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should we standardise the use of database identifiers in OWL Axioms? #1981

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How should we standardise the use of database identifiers in OWL Axioms? #1981

matentzn Jun 24, 2022 Maintainer

Replies: 5 comments · 3 replies

cthoyt Jun 24, 2022 Collaborator

dosumis Jun 24, 2022

cthoyt Jun 24, 2022 Collaborator

matentzn Jun 24, 2022 Maintainer Author

cthoyt Sep 25, 2023 Collaborator

jamesaoverton Jun 24, 2022 Maintainer

dosumis Jun 24, 2022

cmungall Jun 27, 2022 Maintainer

matentzn
Jun 24, 2022
Maintainer

Replies: 5 comments 3 replies

cthoyt
Jun 24, 2022
Collaborator

dosumis
Jun 24, 2022

cthoyt Jun 24, 2022
Collaborator

matentzn Jun 24, 2022
Maintainer Author

cthoyt Sep 25, 2023
Collaborator

jamesaoverton
Jun 24, 2022
Maintainer

dosumis
Jun 24, 2022

cmungall
Jun 27, 2022
Maintainer