Replies: 5 comments 3 replies
-
As you know, I'm not a huge supporter of the semantic web stuff because of this URI debacle.
Yes let's start making OBO dumps of everything e.g., HGNC is ready at https://github.com/pyobo/examples/tree/main/export/hgnc/2022-06-01, which lists the OBO, OWL, and OBO Graph JSON dumps of it. In this repo you can see several other dumps.
There's always going to be discussions about which URIs have "ontological commitments", so why not just use the OBO PURL system to denote a URI for those commitments so that people curating in the OBO world can annotate stuff to them? The situation is already bad enough that one more URI isn't going to make it worse. Then people can just go about using CURIEs in their OBO files as they'd like without any extra stuff necessary.
I don't think using the Bioregistry to declare what's canonical is going to scale. There will always be conflict between what's practical, what's useful, and what the OBO community wants to serve its needs (which also has conflict based on each ontology) |
Beta Was this translation helpful? Give feedback.
-
@cmungall has whole slide decks on this issue. For now I think the most viable solution is to use identifiers.org. This gives us a standard that we can all refer to, IRIs that resolve, and doesn't force us to mint new PURLs that keep up to date with DB releases. |
Beta Was this translation helpful? Give feedback.
-
Chris' slides are a very good overview. I have a bunch of these cases too, that we've handled in different ways. OBO has a bunch of database-translation ontologies already. It would be good to collect some strategies and best practises, and maybe aim for a handful of general approaches with good documentation, but I don't think there will ever be a one-size-fits-all solution here. In each case, a lot depends on what the database exposes, what URLs/IRIs are used in the wild, how willing they are to work with us, etc. I don't know enough about how Wikidata does this sort of thing, but it's always good to look carefully at their approaches. |
Beta Was this translation helpful? Give feedback.
-
https://identifiers.org/hgnc:{$id} Generating a new OBO OWL file regularly with all the latest IDs. The latter will not always be viable, and required resources, but may sometimes be worthwhile if there is both demand and a suitable API to code against (as in HGNC). Note - there are already competing ontologised versions of HGNC build off the API - see https://bioportal.bioontology.org/ontologies/HGNC-NR/?p=classes&conceptid=http%3A%2F%2Fidentifiers.org%2Fhgnc%2F49667 |
Beta Was this translation helpful? Give feedback.
-
If we go with OBO-ized versions of these databases, there are some questions we should be able to answer consistently, at least for each source Partially-in (what Rector et al call "conceptual coat rack" vs All-in; see Partial is more likely to be successful, at least in the short term. What that means is less logical axiomatization (effectively curtailing "relationship" tags in the existing obo files that Charlie provides). While it is tempting to cram in as much knowledge as possible, everything about this is hard to standardize, from the RO object property to the OWL axiom interpretation. Even a decision to model as annotation assertions (logically silent) is a decision. The is-a would have been controversial in the past, but we are now all in agreement that SO is fine here, no need for MSO. Note there can still be an "open market" for axiomatizations but the OBO core should IMO be a conceptual coat rack, at least at first. ownership and governance: This is hard but I think we have most chance of success if we stick to coat racks, and we don't try and get too clever or insert new concepts. For example, when translating a gene database, we make one class per gene ID in the source. We don't make fake groupings that are not in the source, the translation is completely isomorphic, unless there is agreement from the authority to diverge. Again there can be an open market on enhanced versions. Status within OBO, see: I think these should be in a special category, and there should be automatic exemption from certain checks. We should not force providers to game the system and provide a fake ontological definition for something for which an ontological definition makes no sense. I don't expect this to be controversial but I am often surprised when I think that. Ontology browsers can make use of this status too so they don't try and make fake ontological displays that explode when I expand the concept "variant" and get a flat list of a billion variants. |
Beta Was this translation helpful? Give feedback.
-
We constantly have problems with using identifiers like HGNC, ENSEMBL and others in OWL because they do not have PURLs in the Semantic Web sense. So we get people using them in 50 different ways.
https://bioregistry.io/registry/hgnc lists at least four expansions, and some are even missing:
Mondo and other use the "old identifiers.org way":
I am sure http variants exist for all of these.
So people write axioms like (simplified here for illustration):
http://purl.obolibrary.org/obo/MONDO_123 RO:has_basis_in_dysfunction_of https://identifiers.org/hgnc/16793 or whatever other scheme above.
This clearly does not interoperate.
What should we deal with this heterogeneity for database identifiers without formal PURL system?
Opinions welcome!
Beta Was this translation helpful? Give feedback.
All reactions