Contexts don't guarantee bi-directional mapping #11

cthoyt · 2022-10-20T12:35:46Z

In the BioPortal context file (https://github.com/linkml/prefixmaps/blob/main/src/prefixmaps/data/bioportal.csv#L199-L202), two prefixes share the same URI prefix. This seems to invalidate the claim on the README about bijectivity (https://github.com/linkml/prefixmaps/blob/main/README.md?plain=1#L13).

Maybe this is a curation oversight, which I bet @caufieldjh has already found, since I know he recently put a lot of effort into looking through this content. However, this isn't resolved when loading content through the package, which makes me think that the package should be more careful about checking the integrity of content. The following code illustrates:

from prefixmaps import load_context
context = load_context("bioportal")
for e in context.prefix_expansions:
    if e.prefix in {"INVERSEROLES", "ISO-15926-2_2003"}:
         print(e)

# Output:
# PrefixExpansion(context='bioportal', prefix='INVERSEROLES', namespace='http://rds.posccaesar.org/2008/02/OWL/ISO-15926-2_2003#', status=<StatusType.canonical: 'canonical'>)
# PrefixExpansion(context='bioportal', prefix='ISO-15926-2_2003', namespace='http://rds.posccaesar.org/2008/02/OWL/ISO-15926-2_2003#', status=<StatusType.canonical: 'canonical'>)

This means the assumptions in Context.as_dict() are also incorrect, since this naively iterates through the expansions and picks out the prefix/URI prefix (namespace) pairs.

I'm pretty stumped trying to understand the data structure used in this package, it seems like a lot of things that could be grouped are not. Have you considered using a JSON structure?

For example, the curies package has a lot of overlap in terms of needing to represent a group of related prefixes and URI prefixes while denoting which is the "canonical" prefix and "canonical" URI prefix. This data structure is described in the curies documentation and a more full example with the whole Bioregistry can be found here.

Background: I'm currently trying to implement a more principled import of a Context object from this package into a curies.Converter in biopragmatics/curies#22 and am stuck since there's no way to decide which of these two canonical records should be the actual canonical record.

To Do

The text was updated successfully, but these errors were encountered:

caufieldjh · 2022-10-20T14:49:36Z

two prefixes share the same URI prefix

I'm going to chalk that up to curation error - one ontology is a variation on the other, so the easy solution would just be to assign ISO-15926-2_2003 as the canonical prefix. I'll update accordingly (and probably catch some other issues in the process).

But yes, that means the assumption is that the input maps are bijective, without explicit validation.

cthoyt · 2022-10-20T15:09:08Z

Great, thanks. I would say besides just implementing this fix, this issue requires adding some automated tests for all content that can be run on CI and will stop new data from being added that doesn't pass these requirements

caufieldjh · 2022-10-20T17:22:29Z

Fixes to those maps are in d58f199

cthoyt · 2022-10-25T20:14:48Z

Following adding tests in #15, this issue can be closed if there’s branch protection on the main branch and a rule that tests have to pass

cmungall · 2022-10-26T02:35:52Z

@caufieldjh curated content should really go in a curated.yaml, the csvs should be entirely autogenerated. In fact we would eventually like to cede the curated maps upstream to bioregistry, but it is useful to be use a placeholder here for now.

cthoyt · 2022-10-26T12:59:52Z

I added #26, which is also a blocker for closing this issue.

caufieldjh · 2022-10-26T14:41:39Z

@cmungall do you mean something along the same lines of https://raw.githubusercontent.com/geneontology/go-site/master/metadata/db-xrefs.yaml ?

cthoyt mentioned this issue Oct 20, 2022

Add support for prefixmaps context object biopragmatics/curies#22

Closed

1 task

cthoyt mentioned this issue Oct 20, 2022

Add data integrity tests #15

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contexts don't guarantee bi-directional mapping #11

Contexts don't guarantee bi-directional mapping #11

cthoyt commented Oct 20, 2022 •

edited

Loading

caufieldjh commented Oct 20, 2022 •

edited

Loading

cthoyt commented Oct 20, 2022

caufieldjh commented Oct 20, 2022

cthoyt commented Oct 25, 2022 •

edited

Loading

cmungall commented Oct 26, 2022

cthoyt commented Oct 26, 2022

caufieldjh commented Oct 26, 2022

Contexts don't guarantee bi-directional mapping #11

Contexts don't guarantee bi-directional mapping #11

Comments

cthoyt commented Oct 20, 2022 • edited Loading

To Do

caufieldjh commented Oct 20, 2022 • edited Loading

cthoyt commented Oct 20, 2022

caufieldjh commented Oct 20, 2022

cthoyt commented Oct 25, 2022 • edited Loading

cmungall commented Oct 26, 2022

cthoyt commented Oct 26, 2022

caufieldjh commented Oct 26, 2022

cthoyt commented Oct 20, 2022 •

edited

Loading

caufieldjh commented Oct 20, 2022 •

edited

Loading

cthoyt commented Oct 25, 2022 •

edited

Loading