-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contexts don't guarantee bi-directional mapping #11
Comments
I'm going to chalk that up to curation error - one ontology is a variation on the other, so the easy solution would just be to assign But yes, that means the assumption is that the input maps are bijective, without explicit validation. |
Great, thanks. I would say besides just implementing this fix, this issue requires adding some automated tests for all content that can be run on CI and will stop new data from being added that doesn't pass these requirements |
Fixes to those maps are in d58f199 |
Following adding tests in #15, this issue can be closed if there’s branch protection on the main branch and a rule that tests have to pass |
@caufieldjh curated content should really go in a curated.yaml, the csvs should be entirely autogenerated. In fact we would eventually like to cede the curated maps upstream to bioregistry, but it is useful to be use a placeholder here for now. |
I added #26, which is also a blocker for closing this issue. |
@cmungall do you mean something along the same lines of https://raw.githubusercontent.com/geneontology/go-site/master/metadata/db-xrefs.yaml ? |
In the BioPortal context file (https://github.com/linkml/prefixmaps/blob/main/src/prefixmaps/data/bioportal.csv#L199-L202), two prefixes share the same URI prefix. This seems to invalidate the claim on the README about bijectivity (https://github.com/linkml/prefixmaps/blob/main/README.md?plain=1#L13).
Maybe this is a curation oversight, which I bet @caufieldjh has already found, since I know he recently put a lot of effort into looking through this content. However, this isn't resolved when loading content through the package, which makes me think that the package should be more careful about checking the integrity of content. The following code illustrates:
This means the assumptions in
Context.as_dict()
are also incorrect, since this naively iterates through the expansions and picks out the prefix/URI prefix (namespace) pairs.I'm pretty stumped trying to understand the data structure used in this package, it seems like a lot of things that could be grouped are not. Have you considered using a JSON structure?
For example, the
curies
package has a lot of overlap in terms of needing to represent a group of related prefixes and URI prefixes while denoting which is the "canonical" prefix and "canonical" URI prefix. This data structure is described in thecuries
documentation and a more full example with the whole Bioregistry can be found here.Background: I'm currently trying to implement a more principled import of a
Context
object from this package into acuries.Converter
in biopragmatics/curies#22 and am stuck since there's no way to decide which of these two canonical records should be the actual canonical record.To Do
The text was updated successfully, but these errors were encountered: