Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the “composite“ life stages ontology directly in Uberon #3443

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

gouttegd
Copy link
Collaborator

@gouttegd gouttegd commented Dec 6, 2024

(Draft PR as this is a work that depends on:

This PR changes the way Uberon interacts with the “Species-Specific Life Stages Ontology” (SSLSO) project. Roughly, instead of relying on that project to provide us with a “pre-composited” version of the life stage ontologies, we do everything here in Uberon. All SSLSO has to do is to provide us with the mappings between their terms and the corresponding taxon-neutral terms in Uberon.

There are several reasons for such a change, the most important being that it keeps all the logic to create the “composite” ontologies in the same place, here in Uberon. Having the SSLSO perform its own compositing leads to a lot of duplicated code, a lot of unnecessary back-and-forth between Uberon and SSLSO (Uberon generating the bridges with FBdv and WBls, which are then fetched by SSLSO to produce ssso-merged-uberon, which is then fetched by Uberon to produce composite-metazoan), and a risk that the two composite pipelines (the one in SSLSO and the one in Uberon) are not kept in sync and therefore behave slightly differently (which is exactly the case currently).

The SSLSO (species-specific life stage ontology) is now a fairly normal
ODK-managed ontology that we can "import" (talking about our "local
imports" here, not imports in a ODK sense -- those are the imports that
are used to build Composite Metazoan) without any special treatment.
The SSLSO project provides its own mapping set, so we just need to fetch
it, then we can generate the bridge at the same time as all the other
bridges.

We do _not_ generate a distinct bridge for all the species present in
SSLSO, and we will not do that until/unless there is an explicit demand
for it. All bridging axioms to SSLSO terms are in a single bridge,
except for HsapDv and MmusDv terms (we need MmusDv as a separate bridge
to construct composite-mouse; there is no real reason I can think of to
have a separate HsapDv bridge, but we always had it, so I can already
hear people screaming if I dare remove it.)
Add a new product coming out of the Composite pipeline:
"composite-lifestages". This is basically the equivalent to the
"ssso-merged-uberon" product that used to be produced by the
"developmental stages ontology" project.

The intermediate file on the way to get to "composite-lifestages",
"collected-lifestages", is basically the equivalent of "ssso-merged".
@gouttegd gouttegd self-assigned this Dec 6, 2024
@gouttegd gouttegd added tech pipeline composite bridge-files Issues related to the generation of bridge files from Uberon to other ontologies. labels Dec 6, 2024
@gouttegd
Copy link
Collaborator Author

gouttegd commented Dec 6, 2024

QC workflow cancelled as it is bound to fail currently, since the new version of SSLSO is not publicly available yet.

Instead of calling the uberon:merge-species command repeatedly, once for
every species to merge, we call it only once, with a batch file listing
all the species for which a merge is required.

This removes some clutter from the Makefile, but most importantly this
also makes the whole operation much faster (from ~45min down to ~7min,
on my machine), because in batch mode the reasoner state is shared
between all merge operations -- we don't need to create a new reasoner
and have it reason over the ontology for every merge, which is what
takes the most time. The reasoner is initialised once at the beginning
of the first merge, and then it just needs to be kept updated for the
subsequent merge, which is much faster than creating a whole new
reasoner instance.
As for composite-vertebrate.owl and composite-metazoan.owl, we need a
separate rule to create the composite-lifestages.owl product. The
generic rule 'composite-%.owl' is not enough because the standard
ODK-generated Makefile already contains a more specific rule, than can
only be overriden by an equally specific rule.
Now that we generate those two additional products, we must take care
that they are not inadvertently committed to the repository.
Currently, the information about which bridges to generate and how, and
which species to unfold in composite-metazoan and how, is dispersed in
two different places: in the bridges/bridges.rules.m4 source file to
generate the bridge, and the config/tax-merges.tsv to generate
composite-metazoan.

This commit proposes to make those config data more manageable by moving
them all to a single config/taxa.yaml file, from which we derive (using
a relatively simple Python script) both the SSSOM/T rule file and the
batch file that drives the compositing process.

Arguably, having the SSSOM/T ruleset being generated by a Python script
is more maintainer-friendly than having it generated by M4 macros, given
that there are likely many more ontology engineers that can read and
write Python than ontology engineers that can read and write M4 (which I
believe is a shame, as M4 is a powerful and lightweight tool that can do
great things when used well, but that's unfortunately beyond the point).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bridge-files Issues related to the generation of bridge files from Uberon to other ontologies. composite pipeline tech
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant