You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
About 70% of our Panther ingest ends up in the dangling edges bin, and @leokim-l noticed that in a process that they're running that it seemed like we may have low orthology coverage between human genes and genes from species other than mouse.
Improve QC output to get a better picture of which nodes are missing from the graph, which kind of nodes are successfully merged, which kind of nodes work from the ingest without any normalization
Look at whether there are Panther associations that we're bringing through the process that we don't actually want (is it being filtered by taxon currently? does it match the taxon list of genes we have?)
Dipper brought in ZFIN's human curated orthology, right now we only have Panther. Possibly split that off from this issue as its own new modular ingest.
Additional info:
This is the neo4j query that is showing few results for species other than mouse:
`MATCH
(upheno:biolink:PhenotypicFeature WHERE upheno.id STARTS WITH "UPHENO:")<-[:biolink:subclass_of]-(phenotype:biolink:PhenotypicFeature)<-[gena:biolink:has_phenotype]-(gene:biolink:Gene)-[:biolink:orthologous_to]-(human_gene:biolink:Gene WHERE "NCBITaxon:9606" IN [human_gene.in](http://human_gene.in/)_taxon)
RETURN
upheno.id,
phenotype.id,
gene.id,
gena.negated,
CASE WHEN [gene.in](http://gene.in/)_taxon IS NOT NULL AND size([gene.in](http://gene.in/)_taxon) > 0
THEN REDUCE(s = "", x IN [gene.in](http://gene.in/)_taxon | s + x + CASE WHEN x <> [gene.in](http://gene.in/)_taxon[size([gene.in](http://gene.in/)_taxon)-1] THEN "|" ELSE "" END)
ELSE "" END AS gene_in_taxon,
human_gene.id,
gena.primary_knowledge_source,
gena.publications`
and here is the visualization showing the difference in counts
The text was updated successfully, but these errors were encountered:
Right now I'm adding ZFIN's curated orthology, which should give us the best possible connections between human and zebrafish, I'm also planning to fix the missing XB-GENEPAGE to XB-GENE mappings that will give us XenBase's own orthology.
What we missed in 351 was that the counts in Panther didn't change, even though the counts of what came out of our ingest did change, which will probably require a careful tracing through of the utils functions related to the ingest to see if there is a filtering that happens. I didn't include my methodology for the counting (boo past me!) there, the presence of subject_taxon_label and object_taxon_label sure looks like it was coming from the finished KG. I wonder if the kind of identifier used by Panther changed, moving from one where we had good mappings to one where we didn't?
Separately, I did some looking into DIOPT updates. They claim a 2021 build, but are missing ZFIN orthology that existed in 2020. It's wonderful when we can pull from an aggregator to get many sources in one easy go, but I don't think it's a great idea when the update cadence is so irregular.
About 70% of our Panther ingest ends up in the dangling edges bin, and @leokim-l noticed that in a process that they're running that it seemed like we may have low orthology coverage between human genes and genes from species other than mouse.
Additional info:
This is the neo4j query that is showing few results for species other than mouse:
and here is the visualization showing the difference in counts
The text was updated successfully, but these errors were encountered: