The present use-case is described in the following publication:
Frédéric Lemoine, Olivier Gascuel
Gotree/Goalign: toolkit and Go API to facilitate the development of phylogenetic workflows,
NAR Genomics and Bioinformatics, Volume 3, Issue 3, September 2021, lqab075, doi
In this use case, we analyze a phylogenomic dataset inspired from Vanderpool et al. PLoS biology, 2020, in which the authors analyze a set of 1,730 genes in primates in different ways. They infer the species tree either from from individual gene trees using ASTRAL III or from gene concatenation using maximum likelihood. Our use case is inspired from the concatenation study, using available groups of primate orthologous proteins in OrthoDB.
To do so, the workflow first maps the genbank identifiers of the 1,730 analyzed genes to their OrthoDB identifiers, and retrieves the orthologous groups of proteins shared in at least 90% of the 25 analyzed primates, and finally reconstructs a phylogenetic tree of these primates.
The tree can be visualized here
An archive with alignment, gene ids, and trees is downloadable on the v1.0 release.
- Pre-requisites: The workflow only needs Singularity and Nextflow
- Configuring the workflow: Change values of
executor
,queue
andclusterOptions
innextflow.config
- Running the workflow:
nextflow run workflow.nf --itolkey <User iTOL key> --itolproject <iTOL project>