Skip to content

evolbioinfo/gotree_usecase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the Gotree/Goalign use-case workflow

The present use-case is described in the following publication:

Frédéric Lemoine, Olivier Gascuel

Gotree/Goalign: toolkit and Go API to facilitate the development of phylogenetic workflows,

NAR Genomics and Bioinformatics, Volume 3, Issue 3, September 2021, lqab075, doi

Introduction

In this use case, we analyze a phylogenomic dataset inspired from Vanderpool et al. PLoS biology, 2020, in which the authors analyze a set of 1,730 genes in primates in different ways. They infer the species tree either from from individual gene trees using ASTRAL III or from gene concatenation using maximum likelihood. Our use case is inspired from the concatenation study, using available groups of primate orthologous proteins in OrthoDB.

To do so, the workflow first maps the genbank identifiers of the 1,730 analyzed genes to their OrthoDB identifiers, and retrieves the orthologous groups of proteins shared in at least 90% of the 25 analyzed primates, and finally reconstructs a phylogenetic tree of these primates.

Workflow

Workflow DAG

Results

The tree can be visualized here

Tree

An archive with alignment, gene ids, and trees is downloadable on the v1.0 release.

Running the workflow

  • Pre-requisites: The workflow only needs Singularity and Nextflow
  • Configuring the workflow: Change values of executor, queue and clusterOptions in nextflow.config
  • Running the workflow: nextflow run workflow.nf --itolkey <User iTOL key> --itolproject <iTOL project>