Skip to content

A python API to deal with NCBI taxonomy in a neo4j database

License

Notifications You must be signed in to change notification settings

bioinformatics-ptp/neoTaxonomy

Repository files navigation

neoTaxonomy

Build Status

neoTaxonomy is python API to deal with NCBI taxonomy in a neo4j database.

License

neoTaxonomy - A python API to deal with NCBI taxonomy in a neo4j database

Copyright (C) 2016-2017 Paolo Cozzi <[email protected]>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Dependencies

neoTaxonomy depends on a local installation of neo4j. You should consider to provide an installation on neo4j using docker like this:

$ docker pull neo4j:3.1
$ docker run -d --publish=7474:7474 --publish=7687:7687 --name neo4j --env=NEO4J_AUTH=<user>/<password> neo4j:3.1

This will download and run the latest neo4j image, publishing the standard HTTP and BOLT port on your host

Installation

Download code from GitHub, then install using pip:

$ git clone https://github.com/bioinformatics-ptp/neoTaxonomy.git
$ cd neoTaxonomy
$ pip install .

Usage

Loading data into database

Download taxdump data from NCBI taxonomy, and unpack archive:

$ mkdir taxdump
$ cd taxdump
$ wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
$ tar -xvzf taxdump.tar.gz

Then upload nodes.dmp and names.dmp with fillTaxonomyDB. You need to provide parameters for database connection, like this:

$ fillTaxonomyDB --nodes nodes.dmp --names names.dmp --host <host> --password=<password>

Using neoTaxonomy in scripts

An example of python program to get lineage by taxon_id:

from neotaxonomy import TaxGraph
db = TaxGraph(host='localhost', user='neo4j', password='password')
db.connect()

# 562 is the E. Coli taxon id

# get lineage only for ["superKingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"] (default)
print db.getLineage(562)
# [u'k__Bacteria', u'p__Proteobacteria', u'c__Gammaproteobacteria', u'o__Enterobacterales', u'f__Enterobacteriaceae', u'g__Escherichia', u's__coli']

# get only genus and species for E.Coli
print db.getLineage(562, ranks=["Genus", "Species"])
[u'g__Escherichia', u's__coli']

# get full NCBI taxonomy
print db.getFullLineage(562)
# [u'root', u'cellular organisms', u'Bacteria', u'Proteobacteria', u'Gammaproteobacteria', u'Enterobacterales', u'Enterobacteriaceae', u'Escherichia', u'Escherichia coli']

# get abbreviated NCBI taxonomy
print db.getFullLineage(562, abbreviated=True)
# [u'root', u'Bacteria', u'Proteobacteria', u'Gammaproteobacteria', u'Enterobacterales', u'Enterobacteriaceae', u'Escherichia', u'Escherichia coli']

About

A python API to deal with NCBI taxonomy in a neo4j database

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages