Skip to content

open2c/assemblyinfo

Repository files navigation

Assemblyinfo: Interact with assembly metadata in Python

CI Docs status Slack

Assemblyinfo simplifies the management and analysis of genome assembly metadata in Python.

This package provides:

  • Efficient tools for querying and manipulating assembly information datasets.
  • Streamlined methods for importing, exporting, and converting between common chromosome formats.
  • Utilities for retrieving assembly statistics across different versions or species.

Read the documentation for more information.

Installation

Bioframe is available on PyPI:

pip install assemblyinfo

Basic operations on chromosome data

Assemblyinfo offers a flexible and straigthforward interface to interact and perform basic queries.

import assemblyinfo

db = assemblyinfo.connect()
hg38 = db.assembly_info("hg38", roles=["assembled"])

Easily allows getting chromosome sizes:

hg38.chromsizes

> name
> chr1     248956422
> chr2     242193529
> ...

chromosome equivalences:

hg38.chromeq

>      ncbi     genbank        refseq
> chr1     1  CM000663.2  NC_000001.11
> chr2     2  CM000664.2  NC_000002.12
> chr3     3  CM000665.2  NC_000003.12
> ...

or assembly metadata:

hg38.metadata

> {'assembly_level': 'Chromosome',
 'assembly_type': 'haploid-with-alt-loci',
 'bioproject': 'PRJNA168',
 'submitter': 'Genome Reference Consortium',
 'synonyms': ['GRCh38', 'hg38'],
 'taxid': '9606',
 'species': 'homo_sapiens',
 'common_name': 'human',
 ... }

and more!

Request an assembly

Feel free to open an issue and request a non-reference assembly! Current supported species are:

['caenorhabditis_elegans',
 'homo_sapiens',
 'mus_musculus',
 'drosophila_melanogaster',
 'danio_rerio',
 'bos_taurus',
 'gallus_gallus',
 'canis_lupus_familiaris']

You also can easily see which specific assemblies are supported by:

db = assemblyinfo.connect()
db.available_assemblies()

Citing

If you use assemblyinfo in your work, please refer to:

@software{assemblyinfo_2024,
  author       = {Open2C},
  title        = {assemblyinfo},
  year         = {2024},
  publisher    = {Github},
  version      = {v0.0.1},
  url          = {https://github.com/open2c/assemblyinfo}
}

About

Genome assembly metadata

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages