Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔖 release v0.4.3 #36

Merged
merged 29 commits into from
Nov 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
678d8c9
:bookmark: Bump version: 0.4.2 → 0.4.3.dev0
bunop Sep 30, 2021
d6822f9
:white_check_mark: test FID in import illumina multibreed report
bunop Sep 30, 2021
1209a34
:bug: bug fixed in importing multibreed reportfile
bunop Sep 30, 2021
4f33af4
:twisted_rightwards_arrows: Merge pull request #33 from cnr-ibba/issu…
bunop Oct 1, 2021
7d8c38e
:memo: add readme for FTP
bunop Oct 1, 2021
88d56f8
:recycle: plot image for SMARTER notebook/results
bunop Oct 1, 2021
113b1e5
:sparkles: deal with half-missing SNPs in genotypes
bunop Oct 8, 2021
8168917
:sparkles: import sweden background sheep dataset
bunop Oct 8, 2021
197eb9f
:pushpin: pin to a custum version of plinkio
bunop Oct 11, 2021
c2598fc
:sparkles: import french goat foreground dataset
bunop Oct 11, 2021
6ef2717
:wrench: allow phenotypes for ambigous sex animals
bunop Oct 11, 2021
6263e09
:sparkles: import phenotypes using alias
bunop Oct 11, 2021
46be740
:sparkles: import phenotypes from Uruguay
bunop Oct 11, 2021
56732bb
:monocle_face: update smarter results notebook
bunop Oct 13, 2021
7b2ceca
:recycle: update greek metadata
bunop Oct 15, 2021
a4b5c21
:bug: fix greek metadata
bunop Oct 20, 2021
c40cead
:recycle: load greek metadata from phenotypes dataset
bunop Oct 20, 2021
de27534
:wrench: pack and checksum merged genotypes
bunop Oct 20, 2021
215be1f
:arrow_up: Bump babel from 2.9.0 to 2.9.1
dependabot[bot] Oct 21, 2021
dd9218c
:arrow_up: Bump dask from 2021.2.0 to 2021.10.0
dependabot[bot] Oct 27, 2021
62096f8
:recycle: update requirements.txt
bunop Nov 10, 2021
a504b3b
:card_file_box: model SampleSpecie.type_ attribute
bunop Nov 10, 2021
9a42cfd
:white_check_mark: fix features tests
bunop Nov 10, 2021
5da3411
:white_check_mark: fix data tests
bunop Nov 10, 2021
389db1a
:sparkles: track database status and costants
bunop Nov 10, 2021
a4a1260
:mute: suppress debug logs
bunop Nov 10, 2021
0afa0f1
:twisted_rightwards_arrows: Merge pull request #39 from cnr-ibba/issu…
bunop Nov 11, 2021
3c52132
:recycle: split greek foreground metadata in two datasets
bunop Nov 11, 2021
a6ed1ab
:memo: updare FTP README
bunop Nov 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.4.2
current_version = 0.4.3.dev0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\.(?P<release>[a-z]+)(?P<build>\d+))?
Expand Down
25 changes: 22 additions & 3 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ TODO

* ``illumina_top`` attribute should be referred to variants, while
in ``locations`` should be stored the read value from data source.
illumina_top shouldn't change within the same SNP, indipendently from data source
illumina_top shouldn't change within the same SNP, independently from data source
* Check chromosomes in *Variants locations*: mind to **scaffold**, **null**, and
**non-autosomal** chromosomes for *Goat* and *Sheep*
* Enable continuous integration
Expand All @@ -22,6 +22,25 @@ TODO
both in *import_samples* and *import_metadata* scripts
* define a collection for all available *purpose* phenotypes

0.4.3.dev0
----------

* Track database status and constants
* Add *foreground/background* type attribute in ``SampleSpecies``
* Update dependencies
* Add make rule to pack results and make checksum
* Move greek foreground metadata to a custom phenotypes dataset
* Update greek foreground metadata
* Import phenotypes from Uruguay
* Import phenotypes using alias
* Allow phenotypes for ambiguous sex animals
* Import french goat foreground dataset
* Pin ``plinkio`` to support *extra-chroms* in plink binary files
* Import 5 Sweden Sheep background genotypes
* Force *half-missing* SNPs to be MISSING
* Add the README.txt.ftp
* Bug fixed in importing multibreed reportfile (setting FID properly in output)

0.4.2 (2021-08-27)
------------------

Expand All @@ -40,7 +59,7 @@ TODO
* Import french foreground sheep dataset
* Use ``elemMatch`` in projection in ``plinkio.SmarterMixin.fetch_coordinates``
(ex: ``VariantSheep.objects.fields(elemMatch__locations={"imported_from": "SNPchiMp v.3", "version": "Oar_v4.0"})``)
* Use ``elemMatch`` to search a SNP within the desidered coordinate systems in ``plinkio.SmarterMixin.fetch_coordinates``
* Use ``elemMatch`` to search a SNP within the desired coordinate systems in ``plinkio.SmarterMixin.fetch_coordinates``
* Skip SNPchimp indels when importing from SNPchimp
* Skip illumina indels when reading from manifest

Expand Down Expand Up @@ -77,7 +96,7 @@ TODO
* Fix bug in importing dataset order
* Model affymetrix fields
* Read from affymetrix manifest file
* Track illumina manifactured date
* Track illumina manufactured date

0.3.1 (2021-06-11)
------------------
Expand Down
107 changes: 78 additions & 29 deletions Makefile

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions data/processed/README.txt.ftp
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@

SMARTER Genotype FTP Repository
===============================

Welcome to SMARTER Genotype FTP repository! This repository collects the processed
genoypes for the SMARTER project created using the https://github.com/cnr-ibba/SMARTER-database
software.

Folder structure
----------------

All genoypes are processed in a unique PLINK binary file and divided by species
and by the most common assemblies. The PLINK binary files are compressed and archived
in .zip archive. This repository is structured like this:

.
├── GOAT
│   ├── ARS1
│   │   └── archive
│   └── CHI1
│   └── archive
└── SHEEP
├── OAR3
│   └── archive
└── OAR4
└── archive

where GOAT and SHEEP folder collect data for Goat and Sheep respectively, and all
genotypes are collected and processed within their mayor assemblies. For example,
if you need to retrieve the Goat genoypes in the latest assembly version, you
need to go under the GOAT/ARS1 folder. All samples and variants imported are
present in a unique dataset for specie / assembly.

NOTE: CHI1 and OAR4 will be available soon

File naming convention
----------------------

Files names convention relies the following schema:

SMARTER-<genere/specie initials>-<assembly>-top-<version>.<ext>

So the plink file prefix SMARTER-OA-OAR3-top-0.4.2 stands for SMARTER Sheep (Ovis Aries)
OAR3 release v0.4.2. There will be always 6 different extension which corresponds
to the plink binary files (.bed, .bim, .fam, .hh, .log, .sex - For more informations,
see the PLINK documentation: https://www.cog-genomics.org/plink/1.9/formats). Those
file are compressed and archive in the same .zip archive, which follow the names
convention of the plink binary file. There's also a .md5 file useful to verify
file integrity.

Datasets releases
-----------------

Datasets are released using the same versioning system used by https://github.com/cnr-ibba/SMARTER-database
project, since datasets are generated using the correponding software version.
Such versions could introduce new samples, new variants or updates/fix in genotype
positions. To have detailed information on changes within a new version, please refer to
https://github.com/cnr-ibba/SMARTER-database/blob/master/HISTORY.rst. When a older
dataset version is replaced by a new dataset version, the older dataset version is
moved inside the archive folder inside the Species/Assembly folder. This happens
when there are any changes in genotypes between two version (and not changes in
metadata or SMARTER-database software). You can retrieve an older SMARTER dataset
version if there are any changes in genoypes between the old and the new releases.

Stable and latest releases
~~~~~~~~~~~~~~~~~~~~~~~~~~

Datasets with a version like 0.4.2 are stable version and were generated using
the same stable version (aka tag) of https://github.com/cnr-ibba/SMARTER-database
project. Version with a dev suffix like 0.4.3.dev0 are not intended to be stable,
could have the latest information of fixes but there's no guarantee that such file
will be not updated again later or that such file is generated with the latest
SMARTER-database software. When SMARTER-database software is released, a new
stable dataset version will replace the unstable dataset. Once a dataset is stable,
new changes and fix will be released using a newer version.

Accessing metadata and subsetting dataset
-----------------------------------------

Datasets is composed by all samples imported with SMARTER project. However there
are additional metadata that could be used to subset the dataset according user's
need. Those data are made available to SMARTER Data Portal (https://webserver.ibba.cnr.it/smarter/)
and SMARTER API (https://webserver.ibba.cnr.it/smarter/). Samples IDs and breed code
available through these interface are the same used in the genoypes dataset and
could be used to subset data accordinag user needs.
Loading