Releases: google/deepconsensus
Releases · google/deepconsensus
DeepConsensus 1.2.0
- DeepConsensus v1.2 introduces a new model that improves runtime by approximately 12% via changes to the neural network architecture (replacement of compute-intensive normalization layers).
- The new model includes base quality scores from CCS as input for improved yield at empirical Q30 and Q40 over CCS.
- Updated training to include data from maize (Z.mays B73) in addition to the data from CHM13. The inclusion of maize data slightly improves accuracy of DeepConsensus on human data and results in a small improvement on maize (189% Q30 yield improvement to 193% Q30 yield improvement).
- Raised the cap on base quality scores to match PBCCS, increasing the dynamic range that DeepConsensus can use to express base confidence.
- Added a docs page on model calibration, please see this to better understand predicted and actual confidence for reads and bases.
- Thanks to Daniel Liu (@Daniel-Liu-c0deb0t) for his work on replacement of normalization layers, which resulted in significant model speed improvements.
- Thanks to Armin Töpfer (@armintoepfer), Aaron Wenger (@amwenger), and William Rowell (@williamrowell) at PacBio for advice and collaboration.
DeepConsensus 1.1.0
- DeepConsensus v1.1 introduces a new model that improves coverage of telomere regions achieved through improved filtering of the training data with CHM13 high confidence regions.
- Improved yield at empirical Q30 from 187.1% in v1.0 to 194.4% in v1.1, relative to ccs baseline of 100%. This was achieved through improvements to the attention layer in the model.
- Updated the training tutorial for training on TPUs that users can use as a proof-of-concept to develop a training setup.
- This release evaluates performance using an updated HG002 truth assembly. We have re-evaluated previous releases with this updated dataset and updated Q30 yields accordingly.
- Thanks to Sergey Koren (@skoren) from NIH, NHGRI and the T2T consortium for invaluable feedback on the coverage of telomeric regions.
- Thanks to Daniel Liu (@Daniel-Liu-c0deb0t) for incorporating prior knowledge/sparsity in the attention layer of the model, which significantly improved the accuracy and Q30 yield.
- Thanks to Armin Töpfer (@armintoepfer), Aaron Wenger (@amwenger), and William Rowell (@williamrowell) at PacBio for advice and collaboration.
DeepConsensus 1.0.0
- DeepConsensus v1.0 introduces a new model that greatly improves the empirical Q30 yield across chemistries and the insert sizes we tested. For example, using our chem2.2_24kb dataset we observe an increase in Q30 yield from 149% to 176%.
- We reduced the size of our model (using distillation) and the size of the model inputs to lower runtime by approximately 10%, while still improving accuracy over v0.3.
- DeepConsensus can now output a BAM file. BAM output can be used to examine the effective coverage (ec), number of passes (np), or predicted average read accuracy (rq).
- v1.0 introduces a training tutorial that users can use as a proof-of-concept to develop a training setup.
- Models introduced previously (v0.1, v0.2, v0.3) are not compatible with v1.0 and vice versa.
--max_passes
and--example_width
are now defined by the modelparams.json
file. Users do not need to set these flags when running inference. The--padding
flag has been removed. Padding is no longer added to model inputs.
Acknowledgements
- Thanks to Armin Töpfer (@armintoepfer), Aaron Wenger (@amwenger), and William Rowell (@williamrowell) at PacBio for advice and collaboration.
- Thanks to Lucas Brambrink (@lucasbrambrink) for model experiments and analysis.
- Thanks to Daniel Liu (@Daniel-Liu-c0deb0t) for model experiments, analysis, and advice.
DeepConsensus 0.3.1
Change Log
- This patch release reverts the
--min-quality
flag to use a default value of 20.
DeepConsensus 0.3.0
Change Log
- Runtime speedup of 4.9X compared to v0.2.
- Improved yield at empirical Q30 from 141% in v0.2 to 149%, relative to ccs baseline of 100%. This was achieved through improvements to training data including use of new CHM13 T2T assembly (
chm13v2.0_noY
) and sequencing. - Added a documentation page with yield metrics for 3 SMRT Cells with different read length distributions.
- Updated recommendation for ccs settings to skip very low-quality reads, saving runtime.
- Model input condenser layer added, saving runtime.
- To save significant runtime, added an option to skip running the model on windows that are already likely to be correct with
--skip_windows_above
a certain predicted quality from CCS, with a default Q45. - Memory profiling with batch option recommendations.
- Add support for TensorFlow SavedModel for portability.
- Added base quality calibration tuned for v0.3 model, customizable with
--dc_calibration
option. - The
--min-quailty
flag default was changed from 20 to 0 in this version. This change was reverted in v0.3.1.
Acknowledgement
- Thanks to Armin Töpfer, Aaron Wenger, and William Rowell at PacBio for advice and collaboration.
- Thanks to Felipe Llinares for contributing a new alignment training metric.
- Thanks to Moshe Wagner for adding a multiprocessing speedup to the preprocessing stage.
- Thanks to Joel Shor for model advice and code reviews.
DeepConsensus 0.2.0
Change Log
- Substantial (>10x) speed increase relative to v0.1.
- DeepConsensus now supports GPU execution. In our tests, using a NVIDIA V100 GPU is ~3.3x faster than CPU alone.
- Reduced installation complexity by removing Nucleus and Apache Beam dependencies. Added support for newer TensorFlow versions.
- CPU and GPU pip packages are now available alongside corresponding Docker images.
- A more user-friendly command-line interface has been added and can be invoked using
deepconsensus
. - A simplified one-step solution for running DeepConsensus has been developed and can be invoked using
deepconsensus run
. - Small improvements to accuracy by better mapping repetitive subreads with actc, increasing Q30 yield by 31.3 % relative to pbccs, compared to 30.6% for DeepConsensus v0.1.
Thanks to Armin Töpfer for actc support and Jeremy Schmutz for evaluations and feedback.
DeepConsensus 0.1.0
Initial release.
Please see: https://www.biorxiv.org/content/10.1101/2021.08.31.458403v1