Skip to content

Releases: sacdallago/biotrainer

v0.9.5

09 Dec 16:36
Compare
Choose a tag to compare

09.12.2024 - Version 0.9.5

Features

  • Added integration for huggingface datasets by @heispv in #124
  • Added per-sequence dimension reduction methods by @nadyadevani3112 in #123
  • Improving one_hot_encoding embedder with numpy functions @SebieF

Maintenance

  • Fixing "precission" typo in clasification_solver.py
  • Updating dependencies
  • Improving documentation of the config module by @heispv in #121
  • Improving compute_embeddings function to handle Dict, str and Path as input_data
  • Reducing log level of onnx and dynamo to ERROR to decrease logging output
  • Fixing first_steps documentation
  • Adding links to biocentral app, repository and biotrainer documentation

v0.9.4

01 Nov 10:05
Compare
Choose a tag to compare

29.10.2024 - Version 0.9.4

Bug fixes

  • Hotfix for incorrect precision mode setting by @SebieF in #116

Maintenance

  • Updating dependencies: removing python3.9 support
  • Updating CI workflow to be compatible with Windows

Known problems

  • Currently, there are compatibility problems with ONNX on some machines, please refer to the following issue: #111

v0.9.3

14 Oct 10:03
Compare
Choose a tag to compare

14.10.2024 - Version 0.9.3

Features

  • Adding support for ProstT5 embedder by @SebieF in #110

Maintenance

Bug fixes

  • Adding improved onnx saving and inferencer fixes by @SebieF in #112

v0.9.2

26 Aug 16:54
Compare
Choose a tag to compare

26.08.2024 - Version 0.9.2

Features

  • Improving memory management of embedding calculation by @SebieF in #96
  • Use a strategy for sequence preprocessing by @SebieF in #99
  • Adding ONNX support by @SebieF in #101
  • Adding saprot embedder example by @SebieF in #106

Maintenance

  • BREAKING Improving masking mechanisms in CNN and LightAttention models by @SebieF in #102
  • Improving embedder model and tokenizer class recognition by @SebieF in #105
  • Optimize Memory Handling in Embedding Computations and Refactor EmbeddingService by @heispv in #103
  • Updating dependencies

v0.9.1

14 Jul 10:09
Compare
Choose a tag to compare

10.07.2024 - Version 0.9.1

Maintenance

  • Fixing error in type checking for device
  • Updating dependencies
  • Updating inference examples
  • Adding hint for version mismatch in inferencer
  • Adding class weights to out.yml if they are calculated
  • Adding contributors file

Features

  • Improving fallback mechanism of embedder models. Now, cpu mode is exited once there is enough
    RAM again for shorter sequences
  • Changing model storage format from .pt to .safetensors.
    Safetensors is safer for model sharing. Legacy .pt format is still supported, and can be converted via
from biotrainer.inference import Inferencer
inferencer, out_file = Inferencer.create_from_out_file(out_file_path="out.yml", allow_torch_pt_loading=True)
inferencer.convert_all_checkpoints_to_safetensors()

v0.9.0

16 Jun 10:35
Compare
Choose a tag to compare

16.06.2024 - Version 0.9.0

Maintenance

  • Adding more extensive code documentation
  • Optimizing imports
  • Applying consistent file naming
  • Updating dependencies. Note that jupyter was removed as a direct optional dependency.
    You can always add it via poetry add jupyter.
  • Adding simple differentiation between t5 and esm tokenizer and models in embedders module

Features

  • Adding new residues_to_value protocol.
    Similar to the residues_to_class protocol,
    this protocol predicts a value for each sequence, using per-residue embeddings. It might, in some situations, outperform
    the sequence_to_value protocol.

Bug fixes

  • For huggingface_transformer_embedder.py, all special tokens are now always deleted from the final embedding
    (e.g. first/last for esm1b, last for t5)

v0.8.4

06 Jun 13:24
Compare
Choose a tag to compare

04.06.2024 - Version 0.8.4

Maintenance

  • Updating dependencies
  • Adding pip-audit dependency check to CI pipeline

v0.8.3

04 May 09:07
Compare
Choose a tag to compare

04.05.2024 - Version 0.8.3

Maintenance

  • Updating dependencies

Features

  • Adding mps device for macOS. Use by setting the following configuration option: device: mps.
    Note that MPS is still under development, use it at your responsibility.
  • Adding flags to the compute_embedding method of EmbeddingService
  1. force_output_dir: Do not change the given output directory within the method
  2. force_recomputing: Always re-compute the embeddings, even if an existing file is found

These changes are made to make the embedders module of biotrainer easier usable outside the biotrainer pipeline itself.

v0.8.2

28 Feb 09:22
Compare
Choose a tag to compare

Maintenance

  • Updating dependencies

Features

  • Adding option to ignore verification of files in configurator.py. This makes it possible to verify a biotrainer
    configuration independently of the provided files.
  • Adding new compute_embeddings_from_list function to embedding_service.py. This allows to compute embeddings directly
    from sequence strings.

v0.8.1

12 Jan 14:39
Compare
Choose a tag to compare

12.01.2024 - Version 0.8.1

Maintenance

  • Updating dependencies after removing bio_embeddings, notably upgrading torch and adding accelerate
  • Updating examples, documentation, config and test files for inferencer tests to match the new compile mode
  • Replaced the exception with a warning if dropout_rate was set for a model that does not support it (e.g. LogReg)

Features

  • Enable pytorch compile mode. The feature exists since torch 2.0 and is now available in biotrainer. It can be enabled via
disable_pytorch_compile: False