Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reuse indices for plasmids based on md5sum #39

Open
berntpopp opened this issue Aug 12, 2024 · 0 comments
Open

feat: Reuse indices for plasmids based on md5sum #39

berntpopp opened this issue Aug 12, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@berntpopp
Copy link
Member

Description:
Implement a mechanism to reuse existing indices for plasmids by saving them with an md5sum hash of the input file. If the plasmid input file hasn't changed (based on its md5sum), the corresponding index should be reused, avoiding unnecessary recomputation.

Tasks:

  • Generate an md5sum hash for each plasmid input file.
  • Save indices with a name or directory structure incorporating the md5sum.
  • Check if an index exists for a plasmid input by comparing the md5sum before generating a new index.
  • Update documentation to explain the md5sum-based indexing system.
  • Add tests to ensure that indices are correctly reused when input files are unchanged.

Benefits:

  • Significantly reduces computational time by avoiding redundant index creation.
  • Makes the pipeline more efficient, especially when working with large datasets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant