Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ enhance genotype conversion #114

Open
3 tasks
bunop opened this issue Feb 9, 2024 · 0 comments
Open
3 tasks

⚡ enhance genotype conversion #114

bunop opened this issue Feb 9, 2024 · 0 comments
Labels
performance Improve the performance or better resource management

Comments

@bunop
Copy link
Member

bunop commented Feb 9, 2024

Is your feature request related to a problem? Please describe.
SNP genotype conversion is very slow. For example, if the original file is already in plink binary format it requires too much time to convert it by creating temporary plink text files. Converting the whole dataset into FORWARD (see #111) generates huge temporary files. It requires time converting from binary to text since the information for the same sample is column based in binary and row based in text formats.

Describe the solution you'd like
Ideally data conversion need to be done with binary formats (not text). If it's possible to change only the alleles of the .bim files for plink binary files it needs to be done.

Describe alternatives you've considered
We working with text until now and it's works. However, is very inefficient.

Additional context

  • understand how plinkio deal with writing files
  • value plink transposed binary format
  • value updating only the allele in *.bim file for binary format
@bunop bunop added the performance Improve the performance or better resource management label Feb 9, 2024
@bunop bunop added this to the SMARTER database v1.0.0 milestone Feb 9, 2024
@bunop bunop added this to SMARTER Feb 9, 2024
@github-project-automation github-project-automation bot moved this to To do in SMARTER Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improve the performance or better resource management
Projects
Status: To do
Development

No branches or pull requests

1 participant