Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRA fetch and preprocess of illumina with FASTP hangs #42

Open
0karl0 opened this issue Jun 12, 2024 · 3 comments
Open

SRA fetch and preprocess of illumina with FASTP hangs #42

0karl0 opened this issue Jun 12, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@0karl0
Copy link

0karl0 commented Jun 12, 2024

This code

nextflow run fmalmeida/ngs-preprocess   -r dev -latest -profile docker --sra_ids "./input/sra_ids.txt"   --output illumina_single  --shortreads_type "single"   --fastp_additional_parameters " --trim_front1 5 --trim_tail1 5 "

hangs during procesing on 0/1 for the FASTP process

[54/9c682c] process > SRA_FETCH:GET_FASTQ (SRR28776895)    [100%] 1 of 1 ✔
[32/48d60d] process > SRA_FETCH:GET_METADATA (SRR28776895) [100%] 1 of 1 ✔
[-        ] process > NANOPORE:PORECHOP                    -
[-        ] process > NANOPORE:FILTER                      -
[-        ] process > NANOPORE:NANOPACK                    -
[-        ] process > PACBIO:BAM2FASTQ                     -
[-        ] process > PACBIO:NANOPACK                      -
[-        ] process > PACBIO:FILTER                        -
[17/78fe41] process > ILLUMINA:FASTP (SRR28776895)         [  0%] 0 of 1

However, there are 3 fastq files produced and following the previous command with this command completes the preprocessing:

nextflow run fmalmeida/ngs-preprocess   -r dev -latest -profile docker   --shortreads "illumina_single/SRA_FETCH/FASTQ/SRR28776895_data/*.fastq.gz" \                                
   --output illumina_single  --shortreads_type "single"   --fastp_additional_parameters " --trim_front1 5 --trim_tail1 5 " 
executor >  local (3)
[-        ] process > SRA_FETCH:GET_FASTQ            -
[-        ] process > SRA_FETCH:GET_METADATA         -
[-        ] process > NANOPORE:PORECHOP              -
[-        ] process > NANOPORE:FILTER                -
[-        ] process > NANOPORE:NANOPACK              -
[-        ] process > PACBIO:BAM2FASTQ               -
[-        ] process > PACBIO:NANOPACK                -
[-        ] process > PACBIO:FILTER                  -
[64/63cded] process > ILLUMINA:FASTP (SRR28776895_2) [100%] 3 of 3 ✔

My guess is the nextflow does not point to the downloaded SRA files automatically. Perhaps there's a flag I missed.

@fmalmeida fmalmeida self-assigned this Jun 13, 2024
@fmalmeida fmalmeida added the bug Something isn't working label Jun 13, 2024
@fmalmeida
Copy link
Owner

Hi @0karl0 ,
Thanks for flagging this. I am going to investigate it further, however cannot commit on a deadline.
It is good to know that you could get your data with a workaround.

Once I have updates, I can update the ticket with them.

Cheers.

@fmalmeida
Copy link
Owner

Hi @0karl0 ,
I figured it out the problem is that this particular study has technical reads. Thus, three files were being downloaded instead of only two what was expected since study is paired end.

What I can do is, provide a parameter to allow or not for technical reads and try to fix its processing.
Or make it always skip it.

What do you think would be preferable in this scenario?

@fmalmeida
Copy link
Owner

I would probably vouch for skipping entirely the technical reads because they are more relevant for single cell data.

And I doubt people downloading single cell data would use an automation like this one.

But, would prefer to hear some inputs first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants