SRA fetch and preprocess of illumina with FASTP hangs #42

0karl0 · 2024-06-12T20:24:59Z

This code

nextflow run fmalmeida/ngs-preprocess   -r dev -latest -profile docker --sra_ids "./input/sra_ids.txt"   --output illumina_single  --shortreads_type "single"   --fastp_additional_parameters " --trim_front1 5 --trim_tail1 5 "

hangs during procesing on 0/1 for the FASTP process

[54/9c682c] process > SRA_FETCH:GET_FASTQ (SRR28776895)    [100%] 1 of 1 ✔
[32/48d60d] process > SRA_FETCH:GET_METADATA (SRR28776895) [100%] 1 of 1 ✔
[-        ] process > NANOPORE:PORECHOP                    -
[-        ] process > NANOPORE:FILTER                      -
[-        ] process > NANOPORE:NANOPACK                    -
[-        ] process > PACBIO:BAM2FASTQ                     -
[-        ] process > PACBIO:NANOPACK                      -
[-        ] process > PACBIO:FILTER                        -
[17/78fe41] process > ILLUMINA:FASTP (SRR28776895)         [  0%] 0 of 1

However, there are 3 fastq files produced and following the previous command with this command completes the preprocessing:

nextflow run fmalmeida/ngs-preprocess   -r dev -latest -profile docker   --shortreads "illumina_single/SRA_FETCH/FASTQ/SRR28776895_data/*.fastq.gz" \                                
   --output illumina_single  --shortreads_type "single"   --fastp_additional_parameters " --trim_front1 5 --trim_tail1 5 "

executor >  local (3)
[-        ] process > SRA_FETCH:GET_FASTQ            -
[-        ] process > SRA_FETCH:GET_METADATA         -
[-        ] process > NANOPORE:PORECHOP              -
[-        ] process > NANOPORE:FILTER                -
[-        ] process > NANOPORE:NANOPACK              -
[-        ] process > PACBIO:BAM2FASTQ               -
[-        ] process > PACBIO:NANOPACK                -
[-        ] process > PACBIO:FILTER                  -
[64/63cded] process > ILLUMINA:FASTP (SRR28776895_2) [100%] 3 of 3 ✔

My guess is the nextflow does not point to the downloaded SRA files automatically. Perhaps there's a flag I missed.

The text was updated successfully, but these errors were encountered:

fmalmeida · 2024-06-13T11:11:08Z

Hi @0karl0 ,
Thanks for flagging this. I am going to investigate it further, however cannot commit on a deadline.
It is good to know that you could get your data with a workaround.

Once I have updates, I can update the ticket with them.

Cheers.

fmalmeida · 2024-06-30T13:51:30Z

Hi @0karl0 ,
I figured it out the problem is that this particular study has technical reads. Thus, three files were being downloaded instead of only two what was expected since study is paired end.

What I can do is, provide a parameter to allow or not for technical reads and try to fix its processing.
Or make it always skip it.

What do you think would be preferable in this scenario?

fmalmeida · 2024-06-30T14:00:34Z

I would probably vouch for skipping entirely the technical reads because they are more relevant for single cell data.

And I doubt people downloading single cell data would use an automation like this one.

But, would prefer to hear some inputs first.

fmalmeida self-assigned this Jun 13, 2024

fmalmeida added the bug Something isn't working label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SRA fetch and preprocess of illumina with FASTP hangs #42

SRA fetch and preprocess of illumina with FASTP hangs #42

0karl0 commented Jun 12, 2024 •

edited by fmalmeida

Loading

fmalmeida commented Jun 13, 2024

fmalmeida commented Jun 30, 2024

fmalmeida commented Jun 30, 2024

SRA fetch and preprocess of illumina with FASTP hangs #42

SRA fetch and preprocess of illumina with FASTP hangs #42

Comments

0karl0 commented Jun 12, 2024 • edited by fmalmeida Loading

fmalmeida commented Jun 13, 2024

fmalmeida commented Jun 30, 2024

fmalmeida commented Jun 30, 2024

0karl0 commented Jun 12, 2024 •

edited by fmalmeida

Loading