Added a draft for fasta_ltrretriever_lai #4984

GallVp · 2024-02-25T22:32:17Z

PR checklist

Closes #XXX

mahesh-panchal

Looks good to me.

One detail I'm not positive about, but absolutely makes sense to me is if

 ch_ltrretriever_inputs.map { meta, fasta, ltr -> [ meta, fasta ] },
 ch_ltrretriever_inputs.map { meta, fasta, ltr -> ltr },

is the same as

ch_ltrretriever_inputs.multiMap { meta, fasta, ltr ->
    ch_fasta: [meta, fasta]
    ch_ltr: ltr

We normally use the second version, but I don't see why the first version shouldn't be equivalent. Channels are still queues even if they're asynchronous. I guess I'm just wondering if using two+ map operations could ever make the inputs out of sync.

GallVp · 2024-02-26T21:23:22Z

We normally use the second version, but I don't see why the first version shouldn't be equivalent. Channels are still queues even if they're asynchronous. I guess I'm just wondering if using two+ map operations could ever make the inputs out of sync.

Thank you @mahesh-panchal

The two map operations should not desynchronise the channel as it is a FIFO queue. In my experience, desynchronisation is a problem when multiple channels are involved. But here we are explicitly joining (i.e. synchronising) channels into a single channel:

ch_ltrretriever_inputs          = ch_short_ids_fasta.join(ch_ltr_candidates)

If ch_short_ids_fasta and ch_ltr_candidates are out of sync, their messages are synchronised by a unique key using the join operator.

mahesh-panchal · 2024-05-14T07:48:45Z

Sorry, perhaps let's move the conversation here where we can see the overview.
So I'm confused by the suggestion. Is the suggestion to make CUSTOM_SHORTENFASTAIDS a native (exec:) process, and CUSTOM_RESTOREGFFIDS local functions instead, or the other way around? And the other process emits the files from the function through the process?

GallVp · 2024-05-15T22:24:31Z

The suggestion is to replace CUSTOM_SHORTENFASTAIDS and CUSTOM_RESTOREGFFIDS with local groovy functions inside the FASTA_LTRRETRIEVER_LAI workflow. Emit the files produced by the local functions as outputs from FASTA_LTRRETRIEVER_LAI.

This delegates the problem of publishing the files produced by the local functions to the pipeline developer. They can decide to crate a local patch of the sub-workflow and add the output directory to collectFile operator or use a module like cat/cat outside the sub-workflow to publish the files.

mahesh-panchal · 2024-05-16T07:27:01Z

I would say go ahead an implement it. It's easier to see what you mean when there's code. That's the wonderful thing about using git. We can always go back and checkout a previous state if the idea doesn't work out.

GallVp · 2024-05-29T02:21:35Z

Hi @mahesh-panchal

I have hit another road block. I am not sure if we can attach meta to a file after splitting it with splitFasta operator.

I am trying to replace CUSTOM_SHORTENFASTAIDS (#5001) with following groovy code:

workflow {
    Channel.of(file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true))
    | splitFasta( record: [ id: true, sequence: true ] )
    | mix(
        Channel.of(file(params.test_data['actinidia_chinensis']['genome']['genome_1_fasta_gz'], checkIfExists: true))
        | splitFasta( record: [ id: true, sequence: true ] )
    )
    // | operator( [ id: 'test' ] ) // How to insert meta
    | map { record, meta ->
        if ( record.id.size() <= 13 && record.id ==~ /\w+/ ) return [ record, meta ]

        def id_digest   = record.id.md5()[0..<10]
        record.id       = "Ctg$id_digest"

        [ record, meta ]
    }
    | collectFile( sort: { it.size() } ) {
        [ 'test.fa', it.id + '\n' + it.sequence ]
    }
    | view
}
// Execute on macOS: (export PROFILE='docker' && nextflow run playground.nf -c tests/config/nf-test.config)

mahesh-panchal · 2024-05-29T14:28:50Z

You can use the splitFasta method, rather than the channel operator.

    Channel.of( file( params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true ) )
        | map { file -> [ [ id:file.baseName ], file ] }
        | flatMap { meta, file -> file.splitFasta( record: [ id: true, sequence: true ] ).collect{ [ meta, it ] } }

Or something like that.

GallVp · 2024-06-05T02:39:33Z

Hi @mahesh-panchal

Thank you for the hint. I have used it to create a groovy version of the module.

I have also changed the name of the sub-workflow to make the underlying operations explicit: FASTA_LTRHARVEST_LTRFINDER_LTRRETRIEVER_LAI

This has been a very complicated sub-workflow. It has given me a lot of trouble with pesky little bugs even with the custom modules which I tested independently. Now we have put everything into a single workflow and the chances that there is an edge case bug somewhere in there are quite high.

mahesh-panchal

I'll need to get back to this later on. 6th and 7th are holidays here.

mahesh-panchal · 2024-06-05T15:07:00Z

subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf

+
+    take:
+    ch_fasta                        // channel: [ val(meta), fasta ]
+    ch_monoploid_seqs               // channel: [ val(meta2), txt ]; Optional: Set to [] if not needed


Rather than use [] for nothing, I think it would better to say use Channel.empty(). This might simplify some logic downstream

Out of date

mahesh-panchal

Sorry, I still more time to review this.

mahesh-panchal · 2024-06-13T20:15:04Z

subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf

+    ch_versions                     = Channel.empty()
+
+    // Prapre input channels
+    ch_monoploid_seqs_plain         = ( ch_monoploid_seqs ?: Channel.empty() )


Assuming Channel.empty() as input rather than [] would mean this ternary statement isn't needed.

subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf

mahesh-panchal · 2024-06-13T20:22:18Z

subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf

+                                    }
+                                    | collectFile
+                                    | map { tsv ->
+                                        [ tsv.baseName.replace('.short.ids', ''), tsv ]


Why does .short.ids get removed here rather than naming the file without the .short.ids in the first place?

subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf

GallVp added the WIP Work in progress label Feb 25, 2024

GallVp requested a review from mahesh-panchal February 25, 2024 22:32

GallVp requested a review from a team as a code owner February 25, 2024 22:32

GallVp requested review from louperelo and removed request for a team February 25, 2024 22:32

GallVp marked this pull request as draft February 25, 2024 22:41

mahesh-panchal previously approved these changes Feb 26, 2024

View reviewed changes

This was referenced Feb 26, 2024

Added custom/shortenfastaids #5001

Closed

Added custom/restoregffids #5002

Closed

GallVp force-pushed the fasta_ltrretriever_lai branch from 48013af to 78bbfde Compare June 5, 2024 02:27

GallVp marked this pull request as ready for review June 5, 2024 02:28

GallVp requested a review from mahesh-panchal June 5, 2024 02:29

GallVp added Ready for Review and removed WIP Work in progress labels Jun 5, 2024

mahesh-panchal reviewed Jun 5, 2024

View reviewed changes

mahesh-panchal reviewed Jun 14, 2024

View reviewed changes

GallVp added 6 commits June 17, 2024 15:17

Added a draft for fasta_ltrretriever_lai

eb77b4b

Added map_monoploid_seqs_to_new_ids

ff7b34c

Copied over code for fasta_ltrretriever_lai

4128b10

Renamed to fasta_ltrharvest_ltrfinder_ltrretriever_lai

b68870f

Complete groovy impl of custom scripts

ae5bd76

Removed custom modules

f67be44

Removed do_ids_need_to_change func

f4af1c8

GallVp force-pushed the fasta_ltrretriever_lai branch from a63e0ea to f4af1c8 Compare June 17, 2024 03:37

SPPearce added the new subworkflow label Jun 26, 2024

Merge branch 'master' into fasta_ltrretriever_lai

130d65b

GallVp mentioned this pull request Oct 24, 2024

Add Lai to genome only mode nf-core/genomeqc#43

Open

Merge branch 'master' into fasta_ltrretriever_lai

e846ece

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a draft for fasta_ltrretriever_lai #4984

Added a draft for fasta_ltrretriever_lai #4984

GallVp commented Feb 25, 2024

mahesh-panchal left a comment

GallVp commented Feb 26, 2024

mahesh-panchal commented May 14, 2024

GallVp commented May 15, 2024

mahesh-panchal commented May 16, 2024

GallVp commented May 29, 2024 •

edited

Loading

mahesh-panchal commented May 29, 2024

GallVp commented Jun 5, 2024

mahesh-panchal left a comment

mahesh-panchal Jun 5, 2024

mahesh-panchal left a comment

mahesh-panchal Jun 13, 2024

mahesh-panchal Jun 13, 2024

Added a draft for fasta_ltrretriever_lai #4984

Are you sure you want to change the base?

Added a draft for fasta_ltrretriever_lai #4984

Conversation

GallVp commented Feb 25, 2024

PR checklist

mahesh-panchal left a comment

Choose a reason for hiding this comment

GallVp commented Feb 26, 2024

mahesh-panchal commented May 14, 2024

GallVp commented May 15, 2024

mahesh-panchal commented May 16, 2024

GallVp commented May 29, 2024 • edited Loading

mahesh-panchal commented May 29, 2024

GallVp commented Jun 5, 2024

mahesh-panchal left a comment

Choose a reason for hiding this comment

mahesh-panchal Jun 5, 2024

Choose a reason for hiding this comment

mahesh-panchal left a comment

Choose a reason for hiding this comment

mahesh-panchal Jun 13, 2024

Choose a reason for hiding this comment

mahesh-panchal Jun 13, 2024

Choose a reason for hiding this comment

GallVp commented May 29, 2024 •

edited

Loading