-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a draft for fasta_ltrretriever_lai #4984
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
One detail I'm not positive about, but absolutely makes sense to me is if
ch_ltrretriever_inputs.map { meta, fasta, ltr -> [ meta, fasta ] },
ch_ltrretriever_inputs.map { meta, fasta, ltr -> ltr },
is the same as
ch_ltrretriever_inputs.multiMap { meta, fasta, ltr ->
ch_fasta: [meta, fasta]
ch_ltr: ltr
We normally use the second version, but I don't see why the first version shouldn't be equivalent. Channels are still queues even if they're asynchronous. I guess I'm just wondering if using two+ map
operations could ever make the inputs out of sync.
Thank you @mahesh-panchal The two ch_ltrretriever_inputs = ch_short_ids_fasta.join(ch_ltr_candidates) If |
Sorry, perhaps let's move the conversation here where we can see the overview. |
The suggestion is to replace This delegates the problem of publishing the files produced by the local functions to the pipeline developer. They can decide to crate a local patch of the sub-workflow and add the |
I would say go ahead an implement it. It's easier to see what you mean when there's code. That's the wonderful thing about using |
I have hit another road block. I am not sure if we can attach I am trying to replace workflow {
Channel.of(file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true))
| splitFasta( record: [ id: true, sequence: true ] )
| mix(
Channel.of(file(params.test_data['actinidia_chinensis']['genome']['genome_1_fasta_gz'], checkIfExists: true))
| splitFasta( record: [ id: true, sequence: true ] )
)
// | operator( [ id: 'test' ] ) // How to insert meta
| map { record, meta ->
if ( record.id.size() <= 13 && record.id ==~ /\w+/ ) return [ record, meta ]
def id_digest = record.id.md5()[0..<10]
record.id = "Ctg$id_digest"
[ record, meta ]
}
| collectFile( sort: { it.size() } ) {
[ 'test.fa', it.id + '\n' + it.sequence ]
}
| view
}
// Execute on macOS: (export PROFILE='docker' && nextflow run playground.nf -c tests/config/nf-test.config) |
You can use the
Or something like that. |
48013af
to
78bbfde
Compare
Thank you for the hint. I have used it to create a groovy version of the module. I have also changed the name of the sub-workflow to make the underlying operations explicit: FASTA_LTRHARVEST_LTRFINDER_LTRRETRIEVER_LAI This has been a very complicated sub-workflow. It has given me a lot of trouble with pesky little bugs even with the custom modules which I tested independently. Now we have put everything into a single workflow and the chances that there is an edge case bug somewhere in there are quite high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to get back to this later on. 6th and 7th are holidays here.
|
||
take: | ||
ch_fasta // channel: [ val(meta), fasta ] | ||
ch_monoploid_seqs // channel: [ val(meta2), txt ]; Optional: Set to [] if not needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than use []
for nothing, I think it would better to say use Channel.empty()
. This might simplify some logic downstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still more time to review this.
ch_versions = Channel.empty() | ||
|
||
// Prapre input channels | ||
ch_monoploid_seqs_plain = ( ch_monoploid_seqs ?: Channel.empty() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming Channel.empty()
as input rather than []
would mean this ternary statement isn't needed.
subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf
Outdated
Show resolved
Hide resolved
} | ||
| collectFile | ||
| map { tsv -> | ||
[ tsv.baseName.replace('.short.ids', ''), tsv ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does .short.ids
get removed here rather than naming the file without the .short.ids
in the first place?
subworkflows/nf-core/fasta_ltrharvest_ltrfinder_ltrretriever_lai/main.nf
Outdated
Show resolved
Hide resolved
a63e0ea
to
f4af1c8
Compare
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda