Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: describe workflow rationale #135

Merged
merged 23 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
7ed9971
docs: expand workflow description
deliaBlue Dec 22, 2023
ecc920c
docs: expand rule descriptions
deliaBlue Dec 24, 2023
b71d23b
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Jan 2, 2024
cde4393
docs: fix typos
deliaBlue Jan 5, 2024
8ce1821
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Jan 29, 2024
ce4e5b7
docs: update rules
deliaBlue Jan 29, 2024
4dfe8be
docs: update rules
deliaBlue Jan 30, 2024
b5c7200
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Jan 30, 2024
9f8f99e
fix: set correct wildcard
deliaBlue Jan 30, 2024
1e345dd
docs: complete rules improvement
deliaBlue Jan 31, 2024
a4049b1
revert: undo refactoring
deliaBlue Feb 23, 2024
ed8d595
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Feb 23, 2024
8f23697
Merge branch 'dev' into 126-docs-describe-workflow-rationale
uniqueg Mar 18, 2024
1eeb07a
docs: update main README
deliaBlue May 7, 2024
ada71c0
docs: update pipeline documentation
deliaBlue May 7, 2024
f37af52
docs: update pipeline documentation
deliaBlue May 24, 2024
dc5028e
docs: update README
deliaBlue May 24, 2024
c1bc82c
docs: extend workflow description
deliaBlue Jun 17, 2024
077cf8b
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Jul 31, 2024
9e3c639
Merge branch 'dev' into 126-docs-describe-workflow-rationale
deliaBlue Jul 31, 2024
cfba29e
Merge branch '126-docs-describe-workflow-rationale' of github.com:zav…
deliaBlue Jul 31, 2024
b57a5aa
docs: complete documentation
deliaBlue Aug 17, 2024
7741065
docs: complete documentation
deliaBlue Aug 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 47 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ tested, you can go ahead and run the workflow on your samples.
It is suggested to have all the input files for a given run (or hard links
pointing to them) inside a dedicated directory, for instance under the
_MIRFLOWZ_ root directory. This way, it is easier to keep the data together,
reproduce an analysis and set up Singularity access to them.
reproduce analysis and set up Singularity access to them.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

#### 1. Prepare a sample table

Expand Down Expand Up @@ -265,9 +265,7 @@ intermediate files generated during the process. The final outputs comprise:
1. A SAM file containing alignments intersecting a pri-miR locus. These
alignments intersect with extended start and/or end positions specified in the
provided pri-miR annotations. Please note that they may not contribute to the
final counting and may not appear in the final table. Alignments are discarded
if their start and/or end positions differ from the ends of the provided
pri-miR annotations by more bases than the extension used.
final counting and may not appear in the final table.

2. A SAM file containing alignments intersecting a mature miRNA locus. Similar
to the previous file, these alignments intersect with extended start and/or end
Expand Down Expand Up @@ -325,20 +323,46 @@ snakemake \

## Workflow description

The _MIRFLOWZ_ workflow first processes and indexes the user-provided genome
resources. Afterwards, the user-provided short read small-RNA-seq libraries will
be aligned separately against the genome and transcriptome. For increased
fidelity, two separated aligners, [Segemehl][segemehl] and our in-house tool
[Oligomap][oligomap], are used. All the resulting alignments are merged such
that only the best alignments of each read are kept (smallest edit distance).
Alignments are intersected with the user-provided, pre-processed miRNA
annotation file using [BEDTools][bedtools]. Counts are tabulated separately for
reads consistent with either miRNA precursors, mature miRNA and/or isomiRs.
Finally, ASCII-style alignment pileups are optionally generated for
user-defined regions of interest.

> **NOTE:** For a detailed description of each rule, please, refer to the
> [workflow documentation](pipeline_documentation.md)
The _MIRFLOWZ_ workflow initially processes and indexes the genome resources
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
provided by the user. The regions corresponding to mature miRNAs are extended
on both sides to accommodate isomiR species with shifted start and/or end
positions. If necessary, pri-miR loci are similarly extended to adjust to the
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
new miRNA coordinates.

Subsequently, the user-provided short-read small RNA-seq libraries undergo
quality filtering if a FASTQ file is provided. Alternatively, adapters are
directly removed. The resulting reads are independently mapped to both the
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
genome and the transcriptome using two distinct aligners: [Segemehl][segemehl]
and our in-house tool [Oligomap][oligomap]. After the mapping, only the best
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
alignments for each read, determined by the smallest edit distance, are
retained by merging and filtering the resulting alignments into a single file.

The collection of resulting alignments is then reduced to contain only unique
entries. Due to the short length of the reads and the sequence similarity among
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
miRNAs, the number of alignments can be high. Therefore, reads aligned beyond a
specified threshold are discarded. To address multimapping, alignments with the
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
fewest indels are preserved. These alignments are subsequently intersected with
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
the user-provided, pre-processed miRNA annotation files using
[BEDTools][bedtools]. Note that an alignment will not contribute to the final
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
count if its start and/or end positions differ significantly from the provided
miRNA annotations, beyond the extension applied to the mature miRNA start
and/or end positions, or by 1 if no extension was applied. Conversely, a
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
retained read contributes 1/n to all the annotated miRNA species it aligns
with, where `n` is the number of genomic and/or transcriptomic loci it aligns
to.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

_MIRFLOWZ_ employs an unambiguous notation to classify isomiRs using the format
`miRNA_name|5p-shift|3p-shift|CIGAR|MD`, where `5p-shift` and `3p-shift`
represent the differences between the annotated mature miRNA start and end
positions and those of the alignment, respectively.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

Counts are tabulated separately for reads consistent with either
miRNA precursors, mature miRNA and/or isomiRs and all library counts are
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
fused into a single table. Finally, ASCII-style alignment pileups are
optionally generated for user-defined regions of interest.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

> **NOTE:** For a detailed description of each rule along with some examples,
> please, refer to the [workflow documentation](pipeline_documentation.md).

The schema below is a visual representation of the individual workflow steps
and how they are related:
Expand All @@ -350,16 +374,18 @@ and how they are related:
_MIRFLOWZ_ is an open-source project which relies on community contributions.
You are welcome to participate by submitting bug reports or feature requests,
taking part in discussions, or proposing fixes and other code changes. Please
refer to the [contributing guidelines](CONTRIBUTING.md) if you are interested in
contribute.
refer to the [contributing guidelines](CONTRIBUTING.md) if you are interested
in contribute.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

## License

This project is covered by the [MIT License](LICENSE).

## Contact

For questions or suggestions regarding the code, please use the [issue tracker][issue-tracker]. Do not hesitate on contacting us via [email][email] for any other inquiries.
For questions or suggestions regarding the code, please use the
[issue tracker][issue-tracker]. Do not hesitate on contacting us via
[email][email] for any other inquiries.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

© 2023 [Zavolab, Biozentrum, University of Basel][zavolab]

Expand Down
Loading
Loading