You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By visually inspecting the alignments (using samtools tview, there are some regions which seems to have bad alignments: the reason seems to be that Markduplicates changes the sequence in aligned file. For example, before markduplicates we have:
Then number of matches is identical, however markduplicates add 150 mismatches, and the sequence changed in column 10 is the sequence visualized using samtools tview. This behaviour does not affect all the genome regions. Is not clear how this affects the calling process. Markduplicates should be removed as described in #71
The text was updated successfully, but these errors were encountered:
By visually inspecting the alignments (using
samtools tview
, there are some regions which seems to have bad alignments: the reason seems to be thatMarkduplicates
changes the sequence in aligned file. For example, beforemarkduplicates
we have:samtools view WT.cram Chr01:60000-60001 --reference Pvulgaris_442_v2.0.fa | head -n1 A01083:294:H3LHWDSXC:2:2553:27751:35556 147 Chr01 59859 60 150M = 59547 -462 CGCCGCGTCTTTTAAGAAAATAGCGGGAGAAGAAACTTCGATTTTCAATAACAATGAAGGTAAATTAAATTGATAAATTTTATATTCAATTGATAGCAATAAATCACGCAAATATGTAAATTGAAATATTTATTTTAAAGTTTCGATAAC FFF:,FFF:F:FFF,FFFFFF:,F,,,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFMC:Z:150M AS:i:145 XS:i:20 MD:Z:24T125 NM:i:1 RG:Z:WT
And after
markduplicates
we have:samtools view WT.md.cram Chr01:60000-60001 --reference Pvulgaris_442_v2.0.fa | head -n1 A01083:294:H3LHWDSXC:2:2553:27751:35556 147 Chr01 59859 60 150M = 59547 -462 NCNNCNCGNGGGGCCCCCCCGCCNCCCCCCCCCCCNGGNCCGGGGNCCGCCNCCGCCCCCGCCCGGCCCGGCCGCCCGGGGCGCGGNCCGGCCGCCNCCGCCCGNCNCNCCCGCGCGCCCGGCCCCGCGGGCGGGGCCCCGGGNCCGCCN FFF:,FFF:F:FFF,FFFFFF:,F,,,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFAS:i:145 MC:Z:150M PG:Z:MarkDuplicates XS:i:20 MD:Z:0C0G0C0C0G0C0G0T0C0T0T0T0T0A0A0G0A0A0A0A0T0A0G0C0T0G0G0A0G0A0A0G0A0A0A0C0T0T0C0G0A0T0T0T0T0C0A0A0T0A0A0C0A0A0T0G0A0A0G0G0T0A0A0A0T0T0A0A0A0T0T0G0A0T0A0A0A0T0T0T0T0A0T0A0T0T0C0A0A0T0T0G0A0T0A0G0C0A0A0T0A0A0A0T0C0A0C0G0C0A0A0A0T0A0T0G0T0A0A0A0T0T0G0A0A0A0T0A0T0T0T0A0T0T0T0T0A0A0A0G0T0T0T0C0G0A0T0A0A0C0 NM:i:150 RG:Z:WT
Then number of matches is identical, however
markduplicates
add 150 mismatches, and the sequence changed in column 10 is the sequence visualized usingsamtools tview
. This behaviour does not affect all the genome regions. Is not clear how this affects the calling process.Markduplicates
should be removed as described in #71The text was updated successfully, but these errors were encountered: