Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genes have "0" coverage in sample.coverage.tsv but definitely not 0 in sample.exon_reads.gct #61

Open
jiaan-yu opened this issue Aug 27, 2021 · 2 comments
Labels
bug Something isn't working right

Comments

@jiaan-yu
Copy link

jiaan-yu commented Aug 27, 2021

Hi,
I have rnaseq-qc process a batch of targeted RNA-seq data, but I find some genes have "0" coverage in sample.coverage.tsv but definitely not 0 in sample.exon_reads.gct. All my samples (>10) have the same issue, I hope I can get some help to debug / understand this.

Metrics
Sample	Seraseq
Mapping Rate	0.995594
Unique Rate of Mapped	1
Duplicate Rate of Mapped	0
Duplicate Rate of Mapped, excluding Globins	0
Base Mismatch	0.00219932
End 1 Mapping Rate	0.995782
End 2 Mapping Rate	0.995405
End 1 Mismatch Rate	0.00164327
End 2 Mismatch Rate	0.00275544
Expression Profiling Efficiency	0.693188
High Quality Rate	0.945726
Exonic Rate	0.696256
Intronic Rate	0.0614665
Intergenic Rate	0.146946
Intragenic Rate	0.757723
Ambiguous Alignment Rate	0.0953313
High Quality Exonic Rate	0.721175
High Quality Intronic Rate	0.0573186
High Quality Intergenic Rate	0.123681
High Quality Intragenic Rate	0.778493
High Quality Ambiguous Alignment Rate	0.0978252
Discard Rate	0
rRNA Rate	0
Chimeric Alignment Rate	0
End 1 Sense Rate	0.180894
End 2 Sense Rate	0.822415
Avg. Splits per Read	0.426095
Alternative Alignments	432393
Chimeric Reads	96219
Duplicate Reads	0
End 1 Antisense	1820735
End 2 Antisense	408876
End 1 Bases	211264741
End 2 Bases	211232455
End 1 Mapped Reads	2820704
End 2 Mapped Reads	2819634
End 1 Mismatches	347166
End 2 Mismatches	582039
End 1 Sense	402098
End 2 Sense	1893549
Exonic Reads	3927121
Failed Vendor QC	0
High Quality Reads	5334217
Intergenic Reads	828824
Intragenic Reads	4273813
Ambiguous Reads	537701
Intronic Reads	346692
Low Mapping Quality	286133
Low Quality Reads	306121
Mapped Duplicate Reads	0
Mapped Reads	5640338
Mapped Unique Reads	5640338
Mismatched Bases	929205
Non-Globin Reads	5640338
Non-Globin Duplicate Reads	0
Reads excluded from exon counts	0
Reads used for Intron/Exon counts	5640338
rRNA Reads	0
Total Bases	422497196
Total Mapped Pairs	2820704
Total Reads	6097695
Unique Mapping, Vendor QC Passed Reads	5665302
Unpaired Reads	0
Read Length	75
Genes Detected	325
Estimated Library Complexity	0
Genes used in 3' bias	250
Mean 3' bias	0.481574
Median 3' bias	0.466667
3' bias Std	0.253506
3' bias MAD_Std	0.244011
3' Bias, 25th Percentile	0.317972
3' Bias, 75th Percentile	0.653061
Median of Avg Transcript Coverage	40.5074
Median of Transcript Coverage Std	17.0874
Median of Transcript Coverage CV	0.577808
Median Exon CV	0.194139
Exon CV MAD	0.132782

An example of gene/exon is

Seraseq/Seraseq.coverage.tsv 
ENSG00000134259.3	0	0	nan
Seraseq/Seraseq.exon_reads.gct 
ENSG00000134259.3_1	NGF	205.873161
ENSG00000134259.3_2	NGF	180.986735
ENSG00000134259.3_3	NGF	299.923486
ENSG00000134259.3_4	NGF	327.935234
ENSG00000134259.3_5	NGF	211.807303
ENSG00000134259.3_6	NGF	254.474081

GTF of the gene

1       HAVANA  gene    119441651       119474455       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1";
1       HAVANA  transcript      119441651       119474455       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1";
1       HAVANA  exon    119474242       119474455       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_1; exon_number 1";
1       HAVANA  exon    119469133       119469234       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_2; exon_number 2";
1       HAVANA  exon    119467269       119467440       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_3; exon_number 3";
1       HAVANA  exon    119466059       119466226       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_4; exon_number 4";
1       HAVANA  exon    119456738       119456802       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_5; exon_number 5";
1       HAVANA  exon    119441651       119441748       .       -       .       gene_id "ENSG00000134259.3"; transcript_id "ENSG00000134259.3"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NGF"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "NGF"; level 2; havana_gene "OTTHUMG00000011880.1"; exon_id "ENSG00000134259.3_6; exon_number 6";

Happy to provide more information, or to share the bam.

Thanks!
Jiaan

@agraubert agraubert added the bug Something isn't working right label Sep 9, 2021
@agraubert
Copy link
Collaborator

Interesting. If I had to guess, this has to do with how coverage windows are generated and extra filtering that goes into alignments used for coverage statistics. I'll look into it as soon as I have time.

@jiaan-yu
Copy link
Author

jiaan-yu commented Sep 9, 2021

Thanks for looking to this! I'm happy to provide the bam file and other relevant files if you need.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants