-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent read assignment with bam file #250
Comments
Dear @airichli The strand indicated in read assignments and BAM files are two different strands and might not be equal in general. The BAM file indicates mapping strand, i.e. whether a read was reverse-complemented during the alignment. Since ONT cDNA data is non-strand-specific (correct me if you use some specific protocol), strands in BAM files are distributed arbitrary ~50/50. Basically half of the reads are sequenced from cDNA with the same strand as the reference genome, half - from the reverse one. Read assignments simple indicate the strand of the matching gene / isoform. However, in your particular case read assignments also tell "antisense". This means the read was assigned to a + stand gene, but other properties, such as polyA or splice junctions tell that the strand is negative. I'll take a look at this case, thanks for the report. Best |
Thanks Andrey for your kind assistance. I attach the testdata.zip, including
I checked the read ff4b5d98-6398-43ee-8b9b-6416e9083ee7 in IGV, and it should be derived from the known gene rather than antisense novel gene, but isoquant just say it is antisense. Thanks again and looking forward to hearing from you. |
Hi,
Thanks for developing this nice tool. I have met a problem, the read strand is different between .read_assignments.tsv and bam file, this leads to novel gene identification (antisense), but it seems to be from the known gene rather than novel gene, could you please give me some advice?
My command line is (the data is generate by ONT cDNA sequencing, isoquant version is version 3.5.0):
isoquant.py -t 1 --reference mm10.chr19.fa --genedb gencode.vM23.annotation.chr19.gtf --fastq test.fastq.gz --model_construction_strategy default_ont --report_novel_unspliced true --data_type nanopore -o test_out --prefix test_out --complete_genedb
In the below example, one is "+", the other is "16", they are inconsistent. Thanks.
In the test_out.read_assignments.tsv file:
ff4b5d98-6398-43ee-8b9b-6416e9083ee7 chr19 + ENSMUST00000113533.2 ENSMUSG00000024790.8 inconsistent intron_alternation_novel:6116267-6118343,tss_match_precise:2,correct_polya_site_left:6115998,antisense 6115998-6116266,6118344-6118561 gene_assignment=inconsistent; PolyA=True; Classification=novel_not_in_catalog;
However, in the bam file in aux folder
ff4b5d98-6398-43ee-8b9b-6416e9083ee7 16 chr19 6115998 60 106S88M4I26M1I2M1D52M1D38M1D60M2077N160M1D57M135S * 0 0 TCACAAGGTTTGAATCTTTGTCCCTCTCACTTGCCTGTCGCTCTATCTTCAGAGGAGAGTCCGCCGCCCGCAAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTGGAGATCATAATGAGTTCCTTTTATTGTTTGGGAATTATGAAAAAGAAGCCAGGTGCTCTGACCCAGGCCTTGTGGGAGACGAGCAAGACCCTCTCAAGCTGCAGAGCCAGGCTCGTGAATGTCACCTTCCTCAGCCATGACCACATCTTCCAGGGTGTGTCCCTGCACTTATTCCCCACTAATATGTGGTAGGCACCAGGGGGTGGAGTCCTTCCTCAGAGTATTGACCCCTCAGGAATACAACTCTGTCTTTATCCAAGGTCAGTCCATGGGTAGTTTGCAGCCGGACATGGTAGGGGTGTGGGTAGAGGACGCACACTCAGCATCGCCGGGCCAGCGGAAGCGCGGCACTGCATCCTGCCGTGGGCTCGAGGGACAGACTCGAGCCCCAGTCTGGCCAAACGCAGCCTGCCGCGGCTTTCTCATTTGCCGCTCCCGCGGCCTTTAGAAACGCCTCCCATAATCCATCGCGCCTGGCTTCCCCAAATGTGAACTCGAATGTTAATCGTAAAGCAATATCAGCACCAACAGAAAGAGAGGACAAAGGTTTCAACGCTTGTCACGGTTACGCCTTCGCCAATTGCTGGATAAAGACCGGTCTCAAGGGTTACATAGC $$&%&&'$#$$%%)&%%&&'&&%%&&%$%%$####%''&''&%&&''5799:9>=<=<=>E988+(((1*');:666:CEOSSDCDECEECCBEHHJJSLHSNFDEFIDBFIFGA>>IGGGEHGEIGLKS<<;4435***=>CBDDBDBCIHLHKSGCCBDDCDFBFEDFC==>H@C;3323237899::@9854490--.:>FJIEDECDGDMLFGGHGGCDAHK+++-<786AA??@@/.013>-,,,,CC>:78<=::;>MQSSSISGKSCDIEEC==<55;<@=>@=:<866899DHSSB:::;FHPJJGC>=>CIRSHGEDFHMGLOKNOFB@ABHK?812<>=GLIKFDCC>HDMNOPCBCCHDBAAF>B<856766;<>FHG@?@A?644335=EFCGDFJKGGIKSSHCAC??8766111142229@B?>))),*('(%%&$$$224BOGIM??>?>7311,/-,,++$%&$%%)4113-@CFSGQLSFHHFBC@?AAFCCB==JSHEE?IE9645=>=>CDB=EHJIKGLSKIJEIJSLGQLMHIGKMKIJSISGI>FH@AFEIHIHSSSSSSSSSSSSSSSGGFKSSSE;>@96--.2<==<<9788ACC?CCA@=DCD9333>?CGECB?3222333846541-,,,+)(%)+,.011-,%$ NM:i:10 ms:i:459 AS:i:423 nn:i:0 ts:A:- tp:A:P cm:i:125 s1:i:457 s2:i:0 de:f:0.0143 MD:Z:116^T52^A38^G61C158^G57 rl:i:39
The text was updated successfully, but these errors were encountered: