Comments (7)
Thanks for providing the data. I will look into it!
from ultra.
Hi @zhixue ,
I tried your reads mapping to chr1 assuming it was human, but since all my reads are unaligned I'm assuming the reads come from another genome?
I also tried BLAT'ing your reads but it didn't find a good match to an organism. To debug this it would be desirable to have e.g. chr1 and the subset of annotation (annotations to chr1), if possible, so that I can reproduce the error.
Best,
Kristoffer
from ultra.
Thank you~
Here are the links of the rice genome (all.con) and the gene annotation (all.gff3)(http://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/).
You can also download the index I have built (msu7_ultra_index.tar.gz).
My command is
ref=all.con # rice reference (fasta format)
th=4
# ultra
uLTRA align $ref ${sample}.fq ${sample} --index ~/ricerna/ultra_bam/msu7_ultra_index --ont --t ${th} --prefix ${sample}_msu7 --use_NAM_seeds
from ultra.
Hi @zhixue ,
I get output different from yours. I believe it could have something to do with your GFF3 file not being a GTF file (which is what uLTRA wants). Or it has not been converted properly.
Specifically:
I downloaded the rice genome and GFF3
annotation and made a bug_reads_chr1.fa
out of the reads you sent that were mapping to Chr1.
Specifically, I used AGAT to convert the GFF3 to a GTF:
agat_convert_sp_gff2gtf.pl --gff all.gff3 --gtf all.gtf
Then I used only Chr1 in the fast genome file and subsetted the GTF as:
grep "Chr1" all.gtf > chr1.gtf
Finally, I ran:
./uLTRA index --disable_infer chr1.fa chr1.gtf ~/tmp/ultra_chr1_bug_rice/
./uLTRA align chr1.fa bug_reads_chr1.fa ~/tmp/ultra_chr1_bug_rice/
Output:
@SQ SN:Chr1 LN:43270923
18c45dd8-691e-4b68-8f80-9e07874c3c6c 0 Chr1 7683962 60 1=1X3=1X1=2D1=1X1=2D5=1X1=1X3=2I2=1X15=1D5=1I10=234N1=1I19=1D25=1X11=1X1=2X6=1X5=2D1=1X3=1X21=1I3=781N2I2=2I6=1D23=1I13=1I44=122N4=2X12=1I1=1X13=1X27=1X1=1X17=4D8=1X2=1D1=1X2=1D4=1X3=1X1=1X1=1X5=1X9=3D4=1I15=1D89=1D1=2X13=1D29=1D15=1X2=1X2=2I1X6=1I18=1I5=1I1=2X2=1I39=1I14=1D1X4=1I4=2I1=1X8=2D23=2I4=2I1=1D8=1D1=3X1=1X2=2D2=3D2=1D2=1X1=1X1=1X1=1I2=1X3=3D1=1X2=1I3=1X1=2X1=1X2= * 0 0 AACTTTCTGTTGGTGCTGATATTGCTGGGAGATATACCATATAAGAGGGATGCAAAGGAGTGTTTTGTGTGAAATACACTATGATGGAACCTATTTGGTTTCTAGCAGTGATGATGCAGATCTTCAGCTGTAGGTCAAAAGCTTCAGAACAGTTGGGAGGTTTACTTATCTTCCAGAGAACGCAGAAAACAAGAATACCCTAGATGCTGTCAAAGGAGCGGTATAAGCACCTTCCCGAAGTCAAGCGGATTGTGAGGCATGAGCACTTACCTAATGACAATCTACAAGGCAGCCAACTTAAGGCGAACGATGATTGAAACGGAAAACCGTAAAGAAGAAGGCGTGCACAATTGCCCCAGGAACGTACCTGTACAGCCCTTCAAGAGGAAGAATAATCAAAGAGTAGAGTAAAATGGAAGAATATGTGTGGCACAAAACCAAACGCCGCGATTTTACAGGTCTAGCTCAAGACGGGGACATGTCTAATCATCTGCAAACCATTCATCGAGAGTGACGTCTGTTGCAAACAGTGTTTGAAAGCAGCCCCAGCGGCGGCTTTGAAATCAGAAGGGTCTATGGTTCATCCAGTTCTCGAAGAGGGCTGAAATTTTTGCTTCTGCTCAATTTGTTGTATCGTTTTTGCTTGTAACGGTTATGGAACAGCCTCTAGTTATATCTCGTATTACTTTTCTGATGGAAGTATTACTGTGTAAAAAAAAGTAAAAAAAAGAAGATGAGGCGACAGGCAAGTAGGTT * XA:Z: XC:Z:ISM/NIC_known NM:i:113
de6aa4f9-3b41-4b93-b88a-c21e476a0638 0 Chr1 14019109 60 2=1X4=1I1=1I1=1X4=2D3=1D1=2I3=5I1=2I8=1X6=1X2=1X2=2D8=1X12=1X12=1I1=1X21=1I3=2D1=1I1=1X1=3I9=1D5=2D4=1X1=1X4=1I1X3=2X14=1I1X13=1X1=1D5=2D3=1I3=1X6=1D9=2X7=2X12=2D14=1X4=1I3=3I1=4D1=1X5=1X1=2X5=1X3=1I1=1X2=1D12=1X6=1D14=1I8=1D9=2X1=1D6=1X5=2D1=30S * 0 0 ACTACTCTGTTGGTACACAATGTGCTGGGGATGCATCCCTCAGATGTAACTCAGCCCCGACCATCGCCTGAACCAGCAAAGTTAAATACTCAGAACCTGGAAAAAACCCCTGATTTCATTTAACAAGAAAGTAGGTGAACACAGTTTGTGCTGTAATCCTGTGTTTGCGTAAGTAATTAGACTCCAGGACAAATGCCAGACAGGTGAAAAAGAACCACAAGAACATAAACCCAAACTACAAGGAGGAAAAACATTTCAAGCAAATTATTTATATCAAATTCAGTTCCAGTAAGCATCATAGTGCATATTCAAGCCATGTTGGGAGCTTCTTTAAGTTGTCGTATTTTATCATTTGCGCAGAAGATAGGGCAGACAGGCAAGTAGGTT * XA:Z: XC:Z:NO_SPLICE NM:i:84
8c963a72-683e-409b-b389-0962569c0026 0 Chr1 21071376 60 3=3X1=1X2=2X2=2I2=2I3=1X1=2I3=1X1=5D1=2X1=3D4=1X3=2D1=1D3=1X4=2D2=1X2=1X1=2X1=1X2=1X6=1X2=3D1X42=1X15=1D13=4D21=1I14=1D22=1D20=1I1X16=3I11=1I63=1I20=1X15=2X31=1D39=2X8=4D26=1X16=1X9=1D10=1D1=1D1X8=554N30=1X10=1D22=2D5=80N19=1I1X71=141N73=7D2=2X2=2X2=1D3=2D3=1X1=1X2=1X1=3D2=2I1= * 0 0 AACCCTACTTGCCACTGTCGCTCTATCTTCTTTTTTTTTTTTTACTTTTTTTTTTCTCTTTTTTTTTTTTTTTTTTTTGCTTGAAACCTCGGCAGTTGTCACTTCATTATCGGATCATGATGAAAAACATTCTAGTTTAAGATGAAATAAGTAGAGTTGCAGACATAAAGCATGATCTGCTCTTCATCATCACCCTGTTAGGACTTGGAGAGCATAGATTTGTACCACCAAATACTATCTTATTGTAGAGAGCTGCATAAGATGGAACAAATGTGCTCATGTACACTGGGTATAGGATTGGGCACAAATATATATGGATATCACTTATTGCATCCTTATCCTACAAGGATTAGTAACACTATGACATGCATTCCGTGCAACTTCCAATGCATCCTTCAGGAAACTCTGATGATCATGATGATCTGTCTTGCCCGCACATGACTACCCTCCCAAGTGACTTGACAGCCGAGGTACATTGCTTGCTATCCTCCATGTTTGTTCAGGGTATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCAGTTGGTACATCAGCAAGGCTCCATTGCTGTTTATGGCCTGCAGGTTGTCTTGCAACAAAAAAAACATGCGCCGATGACAATGTAGCACAGGAGGAGGAGAATCCCCTTCAGGTAATGTGATGTTCCATCCTGCAGGGTGAAGGCCGTCACCAGCACCGCCATGAACAGAGAGCCGGTCTCTAGCAGCTTGAAGTCAAGATCCCCAGCAATATCAGCACCAACAGAAGGTT * XA:Z: XC:Z:ISM/NIC_known NM:i:102
83e2e624-7264-49db-ae29-df140e4f8fe7 0 Chr1 21071443 60 2=1X2=1X3=3X3=2D4=1D2=3D1=1X1=1X2=5D2=1X1=3D2=2X2=1X1=3D1X1=2X2=1X2=1D183=1D7=1X15=1I14=2D12=3I3=2X29=1D6=1D45=1X31=1X4=1X3=2D3=1I6=4D21=554N35=1X1=2D2=1D26=1X1=1D1362N4=4I3X2=1X4=1D2=3I1=2X1=1X3=34S * 0 0 AACCTACTTGCCTGTCGCTCTATCTTCTTTTTTTTTTTTTTTTTGATGAAAACATTCTAGTTTTAAGATGAAATAGAGAAGTAGAGTTGCAGACATAAACATGATCTGCTCTTACATCATCACCCTGTTAGGACTTAGGAGAGCATAGATTTGTACCGCAAATACTATCTTATTGAGAGCTGCATAGATGGAACAAATGTGCTCATGTACACTGGGTATAGGATTGGGCACAATATATGTGGATATCCTTATTGCCATCCTTATCCTATGGATTAGTAACACTGCCTCAACATGCATTCCGTGCAACTTCCAATGCATCCTTCAGAAACTCTGATGATCATGATGATCTGTCTTGCTGGCACATGAAGAACTACCCTCCCAAGTGACTTGACAGCCAAGGTGCATTTCTTTGTGCCTCCATTGTTCAGGAGACATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCGGTTGTTATCAGCAAGGCTCCATTGCTGTTTGCATAGCTGCAGGTTGTCCTTGCAACGAAAAAAACATGCGCCCCCAGCAATATCAGCACCAACAGAAGGTT * XA:Z: XC:Z:Insufficient_junction_coverage_unclassified NM:i:76
5e38f5cf-3804-43dc-be3e-8f9c41b52a5e 0 Chr1 21071470 60 5=1I1=1I1X2=2D1X4=1X1=1X2=2D4=2D1X2=1D2=1X2=1D26=2I34=2I15=3I7=1D6=1D18=1X9=2I31=2D61=1I8=1I3=1I37=1D39=2X5=1I15=1X4=2I22=1D2=3D24=1I2=1X3=1D13=554N71=1362N4=7I1=1X2=2X4=1D2=1I1=2X1=1X3=34S * 0 0 AACCTACTTGCCTGTCGCCTCTATCTTCTGATGAAAACATTCTAGTTTTAAGATGAAATAGAGAGAAGTAGAGTTGCAGACATAAACATGATCTGCTCTCTTACATCATCACCCTCCTGTTAGACTTAGAGAGCATAGATTTGTACTGCAAATACTTCATCTTATTGAGAGCTGCATAGATGGAACAAATGCTCATGTACACTGGGTATAGGATTGGGCACAAATATATATGGATATCCTTATTGCATCCTTTATCCTAGTAAGGGATTAGTAACACCTTGACATGCATTCCGTGCAACTTCAATGCATCCCTTCAGGAAACTCTGATGATCATGATGATTCGTCTTGGCTGGCACATGAAGAACTACCTCCTCCCAAGTGACTTGACAGCCAGCATTGCTTGCTGTCCTCCATGTTTCTGCTCAGAGACATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCGGTTGGTACATCCAGCAAGGCTCCATTGCTGTTTGCATGGCCTGCAGGTTGTCTCCTTTGCAACGAAAAACATGCGCCCCCAGCAATATCAGCACCAACAGAAGGTT * XA:Z: XC:Z:Insufficient_junction_coverage_unclassified NM:i:62
b2e68eb0-9afd-40b9-a0b5-f83638947925 0 Chr1 22120317 60 30S2=2X1=1X3=1I1=1I2=3I2=1D1=2X1=1X1=2X3=1I2=1X1=2X1=2X1=2X4=1X1=1X3=1I6=2D9=1D8=1X3=1X2=1D8=5D9=2X7=1X1=1X2=1I5=2I2=1X3=1X2=1X1=2I4=3D5=1D3=5X3=1X2=1I1=1X2=1018N5=1X1=2D4=2D4=1D2=2X1=1D1X8=2D1=1X2=2I1X2=1X2=2I2=1I10=1D6=1D5=2D1X15=1I1=1X3=2X6=2I2=1D2X5=1X1=2I8=1I16=2I1X2=1I1X4=109N6=2D4=1I11=3X2=1X1=1X3=3I7=3D2=1I7=1D8=1D4=121N3=1D1X8=1D4=2X11=1D5=3I2=1D12=3I1=1X7=1X1=3I30=4D6=997N14=2D1=1I1X3=1I3=2I14=2D4=2I10=1D1X4=1I7=1I1X1=1X1=89N22=1D10=1I11=1D9=1I6=1I4=103N8=2D7=2I8=2D1X2=1X1=2I15=2X15=2X126N6=2I15=2X6=3X16=84N4=1X9=1D10=1D6=1X11=1D3=1I3=1I2=2D8=77N3=1D3=2I6=1D7=1I1=1X8=1D25=88N8=1I25=1D1X23=1I6=1X9=1I21=1I13=1D29=4D1=1I2=1D13=75N5=3I2X19=1D12=2D27=1X84N3=1X1=5I15=2X1=2I1=1X11=1X2=1X1=2I1X1=4X11=2I11=2I23=560N7=1D2=1X1=4I6=3D11=1I8=1X4=3D11=3D2=3I7=2D6=548N8=2I5=2I6=1D18=2D2=2I11=1D1=1D1X6=1D4=1X6=3X4=1D8=2X42=2X1=2X10=1I8=1X1=1X1=1D605N24=1X13=1X1=1X1=2D7=2X14=1X14=1X33=3I8=1D84N12=1D2=1X1=1X37=1I8=1X2=2I3=1I9=1D3=1X14=1D18=1X12=1I2X11=1I2X1=2I3=3D1=1I2=1X8=1I2=1X13=2D7=1D3=2I5=1D4=1X4=2I1X16=4D4=1I3=2I1=1D3=2X1=2D7=2D2=1X6=71N3=2X15=3I2X30=1I2=6D6=1D6=2X11=1X2=1X3=2I4=90N7=2D2=1X5=1D13=1I4=1D1X21=1X14=1D9=1I9=3X13=3D1=1I3=1D5=2X9=1I13=2I1=1X1=1X9=2I2=1X1=135N3=3D1=4D17=3I4=1D3=3I2=1X10=2I3=1D8=2X9=1I5=3I1=1X6=2I1=1X5=5I2=2X9=126N5=1I4=1X13=1I11=2I22=1I1X2=1D2=1D1=1X10=1I1X11=1I8=86N4=1D5=1X5=1X2=1D4=1X12=2D7=2I1=2I15=3X13=1D12=2X1=1I3=94N5=1I1=1X18=1X2=1I10=2D20=2X12=1D4=1D15=2D6=3D2=1D10=1D14=3X11=3D1=1I12=1X14=3I15=1I21=2D7=1I1X2=2D13=3D8=1D1=1X2=1I2=1I18=1I4=2I19=1D4=1D1X3=1X1=1D1=1X8=4D2=1X6=1X9=2I3=1X2=1D4=1X1=1X1=3D1=1X1=1X7=2D1X5=2I30=1D9=1D17=5D11=1X7=2X14=1D1X19=7D2=1X1=2X2=1I1=1X5=1X1=2X3=1X * 0 0 GTGCCACTTGTTCATTTATATTACTGAAATACAAAATTCATCCAGCGATACTACCTTCCCTATTAGTAATATTAACAGGGACGCCACGCACCTTTCCTTCTTCCTCCATCCACTCCTCCTTCCCCGCTTCCACTGCCGTCACTGCACGCCGACCCAACCACCACTTGGGAGAGGTCCCTTTAACATATCTCAAGAAATGAACCTGTTTCACTTATTTATTGATAAATTAAGCTTTGATTACAATGTTTCTGATTGGGAACTGTGAAGGAAGTTAAAATTAGGCAGATGTGGAAGCACCATATACTTTGTTGTGTCCAATAATGGGTCCACACTGAGCCAGTGGAAAGAACACTTTTTAAATCACGCCTTACTCACTTTCAATTTCGAGAATGGATCCTTCAAAGAAGGTCTAGACTACCAAGGAGTATGGATGGCGAGGCCAAACAAATATTGAACCAGCGTCCACTCTTATGAAAATGGATTTGGAAGGCACGGATGGTCGAGAGGAGAGGATGACACAGCCTTACCAAGACAAAGAGTGCACTGTTTGCTGGCTGTGTTTCAGATGTGTGTTTGATCAGCTGTGTGGTGTCATGACATTGGTAGAAACAGGCTGCAGAACAAACCTCTTTGAAGACTTGTTTTCCCAGGTTATGATGATTGTTCCCAGTCCACCAAGCTGGACATTGTTGTTTGTATTTCGTGATAAATCAAGAACACCTCACTGGAAAATCTTGAATAGATTCTGGAGGAAGATATTCAGAAGATATAGGATGGTGTCCTAAACCCCTGCCCATAAGGAAACTCCCTGAAGTAGATTTTCAATGTCAAGATTTGTGGCCTATCTGAATTATGAGGAAAGGAGGAGTTATTCAAAGAGCAGGTTGCCAGTTTTAAGAGATAGGTTCCAACAGTCTGTGCTCCTGGTGGACTTGCTGGAGAATCGGCAGGGCGTTGTCCCCTGCATCTGGTTTCTCTTTCCAGTTCACAACAATTTGGAAGGTCATTAAGGAGAATAAAGACAGTCACCAGCTCATAAAGTTATAAAATTCGCTACTGTACGCTGTGAGAAATCGGTAATGAAAATTGCCAGTTTTACAGCTGATGAAGAGGGTAAAAGCAACAATTTGAGGAGACTATTTTCAACATGATCACATTAACACAAGTTTGGGAAGAAGAGATCAGCAACAGCTTCTTGACAGATGCTTATCAGAGTATGACTGCAAGCTGGCTATTTTTGATGAAGAGTGTCAGAGCTTCAAGGCACCAACAGTCTTCTAAACTTGCAGCTTGTCAACCCCCTGCACATACCAAATATTTTGGATCATTTGGACACTAGAACTTTGCGTATTTAGGAATCCTTTATTAAGTCCTAGAAAGAGAGGGTTTTGCTGTTGCTGCTCGTGACTGTACTAAGGTTTTCCAGAGGAAGTTTGACAAAAGGATCAGGAATGCTGCTATCCAACAAGTGAAATAGGACCCATCAAAAGTGAGATAAGCAAAAGCGTGACATTGAAGCGCATGTGGCATCGGTTCGTGCCAAAAAACTATCTGAACTTTGTTCCAAGAAATACGAGGACAACTTACCAGACGCTGGCAGAACCAGTTGAAGCTCTTCTAGATTCAGCCAGGTGAAGAAGCCTATGGCCCAGCAATTGGAGGCTTCTTCAACGCGGACAAAATCTGCTGTTTCAGGTTTTGAATCTATACATGGCATCTTCCAGGAAGCAGTGGGGTGACTCAAGGAAGAATTACTTTCAGCTGGAAGTCACACGGAAGACCGTTTATTTGAATCAAAGGCAAAGAAGACTGTTCAGGAACGATTCGGGATGGACAGGTTACCAACACTATTCAGCCCAAACGATGCTGACTCAATGCCAAGAGTGTGGACCTGGGGACATAAGGCCGATACTAAAACTGGTCATTCAAAGCTTCTATGATTACTTTCACAATGGCTGCAATTTCGGTGGATGAGGATGGTGACAACAGTGAGAACACCCTGTCCTTGCTCTGAGTTGACACCTTAAGGCCAGGGACTGAATAGAGCAATCAATCGTTCTGATCCACTTGCCTCACAAACTCATGGGAAGAAAGGTTGGAAAACTTTAATTACACACACTGTCAGTACTGGAAATCTTTATTGGGAGCAATTTAGAGCTGAAACTAGAATACACTGCTGTCACCCCCAGGCCTGCTCAGCGCTGCTCAGGAGGCAAAATCAGAGGAACAACAACCTGGCTGCCACCGCTCCATGGGCACTTGCCGCAATAAGCATCATGGATTCAACAAAGTTCATGACAGTTGTTAAAGAATCCCTTCACCTGAGATCATGTTTGTTGTTTTCTGTTGGAGGAAGAAGCCATGTGGGTACCTTTAGACATTGCCAAGAGTTCCAAAATCAAGTTTCTTCCAACCCGTCCTATCACTTTCGATGAGAATTCGTTCCCGATAATGAACATCCTGAAGGAATTAGCTGATGAGGGCGAGACCTGCAGCCCCAGAGGCGAAGATGGAACTCAACCAAAATCCACTGAGAATGGTTCACGACAACGTGACATCAGCAGGGTCATCCAGCCACATAACCTCTTCGGAGGAGTGGACCCGAATATTCAAGCGATTGCTGCAGATCAGTTCATCTGCTTTTAGTAAAGAAGCCCCAGCAGTTGTAGTTTACAACTTGGGTGAGCCTTTGGAGACCAAGCACTTACGATTTTGTAAGCACTTCCCAATCTGATGTACACGCAGTTCATTAGCTGACAGAGGTGTTATGGAGGATTTTGGGCCCAGGGAGTTTGTTTGTAAAAACGCTGTGAATATGTTTTATTTCATTTTTGTAACAACTTATATATTGGTTGTATAGAAGGTTGCGGTTTTATCATCTCAGAAGATAGAGCGAGCAAATGAGTT * XA:Z:LOC_Os01g39310.1 XC:Z:FSM NM:i:579
51b240c9-90a7-422e-9e11-0ebe73cb4efb 0 Chr1 24375913 60 1X5=2D2=1X1=5I2=3X1=2X3=6I1=1X32=1D1=5D2=1X11=1X5=1X2=1D8=1D53=2D7=1I8=1D2=1X3=1X5=4I35=3X24=2I12=1I14=1X13=2X21=194N20=1X5=2X3=1I1=316N24=2X5=3D1X9=5X3=1D21=1I20=2D2=1X2=78N1=5I5=2I1X2=1D5=1D6=1D19=1I12=2X17=144N11=1D8=2X5=1X1=1I3=183N38=1X2=2D2=1X2=1X1=1D2=1X1=2X1=1X2=2I1=2I1=1X2=1D2= * 0 0 AACCTTTCGTTGGTGCTGATATTGCTGGGGACCGCTCTCTCTCTCTCTCTCTCTCTCTCTCAGTCTTCCTCAAGAAGAAATTCATTCTTCCTCACGCGCTCCGCGTCGCCGCATTCGGGGACGCCCACCGCCGCCGGATCCGGCCAGCCTTCCCCTCCTCCTCCTTTCCTGGCTGCTCCTCCTCCTCCTCGGTAAGATCCGGCGCCGGTTCCATGTCGGTTACCGGCGAGCAGTCGCAGGCGAAGTTGCGGAGGCCGAGTGAGGCGGTGGAGCTTGTGCTGTTCCAGGCTGCTGAGTGCTACGTCTACCTGATACCTCCCAGGAAGACAGCTGCCTCTCACAGGGGCTGATGAATGGAACGTCAACAAAGGGGCTAAAGGGACTCGGGCCGTTTCAGCAAAGGAGAAGAGTGCACTCATCAAACTGGAAGATAAGCAAAGGTAATAGGAGCCGGGTCGCTAGGCATTCTCAGAGAAGACGAACCACAATCCGGTGGAACTCGTTATTGATAGCAGCAGATATTTTGTACCCGTGTTGGCGAGAACAGTAGATGGACGTCAGCGCCATGCCTTTATTGGTTTAGGCTTCAGTAATACGTAACCGAACGAAGTGATACCAG * XA:Z: XC:Z:ISM/NIC_known NM:i:106
58f8f518-0121-4e61-aa92-114986d43317 0 Chr1 26395513 60 53S10=1D1=1X3=1D58=1D9=1I2=1X4=2I12=1I18=1D3=1D1X3=1D111=1X1=2I16=1X17=1D6=3I13=1D4=2X2=4I1=1D21=1D1X1=1D9=1D2=1D1=1X20=1D14=1D9=1X16=1I46=1D9=1D23=1I1=1X4=1I22=1X28=40S * AACCGCTTTCTGTTGGTGCTGATATTGCTGGGTGTATGTTGCAATGATGGCACAGATTGAAGTCATAGAAGAAGGGCCACTCTTACTCGGAGATCATCAACGAGAGTTTGATCGAGTCTGTGGATTCCTGAACCCGATGCATGCACACGCTCGTGGGAGTGGCCTTCATGGTTGATACTATCCCAACAGCCCGGTTGGGATCAAGGAAATGGGCGCCACGCTTCGACTACATCTTGACCCAACAAGCTTTTGTGACTGTTGACAAGAATGCTCCAGTTAATCAAGACCTCATCAACATAATTTCTTCTCCGATCACGTCCATAGTGCCATTGAGTCTGTATTGCTCAGTTGAGGCCACAATTCCCAGCATTTCAGTGCCTGCTGATGCCATTTGTGCGTCAAACTGCGACAATCTTCGTGAAGTTTTGCCAACTCCAGGGCACCTCGTAAACTATGTCAGTATCCTGACGGTTTCAACTCTCAAAGCTATGTATCACTATTTGAGAACCTAATGTAAACTTTGGAGTGTAGGATTGTCGTTTAGTTTGTCCACTGGACTAGTGCTGGTGTTTAGCATGAATAAAAATAGTGGTTTAATAAAAAACACACAAAGAGATGGGAGCGACGGAGCAAGGTGGGGT * NM:i:45 ms:i:414 AS:i:411 nn:i:0 tp:A:P cm:i:79 s1:i:343 s2:i:0 de:f:0.0681 rl:i:0
8329a277-53b0-4be3-a211-9063c8db22f9 0 Chr1 37028734 60 23S20=5I4=1X35=1D38=1D6=1D17=1I4=556N5=1D35=1X29=1I43=2X2=1D1=1D16=1D19=2D51=1D711N3=2X21=1X47=255S * 0 0 AACCTTTCTGTTGGTGCTGATATTGCTGGGACAACATGAACTCTCCGGGAGGCTGATCCTTCTCATCCCAAACCTGCAGTCGGTCATCATTACCATGGCCATCTTCGGATGGCATCCTTTGTCTTTAAAACATCTGCAGACACTTTTTCTGAACCTGACTTACACATACTACACGCTCTGCAAGGTTATGAGAGTTTATCAAGTTGGCTTCTGGATCAAGAAGCTCTCCACATCTCAAGAGCATAATCTCTCTCAGCCATTGTACATATGTAACTCAAATCGTTTGCGTCCTTAGCAGTCAGGTAACTTCAAATCCTCCCAAGCAGGCCTTAGCTTTACAAAAACACTAGTATCGCGAACTCCTGATTTATGCGAGTTAGAATCGTATTCCTTTCTGGCAGCCTGATAACAGGCCGGAGAACACGCTCCTGCCAAGATGGGGTTCAATGGACTGAAGCAGCCACAGGACCGTGCTGATCTCATCGCATACCTGAAGAACGCTACAGCATGAGAGTCCCTGCTGCCATCTTCCCAAGATGAAGAGAAGCTATTGCTCGTAGCCTCATACTGACTTCTGTAACTTGTAAGGGTGAACTTTGCAACAAACTGTGTTGTAAATTTGTAATAATAAGCAGTGGCTCAAAGACAGAAAAAAAAAAAAAGAAGATAGAGCGACAGGCAAGAGGTT * NM:i:24 ms:i:344 AS:i:277 nn:i:0 ts:A:- tp:A:P cm:i:67 s1:i:324 s2:i:0 de:f:0.0458 SA:Z:Chr1,38446047,+,425S214M2D49S,60,7; rl:i:0
f967c0d5-5295-4c74-aa07-d6e793e21ea3 0 Chr1 37028706 60 1X1=1X2=3I2=1I3=1X2=1D1=1X4=8D28=4D28=1D1=1X26=1I1=1I16=1X21=556N22=1X22=1X1=2D1=1X38=4D1=1X13=1X76=1I5=1D15=1I2X3=711N9=2I4=2I1X18=1I13=1I23=1D31=1I16=1I20=2I18=1X14=2X13=1X4=1I17=2X2=1D1=1X15=1I17=1I12=1D65=3D2=2D52=1I10=1D41=1D7=1I29=317N1=1D45=1D15=1D1X2=1X34=2X16=1D1X30=1D5=1D12=1D10=1D3=1D14=2D18=1D27=1I1X35=4D4=1X6=1D2=1X18=2X1=1I1=2X6=1I2=1D2X1=1X1=1X2=2X2=1X1=1X2= * 0 0 AACCTTTCTGTTGGTGCTGATATTGCTGGGACAACATGAACTCGAGGTTGATCTCATCCCAAACCTGCAGTCGGTCATCACTACCATGGCCATCTTCGGATGGCATCTCTTTTGTCTTTAAAAACAGTCTGCAGACACTTTTTCGAACCTGACTTTACACATACTACACGTTCTGCAAGGTTATGAGAGCTTACCGCTGGCTTCTGGATCAAGAAGCCTCCACATCTCAAGAGCACCCTCTCAGCCATTGCACATACATAGACCTCAAATCGTTTGCGTCCCTTAGCAGTCAGGTAACTTCTCAAATCCTCCCAAGCAGGCCTTAGCTTTTACAAAACACTAGTATCGTTAAACCTCTGGATTTCTATGTTAGAGTTAGAATCGCATTCCTTTTCTGGCAGCCTGGATAACAGGCCGGAGAACACGCTCTGCCCACCAGAGATAGGCAGAACTTCCTCCTTTTTGGGTCCCAACGATTTCTGCCATTATCTGTAACTGGGTATCTGTGTCAATGAACCCCTTGAGCAGCTCCTCGTCCTCAATATACTGCTTGGATTTCTGCAGACATCCTCGCACCCTGACGGGGTCGTCTTTCAATATCCATCCTACGGGGAAAGCATCTGATCCGGTCCTCAAAGGACTTCATTGTGTTAGCGACAATAAGGGTTTCATCGAGATCAAATACGACAGCACCGCAAGTTCAGCATCCCAACAGATGCTGCATAGATCCCAGAGCGTACTGGAACAGCACCAAAGCATGGCACCTTCTCCACCTTGCTCGGCATTGCCACTAGTGTAGCTTCCTCATCTCCAACTACTACCACTGCGCTCTGTACTCGTTGAAGCAAGTGAGGTAGAGGCGGTGTAGGCTCGGGTGGGAGGCATCAGCTTACCCGGAGCTTGCATCGCACGGAGAAGGGCGCGATGGTTCGCAGGATGGCGAGCGGTGGCACCGCTCGCTCGTCGGCGAGAGGTGGCGATGCGATCTCGTTGCTGGGAACGGCAGCTCCCTCGGCCCCCTGTCATCGGGAACACCTCCGCTCGCCGAGGAACACGTCGCCGTGGAATTATCCGCATGGTGACCCCACCGGCGGTGGCACCGGCCCTCTGGGGACGCTGCCTGGGGCGGGGCGCGGCCCAGCAGCCATCAGCACCAACAGAAAGAAGTT * XA:Z: XC:Z:ISM/NIC_known NM:i:116
7d39ea74-6ac4-48c3-86ab-6b0477548b5f 0 Chr1 38740059 60 20S1=3D5=4D1=1X4=1X2=4D3=15D1=1D4X5=1I8=2I10=2D1=4I46=1D4=2I26=2I13=1I25=1X46=1D5=1X2=1X16=1X3=1I3=1X1=1X19=2I1=1D13=1I1=1X2=2D1=1X1=2X27=1X10=635N4=1X41=1X2=1D1X2=3X5=2I2=1X1=1X1=1X4=1I24=1I24=2X24=1X3=1D11=2X29=1I1X2=1X1=1X8=4I1=1X2=1D4=1X1=1D1X7=1I5=1X2=2D6=1D4=3D258N11=1X2=2I1=1I1X39=1D2=1D1=82N5=1D1=1X10=1X6=2X12=1X5=1D7=1D8=1X3=1X9=1D2=3I2=2I4=1D13=131N1X12=2D3X16=90N7=2I2=1X12=2I6=1D15=2X3=1D3=1X1=1D4=221N1=1I1=1X28=2D8=1X8=2D6=1D11=2I16=1D3=1X5=1X4=252N1=3I1=1X13=1X7=2D1=2D1=1X4=1X4=1X8=1I1X9=1D2=2D2=1444N1=1X3=1I2=1I1=4I3=2I2=1X2=1I2=1I1=2D1=1X2=1I1X2=1X1=1X3=2D46S * 0 0 AACCTCACTTGCTGCCTGTCGCCTCATCTTCTTTTATTTTTTTTAATAGGGTTAACTTATAGGATATTCTGGCCAGTCTAACCACTGAACTATCTGAATATTGAGATGCTGATGTTACACTGGTAAAAAAGTGAATCCTGAACAAAGTTTATCCCCTCTGTTCATACTTACCTTCATCTTGCTATGAATATCCCATTGCCTTACACAGGGACCAGGTTCATATGGTGCTGTACGGCCATACCAGCTACTGGTGCAGCAGCTACCCTTCTCTTGCTCCAGCAAACACCTGAAGCGTAAGCAATCTTGACGTGCATCACCTCCGTCTTCTGCAATCTTCTCAATAGGAGTTCATAGTCTCTCTGCGGCAGCTGCAGCACGTGGCCAAATGGTTTGTTCAATATCCGACGCACACCCTGTTCTGGCTCAATATACAAAACCTCACCACCAATGACCAAACTTTTGCTGCTCTGGGTCGTCAATACTCTTCAGAGGTTCATTCGTGTAAAACTCCTCCATGTGGCATTCAAGTGATCTAGGTACCACTTATCCTGGTTACCTCATGATGCATCCTTTTTAGCCTGTCAAACCACTCTTTGGCGCCATCCTTCCACAGTTATGAACTACTTGTCTTCTACGGTCCAATTTGTCTCCAAAATTGTTAAATGTTTCTCCAGTTACGACATCATAATCGTGTGGAATTGCTAGCTTTGGGGATTCAACACAAATATCTATATACATCTGACAATATATCTTATTTGTTATCATCCAACTATTTCTTAATGTAAAGTTGCAGTCCAGCAGCTTGTGTTGGGATTTCATCACCTCCTTTAAATGACAAATTTGAATTTAGTAACTTGATAAATCTTAAAAGAATTCCATCAATAACTCCAAACGTAGTTGTTATTGACATCATGGTTCTTGCAGCTATCTTTGATGGCCACAATGATGGTGGCCAATGCCCCACAAAGACAGAGCATGACTAGGAACAACCCAGCTATAATATTTACACTTTCGATTTTCGCATCGGTACAATGTCTACATCAGACGTCGCGTATATCTCTCTCTGAAAAGAATATGAACTGCCCCAGCAATATCAGCACCAACAGAAAGGTT * XA:Z: XC:Z:Insufficient_junction_coverage_unclassified NM:i:200
ddc44957-79db-4e0e-80cb-4fc9cd4c9dba 0 Chr1 38744222 60 2=2I3=1I2=2X2=1I1X2=980N9=1D1X7=1X1=1I2=1X3=1D2=1X4=1X2=1D2=2I1=1X1=1X1=1I2=1X2=1I2=2D2=2X1=1I1=1X2=1I3=2X1=1X3=4I3=1I1X1=2I1=2X1=3X3=1X4=2X3=2X3=1X2=3D2=1X4=4I3=1I1=1X1=11I4=1D2X3=1X1=2I1=2I1X3=2X2=5I2=1D1=1X2=2D5=2D32=2D12=1I1X4=1I1=1I42=2X41=1I2=1X43=2D10=164N21=1D31=1X33=1D3=5I1=1X2=1X1=1D1=1X1=2X2=1X5=3I1=1X3= * 0 0 AACCTACTTGCCTGTCGCCTCATCTTCTGTGAATGAATCAGCTAATGTACAGCATGGCATTTCTAACAGTAATGTATGTAGCTACCGGCGACGCAGAACACAGACGCGCGCCTTCTTTAATGCAATCACCATCAACGCATGAAGCCTCTTACTACGCTAATTAGATTATTTGCGACTAATCACTATTAATCACTACTCCTCCTGACATTGTGATATCTTTGCTGACATAAAGTCTGTGTTCCATGGTTTCACTCTGACGTTGGCATAGGCCACAAGTCGATGTGATCAGTCAGCTCATCGGCGATGCAGGATTGAATTGCCAGAAACGCAAGAAGCAACCTCCCAACTCAGAGCCTGCGCCATGTTTGCAAGGTGATAGGATATCCTTCTGCAAAATCTGCAAGCGCAAGGCAAGTAGAAACCTGAAGGAACCTCTAGTCCTCTACTTGTACCTTCATTTTCACCTCCTAATCTACACCTTTTGATCCCCAGCAATATCAGCACCAACAGAAAGGCAG * XA:Z: XC:Z:NIC_novel NM:i:126
Alignment categories:
{'ISM/NIC_known': 4, 'NO_SPLICE': 2, 'Insufficient_junction_coverage_unclassified': 3, 'FSM': 1, 'unaligned': 1, 'NIC_novel': 1})
from ultra.
Thank you for your detailed reply.
I found the GTF files converted from GFF3 by different tool (gffread / AGAT) are different. OK, I will used AGAT instead of gffread for the analysis. But I wondered that perhaps the error was caused by "gene/transcript" record in GTF?
Here are the head lines generated from gffread/AGAT.
### gffread
# gffread all.gff3 -T -o all_gffread.gtf
$ cat all_gffread.gtf | grep 'Chr1' | head
Chr1 MSU_osa1r7 exon 2903 3268 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 3354 3616 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 4357 4455 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 5457 5560 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 7136 7944 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 8028 8150 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 8232 8320 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 8408 8608 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 9210 9617 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1 MSU_osa1r7 exon 10104 10187 . + . transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
### AGAT
# perl ~/tool/AGAT/bin/agat_convert_sp_gff2gtf.pl --gff all.gff3 --gtf all_agat.gtf
$ cat all_agat.gtf | grep 'Chr1' | head
Chr1 MSU_osa1r7 gene 2903 10817 . + . gene_id "LOC_Os01g01010"; ID "LOC_Os01g01010"; Name "LOC_Os01g01010"; Note "TBC domain containing protein, expressed";
Chr1 MSU_osa1r7 transcript 2903 10817 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1"; Name "LOC_Os01g01010.1"; Parent "LOC_Os01g01010"; original_biotype "mrna";
Chr1 MSU_osa1r7 exon 2903 3268 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_1"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 3354 3616 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_2"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 4357 4455 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_3"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 5457 5560 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_4"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 7136 7944 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_5"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 8028 8150 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_6"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 8232 8320 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_7"; Parent "LOC_Os01g01010.1";
Chr1 MSU_osa1r7 exon 8408 8608 . + . gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_8"; Parent "LOC_Os01g01010.1";
I also used "--disable_infer" to build the index (No error occurred).
$ uLTRA index all.fa all_gffread.gtf msu7_ultra_index --disable_infer
creating msu7_ultra_index
total_flanks2: 258153
total_flank_size 204193233
total_unique_segment_counter 93250745
total_segments_bad 3610100
bad 90170
total parts size: 93301502
total exons size: 113318377
min_intron: 4294967296
Number of ref seqs in gff: 236343
Number of ref seqs in fasta: 14
1 million kmers processed.
...
9816415.0 Unique kmers in reference part sequences with abundance > 1
CGGCGGCGGCGGCGGCG 3560
...
19208 19208 out of 236343 sequences has been modified in masking step.
1 million kmers processed.
...
60343 60343 out of 258153 sequences has been modified in masking step.
Here is the format of GTF i referred:
The following feature types are required: "CDS", "start_codon", "stop_codon". The features "5UTR", "3UTR", "inter", "inter_CNS", "intron_CNS" and "exon" are optional. All other features will be ignored. The types must have the correct capitalization shown here.
from ultra.
I'm glad that it works. It is very possible that the error occurred due to no gene/transcript records. I have not verified this though.
from ultra.
I'm also happy to receive feedback on the quality of your downstream analysis (assembly with stringtie?) after your reruns.
from ultra.
Related Issues (20)
- ultra installation and run error HOT 7
- Mapping with uLTRA without GTF? HOT 3
- Cigar is None HOT 2
- BUG -4294967296 HOT 8
- Controlling (high) uLTRA RAM usage HOT 1
- a bug of `--alignment_threshold` HOT 1
- Out of bound reads HOT 13
- Genomes FASTA/GTF files needed to run the evaluation HOT 4
- Can not access local variable 'read_mems_tmp' when using --use_NAM_seeds HOT 2
- Error: invalid feature coordinates (end<start!) at line: HOT 1
- Bug with uLTRA align : TypeError: bad argument type for built-in operation HOT 3
- UnboundLocalError: local variable 'i' referenced before assignment HOT 4
- Can I use ultra to align est to references HOT 1
- error when aligning direct RNA data during revcomp script HOT 3
- KeyError when running test pipeline HOT 3
- How to control minimap 2 parameters during uLTRA alignment HOT 6
- uLTRA + SQANTI3 HOT 2
- Non-absolute paths don't resolve HOT 5
- Python bindings? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultra.