Giter VIP home page Giter VIP logo

Comments (7)

ksahlin avatar ksahlin commented on July 19, 2024

Thanks for providing the data. I will look into it!

from ultra.

ksahlin avatar ksahlin commented on July 19, 2024

Hi @zhixue ,

I tried your reads mapping to chr1 assuming it was human, but since all my reads are unaligned I'm assuming the reads come from another genome?

I also tried BLAT'ing your reads but it didn't find a good match to an organism. To debug this it would be desirable to have e.g. chr1 and the subset of annotation (annotations to chr1), if possible, so that I can reproduce the error.

Best,
Kristoffer

from ultra.

zhixue avatar zhixue commented on July 19, 2024

Thank you~
Here are the links of the rice genome (all.con) and the gene annotation (all.gff3)(http://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/).

You can also download the index I have built (msu7_ultra_index.tar.gz).

My command is

ref=all.con # rice reference (fasta format)
th=4

# ultra
uLTRA align $ref ${sample}.fq ${sample} --index ~/ricerna/ultra_bam/msu7_ultra_index --ont --t ${th} --prefix ${sample}_msu7 --use_NAM_seeds

from ultra.

ksahlin avatar ksahlin commented on July 19, 2024

Hi @zhixue ,

I get output different from yours. I believe it could have something to do with your GFF3 file not being a GTF file (which is what uLTRA wants). Or it has not been converted properly.

Specifically:

I downloaded the rice genome and GFF3 annotation and made a bug_reads_chr1.fa out of the reads you sent that were mapping to Chr1.

Specifically, I used AGAT to convert the GFF3 to a GTF:

agat_convert_sp_gff2gtf.pl --gff all.gff3 --gtf all.gtf

Then I used only Chr1 in the fast genome file and subsetted the GTF as:

grep "Chr1" all.gtf > chr1.gtf

Finally, I ran:

./uLTRA index --disable_infer  chr1.fa  chr1.gtf ~/tmp/ultra_chr1_bug_rice/
./uLTRA align chr1.fa bug_reads_chr1.fa ~/tmp/ultra_chr1_bug_rice/

Output:

@SQ	SN:Chr1	LN:43270923
18c45dd8-691e-4b68-8f80-9e07874c3c6c	0	Chr1	7683962	60	1=1X3=1X1=2D1=1X1=2D5=1X1=1X3=2I2=1X15=1D5=1I10=234N1=1I19=1D25=1X11=1X1=2X6=1X5=2D1=1X3=1X21=1I3=781N2I2=2I6=1D23=1I13=1I44=122N4=2X12=1I1=1X13=1X27=1X1=1X17=4D8=1X2=1D1=1X2=1D4=1X3=1X1=1X1=1X5=1X9=3D4=1I15=1D89=1D1=2X13=1D29=1D15=1X2=1X2=2I1X6=1I18=1I5=1I1=2X2=1I39=1I14=1D1X4=1I4=2I1=1X8=2D23=2I4=2I1=1D8=1D1=3X1=1X2=2D2=3D2=1D2=1X1=1X1=1X1=1I2=1X3=3D1=1X2=1I3=1X1=2X1=1X2=	*	0	0	AACTTTCTGTTGGTGCTGATATTGCTGGGAGATATACCATATAAGAGGGATGCAAAGGAGTGTTTTGTGTGAAATACACTATGATGGAACCTATTTGGTTTCTAGCAGTGATGATGCAGATCTTCAGCTGTAGGTCAAAAGCTTCAGAACAGTTGGGAGGTTTACTTATCTTCCAGAGAACGCAGAAAACAAGAATACCCTAGATGCTGTCAAAGGAGCGGTATAAGCACCTTCCCGAAGTCAAGCGGATTGTGAGGCATGAGCACTTACCTAATGACAATCTACAAGGCAGCCAACTTAAGGCGAACGATGATTGAAACGGAAAACCGTAAAGAAGAAGGCGTGCACAATTGCCCCAGGAACGTACCTGTACAGCCCTTCAAGAGGAAGAATAATCAAAGAGTAGAGTAAAATGGAAGAATATGTGTGGCACAAAACCAAACGCCGCGATTTTACAGGTCTAGCTCAAGACGGGGACATGTCTAATCATCTGCAAACCATTCATCGAGAGTGACGTCTGTTGCAAACAGTGTTTGAAAGCAGCCCCAGCGGCGGCTTTGAAATCAGAAGGGTCTATGGTTCATCCAGTTCTCGAAGAGGGCTGAAATTTTTGCTTCTGCTCAATTTGTTGTATCGTTTTTGCTTGTAACGGTTATGGAACAGCCTCTAGTTATATCTCGTATTACTTTTCTGATGGAAGTATTACTGTGTAAAAAAAAGTAAAAAAAAGAAGATGAGGCGACAGGCAAGTAGGTT	*	XA:Z:	XC:Z:ISM/NIC_known	NM:i:113
de6aa4f9-3b41-4b93-b88a-c21e476a0638	0	Chr1	14019109	60	2=1X4=1I1=1I1=1X4=2D3=1D1=2I3=5I1=2I8=1X6=1X2=1X2=2D8=1X12=1X12=1I1=1X21=1I3=2D1=1I1=1X1=3I9=1D5=2D4=1X1=1X4=1I1X3=2X14=1I1X13=1X1=1D5=2D3=1I3=1X6=1D9=2X7=2X12=2D14=1X4=1I3=3I1=4D1=1X5=1X1=2X5=1X3=1I1=1X2=1D12=1X6=1D14=1I8=1D9=2X1=1D6=1X5=2D1=30S	*	0	0	ACTACTCTGTTGGTACACAATGTGCTGGGGATGCATCCCTCAGATGTAACTCAGCCCCGACCATCGCCTGAACCAGCAAAGTTAAATACTCAGAACCTGGAAAAAACCCCTGATTTCATTTAACAAGAAAGTAGGTGAACACAGTTTGTGCTGTAATCCTGTGTTTGCGTAAGTAATTAGACTCCAGGACAAATGCCAGACAGGTGAAAAAGAACCACAAGAACATAAACCCAAACTACAAGGAGGAAAAACATTTCAAGCAAATTATTTATATCAAATTCAGTTCCAGTAAGCATCATAGTGCATATTCAAGCCATGTTGGGAGCTTCTTTAAGTTGTCGTATTTTATCATTTGCGCAGAAGATAGGGCAGACAGGCAAGTAGGTT	*	XA:Z:	XC:Z:NO_SPLICE	NM:i:84
8c963a72-683e-409b-b389-0962569c0026	0	Chr1	21071376	60	3=3X1=1X2=2X2=2I2=2I3=1X1=2I3=1X1=5D1=2X1=3D4=1X3=2D1=1D3=1X4=2D2=1X2=1X1=2X1=1X2=1X6=1X2=3D1X42=1X15=1D13=4D21=1I14=1D22=1D20=1I1X16=3I11=1I63=1I20=1X15=2X31=1D39=2X8=4D26=1X16=1X9=1D10=1D1=1D1X8=554N30=1X10=1D22=2D5=80N19=1I1X71=141N73=7D2=2X2=2X2=1D3=2D3=1X1=1X2=1X1=3D2=2I1=	*	0	0	AACCCTACTTGCCACTGTCGCTCTATCTTCTTTTTTTTTTTTTACTTTTTTTTTTCTCTTTTTTTTTTTTTTTTTTTTGCTTGAAACCTCGGCAGTTGTCACTTCATTATCGGATCATGATGAAAAACATTCTAGTTTAAGATGAAATAAGTAGAGTTGCAGACATAAAGCATGATCTGCTCTTCATCATCACCCTGTTAGGACTTGGAGAGCATAGATTTGTACCACCAAATACTATCTTATTGTAGAGAGCTGCATAAGATGGAACAAATGTGCTCATGTACACTGGGTATAGGATTGGGCACAAATATATATGGATATCACTTATTGCATCCTTATCCTACAAGGATTAGTAACACTATGACATGCATTCCGTGCAACTTCCAATGCATCCTTCAGGAAACTCTGATGATCATGATGATCTGTCTTGCCCGCACATGACTACCCTCCCAAGTGACTTGACAGCCGAGGTACATTGCTTGCTATCCTCCATGTTTGTTCAGGGTATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCAGTTGGTACATCAGCAAGGCTCCATTGCTGTTTATGGCCTGCAGGTTGTCTTGCAACAAAAAAAACATGCGCCGATGACAATGTAGCACAGGAGGAGGAGAATCCCCTTCAGGTAATGTGATGTTCCATCCTGCAGGGTGAAGGCCGTCACCAGCACCGCCATGAACAGAGAGCCGGTCTCTAGCAGCTTGAAGTCAAGATCCCCAGCAATATCAGCACCAACAGAAGGTT	*	XA:Z:	XC:Z:ISM/NIC_known	NM:i:102
83e2e624-7264-49db-ae29-df140e4f8fe7	0	Chr1	21071443	60	2=1X2=1X3=3X3=2D4=1D2=3D1=1X1=1X2=5D2=1X1=3D2=2X2=1X1=3D1X1=2X2=1X2=1D183=1D7=1X15=1I14=2D12=3I3=2X29=1D6=1D45=1X31=1X4=1X3=2D3=1I6=4D21=554N35=1X1=2D2=1D26=1X1=1D1362N4=4I3X2=1X4=1D2=3I1=2X1=1X3=34S	*	0	0	AACCTACTTGCCTGTCGCTCTATCTTCTTTTTTTTTTTTTTTTTGATGAAAACATTCTAGTTTTAAGATGAAATAGAGAAGTAGAGTTGCAGACATAAACATGATCTGCTCTTACATCATCACCCTGTTAGGACTTAGGAGAGCATAGATTTGTACCGCAAATACTATCTTATTGAGAGCTGCATAGATGGAACAAATGTGCTCATGTACACTGGGTATAGGATTGGGCACAATATATGTGGATATCCTTATTGCCATCCTTATCCTATGGATTAGTAACACTGCCTCAACATGCATTCCGTGCAACTTCCAATGCATCCTTCAGAAACTCTGATGATCATGATGATCTGTCTTGCTGGCACATGAAGAACTACCCTCCCAAGTGACTTGACAGCCAAGGTGCATTTCTTTGTGCCTCCATTGTTCAGGAGACATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCGGTTGTTATCAGCAAGGCTCCATTGCTGTTTGCATAGCTGCAGGTTGTCCTTGCAACGAAAAAAACATGCGCCCCCAGCAATATCAGCACCAACAGAAGGTT	*	XA:Z:	XC:Z:Insufficient_junction_coverage_unclassified	NM:i:76
5e38f5cf-3804-43dc-be3e-8f9c41b52a5e	0	Chr1	21071470	60	5=1I1=1I1X2=2D1X4=1X1=1X2=2D4=2D1X2=1D2=1X2=1D26=2I34=2I15=3I7=1D6=1D18=1X9=2I31=2D61=1I8=1I3=1I37=1D39=2X5=1I15=1X4=2I22=1D2=3D24=1I2=1X3=1D13=554N71=1362N4=7I1=1X2=2X4=1D2=1I1=2X1=1X3=34S	*	0	0	AACCTACTTGCCTGTCGCCTCTATCTTCTGATGAAAACATTCTAGTTTTAAGATGAAATAGAGAGAAGTAGAGTTGCAGACATAAACATGATCTGCTCTCTTACATCATCACCCTCCTGTTAGACTTAGAGAGCATAGATTTGTACTGCAAATACTTCATCTTATTGAGAGCTGCATAGATGGAACAAATGCTCATGTACACTGGGTATAGGATTGGGCACAAATATATATGGATATCCTTATTGCATCCTTTATCCTAGTAAGGGATTAGTAACACCTTGACATGCATTCCGTGCAACTTCAATGCATCCCTTCAGGAAACTCTGATGATCATGATGATTCGTCTTGGCTGGCACATGAAGAACTACCTCCTCCCAAGTGACTTGACAGCCAGCATTGCTTGCTGTCCTCCATGTTTCTGCTCAGAGACATCTTGATCCATCATGCGGCCTGAACACTCATGGAACCGGTTGGTACATCCAGCAAGGCTCCATTGCTGTTTGCATGGCCTGCAGGTTGTCTCCTTTGCAACGAAAAACATGCGCCCCCAGCAATATCAGCACCAACAGAAGGTT	*	XA:Z:	XC:Z:Insufficient_junction_coverage_unclassified	NM:i:62
b2e68eb0-9afd-40b9-a0b5-f83638947925	0	Chr1	22120317	60	30S2=2X1=1X3=1I1=1I2=3I2=1D1=2X1=1X1=2X3=1I2=1X1=2X1=2X1=2X4=1X1=1X3=1I6=2D9=1D8=1X3=1X2=1D8=5D9=2X7=1X1=1X2=1I5=2I2=1X3=1X2=1X1=2I4=3D5=1D3=5X3=1X2=1I1=1X2=1018N5=1X1=2D4=2D4=1D2=2X1=1D1X8=2D1=1X2=2I1X2=1X2=2I2=1I10=1D6=1D5=2D1X15=1I1=1X3=2X6=2I2=1D2X5=1X1=2I8=1I16=2I1X2=1I1X4=109N6=2D4=1I11=3X2=1X1=1X3=3I7=3D2=1I7=1D8=1D4=121N3=1D1X8=1D4=2X11=1D5=3I2=1D12=3I1=1X7=1X1=3I30=4D6=997N14=2D1=1I1X3=1I3=2I14=2D4=2I10=1D1X4=1I7=1I1X1=1X1=89N22=1D10=1I11=1D9=1I6=1I4=103N8=2D7=2I8=2D1X2=1X1=2I15=2X15=2X126N6=2I15=2X6=3X16=84N4=1X9=1D10=1D6=1X11=1D3=1I3=1I2=2D8=77N3=1D3=2I6=1D7=1I1=1X8=1D25=88N8=1I25=1D1X23=1I6=1X9=1I21=1I13=1D29=4D1=1I2=1D13=75N5=3I2X19=1D12=2D27=1X84N3=1X1=5I15=2X1=2I1=1X11=1X2=1X1=2I1X1=4X11=2I11=2I23=560N7=1D2=1X1=4I6=3D11=1I8=1X4=3D11=3D2=3I7=2D6=548N8=2I5=2I6=1D18=2D2=2I11=1D1=1D1X6=1D4=1X6=3X4=1D8=2X42=2X1=2X10=1I8=1X1=1X1=1D605N24=1X13=1X1=1X1=2D7=2X14=1X14=1X33=3I8=1D84N12=1D2=1X1=1X37=1I8=1X2=2I3=1I9=1D3=1X14=1D18=1X12=1I2X11=1I2X1=2I3=3D1=1I2=1X8=1I2=1X13=2D7=1D3=2I5=1D4=1X4=2I1X16=4D4=1I3=2I1=1D3=2X1=2D7=2D2=1X6=71N3=2X15=3I2X30=1I2=6D6=1D6=2X11=1X2=1X3=2I4=90N7=2D2=1X5=1D13=1I4=1D1X21=1X14=1D9=1I9=3X13=3D1=1I3=1D5=2X9=1I13=2I1=1X1=1X9=2I2=1X1=135N3=3D1=4D17=3I4=1D3=3I2=1X10=2I3=1D8=2X9=1I5=3I1=1X6=2I1=1X5=5I2=2X9=126N5=1I4=1X13=1I11=2I22=1I1X2=1D2=1D1=1X10=1I1X11=1I8=86N4=1D5=1X5=1X2=1D4=1X12=2D7=2I1=2I15=3X13=1D12=2X1=1I3=94N5=1I1=1X18=1X2=1I10=2D20=2X12=1D4=1D15=2D6=3D2=1D10=1D14=3X11=3D1=1I12=1X14=3I15=1I21=2D7=1I1X2=2D13=3D8=1D1=1X2=1I2=1I18=1I4=2I19=1D4=1D1X3=1X1=1D1=1X8=4D2=1X6=1X9=2I3=1X2=1D4=1X1=1X1=3D1=1X1=1X7=2D1X5=2I30=1D9=1D17=5D11=1X7=2X14=1D1X19=7D2=1X1=2X2=1I1=1X5=1X1=2X3=1X	*	0	0	GTGCCACTTGTTCATTTATATTACTGAAATACAAAATTCATCCAGCGATACTACCTTCCCTATTAGTAATATTAACAGGGACGCCACGCACCTTTCCTTCTTCCTCCATCCACTCCTCCTTCCCCGCTTCCACTGCCGTCACTGCACGCCGACCCAACCACCACTTGGGAGAGGTCCCTTTAACATATCTCAAGAAATGAACCTGTTTCACTTATTTATTGATAAATTAAGCTTTGATTACAATGTTTCTGATTGGGAACTGTGAAGGAAGTTAAAATTAGGCAGATGTGGAAGCACCATATACTTTGTTGTGTCCAATAATGGGTCCACACTGAGCCAGTGGAAAGAACACTTTTTAAATCACGCCTTACTCACTTTCAATTTCGAGAATGGATCCTTCAAAGAAGGTCTAGACTACCAAGGAGTATGGATGGCGAGGCCAAACAAATATTGAACCAGCGTCCACTCTTATGAAAATGGATTTGGAAGGCACGGATGGTCGAGAGGAGAGGATGACACAGCCTTACCAAGACAAAGAGTGCACTGTTTGCTGGCTGTGTTTCAGATGTGTGTTTGATCAGCTGTGTGGTGTCATGACATTGGTAGAAACAGGCTGCAGAACAAACCTCTTTGAAGACTTGTTTTCCCAGGTTATGATGATTGTTCCCAGTCCACCAAGCTGGACATTGTTGTTTGTATTTCGTGATAAATCAAGAACACCTCACTGGAAAATCTTGAATAGATTCTGGAGGAAGATATTCAGAAGATATAGGATGGTGTCCTAAACCCCTGCCCATAAGGAAACTCCCTGAAGTAGATTTTCAATGTCAAGATTTGTGGCCTATCTGAATTATGAGGAAAGGAGGAGTTATTCAAAGAGCAGGTTGCCAGTTTTAAGAGATAGGTTCCAACAGTCTGTGCTCCTGGTGGACTTGCTGGAGAATCGGCAGGGCGTTGTCCCCTGCATCTGGTTTCTCTTTCCAGTTCACAACAATTTGGAAGGTCATTAAGGAGAATAAAGACAGTCACCAGCTCATAAAGTTATAAAATTCGCTACTGTACGCTGTGAGAAATCGGTAATGAAAATTGCCAGTTTTACAGCTGATGAAGAGGGTAAAAGCAACAATTTGAGGAGACTATTTTCAACATGATCACATTAACACAAGTTTGGGAAGAAGAGATCAGCAACAGCTTCTTGACAGATGCTTATCAGAGTATGACTGCAAGCTGGCTATTTTTGATGAAGAGTGTCAGAGCTTCAAGGCACCAACAGTCTTCTAAACTTGCAGCTTGTCAACCCCCTGCACATACCAAATATTTTGGATCATTTGGACACTAGAACTTTGCGTATTTAGGAATCCTTTATTAAGTCCTAGAAAGAGAGGGTTTTGCTGTTGCTGCTCGTGACTGTACTAAGGTTTTCCAGAGGAAGTTTGACAAAAGGATCAGGAATGCTGCTATCCAACAAGTGAAATAGGACCCATCAAAAGTGAGATAAGCAAAAGCGTGACATTGAAGCGCATGTGGCATCGGTTCGTGCCAAAAAACTATCTGAACTTTGTTCCAAGAAATACGAGGACAACTTACCAGACGCTGGCAGAACCAGTTGAAGCTCTTCTAGATTCAGCCAGGTGAAGAAGCCTATGGCCCAGCAATTGGAGGCTTCTTCAACGCGGACAAAATCTGCTGTTTCAGGTTTTGAATCTATACATGGCATCTTCCAGGAAGCAGTGGGGTGACTCAAGGAAGAATTACTTTCAGCTGGAAGTCACACGGAAGACCGTTTATTTGAATCAAAGGCAAAGAAGACTGTTCAGGAACGATTCGGGATGGACAGGTTACCAACACTATTCAGCCCAAACGATGCTGACTCAATGCCAAGAGTGTGGACCTGGGGACATAAGGCCGATACTAAAACTGGTCATTCAAAGCTTCTATGATTACTTTCACAATGGCTGCAATTTCGGTGGATGAGGATGGTGACAACAGTGAGAACACCCTGTCCTTGCTCTGAGTTGACACCTTAAGGCCAGGGACTGAATAGAGCAATCAATCGTTCTGATCCACTTGCCTCACAAACTCATGGGAAGAAAGGTTGGAAAACTTTAATTACACACACTGTCAGTACTGGAAATCTTTATTGGGAGCAATTTAGAGCTGAAACTAGAATACACTGCTGTCACCCCCAGGCCTGCTCAGCGCTGCTCAGGAGGCAAAATCAGAGGAACAACAACCTGGCTGCCACCGCTCCATGGGCACTTGCCGCAATAAGCATCATGGATTCAACAAAGTTCATGACAGTTGTTAAAGAATCCCTTCACCTGAGATCATGTTTGTTGTTTTCTGTTGGAGGAAGAAGCCATGTGGGTACCTTTAGACATTGCCAAGAGTTCCAAAATCAAGTTTCTTCCAACCCGTCCTATCACTTTCGATGAGAATTCGTTCCCGATAATGAACATCCTGAAGGAATTAGCTGATGAGGGCGAGACCTGCAGCCCCAGAGGCGAAGATGGAACTCAACCAAAATCCACTGAGAATGGTTCACGACAACGTGACATCAGCAGGGTCATCCAGCCACATAACCTCTTCGGAGGAGTGGACCCGAATATTCAAGCGATTGCTGCAGATCAGTTCATCTGCTTTTAGTAAAGAAGCCCCAGCAGTTGTAGTTTACAACTTGGGTGAGCCTTTGGAGACCAAGCACTTACGATTTTGTAAGCACTTCCCAATCTGATGTACACGCAGTTCATTAGCTGACAGAGGTGTTATGGAGGATTTTGGGCCCAGGGAGTTTGTTTGTAAAAACGCTGTGAATATGTTTTATTTCATTTTTGTAACAACTTATATATTGGTTGTATAGAAGGTTGCGGTTTTATCATCTCAGAAGATAGAGCGAGCAAATGAGTT	*	XA:Z:LOC_Os01g39310.1	XC:Z:FSM	NM:i:579
51b240c9-90a7-422e-9e11-0ebe73cb4efb	0	Chr1	24375913	60	1X5=2D2=1X1=5I2=3X1=2X3=6I1=1X32=1D1=5D2=1X11=1X5=1X2=1D8=1D53=2D7=1I8=1D2=1X3=1X5=4I35=3X24=2I12=1I14=1X13=2X21=194N20=1X5=2X3=1I1=316N24=2X5=3D1X9=5X3=1D21=1I20=2D2=1X2=78N1=5I5=2I1X2=1D5=1D6=1D19=1I12=2X17=144N11=1D8=2X5=1X1=1I3=183N38=1X2=2D2=1X2=1X1=1D2=1X1=2X1=1X2=2I1=2I1=1X2=1D2=	*	0	0	AACCTTTCGTTGGTGCTGATATTGCTGGGGACCGCTCTCTCTCTCTCTCTCTCTCTCTCTCAGTCTTCCTCAAGAAGAAATTCATTCTTCCTCACGCGCTCCGCGTCGCCGCATTCGGGGACGCCCACCGCCGCCGGATCCGGCCAGCCTTCCCCTCCTCCTCCTTTCCTGGCTGCTCCTCCTCCTCCTCGGTAAGATCCGGCGCCGGTTCCATGTCGGTTACCGGCGAGCAGTCGCAGGCGAAGTTGCGGAGGCCGAGTGAGGCGGTGGAGCTTGTGCTGTTCCAGGCTGCTGAGTGCTACGTCTACCTGATACCTCCCAGGAAGACAGCTGCCTCTCACAGGGGCTGATGAATGGAACGTCAACAAAGGGGCTAAAGGGACTCGGGCCGTTTCAGCAAAGGAGAAGAGTGCACTCATCAAACTGGAAGATAAGCAAAGGTAATAGGAGCCGGGTCGCTAGGCATTCTCAGAGAAGACGAACCACAATCCGGTGGAACTCGTTATTGATAGCAGCAGATATTTTGTACCCGTGTTGGCGAGAACAGTAGATGGACGTCAGCGCCATGCCTTTATTGGTTTAGGCTTCAGTAATACGTAACCGAACGAAGTGATACCAG	*	XA:Z:	XC:Z:ISM/NIC_known	NM:i:106
58f8f518-0121-4e61-aa92-114986d43317	0	Chr1	26395513	60	53S10=1D1=1X3=1D58=1D9=1I2=1X4=2I12=1I18=1D3=1D1X3=1D111=1X1=2I16=1X17=1D6=3I13=1D4=2X2=4I1=1D21=1D1X1=1D9=1D2=1D1=1X20=1D14=1D9=1X16=1I46=1D9=1D23=1I1=1X4=1I22=1X28=40S	*	AACCGCTTTCTGTTGGTGCTGATATTGCTGGGTGTATGTTGCAATGATGGCACAGATTGAAGTCATAGAAGAAGGGCCACTCTTACTCGGAGATCATCAACGAGAGTTTGATCGAGTCTGTGGATTCCTGAACCCGATGCATGCACACGCTCGTGGGAGTGGCCTTCATGGTTGATACTATCCCAACAGCCCGGTTGGGATCAAGGAAATGGGCGCCACGCTTCGACTACATCTTGACCCAACAAGCTTTTGTGACTGTTGACAAGAATGCTCCAGTTAATCAAGACCTCATCAACATAATTTCTTCTCCGATCACGTCCATAGTGCCATTGAGTCTGTATTGCTCAGTTGAGGCCACAATTCCCAGCATTTCAGTGCCTGCTGATGCCATTTGTGCGTCAAACTGCGACAATCTTCGTGAAGTTTTGCCAACTCCAGGGCACCTCGTAAACTATGTCAGTATCCTGACGGTTTCAACTCTCAAAGCTATGTATCACTATTTGAGAACCTAATGTAAACTTTGGAGTGTAGGATTGTCGTTTAGTTTGTCCACTGGACTAGTGCTGGTGTTTAGCATGAATAAAAATAGTGGTTTAATAAAAAACACACAAAGAGATGGGAGCGACGGAGCAAGGTGGGGT	*	NM:i:45	ms:i:414	AS:i:411	nn:i:0	tp:A:P	cm:i:79	s1:i:343	s2:i:0	de:f:0.0681	rl:i:0
8329a277-53b0-4be3-a211-9063c8db22f9	0	Chr1	37028734	60	23S20=5I4=1X35=1D38=1D6=1D17=1I4=556N5=1D35=1X29=1I43=2X2=1D1=1D16=1D19=2D51=1D711N3=2X21=1X47=255S	*	0	0	AACCTTTCTGTTGGTGCTGATATTGCTGGGACAACATGAACTCTCCGGGAGGCTGATCCTTCTCATCCCAAACCTGCAGTCGGTCATCATTACCATGGCCATCTTCGGATGGCATCCTTTGTCTTTAAAACATCTGCAGACACTTTTTCTGAACCTGACTTACACATACTACACGCTCTGCAAGGTTATGAGAGTTTATCAAGTTGGCTTCTGGATCAAGAAGCTCTCCACATCTCAAGAGCATAATCTCTCTCAGCCATTGTACATATGTAACTCAAATCGTTTGCGTCCTTAGCAGTCAGGTAACTTCAAATCCTCCCAAGCAGGCCTTAGCTTTACAAAAACACTAGTATCGCGAACTCCTGATTTATGCGAGTTAGAATCGTATTCCTTTCTGGCAGCCTGATAACAGGCCGGAGAACACGCTCCTGCCAAGATGGGGTTCAATGGACTGAAGCAGCCACAGGACCGTGCTGATCTCATCGCATACCTGAAGAACGCTACAGCATGAGAGTCCCTGCTGCCATCTTCCCAAGATGAAGAGAAGCTATTGCTCGTAGCCTCATACTGACTTCTGTAACTTGTAAGGGTGAACTTTGCAACAAACTGTGTTGTAAATTTGTAATAATAAGCAGTGGCTCAAAGACAGAAAAAAAAAAAAAGAAGATAGAGCGACAGGCAAGAGGTT	*	NM:i:24	ms:i:344	AS:i:277	nn:i:0	ts:A:-	tp:A:P	cm:i:67	s1:i:324	s2:i:0	de:f:0.0458	SA:Z:Chr1,38446047,+,425S214M2D49S,60,7;	rl:i:0
f967c0d5-5295-4c74-aa07-d6e793e21ea3	0	Chr1	37028706	60	1X1=1X2=3I2=1I3=1X2=1D1=1X4=8D28=4D28=1D1=1X26=1I1=1I16=1X21=556N22=1X22=1X1=2D1=1X38=4D1=1X13=1X76=1I5=1D15=1I2X3=711N9=2I4=2I1X18=1I13=1I23=1D31=1I16=1I20=2I18=1X14=2X13=1X4=1I17=2X2=1D1=1X15=1I17=1I12=1D65=3D2=2D52=1I10=1D41=1D7=1I29=317N1=1D45=1D15=1D1X2=1X34=2X16=1D1X30=1D5=1D12=1D10=1D3=1D14=2D18=1D27=1I1X35=4D4=1X6=1D2=1X18=2X1=1I1=2X6=1I2=1D2X1=1X1=1X2=2X2=1X1=1X2=	*	0	0	AACCTTTCTGTTGGTGCTGATATTGCTGGGACAACATGAACTCGAGGTTGATCTCATCCCAAACCTGCAGTCGGTCATCACTACCATGGCCATCTTCGGATGGCATCTCTTTTGTCTTTAAAAACAGTCTGCAGACACTTTTTCGAACCTGACTTTACACATACTACACGTTCTGCAAGGTTATGAGAGCTTACCGCTGGCTTCTGGATCAAGAAGCCTCCACATCTCAAGAGCACCCTCTCAGCCATTGCACATACATAGACCTCAAATCGTTTGCGTCCCTTAGCAGTCAGGTAACTTCTCAAATCCTCCCAAGCAGGCCTTAGCTTTTACAAAACACTAGTATCGTTAAACCTCTGGATTTCTATGTTAGAGTTAGAATCGCATTCCTTTTCTGGCAGCCTGGATAACAGGCCGGAGAACACGCTCTGCCCACCAGAGATAGGCAGAACTTCCTCCTTTTTGGGTCCCAACGATTTCTGCCATTATCTGTAACTGGGTATCTGTGTCAATGAACCCCTTGAGCAGCTCCTCGTCCTCAATATACTGCTTGGATTTCTGCAGACATCCTCGCACCCTGACGGGGTCGTCTTTCAATATCCATCCTACGGGGAAAGCATCTGATCCGGTCCTCAAAGGACTTCATTGTGTTAGCGACAATAAGGGTTTCATCGAGATCAAATACGACAGCACCGCAAGTTCAGCATCCCAACAGATGCTGCATAGATCCCAGAGCGTACTGGAACAGCACCAAAGCATGGCACCTTCTCCACCTTGCTCGGCATTGCCACTAGTGTAGCTTCCTCATCTCCAACTACTACCACTGCGCTCTGTACTCGTTGAAGCAAGTGAGGTAGAGGCGGTGTAGGCTCGGGTGGGAGGCATCAGCTTACCCGGAGCTTGCATCGCACGGAGAAGGGCGCGATGGTTCGCAGGATGGCGAGCGGTGGCACCGCTCGCTCGTCGGCGAGAGGTGGCGATGCGATCTCGTTGCTGGGAACGGCAGCTCCCTCGGCCCCCTGTCATCGGGAACACCTCCGCTCGCCGAGGAACACGTCGCCGTGGAATTATCCGCATGGTGACCCCACCGGCGGTGGCACCGGCCCTCTGGGGACGCTGCCTGGGGCGGGGCGCGGCCCAGCAGCCATCAGCACCAACAGAAAGAAGTT	*	XA:Z:	XC:Z:ISM/NIC_known	NM:i:116
7d39ea74-6ac4-48c3-86ab-6b0477548b5f	0	Chr1	38740059	60	20S1=3D5=4D1=1X4=1X2=4D3=15D1=1D4X5=1I8=2I10=2D1=4I46=1D4=2I26=2I13=1I25=1X46=1D5=1X2=1X16=1X3=1I3=1X1=1X19=2I1=1D13=1I1=1X2=2D1=1X1=2X27=1X10=635N4=1X41=1X2=1D1X2=3X5=2I2=1X1=1X1=1X4=1I24=1I24=2X24=1X3=1D11=2X29=1I1X2=1X1=1X8=4I1=1X2=1D4=1X1=1D1X7=1I5=1X2=2D6=1D4=3D258N11=1X2=2I1=1I1X39=1D2=1D1=82N5=1D1=1X10=1X6=2X12=1X5=1D7=1D8=1X3=1X9=1D2=3I2=2I4=1D13=131N1X12=2D3X16=90N7=2I2=1X12=2I6=1D15=2X3=1D3=1X1=1D4=221N1=1I1=1X28=2D8=1X8=2D6=1D11=2I16=1D3=1X5=1X4=252N1=3I1=1X13=1X7=2D1=2D1=1X4=1X4=1X8=1I1X9=1D2=2D2=1444N1=1X3=1I2=1I1=4I3=2I2=1X2=1I2=1I1=2D1=1X2=1I1X2=1X1=1X3=2D46S	*	0	0	AACCTCACTTGCTGCCTGTCGCCTCATCTTCTTTTATTTTTTTTAATAGGGTTAACTTATAGGATATTCTGGCCAGTCTAACCACTGAACTATCTGAATATTGAGATGCTGATGTTACACTGGTAAAAAAGTGAATCCTGAACAAAGTTTATCCCCTCTGTTCATACTTACCTTCATCTTGCTATGAATATCCCATTGCCTTACACAGGGACCAGGTTCATATGGTGCTGTACGGCCATACCAGCTACTGGTGCAGCAGCTACCCTTCTCTTGCTCCAGCAAACACCTGAAGCGTAAGCAATCTTGACGTGCATCACCTCCGTCTTCTGCAATCTTCTCAATAGGAGTTCATAGTCTCTCTGCGGCAGCTGCAGCACGTGGCCAAATGGTTTGTTCAATATCCGACGCACACCCTGTTCTGGCTCAATATACAAAACCTCACCACCAATGACCAAACTTTTGCTGCTCTGGGTCGTCAATACTCTTCAGAGGTTCATTCGTGTAAAACTCCTCCATGTGGCATTCAAGTGATCTAGGTACCACTTATCCTGGTTACCTCATGATGCATCCTTTTTAGCCTGTCAAACCACTCTTTGGCGCCATCCTTCCACAGTTATGAACTACTTGTCTTCTACGGTCCAATTTGTCTCCAAAATTGTTAAATGTTTCTCCAGTTACGACATCATAATCGTGTGGAATTGCTAGCTTTGGGGATTCAACACAAATATCTATATACATCTGACAATATATCTTATTTGTTATCATCCAACTATTTCTTAATGTAAAGTTGCAGTCCAGCAGCTTGTGTTGGGATTTCATCACCTCCTTTAAATGACAAATTTGAATTTAGTAACTTGATAAATCTTAAAAGAATTCCATCAATAACTCCAAACGTAGTTGTTATTGACATCATGGTTCTTGCAGCTATCTTTGATGGCCACAATGATGGTGGCCAATGCCCCACAAAGACAGAGCATGACTAGGAACAACCCAGCTATAATATTTACACTTTCGATTTTCGCATCGGTACAATGTCTACATCAGACGTCGCGTATATCTCTCTCTGAAAAGAATATGAACTGCCCCAGCAATATCAGCACCAACAGAAAGGTT	*	XA:Z:	XC:Z:Insufficient_junction_coverage_unclassified	NM:i:200
ddc44957-79db-4e0e-80cb-4fc9cd4c9dba	0	Chr1	38744222	60	2=2I3=1I2=2X2=1I1X2=980N9=1D1X7=1X1=1I2=1X3=1D2=1X4=1X2=1D2=2I1=1X1=1X1=1I2=1X2=1I2=2D2=2X1=1I1=1X2=1I3=2X1=1X3=4I3=1I1X1=2I1=2X1=3X3=1X4=2X3=2X3=1X2=3D2=1X4=4I3=1I1=1X1=11I4=1D2X3=1X1=2I1=2I1X3=2X2=5I2=1D1=1X2=2D5=2D32=2D12=1I1X4=1I1=1I42=2X41=1I2=1X43=2D10=164N21=1D31=1X33=1D3=5I1=1X2=1X1=1D1=1X1=2X2=1X5=3I1=1X3=	*	0	0	AACCTACTTGCCTGTCGCCTCATCTTCTGTGAATGAATCAGCTAATGTACAGCATGGCATTTCTAACAGTAATGTATGTAGCTACCGGCGACGCAGAACACAGACGCGCGCCTTCTTTAATGCAATCACCATCAACGCATGAAGCCTCTTACTACGCTAATTAGATTATTTGCGACTAATCACTATTAATCACTACTCCTCCTGACATTGTGATATCTTTGCTGACATAAAGTCTGTGTTCCATGGTTTCACTCTGACGTTGGCATAGGCCACAAGTCGATGTGATCAGTCAGCTCATCGGCGATGCAGGATTGAATTGCCAGAAACGCAAGAAGCAACCTCCCAACTCAGAGCCTGCGCCATGTTTGCAAGGTGATAGGATATCCTTCTGCAAAATCTGCAAGCGCAAGGCAAGTAGAAACCTGAAGGAACCTCTAGTCCTCTACTTGTACCTTCATTTTCACCTCCTAATCTACACCTTTTGATCCCCAGCAATATCAGCACCAACAGAAAGGCAG	*	XA:Z:	XC:Z:NIC_novel	NM:i:126

Alignment categories:

{'ISM/NIC_known': 4, 'NO_SPLICE': 2, 'Insufficient_junction_coverage_unclassified': 3, 'FSM': 1, 'unaligned': 1, 'NIC_novel': 1})

from ultra.

zhixue avatar zhixue commented on July 19, 2024

Thank you for your detailed reply.

I found the GTF files converted from GFF3 by different tool (gffread / AGAT) are different. OK, I will used AGAT instead of gffread for the analysis. But I wondered that perhaps the error was caused by "gene/transcript" record in GTF?

Here are the head lines generated from gffread/AGAT.

### gffread
# gffread all.gff3 -T -o all_gffread.gtf 
$ cat all_gffread.gtf  | grep 'Chr1' | head
Chr1    MSU_osa1r7      exon    2903    3268    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    3354    3616    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    4357    4455    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    5457    5560    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    7136    7944    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    8028    8150    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    8232    8320    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    8408    8608    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    9210    9617    .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";
Chr1    MSU_osa1r7      exon    10104   10187   .       +       .       transcript_id "LOC_Os01g01010.1"; gene_id "LOC_Os01g01010"; gene_name "LOC_Os01g01010";

### AGAT
# perl ~/tool/AGAT/bin/agat_convert_sp_gff2gtf.pl --gff all.gff3 --gtf all_agat.gtf
$ cat  all_agat.gtf | grep 'Chr1' | head
Chr1    MSU_osa1r7      gene    2903    10817   .       +       .       gene_id "LOC_Os01g01010"; ID "LOC_Os01g01010"; Name "LOC_Os01g01010"; Note "TBC domain containing protein, expressed";
Chr1    MSU_osa1r7      transcript      2903    10817   .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1"; Name "LOC_Os01g01010.1"; Parent "LOC_Os01g01010"; original_biotype "mrna";
Chr1    MSU_osa1r7      exon    2903    3268    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_1"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    3354    3616    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_2"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    4357    4455    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_3"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    5457    5560    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_4"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    7136    7944    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_5"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    8028    8150    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_6"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    8232    8320    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_7"; Parent "LOC_Os01g01010.1";
Chr1    MSU_osa1r7      exon    8408    8608    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_8"; Parent "LOC_Os01g01010.1";

I also used "--disable_infer" to build the index (No error occurred).

$ uLTRA index all.fa all_gffread.gtf msu7_ultra_index --disable_infer
creating msu7_ultra_index
total_flanks2: 258153
total_flank_size 204193233
total_unique_segment_counter 93250745
total_segments_bad 3610100
bad 90170
total parts size: 93301502
total exons size: 113318377
min_intron: 4294967296
Number of ref seqs in gff: 236343
Number of ref seqs in fasta: 14
1  million kmers processed.
...
9816415.0 Unique kmers in reference part sequences with abundance > 1
CGGCGGCGGCGGCGGCG 3560
...
19208 19208 out of 236343 sequences has been modified in masking step.
1  million kmers processed.
...
60343 60343 out of 258153 sequences has been modified in masking step.

Here is the format of GTF i referred:

The following feature types are required: "CDS", "start_codon", "stop_codon". The features "5UTR", "3UTR", "inter", "inter_CNS", "intron_CNS" and "exon" are optional. All other features will be ignored. The types must have the correct capitalization shown here.

from ultra.

ksahlin avatar ksahlin commented on July 19, 2024

I'm glad that it works. It is very possible that the error occurred due to no gene/transcript records. I have not verified this though.

from ultra.

ksahlin avatar ksahlin commented on July 19, 2024

I'm also happy to receive feedback on the quality of your downstream analysis (assembly with stringtie?) after your reruns.

from ultra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.