Giter VIP home page Giter VIP logo

ribotish's Issues

Ribotish on prokaryotic ribosome profiling

Hello,

I carried out ribosome profiling TI-seq on a prokaryotic organism and I am interested in seeing if I could use ribotish. I have first tried executing this command:
ribotish quality -l 10,40 -b /Volumes/Diego_2TB/ribosome_profiling_final_libs/libraries/H98_WT_ctrl/reads/density/filtering_bowtie/alignments/chr/H98_WT_ctrl.bam -g /Users/DRG/Downloads/GCF_000025685.1_ASM2568v1_genomic_update_080119.gff --geneformat gff

but then I get an error which is:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

I am having a hard time troubleshooting what could be the problem. From my own analysis, the peak footprint length is 27 nt, but there is a range of footprint lengths since this is MNase-treated (for a metagene plot of density periodicity and heatmap of length distribution see here. Is the problem the footprint length or is the gff annotation the problem? Or something else entirely?

Some extra details:

  1. The annotation is from NCBI
  2. The bam file was aligned with bowtie, it is single-end, and it has been coordinate sorted.

Any help/insight on what the problem could be is greatly appreciated.

Thank you.

Not working with stringtie custom GTF

Hi, I created a custom GTF by assembling with stringtie but am getting the following error.
The command works fine with gencodev46 GTF.
Is there some reason why this wouldn't work?

ribotish predict \
-b ${BAM} \
-g ${GTF} \
-f ${GENOME} \
-o ${RIBOTISHDIR}ORF.tsv \
--ribopara ${RIBOTISHDIR}para.py \
--seq \
--aaseq \
--blocks \
--altcodons CTG,GTG,ACG \
--inframecount \
--maxNH 1000 \
--minMapQ 0 \
--secondary \
-p ${N} \
--verbose \
--transprofile ${RIBOTISHDIR}transprofile


"""
Traceback (most recent call last):
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/run/predict.py", line 394, in _pred_gene
    ttis = ribo.Ribo(t, bamload = tismbl, compatible = compatible, mis = compatiblemis)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/zbio/ribo.py", line 62, in __init__
    self.cnts = bamload.transCounts(trans, compatible = compatible, mis = mis)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/zbio/bam.py", line 613, in transCounts
    d = self.data[trans.strand]
KeyError: '.'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/gstore/home/malekose/micromamba/envs/test/bin/ribotish", line 56, in <module>
    main()
  File "/gstore/home/malekose/micromamba/envs/test/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 451, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
KeyError: '.'

ribotish quality: error: argument -d: expected one argument

Hi, when I wanted to use option -d to define the position range near start codon or stop codon, I met such an error:

ribotish quality: error: argument -d: expected one argument

My command line is:

ribotish quality -b D0_1.unique_mapped.sorted.bam -g Mus_musculus.GRCm39.103.chr.gtf -d -20,60

Would you please show me how to customize the position range?

Thanks a lot!

Visualization of single transcripts with a-site/p-site coverage

Hello!

First of all thank you for developing Ribo-TISH! I have found it really useful for the QC it provides and also for uORF prediction.

I have a question that I am not sure that it is relevant to Ribo-TISH itself.

In the published paper there are several snapshots of single transcripts (e.g. Fig.3a,3b and Fig.4a,4b) providing p-site (or a-site?) Ribo-seq or QTI-seq coverage. I would like to ask if Ribo-TISH has the ability to provide such plots.

If not, may I know how do you manage to end up with the .bam/.bed file containing only the counts per p-site or a-site so that can be easily visualized on a genome browser.

Thanks in advance!

Kind Regards,
Alex

Feature request: Ribo-seq counts in the results

Thank you for the great tool!
It seems to be working perfectly for me but I would like to request a feature implementation.
Would it be possible to add a column of in-frame Ribo-seq (not TI-seq) count per ORF to the result of ribotish predict, like TISCount column for TI-seq?
I believe this would be useful from the following viewpoints (I'm detecting ORFs using Ribo-seq only, without TI-seq):

  1. The current output includes ORFs with very low read counts. I understand that this is helpful for increasing sensitivity but in some cases users may want to apply filtering by CPM etc to extract highly-translated ORFs.
  2. With --framebest option, one stop codon can have multiple TIS, when RiboPvalue has ties (especially when RiboPvalue = 0). One option for filtering is taking the longest ORF but it would be more helpful if users can take P-site read abundance into consideration to pick up the ORF with highest translation activity.

I would appreciate it if you could consider implementing this feature, only if it would not cause too much trouble for you.
Many thanks,
Sotta

AttributeError: 'Exp' object has no attribute 'sq'

Hi,

I am trying to run ribotish on human samples with older genome and annotation (gtf) from 2014.

This is the command I ran:

ribotish predict \
-b file.bam \
-g Homo_sapiens.GRCh37.75_without_bs.gtf \
-f Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \
--ribopara SRR1333394.sorted_indexed.bam.para.py \
--aaseq --tpth 1 --fpth 1 --fspth 1 --fsqth 1 --longest -v \
-o ribotish_pred.txt

This is the error:

Thu Nov 24 10:02:01 2022 Loading genome...
Making fasta index for Homo_sapiens.GRCh37.75.dna.primary_assembly.fa...
No input TIS data!
Thu Nov 24 10:02:42 2022 Predicting...
1
Traceback (most recent call last):
  File "/usr/local/bin/ribotish", line 56, in <module>
    main()
  File "/usr/local/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 469, in _pred_gene
    e = getResult(t, tis, o.stop, cds1, cds2, tsq, [ip, ttis.cnts[tis], tps[i], rps[i], rst[i], fsp], o.has_stop_codon)
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 499, in getResult
    if aaseq : e.aa = orf.translate(e.sq)
AttributeError: 'Exp' object has no attribute 'sq'

Can you help me on this one?

Error running ribotish predict with multicore parameters

Hi,
I'm trying to use the whole pipeline of ribotish. When I run the command
ribotish predict -t unassigned_HARR_mapped_sort.bam -b unassigned_mapped_sort.bam -g ../gencode/gencode.v28.annotation_appris_princ_1.gtf -f ../gencode/GRCh38.p12.genome.fa -o unassigned_pred.txt -p 8
i get this error :
Thu Jun 27 17:21:06 2019 Estimating TIS background parameters... Traceback (most recent call last): File "/home/adminmanu/softwares/miniconda3/bin/ribotish", line 56, in <module> main() File "/home/adminmanu/softwares/miniconda3/bin/ribotish", line 34, in main commands[cmd].run(args) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/site-packages/ribotish/run/predict.py", line 151, in run pool = MyPool(1) # This is for memory efficiency File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 177, in __init__ self._repopulate_pool() File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 238, in _repopulate_pool self._wrap_exception) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 252, in _repopulate_pool_static wrap_exception) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/process.py", line 74, in __init__ assert group is None, 'group argument must be None for now' AssertionError: group argument must be None for now

This error did not occur when I use the same command without the -p parameters so I can continue with downstream analysis using one core.

I'm using ribotish with python 3.7

Thanks

Feature Request - Add 3' end match for RPFs

Hi. This is a great tool!

The prokaryotic community could benefit more of it if a feature doing the same for 3' end of RPFs could be added. Seems like the 3-nt periodicity for model organisms like E. coli is easier to notice when using 3' end profile of RPFs.

Thanks!

run mistakes

Traceback (most recent call last):
File "/gss1/home/zpchen/miniconda3/envs/ribotish/bin/ribotish", line 56, in
main()
File "/gss1/home/zpchen/miniconda3/envs/ribotish/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat)
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans
for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=paired, flank=flank):
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 498, in transReadsIter
for read in rds: #yield read
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 52, in fetch_reads
if minMapQ is not None and read.mapping_quality < minMapQ : continue
AttributeError: 'csamtools.AlignedRead' object has no attribute 'mapping_quality'

Wrong CDS annotation

Hello,

I have ran:

ribotish predict -t TIS_bams_new/R3_ProAligned.sortedByCoord.out.bam -b CHX_bams_new/R3_ProAligned.sortedByCoord.out.bam -g gencode.v19.annotation.gtf -f GRCh37.p13.genome.fa -o pred.txt --framebest

The result is that (It is not finished):

No offset parameter file found for CHX_bams_new/R3_ProAligned.sortedByCoord.out.bam. Using default offset (12).
Thu Sep 22 10:43:03 2022 Estimating TIS background parameters...
Thu Sep 22 12:05:04 2022 Predicting...
Wrong CDS annotation: ENSG00000189409.8 ENST00000472264.1 56 556 556
Wrong CDS annotation: ENSG00000116649.5 ENST00000487300.1 252 611 611
Wrong CDS annotation: ENSG00000219073.3 ENST00000374666.1 3 497 497
Wrong CDS annotation: ENSG00000158062.16 ENST00000374215.1 196 938 938
Wrong CDS annotation: ENSG00000090020.6 ENST00000374084.2 289 695 695
Wrong CDS annotation: ENSG00000142765.13 ENST00000473280.1 83 312 312
Wrong CDS annotation: ENSG00000116497.13 ENST00000482212.1 267 706 706
Wrong CDS annotation: ENSG00000116922.10 ENST00000486637.1 511 852 852
Wrong CDS annotation: ENSG00000066136.15 ENST00000531464.1 344 525 525
Wrong CDS annotation: ENSG00000117385.11 ENST00000372526.2 31 654 654
Wrong CDS annotation: ENSG00000186973.6 ENST00000409396.1 29 448 448
Wrong CDS annotation: ENSG00000079277.15 ENST00000496619.1 202 738 738
Wrong CDS annotation: ENSG00000132122.7 ENST00000371841.1 69 818 818
Wrong CDS annotation: ENSG00000085831.11 ENST00000411642.2 92 931 931
Wrong CDS annotation: ENSG00000134744.9 ENST00000484723.2 191 2910 2910
Wrong CDS annotation: ENSG00000116212.10 ENST00000371368.1 261 838 838
Wrong CDS annotation: ENSG00000125703.10 ENST00000371118.1 96 665 665
Wrong CDS annotation: ENSG00000177414.9 ENST00000371077.5 424 1193 1193
Wrong CDS annotation: ENSG00000116791.9 ENST00000370870.1 158 888 888
Wrong CDS annotation: ENSG00000137944.12 ENST00000370486.1 232 1019 1019
Wrong CDS annotation: ENSG00000184371.9 ENST00000357302.4 239 597 597
Wrong CDS annotation: ENSG00000007341.14 ENST00000369664.1 174 862 862
Wrong CDS annotation: ENSG00000134262.8 ENST00000369564.1 336 1225 1225
Wrong CDS annotation: ENSG00000163349.17 ENST00000503968.1 251 570 570
Wrong CDS annotation: ENSG00000163399.11 ENST00000369494.1 264 698 698
Wrong CDS annotation: ENSG00000143452.11 ENST00000368987.1 219 791 791
Wrong CDS annotation: ENSG00000143442.17 ENST00000533351.1 167 586 586
Wrong CDS annotation: ENSG00000143569.14 ENST00000368504.1 69 1120 1120
Wrong CDS annotation: ENSG00000132676.11 ENST00000471214.1 388 1287 1287
Wrong CDS annotation: ENSG00000143320.4 ENST00000368220.1 208 456 456
Wrong CDS annotation: ENSG00000249730.1 ENST00000504970.1 0 935 935
Wrong CDS annotation: ENSG00000116191.13 ENST00000324778.5 107 906 906
Wrong CDS annotation: ENSG00000162779.16 ENST00000509175.1 310 765 765
Wrong CDS annotation: ENSG00000135837.11 ENST00000357434.2 0 392 392
Wrong CDS annotation: ENSG00000116747.8 ENST00000506303.1 488 613 613
Wrong CDS annotation: ENSG00000081237.14 ENST00000367379.1 71 517 517
Wrong CDS annotation: ENSG00000117153.11 ENST00000367258.1 87 1033 1033
Wrong CDS annotation: ENSG00000143842.10 ENST00000525442.1 365 538 538
Wrong CDS annotation: ENSG00000117625.9 ENST00000533469.1 91 540 540
Wrong CDS annotation: ENSG00000162931.7 ENST00000479800.1 123 925 925
Wrong CDS annotation: ENSG00000086619.9 ENST00000366589.1 390 605 605
Wrong CDS annotation: ENSG00000116977.14 ENST00000481485.1 480 799 799
Wrong CDS annotation: ENSG00000172059.6 ENST00000401510.1 232 614 614
Wrong CDS annotation: ENSG00000138074.10 ENST00000401463.1 290 732 732
Wrong CDS annotation: ENSG00000189350.8 ENST00000401723.1 233 720 720
Wrong CDS annotation: ENSG00000055332.12 ENST00000390013.3 257 564 564
Wrong CDS annotation: ENSG00000115828.11 ENST00000404976.1 138 992 992
Wrong CDS annotation: ENSG00000162994.11 ENST00000403506.1 290 430 430
....

I don't know why it takes so long and why that message. I think which is due to annotation file, but then what file I have to use?

thank you so much!!

Error when running ribotish tisdiff

Hi,

I've been trying to run the whole pipeline included in Ribo-seq TIS Hunter(ribotish quality, ribotish predict and ribotish tisdiff ) with four different replicates of RiboSeq data. I've had no issues so far with the first two functions, but when running tisdiff

ribotish tisdiff -1 Sample_6/test6.txt -2 Sample_7/test7.txt -a Sample_6/sample6_sort.bam -b Sample_7/sample6_sort.bam -g //index/genome.gtf -o diff2.txt -v

I've retrieved the following errors
Traceback (most recent call last): File "/home/user/RiboSeq/project/bin/ribotish", line 14, in <module> import os, sys, argparse, time File "/home/user/RiboSeq/project/bin/ribotish", line 35, in main commands[cmd].run(args) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/run/tisdiff.py", line 229, in run for result in pred_iter: File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/run/tisdiff.py", line 375, in _get_tis ttis = ribo.multiRibo(t, bamlist[i], offlist[i], compatible = compatible, mis = compatiblemis, paired = paired) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 489, in multiRibo mribo.merge(Ribo(trans, bamfiles[i], offset = offset, offdict = offdict[i], compatible = compatible, mis = mis, paired = paired)) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 71, in __init__ off = offset(r, offdict) TypeError: 'list' object is not callable

Also, as I ran it in verbose mode, I retrieved the following printed- on- screen messages

No offset parameter file found for Sample_6/sample1_sort.bam. Using default offset (12). No offset parameter file found for Sample_7/sample1_sort.bam. Using default offset (12). Loading 2 TIS data... 7 TISs in Sample_6/test6.txt. 23 TISs in Sample_7/test7.txt. 24 TISs in total. Reading bams...

Could it be the error due to the fact of having a different number of predicted TIS in both Sample_6/test6.txt and Sample_7/test7.txt ? I've tried to filter both files so only the intersecting common predicted TIS are fed as an input to ribotish tisdiff, but still getting the same errors.

Would you have any idea of how to fix this issue?

The prediction files( Sample_6/test6.txt and Sample_7/test7.txt) were generated by running
ribotish predict -t /home/user/Sample_7/sample7_sort.bam -g //index/genome.gtf -f //index/genome.fa --tispara 12 --altcodons CTG,TTG --harr -p 78 -o test7.txt

Thank you very much!

Encountered error when using ribotish predict

Dear developer, I am using ribotish to predict ORF from Ribo-seq data, but I am getting some errors while running ribotish predict, here is my code:

INPUT_DIR="/scratch/lb4489/project/liver_ribo/mapping/ribotish/LC001_normal_RPFAligned.sortedByCoord.out.bam,/scratch/lb4489/project/liver_ribo/mapping/ribotish/LC001_tumor_RPFAligned.sortedByCoord.out.bam"

ribotish predict -b "$FILES" -g /scratch/lb4489/bioindex/gencode_human_plus_selNONCODE.gtf -f /scratch/lb4489/bioindex/GRCh38.p14.genome.fa --framebest -o ribotish_demo.txt

Here is the error message

No offset parameter file found for . Using default offset (12). 
No input TIS data!
Fri Mar  1 12:35:28 2024 Predicting...
Traceback (most recent call last):
  File "/home/lb4489/.conda/envs/ribotish/bin/ribotish", line 56, in <module>
    main()
  File "/home/lb4489/.conda/envs/ribotish/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/run/predict.py", line 375, in _pred_gene
    ribombl = ribo.multiRiboGene(g, ribobampaths, offdict = riboffdict, compatible = compatible, mis = compatiblemis, paired = paired)
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 504, in multiRiboGene
    mbl.merge(bam.BamLoadChr(bampaths[i], chr = gene.chr, region = regions[gene.chr], strand = gene.strand, posFunc = offunc, maxNH = maxNH, minMapQ = minMapQ, secondary = secondary, paired = paired))
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 583, in __init__
    bamfile = Bamfile(bampath)
  File "pysam/calignmentfile.pyx", line 318, in pysam.calignmentfile.AlignmentFile.__cinit__ (pysam/calignmentfile.c:4730)
  File "pysam/calignmentfile.pyx", line 388, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:5652)
  File "pysam/calignmentfile.pyx", line 534, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7261)
IOError: file `` not found

Firstly, I generated the offset parameter file in the bam file location and also tried copying the offset parameter file to the current folder. But ribotish doesn't even seem to recognise it automatically, what should I do.
Secondly I don't quite understand how to fix the error reported, I have installed pysam version 0.8.4 using pip, can you help me see what the problem is.

Thanks,
LeeLee

ribotish quality "no reads found" for TIS reads

I'm having an issue with running ribotish quality with the -t flag set.
Running the following command works:
ribotish quality -b TI.sorted.bam -g gencode.vM25.primary_assembly.annotation.gtf --geneformat gtf

But when I add the flag for TIS enrichment I get the following error.

ribotish quality -b TI.sorted.bam -g gencode.vM25.primary_assembly.annotation.gtf --geneformat gtf -t
Error: no reads found! Check read length or protein coding annotation.

Anything I should be doing differently?
I could also run without -t, is this expected to give very different results?

Thanks for developing a great tool!

ribotish predict module run error

Hellow
when I run ribotish predict using stringtie assembly gtf which include new transcript info, it return this error
image
after removing new transcript it run successfully. How can I solve it?

This is my gtf
image

Empty sequence error

Hi, I am trying to predict ORF from ribo-seq data.

My command

ribotish predict -b sample.bam -g annotation.gtf -f genome.fasta -o pred.txt --longest --alt --verbose

Unfortunately, I am almost immediately getting an error

Tue Jan 23 15:12:11 2024 Loading genome...
Making fasta index for genome.fasta...
Traceback (most recent call last):
  File "/home/qbase/miniconda3/envs/ribotish-env/bin/ribotish", line 56, in <module>
    main()
  File "/home/qbase/miniconda3/envs/ribotish-env/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/run/predict.py", line 132, in run
    genome = fa.Fa(args.genomefapath, verbose = args.verbose)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 88, in __init__
    self.make_idx(idxpath)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 104, in make_idx
    self.idxarr[-1].updatels(ls)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 63, in updatels
    if ls <= 0 : raise Exception("Empty sequence!")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I tried to create new separate environment, but it did not help.
Also, QC was performed fine, I got desired result.

Here is the list of packages in the conda environment:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.12.12           h06a4308_0  
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
expat                     2.5.0                h6a678d5_0  
fonttools                 4.47.2                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
matplotlib                3.8.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
numpy                     1.26.3                   pypi_0    pypi
openssl                   3.0.12               h7f8727e_0  
packaging                 23.2                     pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1          py312h06a4308_0  
pyparsing                 3.1.1                    pypi_0    pypi
pysam                     0.22.0                   pypi_0    pypi
python                    3.12.1               h996f2a0_0  
python-dateutil           2.8.2                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
ribotish                  0.2.7                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
setuptools                68.2.2          py312h06a4308_0  
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
tk                        8.6.12               h1ccaba5_0  
tzdata                    2023d                h04d1e81_0  
wheel                     0.41.2          py312h06a4308_0  
xz                        5.4.5                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0  

Hope for your help!
Thank you!

--altcodons ACG,CUG

Hi, I want to change the start codon to CUG.However I use the command , I get the predict results which start codons only contain AUG and ACG, I did not get the CUG start codon result ,how I set the parameter to the the CUG start codon result ?

ribotish predict -b test_1.sort.bam,test_2.sort.bam -p 8 --altcodons ACG,CUG --framebest --inframecount -g test.gtf -f test.fna -o predict_output.txt

Use of 5' mismatch reads in predict

This is just a question about the implementation of the predict function.
Say I have para.py as
offdict = {27: 11, 'm0': {27: 12}}

But after examination I decide to remove the 5' mismatch reads from the analysis. If I edit para.py to:
offdict = {27: 11}

Is that going to ignore or include the 5' mismatch reads? I don't want to include them, should I do something else?
e.g.
'm0':{}

Thank you!

The meaning and differences in TisType

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

  • Can I understand the class Extended in a way that if an assembled transcript from RiboSeq data is aligned to the annotated CDS region and the transcript is continuous without frameshift and extends outside of the annotated CDS, it is annotated as extended.
  • While the 5'UTR and 3'UTR means that the TIS of a transcript is aligned to these untranslated regions and not assembled into the transcript of the CDS part (or not in the same frame)?

Second, I also got some Internal and Internal:CDSFrameOverlap

  • I see CDSOverlap means the ORF overlaps with annotated CDS in another transcript in the same reading frame.
  • Does Internal mean that a predicted ORF
    • locates within an annotated CDS (both ends locate within the annotated one)
    • is in different frame
  • Does internal:CDSFrameOverlap means a predicted ORF locates within an annotated but in the same frame?

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR?

Thank you very much in advance!!
Ruixuan

TypeError: '>=' not supported between instances of 'NoneType' and 'float'

Hi,
I ran the following command and it got through the background estimation and part of the prediction step before ending on the TypeError as seen below.

Also, my GTF is produced by cuffmerge, which only determines exons, I then inserted the CDS's from the Gencode GTF. It throws the warnings as seen below, but ran without issue when I did not include the TIS information for the predict function, so I don't think that's the problem?

ribotish predict -b ./merge.bam -g ./merged_nanopore_Gen25_primary_with_CDS.gtf \
 -t ./LTM.bam --longest --tispara ./LTM.out.bam.para.py \
-e ./LTM_TIbackround.txt -f ./GRCm38.primary_assembly.genome.fa --geneformat gtf \
--ribopara ./riboquality_merge.bam.para.py --alt --minaalen 15 --seq -v

No offset parameter file found for LTM.bam. Using default offset (12).
Sat Sep 25 08:00:11 2021 Loading genome...
Sat Sep 25 08:00:11 2021 Estimating TIS background parameters...
TIS background estimation result will be saved to LTM_TIbackround.txt
...
Group data...
[7.185980623969726, 8.281739445973116, 9.203102783507383, 10.172378067164596, 11.06840648061946, 12.107621280053845, 13.12517452951367, 15.740720641276173, 15.740720641276173, None]
Estimate NB parameters...
Sun Sep 26 04:30:41 2021 Predicting...
...
chr12
Wrong CDS annotation: XLOC_006674 TCONS_00026339 214 1028 1666
Wrong CDS annotation: XLOC_006952 TCONS_00027793 305 2682 3539
Wrong CDS annotation: XLOC_007091 TCONS_00028533 553 1535 2076
Traceback (most recent call last):
  File "/public/home/emalekos/miniconda3/bin/ribotish", line 56, in <module>
    main()
  File "/public/home/emalekos/miniconda3/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/run/predict.py", line 398, in _pred_gene
    if score is not None: ip = ribo.pidx(score, slp)
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/zbio/ribo.py", line 403, in pidx
    if s >= value : break
TypeError: '>=' not supported between instances of 'NoneType' and 'float'

No reads found!

Hi there,

I am trying to run ribotish on regular riboseq data from our lab.
Unfortunately, this already fails at the quality step. I mapped the reads to their reference (GRCm39) with a special annotation file from RNAcentral containing annotation for ncRNAs. I used the same gtf below.

I ran:

ribotish quality -b file.bam -g mus_musculus.RNAcentral.gtf --th 0.5

The error I get is:

Counted reads: 0
Error: no reads found! Check read length or protein coding annotation.

The first couple gtf entries look the following way:

1	RNAcentral	transcript	3056358	3056384	.	-	.	transcript_id "URS0000253E70_10090.0"; gene_id "URS0000253E70_10090.0";
1	RNAcentral	exon	3056358	3056384	.	-	.	transcript_id "URS0000253E70_10090.0";
1	RNAcentral	transcript	3056360	3056384	.	-	.	transcript_id "URS000042B783_10090.1"; gene_id "URS000042B783_10090.1";

Do you have any idea where this error may be coming from?

Thanks for your help!

"wrong exon strand" and "stop codon error"

Hi

I just started working on riboseq data, and came across ribo-tish which seems really useful for QC-ing. I have aligned my reads using STAR as per the guidelines and using the gtf annotation file.

Now I started running the command on one of my bam files, and I have a couple of questions.

  1. how long time should it take? The bam file is 300mb, and ribotish has been running for almost an hour, so I wonder if something is wrong.

  2. I get multiple "Wrong exon strand" messages, as well as "stop codon error". I have tried to look for a description of these errors, but I cannot find any. What does it mean, and is it a problem?

Cheers
Sjannie

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

Hi Zhang lab,

I have run quality.py successfully on all of my bam files, but when I try to run predict.py using the same reference, I get this error:

Command:

predict.py -t Ngn2_H2_lncRNAKB_unique.bam -b Ngn2_D2_lncRNAKB_unique.bam -g lncRNAKB_hg38_v7.gtf -f GRCh38.primary_assembly.genome.fa -o pred_rep1.txt

Output:

Mon Feb 28 13:02:03 2022 Estimating TIS background parameters...
Mon Feb 28 13:23:06 2022 Predicting...
Traceback (most recent call last):
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 564, in <module>
    run(p.parse_args())
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 394, in _pred_gene
    ttis = ribo.Ribo(t, bamload = tismbl, compatible = compatible, mis = compatiblemis)
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 62, in __init__
    self.cnts = bamload.transCounts(trans, compatible = compatible, mis = mis)
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 629, in transCounts
    i += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

I am running this command on indexed bam files located in the same folder as the output from the quality.py script.

ribotish tisdiff error?ZeroDivisionError: integer division or modulo by zero

hi,when I use the ribotish tisdiff ,some error information as follows:
my command :
ribotish tisdiff -1 pred1.txt -2 pred2.txt -a v1.sort.bam -b vi2.sort.bam -g test.gtf -o diff -v
error informations:

Loading 2 TIS data...
1 TISs in pred1.txt.
1 TISs in pred2.txt.
1 TISs in total.
Reading bams...
JN
Estimate scale factor...
Traceback (most recent call last):
File "/usr/bin/ribotish", line 4, in
import('pkg_resources').run_script('ribotish==0.2.6', 'ribotish')
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 1441, in run_script
exec(script_code, namespace, namespace)
File "/usr/lib/python2.7/site-packages/ribotish-0.2.6-py2.7.egg/EGG-INFO/scripts/ribotish", line 56, in

File "/usr/lib/python2.7/site-packages/ribotish-0.2.6-py2.7.egg/EGG-INFO/scripts/ribotish", line 34, in main

File "build/bdist.linux-x86_64/egg/ribotish/run/tisdiff.py", line 254, in run
File "build/bdist.linux-x86_64/egg/ribotish/zbio/exp.py", line 207, in TMM
ZeroDivisionError: integer division or modulo by zero

issues about quality

Hi,
When I run the Ribotish for quality, it present a mass of error, just as below:
"Traceback (most recent call last):
File "/usr/local/bin/ribotish", line 56, in
main()
File "/usr/local/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/usr/local/lib/python2.7/dist-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.genefor mat)
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/ribo.py", line 943, in _lendis_trans
if m0 : ism0 = r.is_m0()
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/bam.py", line 415, in is_m0
if self.read.get_tag('MD')[0] == '0' : return True # mismatch at 0
File "pysam/libcalignedsegment.pyx", line 2392, in pysam.libcalignedsegment.AlignedSegment.get_t ag
File "pysam/libcalignedsegment.pyx", line 2434, in pysam.libcalignedsegment.AlignedSegment.get_t ag
KeyError: "tag 'MD' not present"

Thank you!

what the types of TisType mean?

What the types of TisType mean?When I use the ribotish predict ,I get the many types of TisType,what dose every TisType
means ?
TisType

Annotated
Truncated
Novel
Internal:CDSFrameOverlap
Internal
3'UTR
5'UTR
5'UTR:CDSFrameOverlap
Extended
3'UTR:Known
3'UTR:CDSFrameOverlap
Novel:CDSFrameOverlap
Truncated:Known

TypeError: '<' not supported between instances of 'NoneType' and 'int' when running predict module

Dear developer,

I came across this error when running ribotish with the following command:

ribotish predict -p4 --alt --framebest -b sample.bam -g ribotish_test2.gtf -f genome.fa -o ribotish_test2 -v -v

Error message:

Thu Aug  4 19:14:55 2022 Loading genome...
No input TIS data!
Thu Aug  4 19:14:55 2022 Predicting...
10
ENSMMUG00000061626	ENSMMUT00000107115	0	68
Thu Aug  4 19:14:55 2022 ENSMMUG00000061626	ENSMMUT00000107115	SHISAL1	protein_coding	10:7187171-7234333:+	TTG	804	1335	Internal	0	0	None	5.82694120373455e-12	T	5.826941203734552e-12
Traceback (most recent call last):
  File "/home/admin/mambaforge/bin/ribotish", line 56, in <module>
    main()
  File "/home/admin/mambaforge/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/admin/mambaforge/lib/python3.9/site-packages/ribotish/run/predict.py", line 250, in run
    cr = interval.cds_region_trans(t)
  File "/home/admin/mambaforge/lib/python3.9/site-packages/ribotish/zbio/interval.py", line 299, in cds_region_trans
    thick.sort()
TypeError: '<' not supported between instances of 'NoneType' and 'int'

This error seems to be caused by incorrect parsing of stop_codon position in src/zbio/gtf.py when the stop codon overlaps with a splice junction and is reprensented by two stop_codon lines in the gtf file:

cs = self.cdna_pos(sc.end5) + 3

And the stop codon position is stored as an interval rather than a list of two intervals:
elif e.type == 'stop_codon': self.stop_codon = e

Here is the full gtf file used above:

10	ensembl	transcript	7172377	7234333	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";
10	ensembl	exon	7172377	7173079	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000475544"; exon_version "1";
10	ensembl	CDS	7172377	7173079	.	+	0	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7183086	7183184	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "2"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000523952"; exon_version "1";
10	ensembl	CDS	7183086	7183184	.	+	2	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "2"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7187170	7187383	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "3"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000036524"; exon_version "2";
10	ensembl	CDS	7187170	7187383	.	+	2	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "3"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7198342	7198659	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000520747"; exon_version "1";
10	ensembl	CDS	7198342	7198657	.	+	1	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7234333	7234333	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "5"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000508835"; exon_version "1";
10	ensembl	stop_codon	7198658	7198659	.	+	0	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";
10	ensembl	stop_codon	7234333	7234333	.	+	1	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "5"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";

Here is a demo of this error:

from ribotish.zbio import io

for g in io.geneIter('ribotish_test2.gtf', 'gtf'):
     print(g)

for t in g.trans:
     print(t)

t.cds_stop(cdna=True)  # = 1337
t.cds_start(cdna=True)  # = 2
t.cdna_length()  # = 1335 (< 1337)

I am not sure how to deal with this issue. Can I get expected results by just deleting all the lines with "stop_codon" in third column of the gtf file?

ValueError: start out of range (-21) running ribotish quality

I am new to ribotish - and want to run the quality command. This may be a trivial error to fix - advice appreciated.

$ ribotish quality -b /media/mrbmk1000/data/ms-pool01/C_neoformans/ribo-seq/star_sandbox_01/rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out.bam -g /media/mrbmk1000/data/ms-pool01/annotationdb/cneoformans/FungiDB-38_CneoformansH99.gtf
Traceback (most recent call last):
File "/home/mrbmk1000/miniconda3/bin/ribotish", line 56, in
main()
File "/home/mrbmk1000/miniconda3/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat)
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans
for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=paired, flank=flank):
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 496, in transReadsIter
for read in rds: #yield read
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 45, in fetch_reads
rds = self.fetch(reference=chr, start=start, end=stop) #, multiple_iterators=multiple_iterators)
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 691, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-21)

Thank you for guidance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.