Giter VIP home page Giter VIP logo

ribotish's Introduction

README for Ribo-TISH (0.2.6)

<2020-11-15 Peng Zhang>

Introduction

Translation is a critical step in gene regulation that synthesizes proteins from a given RNA template. The development of the ribosome profiling (riboseq) technique has enabled the measurement of translation at a genome-wide level. The basic idea of ribosome profiling is to perform deep-sequencing of the ribosome-protected mRNA fragment (~30 nts), termed ribosome footprints, to determine the occupancy of translating ribosomes on a given mRNA. There are several variants of the ribosome profiling technique that are based on the use of different translation inhibitors. The regular ribo-seq utilizes Cycloheximide (CHX), a translation elongation inhibitor to freeze all translating ribosomes. In contrast to CHX, the translation inhibitor lactimidomycin (LTM) and harringtonine (Harr) have a much stronger effect on initiating ribosomes. The use of these two inhibitors allows for the global mapping of translating initiating sites (TISs) when they are coupled with with ribosome profiling (TI-Seq). In addition, when LTM is used sequentially with puromycin (PMY), the TISs can be mapped quantitatively and can be compared between different conditions. we present a novel algorithm, named Ribo TIS Hunter (Ribo-TISH), for identifying translation activities using ribosome profiling data. Ribo-TISH uses statistical tests to assess the significance of translation activities. It captures significant TISs using negative binomial test, and frame biased open reading frames (ORFs) using rank sum test. Ribo-TISH can also perform differential analysis between two TI-Seq data.

Install

Please check the file 'INSTALL.rst' in the distribution.

Usage of Ribo-TISH

ribotish [-h] [--version] {quality,predict,tisdiff}
Example for quality control

ribotish quality -b ltm.bam -g gene.gtf -t

Example for prediction

ribotish predict -t ltm.bam -b chx.bam -g gene.gtf -f genome.fa -o pred.txt

Example for differential TIS

ribotish tisdiff -1 pred1.txt -2 pred2.txt -a qti1.bam -b qti2.bam -g gene.gtf -o diff.txt

There are 3 functions available as sub-commands.

quality

Quality control for riboseq bam data.

predict

Main function to predict ORF/TIS.

tisdiff

Call diffential TIS between two TIS data

The main input data is in bam file format. For best performance, reads should be trimmed (to ~ 29 nt RPF length) and aligned to genome using end-to-end mode (no soft-clip). Intron splicing is supported. Some attributes are needed such as NM, NH and MD. For STAR, `--outSAMattributes All` should be set. bam file should be sorted and indexed by samtools.

All positions or regions reported by Ribo-TISH are 0 based, half open, same as in bed format.

quality

Quality control of riboseq bam data. This function checks reads distribution around annotated protein coding regions on user provided transcripts, show frame bias and estimate P-site offset for different group of reads. Reads are grouped by read length as well as 5' end match or mismatch. 5' end mismatch ('m0') reads often have different distribution from matched reads. To turn off 5' end mismatch grouping, use `--nom0`.

There are 3 output files: a txt file recording all distribution data, a pdf figure file and a python file for P-site offset parameters.

Quick examples:

For regular riboseq :

ribotish quality -b chx.bam -g gene.gtf

For TI-Seq data :

ribotish quality -b ltm.bam -g gene.gtf -t

Options

-b RIBOBAMPATH

Riboseq bam data file. Reads should be trimmed and aligned to genome.

-g GENEPATH

Gene annotation file. Acceptable formats include gtf, gff, bed and genepred with gene names. Input file format can be auto detected or specified by `--geneformat` option

-o OUTPUT

Output all distribution data. Default: bampath[:-4]+'_qual.txt'. Quality and offset estimation is based on this distribution. User can save this file for further quick estimation trying different thresholds by `-i` option.

-t/--tis

This data is TIS enriched, for LTM and Harr. Quality will pay more attention to TIS sites.

-i INPUT

Input previous output file, do not read gene file and bam file again.

--geneformat GENEFORMAT

Gene annotation file format (gtf, bed, gpd, gff, default: auto)

--chrmap CHRMAP

Input chromosome id mapping table file if annotation chr ids are not the same as chr ids in bam/fasta files. Format:

chr_name1 chr_name2

Two column

s, tab seperated, no specific order requirement. Mappings such as 'chr1' to '1' can be automatically processed without using this option.

-f FIGPDFP ATH

```

Output pdf

figure file. Default: bampath[:-4]+'_qual.pdf'

-r PARAPAT H

`

Output off

-l LENS

set parameter file. Default: bampath+'.para.py'. This file saves P-site offsets for different reads lengths in python code dict format, and can be used in further analysis.

Range of t

-d DIS

ag length Default: 25,35. The last number (35) is not included, i.e. the longest length considered is 34.

Position r

ange near start codon or stop codon. Default: -40,20

--bins BIN S

`

Number of

--nom0

bins for cds profile. Default: 20


`

Not consid

--th TH

er reads with mismatch at position 0 (5' end mismatch) as a new group.

Threshold

--end3

for quality. Default: 0.5. Group that frame bias ratio < TH will be considered as low quality and this group of reads will not be used in further analysis. The offset for low quality groups will not be set in parameter file.

Plot RPF 3

' end profile instead of 5' end.

--colorbli nd

``

Use a colo

r style readable for color blind people ('#F00078,#00F000,#0078F0')

--colors C OLORS


User speci

-p NUMPROC

fied Matplotlib acceptable color codes for three frames (default: 'r,g,b')

Number of

processes. Default: 1

-v/--verbo se

```

Increase o

utput verbosity.

Output fil es

OUTPUT

--

OUTPUT is

Pdf figure

a txt file recording all distribution data in python format for each group of reads. These distributions are shown in pdf figure file. Quality and offset estimation is based on this distribution. User can save this file for further quick estimation trying different thresholds by `-i` option.

Pdf figure

file is plot of all the distributions and illustration of quality and P-site offset. The left part is for 5' end matched reads and the right part is for 5' end mismatch reads if `--nom0` is not set.

Upper pane

l: the length distribution of RPFs uniquely mapped to annotated protein-coding regions.

Lower pane l: different quality metrics for RPFs uniquely mapped to annotated protein-coding regions.

Each row s

hows the RPFs with different lengths.

  • Column
1: distribution of RPF 5’ end in 3 frames in all annotated codons. The percentage of the reads from the dominant reading frame is shown.
  • Column
2: the distribution of RPF 5’end count near annotated TIS. The estimate of the P site offset and TIS accuracy are also shown. The RPFs of a specific length that do not pass threshold are considered as low quality and removed.
  • Column
3: the distribution of RPF 5’end count near annotated stop codon.
  • Column

4: The RPF profile throughout the protein-coding regions in 3 frames. TIS enrich score (TIS count / CDS average) is also shown for TIS data.

Offset par ameter file


This file :

saves P-site offsets for different reads lengths in python code dict format, and can be used in further analysis. The default offset file name is bampath+'.para.py' accompanied with the input bam file. The file format is like

offdict

= {28: 12, 29: 12, 30: 12, 32: 13, 'm0': {29: 12, 30: 12, 31: 13}}

The offset

parameter file is easy to interpret and can be edited by user if auto estimated offsets are not correct. The default file name will be auto-recognized in further analysis. If the bam file is in a different directory and user do not want to create a parameter file in that directory, we recommend creating a link for the bam file in current working directory, e.g. `ln -s original/dir/ribo.bam`

Ribo-TISH

predict

does not guarantee that it can always find best P-site offset values. Users should check the quality figures and edit the parameter file if necessary.

This is th

e main function of Ribo-TISH. This function predicts ORF/TIS with riboseq bam files. This function uses negative binomial model to fit TI-Seq background and test significance of TIS sites. For regular riboseq data, Wilcoxon rank sum test between in-frame reads and out-frame reads inside the ORF is performed.

Quick exam

ples:

Combine TI :

-Seq and regular riboseq data

ribotish

predict -t ltm.bam -b chx.bam -g gene.gtf -f genome.fa -o pred.txt

For TI-Seq :

data only

ribotish

predict -t ltm.bam -g gene.gtf -f genome.fa -o pred.txt

De novo OR :

F prediction with only regular riboseq data using longest strategy

ribotish

predict -b chx.bam -g gene.gtf -f genome.fa --longest -o pred.txt

De novo OR :

F prediction with two regular riboseq data using framebest strategy

ribotish

predict -b chx1.bam,chx2.bam -g gene.gtf -f genome.fa --framebest -o pred.txt

Only test :

user provided ORF candidates with two regular riboseq data

ribotish

Options

predict -b chx1.bam,chx2.bam -g gene.gtf -f genome.fa -i cand.txt -o pred.txt



-t TISBAMP ATHS


Input TI-s

eq bam data files, comma seperated.

-b RIBOBAM PATHS


Regular ri

boseq bam data files, comma seperated.

At least o

ne bam file should be provided by either `-t or -b`.

-g GENEPAT H

`

Gene annot ation file for ORF prediction. Acceptable formats include gtf, gff, bed and genepred with gene names. Input file format can be auto detected or specified by `--geneformat` option.
If user ne ed to predict on only non-coding genes and use a different gene annotation file for known ORF annotation and background estimation, use `-a` option to provide another gene annotation for known ORF annotation.

If user pr

ovided candidates `-i` option is set, the transcript annotation for the candidates should be found in gene annotation file.

-f GENOMEF APATH


Genome fas

-o OUTPUT

ta file. The fasta file should has a .fai index file accompanied with genome fasta file (indexed) or indexable (fasta sequences have fixed length in each line). This program will index the genome file before prediction if .fai index file can not be found.

Output all

-i INPUT

possible ORF results that fit the thresholds.

Only test

input candidate ORFs, format:

======= = ==== =====
transID s tart stop

Start, stop position is 0 based, half open. Stop - start should be multiples of 3. Transcript should be found in gene annotation file.

--geneformat GENEFORMAT

Gene annotation file format (gtf, bed, gpd, gff, default: auto)

--chrmap CHRMAP

Input chromosome id mapping table file if annotation chr ids are not same as chr ids in bam/fasta files. See --chrmap option in `quality` section.

--tispara TISPARA

Input P-site offset parameter files for `-t bam files. The default parameter files are bampath+'.para.py' for each bam file, which is generated in ribotish quality` function. There's no need to specify this option if default parameter files exist. To use this option to provide other parameter files, each bam file should be provided with a file, and file names are separated with comma. If no parameter file is found, default offset 12 will apply for all reads in the bam data.

--ribopara RIBOPARA

Input P-site offset parameter files for `-b bam files. Same as --tispara` option.

--nparts NPARTS

Group transcript according to TIS reads density quantile. Default: 10.

TIS background estimation uses ORF in-frame read counts (excluding TIS codons) to estimate negative binomial parameters. Since different transcripts have different expression levels, the background is different for highly expressed and lowly expressed transcripts. Ribo-TISH groups expressed transcripts into N parts based on TIS reads density of the transcript. Each transcript group have same total number of TIS reads.

-e ESTPATH

Output TIS background estimation result. If only one bam file is provided by `-t` option, the default file name is tisbampath+'.bgest.txt'. If multiple TIS data provided, the default file name is tisBackground.txt The result file contains negative binomial parameters, group levels and thresholds for each group.

-s INESTPATH

Input background estimation result file instead of instant estimation. By default, if only one bam file is provided by `-t option, the program will first look for file name tisbampath+'.bgest.txt'. If this file exists, background parameters in this file will be used. Otherwise, TIS background estimation will run and generate a result file according to -e` option.

-a AGENEPATH

Another gene annotation file for ORF annotation in addition to `-g gene file. This option is mainly used when -g` annotation focuses on predicting ORFs in non-coding transcripts and does not have sufficient protein coding gene annotation. Protein coding gene annotation is used for TIS background estimation as well as output TIS type classification.

--alt

Use alternative start codons. If set, all codons with 1 base different from ATG will be considered as start codon in ORF finding. Affect both TIS background estimation and prediction. It does not affect `-i mode prediction. To customize alt start codons, use --altcodons`.

--altcodons ALTCODONS

Use provided alternative start codons, comma seperated, e.g. `--altcodons CTG,GTG,ACG. Turn on --alt` option. Do not need to provide 'ATG'. It does not support 'N' bases.

--tis2ribo

Add TIS bam counts to regular riboseq counts. Use TIS data also for ORF frame test. This option will be turned on automatically if `-b` is not provided.

--harr

The data is treated with harringtonine (instead of LTM). For Harr data, the reads at TIS sites are not as focus as LTM reads. Reads in flanking region (default 15 codons) of TIS will not be used for TIS background estimation. To customize flanking size, use `--harrwidth`.

--harrwidth HARRWIDTH

Flanking region for harr data, in codons. Default: 15. Turn on `--harr` option.

--longest

Only report longest possible ORF results for multiple candidate start codons in the same ORF (same stop codon). This is a TIS selection strategy when there's no `-t` TI-Seq data input.

--framebest

Only report best frame test results for multiple candidate start codons in the same ORF (same stop codon), which is TIS with the smallest frame test p-value (marked as 'T' in RiboPStatus column). This is a TIS selection strategy when there's no `-t` TI-Seq data input.

--enrichtest

Use enrich test instead of frame test. Enrich test is rank sum test between in-frame reads inside ORF and same frame reads outside ORF.

--nocompatible

Not require reads compatible with transcript splice junctions.

--minaalen MINAALEN

Minimum amino acid length of candidate ORF, Default: 6.

--genefilter GENEFILTER

Only process given genes. Comma separated.

--tpth TPTH

TIS p value threshold. Default: 0.05.

--fpth FPTH

Frame p value threshold. Default: 0.05.

--minpth MINPTH

At least one of TIS or frame p value should be lower than this threshold. Default: 1.

--fspth FSPTH

Fisher's p value threshold. Default: 0.05.

--fsqth FSQTH

Fisher's FDR q value threshold. Default: 0.05.

--allresult ALLRESULT

Write all result output without FDR q-value threshold to another file. (default: output + '_all.txt', 'off' or using `--fsqth 1` to turn off)

-p NUMPROC

Number of processes. Default: 1

-v/--verbose

Increase output verbosity.

--transprofile TRANSPROFILE

Output RPF P-site profile for each transcript. The profile data is in python dict format, recording non-zero read counts at different positions on transcript.

--inprofile INPROFILE

Input RPF P-site profile for each transcript, instead of reading bam reads. The profile file is the output file from `--transprofile` option. Save some time for re-running.

--seq

Report ORF sequences.

--aaseq

Report amino acid sequences.

--blocks

Report all exon block positions for predicted ORFs. Format: start1-stop1,start2-stop2,...startN-stopN. In chromosome direction.

--inframecount

Report the sum of all counts at the in-frame positions in the ORF.

Output files

OUTPUT

The output is a txt file all possible ORF results that fit the thresholds. Some of the columns are:

GenomePos

Genome position and strand of TIS site, 0 based, half open

Start

TIS of the ORF on transcript

Stop

3' end of stop codon on transcript

TisType

Relative position of this TIS to annotated ORF of the transcript. 'Novel' if no ORF annotation. ':Known' means the TIS is annotated in another transcript. ':CDSFrameOverlap' means the ORF overlaps with annotated CDS in another transcript in the same reading frame.

TISGroup

Group of the transcript for TIS background estimation

TISCount

Number of reads with P-site at TIS site

TISPvalue

One tailed negative binomial test p-value for TISCount (TIS test)

RiboPvalue

One tailed rank sum test p-value for regular riboseq frame bias inside ORF (frame test)

RiboPStatus

For all ORFs sharing same stop codon, 'T' means top (best) p-value, 'L' means local best p-value, 'N' means other. All 'N' in `-i or --longest` mode.

FisherPvalue

Combination of TIS and Ribo p-values using Fisher's method

TISQvalue

BH correction q-value of TIS test

RiboQvalue

BH correction q-value of frame test

FisherQvalue

BH correction q-value of Fisher's p-value

AALen

Amino acid length of the ORF

ALL

The '_all' output result is generated according to `--allresult` option, which is similar to the output but do not use FDR (q-value) cutoff. Other cutoffs are the same as output file.

tisdiff

This is the function for differential TIS identification. This function uses two different TIS test results generated by `ribotish predict` using different quantitative TI-Seq (QTI-Seq) data. The ordinary global TI-Seq (GTI-Seq) may have some biases so is not suitable for differential analysis.

First a normalization factor is estimated by Trimmed Mean of M values (TMM) method on the union of significant TIS counts in the two results. Then binomial test p-value and fold change are calculated. If RNASeq counts are provided as reference, the TI efficiency is calculated using Fisher's exact test with normalized count values.

Quick examples:

Differential TIS activity calling :

ribotish tisdiff -1 pred1.txt -2 pred2.txt -a qti1.bam -b qti2.bam -g gene.gtf -o diff.txt

Differential TIS efficiency calling with RNASeq count input :

ribotish tisdiff -1 pred1.txt -2 pred2.txt -a qti1.bam -b qti2.bam -g gene.gtf --rnaseq RNA.txt -o diff.txt

Options

-1 TIS1PATH, -2 TIS2PATH

Predict result of group 1 & 2 TIS data. Comma seperated if there are more than 1 replicates.

-a TIS1BAMPATHS, -b TIS1BAMPATHS

Group 1 & 2 TIS riboseq bam files, comma seperated.

--l1 TIS1LABELS, --l2 TIS2LABELS

Labels for each replicate.

-g GENEPATH

Gene annotation file. Acceptable formats include gtf, gff, bed and genepred with gene names. Input file format can be auto detected or specified by `--geneformat` option.

-o OUTPUT

Output result file.

--geneformat GENEFORMAT

Gene annotation file format (gtf, bed, gpd, gff, default: auto)

--tis1para TIS1PARA, --tis2para TIS2PARA

Input P-site offset parameter files for group 1 & 2 bam files. The default parameter files are bampath+'.para.py' for each bam file, which is generated in `ribotish quality` function. To use this option, each bam file should be provided with a file, and file names are separated with comma. If no parameter file is found, default offset 12 will apply for all reads in the bam data.

--nocompatible

Not require reads compatible with transcript splice junctions.

--normcomm

Use common TISs instead of union TISs for normalization.

--normanno

Use only annotated TISs for normalization.

--rnaseq RNASEQ

RNASeq count input. Format:

ID count1 count2 ...

Both gene ID and transcript ID are acceptable.

--scalefactor SCALEFACTOR

Input TIS scale factor of group 2/1 instead of auto calculate. Not log value.

--rnascale RNASCALE

Input RNASeq scale factor of group 2/1 instead of auto calculate. Not log value.

--export EXPORT

Export count table for differential analysis with other tools. Especially for replicated data.

--plotout PLOTOUT

Scatter plot output pdf file.

--figsize FIGSIZE

Scatter plot figure size. Default: 8,8.

-f FOLDCHANGE

Minimum fold change threshold. Default: 1.5.

--ipth IPTH

Input TIS p value threshold. Default: 0.05.

--iqth IQTH

Input TIS q value threshold. Default: 0.05.

--opth OPTH

Output TIS diff p value threshold. Default: 0.05.

--oqth OQTH

Output TIS diff q value threshold. Default: 0.05.

-p NUMPROC

Number of processes. Default: 1

-v/--verbose

Increase output verbosity.

Output files

OUTPUT

The output is a txt file all differential TIS results that fit the thresholds. Some of the columns are:

FoldChange

Fold change (2/1) value after normalization

DiffPvalue

Differential test p-value, two-tailed.

DiffQvalue

BH correction q-value of DiffPvalue

EXPORT

The export table is generated using `--export` option. It is also automatically generated when the input data has replicated samples. It is a txt file with raw TIS counts for each predicted TIS. The format of TIS id is 'TransID_Start_GenomePos'.

For replicated data, Ribo-TISH provided R scripts to call differential TISs using edgeR or DESeq2.

Example for edgeR: :

Rscript path_to_scripts/tisdiff_edgeR.r tisdiff_export.txt 3 4 tisdiff_edgeR_output.txt

For DESeq2: :

Rscript path_to_scripts/tisdiff_DESeq2.r tisdiff_export.txt 3 4 tisdiff_DESeq2_output.txt

3 and 4 are number of replicates in two conditions.

If `--rnaseq` is provided, the RNASeq counts of genes/transcripts for the TISs are also provided in the export table. However, the analysis for RNASeq referenced differential TIS efficiency analysis with replicate data is currently unavailable.

ribotish's People

Contributors

zhpn1024 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ribotish's Issues

Empty sequence error

Hi, I am trying to predict ORF from ribo-seq data.

My command

ribotish predict -b sample.bam -g annotation.gtf -f genome.fasta -o pred.txt --longest --alt --verbose

Unfortunately, I am almost immediately getting an error

Tue Jan 23 15:12:11 2024 Loading genome...
Making fasta index for genome.fasta...
Traceback (most recent call last):
  File "/home/qbase/miniconda3/envs/ribotish-env/bin/ribotish", line 56, in <module>
    main()
  File "/home/qbase/miniconda3/envs/ribotish-env/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/run/predict.py", line 132, in run
    genome = fa.Fa(args.genomefapath, verbose = args.verbose)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 88, in __init__
    self.make_idx(idxpath)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 104, in make_idx
    self.idxarr[-1].updatels(ls)
  File "/home/qbase/miniconda3/envs/ribotish-env/lib/python3.12/site-packages/ribotish/zbio/fa.py", line 63, in updatels
    if ls <= 0 : raise Exception("Empty sequence!")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I tried to create new separate environment, but it did not help.
Also, QC was performed fine, I got desired result.

Here is the list of packages in the conda environment:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.12.12           h06a4308_0  
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
expat                     2.5.0                h6a678d5_0  
fonttools                 4.47.2                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
matplotlib                3.8.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
numpy                     1.26.3                   pypi_0    pypi
openssl                   3.0.12               h7f8727e_0  
packaging                 23.2                     pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1          py312h06a4308_0  
pyparsing                 3.1.1                    pypi_0    pypi
pysam                     0.22.0                   pypi_0    pypi
python                    3.12.1               h996f2a0_0  
python-dateutil           2.8.2                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
ribotish                  0.2.7                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
setuptools                68.2.2          py312h06a4308_0  
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
tk                        8.6.12               h1ccaba5_0  
tzdata                    2023d                h04d1e81_0  
wheel                     0.41.2          py312h06a4308_0  
xz                        5.4.5                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0  

Hope for your help!
Thank you!

Visualization of single transcripts with a-site/p-site coverage

Hello!

First of all thank you for developing Ribo-TISH! I have found it really useful for the QC it provides and also for uORF prediction.

I have a question that I am not sure that it is relevant to Ribo-TISH itself.

In the published paper there are several snapshots of single transcripts (e.g. Fig.3a,3b and Fig.4a,4b) providing p-site (or a-site?) Ribo-seq or QTI-seq coverage. I would like to ask if Ribo-TISH has the ability to provide such plots.

If not, may I know how do you manage to end up with the .bam/.bed file containing only the counts per p-site or a-site so that can be easily visualized on a genome browser.

Thanks in advance!

Kind Regards,
Alex

issues about quality

Hi,
When I run the Ribotish for quality, it present a mass of error, just as below:
"Traceback (most recent call last):
File "/usr/local/bin/ribotish", line 56, in
main()
File "/usr/local/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/usr/local/lib/python2.7/dist-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.genefor mat)
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/ribo.py", line 943, in _lendis_trans
if m0 : ism0 = r.is_m0()
File "/usr/local/lib/python2.7/dist-packages/ribotish/zbio/bam.py", line 415, in is_m0
if self.read.get_tag('MD')[0] == '0' : return True # mismatch at 0
File "pysam/libcalignedsegment.pyx", line 2392, in pysam.libcalignedsegment.AlignedSegment.get_t ag
File "pysam/libcalignedsegment.pyx", line 2434, in pysam.libcalignedsegment.AlignedSegment.get_t ag
KeyError: "tag 'MD' not present"

Thank you!

ribotish quality "no reads found" for TIS reads

I'm having an issue with running ribotish quality with the -t flag set.
Running the following command works:
ribotish quality -b TI.sorted.bam -g gencode.vM25.primary_assembly.annotation.gtf --geneformat gtf

But when I add the flag for TIS enrichment I get the following error.

ribotish quality -b TI.sorted.bam -g gencode.vM25.primary_assembly.annotation.gtf --geneformat gtf -t
Error: no reads found! Check read length or protein coding annotation.

Anything I should be doing differently?
I could also run without -t, is this expected to give very different results?

Thanks for developing a great tool!

Feature request: Ribo-seq counts in the results

Thank you for the great tool!
It seems to be working perfectly for me but I would like to request a feature implementation.
Would it be possible to add a column of in-frame Ribo-seq (not TI-seq) count per ORF to the result of ribotish predict, like TISCount column for TI-seq?
I believe this would be useful from the following viewpoints (I'm detecting ORFs using Ribo-seq only, without TI-seq):

  1. The current output includes ORFs with very low read counts. I understand that this is helpful for increasing sensitivity but in some cases users may want to apply filtering by CPM etc to extract highly-translated ORFs.
  2. With --framebest option, one stop codon can have multiple TIS, when RiboPvalue has ties (especially when RiboPvalue = 0). One option for filtering is taking the longest ORF but it would be more helpful if users can take P-site read abundance into consideration to pick up the ORF with highest translation activity.

I would appreciate it if you could consider implementing this feature, only if it would not cause too much trouble for you.
Many thanks,
Sotta

ValueError: start out of range (-21) running ribotish quality

I am new to ribotish - and want to run the quality command. This may be a trivial error to fix - advice appreciated.

$ ribotish quality -b /media/mrbmk1000/data/ms-pool01/C_neoformans/ribo-seq/star_sandbox_01/rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out.bam -g /media/mrbmk1000/data/ms-pool01/annotationdb/cneoformans/FungiDB-38_CneoformansH99.gtf
Traceback (most recent call last):
File "/home/mrbmk1000/miniconda3/bin/ribotish", line 56, in
main()
File "/home/mrbmk1000/miniconda3/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat)
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans
for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=paired, flank=flank):
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 496, in transReadsIter
for read in rds: #yield read
File "/home/mrbmk1000/miniconda3/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 45, in fetch_reads
rds = self.fetch(reference=chr, start=start, end=stop) #, multiple_iterators=multiple_iterators)
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 691, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-21)

Thank you for guidance.

Error running ribotish predict with multicore parameters

Hi,
I'm trying to use the whole pipeline of ribotish. When I run the command
ribotish predict -t unassigned_HARR_mapped_sort.bam -b unassigned_mapped_sort.bam -g ../gencode/gencode.v28.annotation_appris_princ_1.gtf -f ../gencode/GRCh38.p12.genome.fa -o unassigned_pred.txt -p 8
i get this error :
Thu Jun 27 17:21:06 2019 Estimating TIS background parameters... Traceback (most recent call last): File "/home/adminmanu/softwares/miniconda3/bin/ribotish", line 56, in <module> main() File "/home/adminmanu/softwares/miniconda3/bin/ribotish", line 34, in main commands[cmd].run(args) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/site-packages/ribotish/run/predict.py", line 151, in run pool = MyPool(1) # This is for memory efficiency File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 177, in __init__ self._repopulate_pool() File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 238, in _repopulate_pool self._wrap_exception) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/pool.py", line 252, in _repopulate_pool_static wrap_exception) File "/home/adminmanu/softwares/miniconda3/lib/python3.7/multiprocessing/process.py", line 74, in __init__ assert group is None, 'group argument must be None for now' AssertionError: group argument must be None for now

This error did not occur when I use the same command without the -p parameters so I can continue with downstream analysis using one core.

I'm using ribotish with python 3.7

Thanks

The meaning and differences in TisType

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

  • Can I understand the class Extended in a way that if an assembled transcript from RiboSeq data is aligned to the annotated CDS region and the transcript is continuous without frameshift and extends outside of the annotated CDS, it is annotated as extended.
  • While the 5'UTR and 3'UTR means that the TIS of a transcript is aligned to these untranslated regions and not assembled into the transcript of the CDS part (or not in the same frame)?

Second, I also got some Internal and Internal:CDSFrameOverlap

  • I see CDSOverlap means the ORF overlaps with annotated CDS in another transcript in the same reading frame.
  • Does Internal mean that a predicted ORF
    • locates within an annotated CDS (both ends locate within the annotated one)
    • is in different frame
  • Does internal:CDSFrameOverlap means a predicted ORF locates within an annotated but in the same frame?

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR?

Thank you very much in advance!!
Ruixuan

run mistakes

Traceback (most recent call last):
File "/gss1/home/zpchen/miniconda3/envs/ribotish/bin/ribotish", line 56, in
main()
File "/gss1/home/zpchen/miniconda3/envs/ribotish/bin/ribotish", line 34, in main
commands[cmd].run(args)
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/run/quality.py", line 85, in run
cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat)
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis
for result in len_iter:
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans
for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=paired, flank=flank):
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 498, in transReadsIter
for read in rds: #yield read
File "/gss1/home/zpchen/miniconda3/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 52, in fetch_reads
if minMapQ is not None and read.mapping_quality < minMapQ : continue
AttributeError: 'csamtools.AlignedRead' object has no attribute 'mapping_quality'

Use of 5' mismatch reads in predict

This is just a question about the implementation of the predict function.
Say I have para.py as
offdict = {27: 11, 'm0': {27: 12}}

But after examination I decide to remove the 5' mismatch reads from the analysis. If I edit para.py to:
offdict = {27: 11}

Is that going to ignore or include the 5' mismatch reads? I don't want to include them, should I do something else?
e.g.
'm0':{}

Thank you!

TypeError: '<' not supported between instances of 'NoneType' and 'int' when running predict module

Dear developer,

I came across this error when running ribotish with the following command:

ribotish predict -p4 --alt --framebest -b sample.bam -g ribotish_test2.gtf -f genome.fa -o ribotish_test2 -v -v

Error message:

Thu Aug  4 19:14:55 2022 Loading genome...
No input TIS data!
Thu Aug  4 19:14:55 2022 Predicting...
10
ENSMMUG00000061626	ENSMMUT00000107115	0	68
Thu Aug  4 19:14:55 2022 ENSMMUG00000061626	ENSMMUT00000107115	SHISAL1	protein_coding	10:7187171-7234333:+	TTG	804	1335	Internal	0	0	None	5.82694120373455e-12	T	5.826941203734552e-12
Traceback (most recent call last):
  File "/home/admin/mambaforge/bin/ribotish", line 56, in <module>
    main()
  File "/home/admin/mambaforge/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/admin/mambaforge/lib/python3.9/site-packages/ribotish/run/predict.py", line 250, in run
    cr = interval.cds_region_trans(t)
  File "/home/admin/mambaforge/lib/python3.9/site-packages/ribotish/zbio/interval.py", line 299, in cds_region_trans
    thick.sort()
TypeError: '<' not supported between instances of 'NoneType' and 'int'

This error seems to be caused by incorrect parsing of stop_codon position in src/zbio/gtf.py when the stop codon overlaps with a splice junction and is reprensented by two stop_codon lines in the gtf file:

cs = self.cdna_pos(sc.end5) + 3

And the stop codon position is stored as an interval rather than a list of two intervals:
elif e.type == 'stop_codon': self.stop_codon = e

Here is the full gtf file used above:

10	ensembl	transcript	7172377	7234333	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";
10	ensembl	exon	7172377	7173079	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000475544"; exon_version "1";
10	ensembl	CDS	7172377	7173079	.	+	0	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "1"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7183086	7183184	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "2"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000523952"; exon_version "1";
10	ensembl	CDS	7183086	7183184	.	+	2	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "2"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7187170	7187383	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "3"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000036524"; exon_version "2";
10	ensembl	CDS	7187170	7187383	.	+	2	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "3"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7198342	7198659	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000520747"; exon_version "1";
10	ensembl	CDS	7198342	7198657	.	+	1	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMMUP00000071984"; protein_version "1";
10	ensembl	exon	7234333	7234333	.	+	.	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "5"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSMMUE00000508835"; exon_version "1";
10	ensembl	stop_codon	7198658	7198659	.	+	0	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "4"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";
10	ensembl	stop_codon	7234333	7234333	.	+	1	gene_id "ENSMMUG00000061626"; gene_version "1"; transcript_id "ENSMMUT00000107115"; transcript_version "1"; exon_number "5"; gene_name "SHISAL1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "SHISAL1-205"; transcript_source "ensembl"; transcript_biotype "protein_coding";

Here is a demo of this error:

from ribotish.zbio import io

for g in io.geneIter('ribotish_test2.gtf', 'gtf'):
     print(g)

for t in g.trans:
     print(t)

t.cds_stop(cdna=True)  # = 1337
t.cds_start(cdna=True)  # = 2
t.cdna_length()  # = 1335 (< 1337)

I am not sure how to deal with this issue. Can I get expected results by just deleting all the lines with "stop_codon" in third column of the gtf file?

No reads found!

Hi there,

I am trying to run ribotish on regular riboseq data from our lab.
Unfortunately, this already fails at the quality step. I mapped the reads to their reference (GRCm39) with a special annotation file from RNAcentral containing annotation for ncRNAs. I used the same gtf below.

I ran:

ribotish quality -b file.bam -g mus_musculus.RNAcentral.gtf --th 0.5

The error I get is:

Counted reads: 0
Error: no reads found! Check read length or protein coding annotation.

The first couple gtf entries look the following way:

1	RNAcentral	transcript	3056358	3056384	.	-	.	transcript_id "URS0000253E70_10090.0"; gene_id "URS0000253E70_10090.0";
1	RNAcentral	exon	3056358	3056384	.	-	.	transcript_id "URS0000253E70_10090.0";
1	RNAcentral	transcript	3056360	3056384	.	-	.	transcript_id "URS000042B783_10090.1"; gene_id "URS000042B783_10090.1";

Do you have any idea where this error may be coming from?

Thanks for your help!

Wrong CDS annotation

Hello,

I have ran:

ribotish predict -t TIS_bams_new/R3_ProAligned.sortedByCoord.out.bam -b CHX_bams_new/R3_ProAligned.sortedByCoord.out.bam -g gencode.v19.annotation.gtf -f GRCh37.p13.genome.fa -o pred.txt --framebest

The result is that (It is not finished):

No offset parameter file found for CHX_bams_new/R3_ProAligned.sortedByCoord.out.bam. Using default offset (12).
Thu Sep 22 10:43:03 2022 Estimating TIS background parameters...
Thu Sep 22 12:05:04 2022 Predicting...
Wrong CDS annotation: ENSG00000189409.8 ENST00000472264.1 56 556 556
Wrong CDS annotation: ENSG00000116649.5 ENST00000487300.1 252 611 611
Wrong CDS annotation: ENSG00000219073.3 ENST00000374666.1 3 497 497
Wrong CDS annotation: ENSG00000158062.16 ENST00000374215.1 196 938 938
Wrong CDS annotation: ENSG00000090020.6 ENST00000374084.2 289 695 695
Wrong CDS annotation: ENSG00000142765.13 ENST00000473280.1 83 312 312
Wrong CDS annotation: ENSG00000116497.13 ENST00000482212.1 267 706 706
Wrong CDS annotation: ENSG00000116922.10 ENST00000486637.1 511 852 852
Wrong CDS annotation: ENSG00000066136.15 ENST00000531464.1 344 525 525
Wrong CDS annotation: ENSG00000117385.11 ENST00000372526.2 31 654 654
Wrong CDS annotation: ENSG00000186973.6 ENST00000409396.1 29 448 448
Wrong CDS annotation: ENSG00000079277.15 ENST00000496619.1 202 738 738
Wrong CDS annotation: ENSG00000132122.7 ENST00000371841.1 69 818 818
Wrong CDS annotation: ENSG00000085831.11 ENST00000411642.2 92 931 931
Wrong CDS annotation: ENSG00000134744.9 ENST00000484723.2 191 2910 2910
Wrong CDS annotation: ENSG00000116212.10 ENST00000371368.1 261 838 838
Wrong CDS annotation: ENSG00000125703.10 ENST00000371118.1 96 665 665
Wrong CDS annotation: ENSG00000177414.9 ENST00000371077.5 424 1193 1193
Wrong CDS annotation: ENSG00000116791.9 ENST00000370870.1 158 888 888
Wrong CDS annotation: ENSG00000137944.12 ENST00000370486.1 232 1019 1019
Wrong CDS annotation: ENSG00000184371.9 ENST00000357302.4 239 597 597
Wrong CDS annotation: ENSG00000007341.14 ENST00000369664.1 174 862 862
Wrong CDS annotation: ENSG00000134262.8 ENST00000369564.1 336 1225 1225
Wrong CDS annotation: ENSG00000163349.17 ENST00000503968.1 251 570 570
Wrong CDS annotation: ENSG00000163399.11 ENST00000369494.1 264 698 698
Wrong CDS annotation: ENSG00000143452.11 ENST00000368987.1 219 791 791
Wrong CDS annotation: ENSG00000143442.17 ENST00000533351.1 167 586 586
Wrong CDS annotation: ENSG00000143569.14 ENST00000368504.1 69 1120 1120
Wrong CDS annotation: ENSG00000132676.11 ENST00000471214.1 388 1287 1287
Wrong CDS annotation: ENSG00000143320.4 ENST00000368220.1 208 456 456
Wrong CDS annotation: ENSG00000249730.1 ENST00000504970.1 0 935 935
Wrong CDS annotation: ENSG00000116191.13 ENST00000324778.5 107 906 906
Wrong CDS annotation: ENSG00000162779.16 ENST00000509175.1 310 765 765
Wrong CDS annotation: ENSG00000135837.11 ENST00000357434.2 0 392 392
Wrong CDS annotation: ENSG00000116747.8 ENST00000506303.1 488 613 613
Wrong CDS annotation: ENSG00000081237.14 ENST00000367379.1 71 517 517
Wrong CDS annotation: ENSG00000117153.11 ENST00000367258.1 87 1033 1033
Wrong CDS annotation: ENSG00000143842.10 ENST00000525442.1 365 538 538
Wrong CDS annotation: ENSG00000117625.9 ENST00000533469.1 91 540 540
Wrong CDS annotation: ENSG00000162931.7 ENST00000479800.1 123 925 925
Wrong CDS annotation: ENSG00000086619.9 ENST00000366589.1 390 605 605
Wrong CDS annotation: ENSG00000116977.14 ENST00000481485.1 480 799 799
Wrong CDS annotation: ENSG00000172059.6 ENST00000401510.1 232 614 614
Wrong CDS annotation: ENSG00000138074.10 ENST00000401463.1 290 732 732
Wrong CDS annotation: ENSG00000189350.8 ENST00000401723.1 233 720 720
Wrong CDS annotation: ENSG00000055332.12 ENST00000390013.3 257 564 564
Wrong CDS annotation: ENSG00000115828.11 ENST00000404976.1 138 992 992
Wrong CDS annotation: ENSG00000162994.11 ENST00000403506.1 290 430 430
....

I don't know why it takes so long and why that message. I think which is due to annotation file, but then what file I have to use?

thank you so much!!

ribotish predict module run error

Hellow
when I run ribotish predict using stringtie assembly gtf which include new transcript info, it return this error
image
after removing new transcript it run successfully. How can I solve it?

This is my gtf
image

Error when running ribotish tisdiff

Hi,

I've been trying to run the whole pipeline included in Ribo-seq TIS Hunter(ribotish quality, ribotish predict and ribotish tisdiff ) with four different replicates of RiboSeq data. I've had no issues so far with the first two functions, but when running tisdiff

ribotish tisdiff -1 Sample_6/test6.txt -2 Sample_7/test7.txt -a Sample_6/sample6_sort.bam -b Sample_7/sample6_sort.bam -g //index/genome.gtf -o diff2.txt -v

I've retrieved the following errors
Traceback (most recent call last): File "/home/user/RiboSeq/project/bin/ribotish", line 14, in <module> import os, sys, argparse, time File "/home/user/RiboSeq/project/bin/ribotish", line 35, in main commands[cmd].run(args) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/run/tisdiff.py", line 229, in run for result in pred_iter: File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/run/tisdiff.py", line 375, in _get_tis ttis = ribo.multiRibo(t, bamlist[i], offlist[i], compatible = compatible, mis = compatiblemis, paired = paired) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 489, in multiRibo mribo.merge(Ribo(trans, bamfiles[i], offset = offset, offdict = offdict[i], compatible = compatible, mis = mis, paired = paired)) File "/home/user/RiboSeq/project/local/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 71, in __init__ off = offset(r, offdict) TypeError: 'list' object is not callable

Also, as I ran it in verbose mode, I retrieved the following printed- on- screen messages

No offset parameter file found for Sample_6/sample1_sort.bam. Using default offset (12). No offset parameter file found for Sample_7/sample1_sort.bam. Using default offset (12). Loading 2 TIS data... 7 TISs in Sample_6/test6.txt. 23 TISs in Sample_7/test7.txt. 24 TISs in total. Reading bams...

Could it be the error due to the fact of having a different number of predicted TIS in both Sample_6/test6.txt and Sample_7/test7.txt ? I've tried to filter both files so only the intersecting common predicted TIS are fed as an input to ribotish tisdiff, but still getting the same errors.

Would you have any idea of how to fix this issue?

The prediction files( Sample_6/test6.txt and Sample_7/test7.txt) were generated by running
ribotish predict -t /home/user/Sample_7/sample7_sort.bam -g //index/genome.gtf -f //index/genome.fa --tispara 12 --altcodons CTG,TTG --harr -p 78 -o test7.txt

Thank you very much!

Feature Request - Add 3' end match for RPFs

Hi. This is a great tool!

The prokaryotic community could benefit more of it if a feature doing the same for 3' end of RPFs could be added. Seems like the 3-nt periodicity for model organisms like E. coli is easier to notice when using 3' end profile of RPFs.

Thanks!

Ribotish on prokaryotic ribosome profiling

Hello,

I carried out ribosome profiling TI-seq on a prokaryotic organism and I am interested in seeing if I could use ribotish. I have first tried executing this command:
ribotish quality -l 10,40 -b /Volumes/Diego_2TB/ribosome_profiling_final_libs/libraries/H98_WT_ctrl/reads/density/filtering_bowtie/alignments/chr/H98_WT_ctrl.bam -g /Users/DRG/Downloads/GCF_000025685.1_ASM2568v1_genomic_update_080119.gff --geneformat gff

but then I get an error which is:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

I am having a hard time troubleshooting what could be the problem. From my own analysis, the peak footprint length is 27 nt, but there is a range of footprint lengths since this is MNase-treated (for a metagene plot of density periodicity and heatmap of length distribution see here. Is the problem the footprint length or is the gff annotation the problem? Or something else entirely?

Some extra details:

  1. The annotation is from NCBI
  2. The bam file was aligned with bowtie, it is single-end, and it has been coordinate sorted.

Any help/insight on what the problem could be is greatly appreciated.

Thank you.

AttributeError: 'Exp' object has no attribute 'sq'

Hi,

I am trying to run ribotish on human samples with older genome and annotation (gtf) from 2014.

This is the command I ran:

ribotish predict \
-b file.bam \
-g Homo_sapiens.GRCh37.75_without_bs.gtf \
-f Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \
--ribopara SRR1333394.sorted_indexed.bam.para.py \
--aaseq --tpth 1 --fpth 1 --fspth 1 --fsqth 1 --longest -v \
-o ribotish_pred.txt

This is the error:

Thu Nov 24 10:02:01 2022 Loading genome...
Making fasta index for Homo_sapiens.GRCh37.75.dna.primary_assembly.fa...
No input TIS data!
Thu Nov 24 10:02:42 2022 Predicting...
1
Traceback (most recent call last):
  File "/usr/local/bin/ribotish", line 56, in <module>
    main()
  File "/usr/local/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 469, in _pred_gene
    e = getResult(t, tis, o.stop, cds1, cds2, tsq, [ip, ttis.cnts[tis], tps[i], rps[i], rst[i], fsp], o.has_stop_codon)
  File "/usr/local/lib/python3.8/dist-packages/ribotish/run/predict.py", line 499, in getResult
    if aaseq : e.aa = orf.translate(e.sq)
AttributeError: 'Exp' object has no attribute 'sq'

Can you help me on this one?

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

Hi Zhang lab,

I have run quality.py successfully on all of my bam files, but when I try to run predict.py using the same reference, I get this error:

Command:

predict.py -t Ngn2_H2_lncRNAKB_unique.bam -b Ngn2_D2_lncRNAKB_unique.bam -g lncRNAKB_hg38_v7.gtf -f GRCh38.primary_assembly.genome.fa -o pred_rep1.txt

Output:

Mon Feb 28 13:02:03 2022 Estimating TIS background parameters...
Mon Feb 28 13:23:06 2022 Predicting...
Traceback (most recent call last):
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 564, in <module>
    run(p.parse_args())
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/run/predict.py", line 394, in _pred_gene
    ttis = ribo.Ribo(t, bamload = tismbl, compatible = compatible, mis = compatiblemis)
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 62, in __init__
    self.cnts = bamload.transCounts(trans, compatible = compatible, mis = mis)
  File "/home/eed13/env/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 629, in transCounts
    i += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

I am running this command on indexed bam files located in the same folder as the output from the quality.py script.

Encountered error when using ribotish predict

Dear developer, I am using ribotish to predict ORF from Ribo-seq data, but I am getting some errors while running ribotish predict, here is my code:

INPUT_DIR="/scratch/lb4489/project/liver_ribo/mapping/ribotish/LC001_normal_RPFAligned.sortedByCoord.out.bam,/scratch/lb4489/project/liver_ribo/mapping/ribotish/LC001_tumor_RPFAligned.sortedByCoord.out.bam"

ribotish predict -b "$FILES" -g /scratch/lb4489/bioindex/gencode_human_plus_selNONCODE.gtf -f /scratch/lb4489/bioindex/GRCh38.p14.genome.fa --framebest -o ribotish_demo.txt

Here is the error message

No offset parameter file found for . Using default offset (12). 
No input TIS data!
Fri Mar  1 12:35:28 2024 Predicting...
Traceback (most recent call last):
  File "/home/lb4489/.conda/envs/ribotish/bin/ribotish", line 56, in <module>
    main()
  File "/home/lb4489/.conda/envs/ribotish/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/run/predict.py", line 375, in _pred_gene
    ribombl = ribo.multiRiboGene(g, ribobampaths, offdict = riboffdict, compatible = compatible, mis = compatiblemis, paired = paired)
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/ribo.py", line 504, in multiRiboGene
    mbl.merge(bam.BamLoadChr(bampaths[i], chr = gene.chr, region = regions[gene.chr], strand = gene.strand, posFunc = offunc, maxNH = maxNH, minMapQ = minMapQ, secondary = secondary, paired = paired))
  File "/home/lb4489/.conda/envs/ribotish/lib/python2.7/site-packages/ribotish/zbio/bam.py", line 583, in __init__
    bamfile = Bamfile(bampath)
  File "pysam/calignmentfile.pyx", line 318, in pysam.calignmentfile.AlignmentFile.__cinit__ (pysam/calignmentfile.c:4730)
  File "pysam/calignmentfile.pyx", line 388, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:5652)
  File "pysam/calignmentfile.pyx", line 534, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7261)
IOError: file `` not found

Firstly, I generated the offset parameter file in the bam file location and also tried copying the offset parameter file to the current folder. But ribotish doesn't even seem to recognise it automatically, what should I do.
Secondly I don't quite understand how to fix the error reported, I have installed pysam version 0.8.4 using pip, can you help me see what the problem is.

Thanks,
LeeLee

ribotish quality: error: argument -d: expected one argument

Hi, when I wanted to use option -d to define the position range near start codon or stop codon, I met such an error:

ribotish quality: error: argument -d: expected one argument

My command line is:

ribotish quality -b D0_1.unique_mapped.sorted.bam -g Mus_musculus.GRCm39.103.chr.gtf -d -20,60

Would you please show me how to customize the position range?

Thanks a lot!

"wrong exon strand" and "stop codon error"

Hi

I just started working on riboseq data, and came across ribo-tish which seems really useful for QC-ing. I have aligned my reads using STAR as per the guidelines and using the gtf annotation file.

Now I started running the command on one of my bam files, and I have a couple of questions.

  1. how long time should it take? The bam file is 300mb, and ribotish has been running for almost an hour, so I wonder if something is wrong.

  2. I get multiple "Wrong exon strand" messages, as well as "stop codon error". I have tried to look for a description of these errors, but I cannot find any. What does it mean, and is it a problem?

Cheers
Sjannie

TypeError: '>=' not supported between instances of 'NoneType' and 'float'

Hi,
I ran the following command and it got through the background estimation and part of the prediction step before ending on the TypeError as seen below.

Also, my GTF is produced by cuffmerge, which only determines exons, I then inserted the CDS's from the Gencode GTF. It throws the warnings as seen below, but ran without issue when I did not include the TIS information for the predict function, so I don't think that's the problem?

ribotish predict -b ./merge.bam -g ./merged_nanopore_Gen25_primary_with_CDS.gtf \
 -t ./LTM.bam --longest --tispara ./LTM.out.bam.para.py \
-e ./LTM_TIbackround.txt -f ./GRCm38.primary_assembly.genome.fa --geneformat gtf \
--ribopara ./riboquality_merge.bam.para.py --alt --minaalen 15 --seq -v

No offset parameter file found for LTM.bam. Using default offset (12).
Sat Sep 25 08:00:11 2021 Loading genome...
Sat Sep 25 08:00:11 2021 Estimating TIS background parameters...
TIS background estimation result will be saved to LTM_TIbackround.txt
...
Group data...
[7.185980623969726, 8.281739445973116, 9.203102783507383, 10.172378067164596, 11.06840648061946, 12.107621280053845, 13.12517452951367, 15.740720641276173, 15.740720641276173, None]
Estimate NB parameters...
Sun Sep 26 04:30:41 2021 Predicting...
...
chr12
Wrong CDS annotation: XLOC_006674 TCONS_00026339 214 1028 1666
Wrong CDS annotation: XLOC_006952 TCONS_00027793 305 2682 3539
Wrong CDS annotation: XLOC_007091 TCONS_00028533 553 1535 2076
Traceback (most recent call last):
  File "/public/home/emalekos/miniconda3/bin/ribotish", line 56, in <module>
    main()
  File "/public/home/emalekos/miniconda3/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/run/predict.py", line 398, in _pred_gene
    if score is not None: ip = ribo.pidx(score, slp)
  File "/public/home/emalekos/miniconda3/lib/python3.9/site-packages/ribotish/zbio/ribo.py", line 403, in pidx
    if s >= value : break
TypeError: '>=' not supported between instances of 'NoneType' and 'float'

--altcodons ACG,CUG

Hi, I want to change the start codon to CUG.However I use the command , I get the predict results which start codons only contain AUG and ACG, I did not get the CUG start codon result ,how I set the parameter to the the CUG start codon result ?

ribotish predict -b test_1.sort.bam,test_2.sort.bam -p 8 --altcodons ACG,CUG --framebest --inframecount -g test.gtf -f test.fna -o predict_output.txt

what the types of TisType mean?

What the types of TisType mean?When I use the ribotish predict ,I get the many types of TisType,what dose every TisType
means ?
TisType

Annotated
Truncated
Novel
Internal:CDSFrameOverlap
Internal
3'UTR
5'UTR
5'UTR:CDSFrameOverlap
Extended
3'UTR:Known
3'UTR:CDSFrameOverlap
Novel:CDSFrameOverlap
Truncated:Known

ribotish tisdiff error?ZeroDivisionError: integer division or modulo by zero

hi,when I use the ribotish tisdiff ,some error information as follows:
my command :
ribotish tisdiff -1 pred1.txt -2 pred2.txt -a v1.sort.bam -b vi2.sort.bam -g test.gtf -o diff -v
error informations:

Loading 2 TIS data...
1 TISs in pred1.txt.
1 TISs in pred2.txt.
1 TISs in total.
Reading bams...
JN
Estimate scale factor...
Traceback (most recent call last):
File "/usr/bin/ribotish", line 4, in
import('pkg_resources').run_script('ribotish==0.2.6', 'ribotish')
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 1441, in run_script
exec(script_code, namespace, namespace)
File "/usr/lib/python2.7/site-packages/ribotish-0.2.6-py2.7.egg/EGG-INFO/scripts/ribotish", line 56, in

File "/usr/lib/python2.7/site-packages/ribotish-0.2.6-py2.7.egg/EGG-INFO/scripts/ribotish", line 34, in main

File "build/bdist.linux-x86_64/egg/ribotish/run/tisdiff.py", line 254, in run
File "build/bdist.linux-x86_64/egg/ribotish/zbio/exp.py", line 207, in TMM
ZeroDivisionError: integer division or modulo by zero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.