xryanglab / ribocode Goto Github PK

release version

License: MIT License

Python 99.35% Shell 0.65%

bioinformatics ribosome-profiling orfs peptides

ribocode's Introduction

Detect translated ORFs using ribosome-profiling data

RiboCode is a very simple but high-quality computational algorithm to identify genome-wide translated ORFs using ribosome-profiling data.

Dependencies:

pysam
pyfasta
h5py
Biopython
Numpy
Scipy
statsmodels
matplotlib
HTSeq
minepy

Installation

RiboCode can be installed like any other Python packages. Here are some popular ways:

Install via pypi:

pip install ribocode

Install via conda:

conda install -c bioconda ribocode

Install from source:

git clone https://www.github.com/xzt41/RiboCode
cd RiboCode
python setup.py install

Install from local:

pip install RiboCode-*.tar.gz

If you have not administrator permission, you need to install RiboCode locally in you own directory by adding the option --user in the above command. Then, you need to define ~/.local/bin/ in PATH variable, and ~/.local/lib/ in PYTHONPATH variable. For example, if you are using the bash shell, you should add the following lines to your ~/.bashrc file:

export PATH=$PATH:$HOME/.local/bin/
export PYTHONPATH=$HOME/.local/lib/python2.7

then, source your ~/.bashrc file using this command:

source ~/.bashrc

Users can also update or uninstall package through one of the following commands:

pip install --upgrade RiboCode # upgrade
pip uninstall RiboCode # uninstall
conda update -c bioconda ribocode # upgrade
conda remove ribocode # uninstall

Tutorial to analyze ribosome-profiling data and run RiboCode

Here, we use the HEK293 dataset as an example to illustrate the use of RiboCode and demonstrate typical workflow. Please make sure the path and file name are correct.

Required files

The genome FASTA file, GTF file for annotation can be downloaded from:

http://www.gencodegenes.org

or from:

http://asia.ensembl.org/info/data/ftp/index.html

http://useast.ensembl.org/info/data/ftp/index.html

For example, the required files in this tutorial can be downloaded from following URL:

GTF: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz

The GTF file required by RiboCode should include three-level hierarchy annotations: genes,transcripts and exons. Some GTF files may lack the gene and transcript annotations, users can added these annotations using the "GTFupdate" command in RiboCode. Please refer to GTF_update.rst for more information.

The raw Ribo-seq FASTQ file can be downloaded using fastq-dump tool from SRA_Toolkit:
```
fastq-dump -A <SRR1630831>
```
Trimming adapter sequence for ribo-seq data

Using cutadapt program https://cutadapt.readthedocs.io/en/stable/installation.html

Example:
```
cutadapt -m 20 --match-read-wildcards -a (Adapter sequence) -o <Trimmed fastq file> <Input fastq file>
```
Here, the adapter sequences for this data had already been trimmed off, so we can skip this step.
Removing ribosomal RNA(rRNA) derived reads

Removing rRNA contamination by aligning the trimmed reads to rRNA sequences using Bowtie, then keeping the unaligned reads for the next step.

rRNA sequences are provided in rRNA.fa file.

Example:
```
bowtie-build <rRNA.fa> rRNA
bowtie -p 8 -norc --un <un_aligned.fastq> -q <SRR1630831.fastq> rRNA <HEK293_rRNA.align>
```

Aligning the clean reads to reference genome

Using STAR program: https://github.com/alexdobin/STAR

Example:

(1). Build index

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir <hg19_STARindex>
--genomeFastaFiles <hg19_genome.fa> --sjdbGTFfile <gencode.v19.annotation.gtf>

(2). Alignment:

STAR --outFilterType BySJout --runThreadN 8 --outFilterMismatchNmax 2 --genomeDir <hg19_STARindex>
--readFilesIn <un_aligned.fastq>  --outFileNamePrefix <HEK293> --outSAMtype BAM
SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --outFilterMultimapNmax 1
--outFilterMatchNmin 16 --alignEndsType EndToEnd

Running RiboCode to identify translated ORFs

(1). Preparing the transcripts annotation files:
```
prepare_transcripts -g <gencode.v19.annotation.gtf> -f <hg19_genome.fa> -o <RiboCode_annot>
```
The RiboCode_annot folder is necessary for the following steps, so its location should be properly given if author moved it or changed the working directory.

(2). Selecting the length range of the RPF reads and identify the P-site locations:
```
metaplots -a <RiboCode_annot> -r <HEK293Aligned.toTranscriptome.out.bam>
```
This step will generate two files: a PDF file plots the aggregate profiles of the distance from the 5'-end of reads to the annotated start codons (or stop codons), which is used for examining the P-site periodicity of RPF reads on CDS regions. The P-site config file, which defines the read lengths with strong 3-nt periodicity and the associated P-site locations for each length. In some cases, user may have multiple bam files to predict ORFs together in next step, they can use "-i" argument to specify a text file which contains the names of these bam files ( one file per line)

(3). Detecting translated ORFs using the ribosome-profiling data:
```
RiboCode -a <RiboCode_annot> -c <config.txt> -l no -g -o <RiboCode_ORFs_result>
```
Using the config file generated by last step to specify the information of the bam file and P-site parameters, please refer to the example file config.txt in data folder. The "gtf" or "bed" format file of predicted ORFs can be obtained by adding the "-g" or "-b" argument to this command.

Explanation of final result files

The RiboCode generates two text files: The "(output file name).txt" contains the information of all predicted ORFs in each transcript. The "(output file name)_collapsed.txt" file combines the ORFs having the same stop codon in different transcript isoforms: the one harboring the most upstream in-frame ATG will be kept.

Some column names of the result file:
```
- ORF_ID: The identifier of predicated ORF.
- ORF_type: The type of predicted ORF, which is annotated according to its location to associated CDS. The following ORF categories are reported:

 "annotated" (overlapping with annotated CDS, have the same stop codon with annotated CDS)

 "uORF" (upstream of annotated CDS, not overlapping with annotated CDS)

 "dORF" (downstream of annotated CDS, not overlapping with annotated CDS)

 "Overlap_uORF" (upstream of annotated CDS and overlapping annotated with CDS)

 "Overlap_dORF" (downstream of annotated CDS and overlapping annotated CDS"

 "Internal" (internal ORF of annotated CDS, but in a different reading frame)

 "novel" (from non-coding genes or non-coding transcripts of the coding genes).

- alt_ORF_type: only shown in "_collapsed.txt" file for reporting alternative annotations of each ORF based on its relative location in those transcripts other than the longest one
- ORF_tstart, ORF_tstop: the start and end position of ORF relative to its transcript (1-based coordinate)
- ORF_gstart, ORF_gstop: the start and end position of ORF in the genome (1-based coordinate)
- pval_frame0_vs_frame1: significance levels of P-site densities of frame0 greater than of frame1
- pval_frame0_vs_frame2: significance levels of P-site densities of frame0 greater than of frame2
- pval_combined: integrated P-value by combining pval_frame0_vs_frame1 and pval_frame0_vs_frame2
- adjusted_pval: adjusted p-value for multiple testing correction.
```
All above three steps can also be easily run by a single command "RiboCode_onestep":
```
RiboCode_onestep -g <gencode.v19.annotation.gtf> -f <hg19_genome.fa> -r <HEK293Aligned.toTranscriptome.out.bam>
                 -l no -o <RiboCode_ORFs_result>
```
(4). (optional) Plotting the P-sites densities of predicted ORFs

Using the "plot_orf_density" command, for example:
```
plot_orf_density -a <RiboCode_annot> -c <config.txt> -t (transcript_id)
-s (ORF_gstart) -e (ORF_gstop)
```
The generated PDF plots can be edited by Adobe Illustrator.

(5). (optional) Counting the number of RPF reads aligned to ORFs

The number of reads aligned on each ORF can be counted by the "ORFcount" command which will call the HTSeq-count program. Only the reads of a given length will be counted. For those ORF with length longer than a specified value (set by "-e"), the RPF reads located in first few and last few codons can be excluded by adjusting the parameters "-f" and "-l". For example, the reads with length between 26-34 nt aligned on predicted ORF can be obtained by using below command:
```
ORFcount -g <RiboCode_ORFs_result.gtf> -r <ribo-seq genomic mapping file> -f 15 -l 5 -e 100 -m 26 -M 34 -o <ORF.counts>
```
The reads aligned to first 15 codons and last 5 codons of ORFs and had the length longer than 100 nt will be excluded. The "RiboCode_ORFs_result.gtf" file can be generated by RiboCode command. The "ribo-seq genomic mapping file" is the genome-wide mapping file produced by STAR mapping.

Recipes (FAQ):

I have a BAM/SAM file aligned to genome, how do I convert it to transcriptome-based mapping file ?

You can use STAR aligner to generate the transcriptome-based alignment file by specifying the "--quantMode TranscriptomeSAM" parameters, or use the "sam-xlate" command from UNC Bioinformatics Utilities .
How to use multiple BAM/SAM files to identify ORFs?

You can select the read lengths which show strong 3-nt periodicity and the corresponding P-site locations for each BAM/SAM file, then list each file and their information in config.txt file. RiboCode will combine the P-site densities at each nucleotides of these BAM/SAM files together to predict ORFs.
Generating figures with matplotlib when DISPLAY variable is undefined or invalid

When running the "metaplots" or "plot_orf_density" command, some users received errors similar to the following:

raise RuntimeError('Invalid DISPLAY variable')

_tkinter.TclError: no display name and no $DISPLAY environment variable

The main problem is that default backend of matplotlib is unavailable. The solution is to modify the backend in matplotlibrc file. A very simple solution is to set the MPLBACKEND environment variable, either for your current shell or for a single script:
```
export MPLBACKEND="module:Agg"
```
Giving below are non-interactive backends, capable of writing to a file:

Agg PS PDF SVG Cairo GDK

See also:

http://matplotlib.org/faq/usage_faq.html#what-is-a-backend

http://matplotlib.org/users/customizing.html#the-matplotlibrc-file

http://stackoverflow.com/questions/2801882/generating-a-png-with-matplotlib-when-display-is-undefined

For any questions, please contact:

Xuerui Yang (yangxuerui[at]tsinghua.edu.cn); Zhengtao Xiao (zhengtao.xiao[at]xjtu.edu.cn)

ribocode's People

Contributors

Stargazers

Watchers

Forkers

zhengtaoxiao yf0205 sunqiangzai zhaoyuehrb marcasriv kennyluo4 sitaoz silhouetteq sherkinglee huguanjing zachery001

ribocode's Issues

h5py TypeError: No conversion path for dtype: dtype('<U18')

RiboCode -a RiboCode_annot -c metaplots_pre_config.txt -l no -g -o RiboCode_ORFs_result

Loading transcripts.pickle ...
Reading bam file: /n/jobspace/bbcore/schaffer_rpf_abby_min6/two_batches/rpf/alignments_transcriptome/20210125_Index_3_JS8600_S11_R1_001_Aligned.toTranscriptome.out.bam......
Traceback (most recent call last):
File "/home/panh/miniconda3/envs/ribocode/bin/RiboCode", line 10, in
sys.exit(main())
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/RiboCode.py", line 40, in main
tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 118, in psites_count
tpsites,psites_number = read_bam(configData)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 80, in read_bam
write_psites(tpsites,psites_number, name + "_psites.hd5")
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 21, in write_psites
fout.create_dataset("transcript_ids",data=list(tpsites.keys()),dtype=ds)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/h5py/_hl/group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 170, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 212, in h5py.h5d.DatasetID.write
File "h5py/h5t.pyx", line 1654, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1715, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U18')

prepare_transcripts failed

Traceback (most recent call last):
File "/home/dr/anaconda2/bin/prepare_transcripts", line 10, in
sys.exit(main())
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 308, in processTranscripts
gene_dict,transcript_dict = readGTF(gtfFile)
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 187, in readGTF
gene => transcript => exon (or CDS)" % i)
RiboCode.prepare_transcripts.ParsingError: Error in line 0. The annotation in GTF file should be three-level hierarchy of gene => transcript => exon (or CDS)

I am getting this error and I am new to bioinformatics kindly resolve....

Read counts issues

Is this package only counts the unique mapped reads in the setp "ORFcount"？

can't locate attribute: psites_number

I'm trying to debug a workflow that runs RiboCode. I tried using the latest container from quay.io and this is the error I'm running into. Any guidance on what I should look into first?

Loading transcripts.pickle ...
Loading Psites from xxxxx.transcriptome.dedup......
_psites.hd5Traceback (most recent call last):
File "/usr/local/bin/RiboCode", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 107, in psites_count
tpsites_sum,total_psites_number = read_bam(configList[0])
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 45, in read_bam
tpsites,psites_number = load_psites(name + "_psites.hd5" )
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 31, in load_psites
psites_number = fin.attrs["psites_number"]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 56, in getitem
attr = h5a.open(self._id, self._e(name))
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5a.pyx", line 80, in h5py.h5a.open
KeyError: "Can't open attribute (can't locate attribute: 'psites_number')"

Error in GTFupdate

Hi, I want to use a special gtf file and want to process these files through GTFupdate, but I find that there will be some errors.
The following is the content of my gtf file：

chr1	Cufflinks	exon	11872	12227	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	12613	12721	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	13225	14412	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	11874	12227	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	12595	12721	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	13403	13655	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	13661	14409	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"

I modified it with reference to the format of GTF_update.rst, but there are still the following errors;

Traceback (most recent call last):
  File "/home/leelee/miniconda3/envs/p3/bin/GTFupdate", line 10, in <module>
    sys.exit(main())
  File "/home/leelee/miniconda3/envs/p3/lib/python3.7/site-packages/RiboCode/GTF_update.py", line 117, in main
    gset,sourted_gset_keys = GroupGeneSubsets(args.gtfFile)
  File "/home/leelee/miniconda3/envs/p3/lib/python3.7/site-packages/RiboCode/GTF_update.py", line 34, in GroupGeneSubsets
    gid=field_dict["attr"]["gene_id"]
KeyError: 'gene_id'

How can i solve this problem？
Thanks,
LeeLee

No obviously periodicity are detected from bam file when running metaplots step

When I process metaplots, the log printed that:

No obviously periodicity are detected from bam file SRR22462570/SRR22462570.Aligned.toTranscriptome.out.bam,
it could be due to poor quality sequencing.
Please check the metagene plots and try again by lowering the value of frame0_percent

Even thought used the following parameters
metaplots -a $genecode.v44.ribo.anno -r $sample.Aligned.toTranscriptome.out.bam -o $sample.ribocode -f0_percent 0.01
The log still reported the same contents, which causes RiboCode to interrput

Empty "transcripts_cds.txt" file

Hi, the "transcripts_cds.txt" file generated is empty, how should I fix it? Thanks!

Here is GTF

A01	phytozomev13	gene	34694907	34695456	.	-	.	gene_id "Gohir.A01G126666.v2.1"; gene_name "Gohir.A01G126666.v2.1";
A01	phytozomev13	transcript	34694907	34695456	.	-	.	gene_id "Gohir.A01G126666.v2.1"; gene_name "Gohir.A01G126666.v2.1"; transcript_id "Gohir.A01G126666.1.v2.1";
A01	phytozomev13	exon	34694907	34695456	0	-	.	gene_id "Gohir.A01G126666.v2.1"; transcript_id "Gohir.A01G126666.1.v2.1"; Name "Gohir.A01G126666";
A01	phytozomev13	gene	119528395	119531897	.	+	.	gene_id "Gohir.A01G229700.v2.1"; gene_name "Gohir.A01G229700.v2.1";
A01	phytozomev13	transcript	119528395	119531897	.	+	.	gene_id "Gohir.A01G229700.v2.1"; gene_name "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1";
A01	phytozomev13	exon	119528395	119528785	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119528884	119529005	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529125	119529262	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529365	119529507	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529601	119529736	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";

And here is the script

(ribocode) [hugj2006@bigram2 translatome2022]$ prepare_transcripts -g td1.gtf -f ../cottonLeaf/refGenomes/TM1utx_26.fasta -o RiboCode_annot
[2022-06-08 05:12:58] Preparing annotation files ...
	Loading transcripts.pickle ...
[2022-06-08 05:13:09] The step of preparing transcript annotation has been completed.
(ribocode) [hugj2006@bigram2 translatome2022]$ 
(ribocode) [hugj2006@bigram2 translatome2022]$ ls -lh RiboCode_annot/
total 166M
-rw-rw-r--+ 1 hugj2006 domain users    0 Jun  8 04:53 transcripts_cds.txt
-rw-rw-r--+ 1 hugj2006 domain users  71M Jun  8 04:53 transcripts.pickle
-rw-rw-r--+ 1 hugj2006 domain users 186M Jun  8 04:53 transcripts_sequence.fa

Alternative Start Codons

Hi,

I was wondering how to add an alternative start codon list in the RiboCode step. Thank you for your help!

Best,
Qidi

Whether multiple mapping affect ribocode result

Sorry to bother you.
I followed your workflow use STAR arguments --quantMode TranscriptomeSAM and --outFilterMultimapNmax 1,this is the command looks like

STAR --outFilterType BySJout --runThreadN 10 --outFilterMismatchNmax 2 \
--genomeDir /reference/GRCm38/STAR \
--readFilesIn /my/rmrRNA/${sample}_trim_norrna.fq  \
--outFileNamePrefix ${sample} \
--outSAMtype BAM SortedByCoordinate \
--quantMode TranscriptomeSAM GeneCounts \
--outFilterMultimapNmax 1 --outFilterMatchNmin 16 --alignEndsType EndToEnd

the output Aligned.toTranscriptome.out.bam file still have multiple mapping sequence ( NH:i > 1 ) , because STAR wiil output all records in this toTranscriptome bam file , Does these multiple mapping records affect the Ribocode result to find uORF ?

Change readme from pip to pip3

Hey, since RiboCode requires Python 3.6, it is better to set default in readme to pip3, as this makes it more failsafe for users who have both. If there is not something I am missing ?

No R script supp

I just wanted to ask for the supplementary R file for the data analysis. I followed the the link but the document is not available anymore.

Thanks!

ValueError: Can't transform the genomic interval, please check!

when i use the command "prepare_transcripts",it show "ValueError: Can't transform the genomic interval, please check!",I use the "GTFupdate",it also show the same error,how can i slove this? thanks!

different transcripts of one gene

Hi!
I'm sorry to bother you again. I don't know how to deal with different transcripts of one gene when quantification and study 3-nt periodicity in translation. Do you have some suggestions? Thank you very much!

Error, the references in bam are different from transcriptome annotation

Hi,

I got the following error when running my script:

Error, the references in bam are different from transcriptome annotation

It seems to throw this as my reads were aligned to the transcriptome and have the transcript version ID instead of just the transcript ID. Is it possible for the output of the prepare_transcript function to include the version number?

Thanks

KeyError while reading BAM

Dear developers,

I'm having the following error message:

Traceback (most recent call last):
  File "/home/huf/.local/bin/RiboCode", line 11, in <module>
    sys.exit(main())
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/process_bam.py", line 77, in read_bam
    tpsites[tid][t_psite] += 1
KeyError: 'ENSMUST00000035606.8'

I even sorted the bam, but it gave me the same error for a different transcript Id. Do you know where possibly goes wrong?

Best wishes,
Fengyuan

ModuleNotFoundError: No module named 'fasta'

Hi,
I got a ModuleNotFoundError when testing the RiboCode installation with RiboCode_onestep -V
Can you help?

$ RiboCode_onestep -V
Traceback (most recent call last):
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/bin/RiboCode_onestep", line 5, in <module>
    from RiboCode.RiboCode_onestep import main
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/RiboCode/RiboCode_onestep.py", line 16, in <module>
    from .prepare_transcripts import *
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/RiboCode/prepare_transcripts.py", line 17, in <module>
    from pyfasta import Fasta
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/pyfasta/__init__.py", line 3, in <module>
    from fasta import Fasta, complement, DuplicateHeaderException
ModuleNotFoundError: No module named 'fasta'

my package list

$ pip3 list
Package            Version
------------------ ---------
AGEpy              0.8.2
autopaths          1.6.0
bcrypt             3.2.0
biomart            0.9.2
biopython          1.78
certifi            2020.12.5
cffi               1.14.5
chardet            4.0.0
charset-normalizer 3.1.0
click              7.1.2
coloredlogs        15.0
colormath          3.0.0
cryptography       3.4.6
cycler             0.10.0
decorator          4.4.2
et-xmlfile         1.0.1
future             0.18.2
h5py               3.1.0
HTSeq              0.13.5
humanfriendly      9.1
idna               2.10
ipaddress          1.0.23
jdcal              1.4.1
Jinja2             2.11.3
joblib             1.0.1
kiwisolver         1.3.1
lzstring           1.0.4
Markdown           3.3.4
MarkupSafe         1.1.1
matplotlib         3.3.4
minepy             1.2.6
multiqc            1.9
networkx           2.5
numpy              1.20.1
openpyxl           3.0.6
pandas             1.2.2
paramiko           2.7.2
patsy              0.5.1
Pillow             8.1.0
pip                23.0.1
plumbing           2.11.2
py                 1.11.0
pybedtools         0.8.1
pycparser          2.20
pyfasta            0.5.2
PyNaCl             1.4.0
pyparsing          2.4.7
pysam              0.16.0.1
python-dateutil    2.8.1
pytz               2021.1
PyYAML             5.4.1
requests           2.25.1
retry              0.9.2
RiboCode           1.2.15
scikit-learn       0.24.1
scipy              1.6.1
seaborn            0.11.1
setuptools         57.5.0
sh                 2.0.3
simplejson         3.17.2
six                1.15.0
spectra            0.0.11
statsmodels        0.12.2
suds-jurko         0.6
threadpoolctl      2.1.0
tqdm               4.65.0
urllib3            1.26.3
Wand               0.6.5
wheel              0.38.4
xlrd               2.0.1
XlsxWriter         1.3.7

pickle.load() AttributeError: 'module' object has no attribute 'Gene'

Hi,

I ran your tool following the guidelines but when I try to run the Ribocode.py bit, I am getting this error:
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'Gene'

It seems to stem from pickle.load()

Let me know if you have any suggestions.

Best

prepare_transcripts not taking multiple CDSs into account

Hi,

I'm trying to run prepare_transcripts to extract transcripts and CDSs from a GTF file. The problem is that this GTF file contains transcripts with multiple CDSs and, when checking the output file 'transcripts_cds.txt' from prepare_transcripts, it seems to contain only one CDS for each transcript. Does RiboCode support transcripts with multiple CDSs at all? Any way around this?

psites HDF5 file fails to be created

My run failed for my dataset, that has all genes set to equal length.
What happens is this (I here display the partial hdf5 file):

  group           name       otype dclass        dim
0     /        p_sites H5I_DATASET   VLEN 551 x 1000
1     / transcript_ids H5I_DATASET STRING       1000

You see the "psites_number" table is not made, because it fails at inserting the "p_sites" table properly.
I think the newest h5py with anaconda (3.6.0) version does not work now as the code is implemented,
or am I wrong here?

ERROR is:

Traceback (most recent call last):
  File "/home/roler/anaconda3/envs/ribocode_env/bin/RiboCode", line 10, in <module>
    sys.exit(main())
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 84, in read_bam
    write_psites(tpsites,psites_number, name + "_psites.hd5")
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 26, in write_psites
    fout.create_dataset("p_sites",data=list(tpsites.values()),dtype=dt, compression="gzip")
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 145, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 784, in h5py._conv.ndarray2vlen
AttributeError: 'int' object has no attribute 'dtype'

I tried to make the h5 file from scratch, but it still fails with this error:

Traceback (most recent call last):
  File "/home/roler/anaconda3/envs/ribocode_env/bin/RiboCode", line 10, in <module>
    sys.exit(main())
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 49, in read_bam
    tpsites,psites_number = load_psites(name + "_psites.hd5" )
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 35, in load_psites
    psites_number = fin["psites_number"].value
AttributeError: 'Dataset' object has no attribute 'value'

It now says there is no ".value" getter, which might be because ".value" is deprecated?
Relevant link: https://stackoverflow.com/questions/67409919/attributeerror-dataset-object-has-no-attribute-value

ImportError: cannot import name 'PPoly' from partially initialized module 'scipy.interpolate' (most likely due to a circular import) (/home/zhai2/miniconda3/lib/python3.8/site-packages/scipy/interpolate/init.py)

when i use the command : metaplots -a -r ,it show
ImportError: cannot import name 'PPoly' from partially initialized module 'scipy.interpolate' (most likely due to a circular import) (/home/zhai2/miniconda3/lib/python3.8/site-packages/scipy/interpolate/init.py)

how can i slove this ? thankyou

Error in 'RiboCode'

Hi,

When I try to run RiboCode:

RiboCode -a annot/ -c config.txt -l no -g -o result/

I got this error:

Loading transcripts.pickle ...
Reading bam file: newstarAligned.toTranscriptome.out.bam......
Finished reading bam file!
100% has finished!
Writing the results to file .....
Traceback (most recent call last):
File "/usr/local/bin/RiboCode", line 11, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'RiboCode')()
File "/usr/local/lib/python3.6/dist-packages/RiboCode/RiboCode.py", line 63, in main
output_gtf=output_gtf, output_bed=output_bed)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 379, in main
write_result(orf_results,outname)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 156, in write_result
header = "\t".join(list(orf_results[0].keys())[:-1])
IndexError: list index out of range

I really don't know how to deal with this problem.Could you show me some advice?

error in GTFupdate

GTFupdate error， I download the annotation from NCBI. The same problem is present in maize and rice

Generating figures with matplotlib when DISPLAY variable is undefined or invalid

When running the "metaplots" or "plot_orf_density" command, some users received errors similar to the following:

"raise RuntimeError('Invalid DISPLAY variable')"
_"tkinter.TclError: no display name and no $DISPLAY environment variable"

The main problem is that default backend of matplotlib is unavailable. The solution is to modify the backend. A very simple solution is to set the MPLBACKEND environment variable, either for your current shell or for a single script:

export MPLBACKEND="module://my_backend"

Giving below are non-interactive backends, capable of writing to a file:

Agg
PS
PDF
SVG
Cairo
GDK

No obviously periodicity are detected from bam file, it could be due to poor quality sequencing. Please check the metagene plots and try again by lowering the value of frame0_percent

I am using the QC data to do my best analysis and I am not getting the following error, how can I change the p-value or how can I fix this problem？
error:No obviously periodicity are detected from bam file, it could be due to poor quality sequencing. Please check the metagene plots and try again by lowering the value of frame0_percent

thank you!

RiboCode_onestep crashes when calling detectORF

When trying to use RiboCode_onestep I come across this error below:

Finished reading bam file!
Traceback (most recent call last):
File "/usr/local/bin/RiboCode_onestep", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/RiboCode/RiboCode_onestep.py", line 76, in main
detectORF.main(gene_dict=gene_dict, transcript_dict=transcript_dict, annot_dir = "annot",
TypeError: main() missing 3 required positional arguments: 'dependence_test', 'stouffer_adj', and 'pval_adj'

My command line looks like this:
Singularity> RiboCode_onestep -g Homo_sapiens.GRCh38.104.gtf -f Homo_sapiens.GRCh38.dna.primary_assembly.fa -r my.transcriptome.dedup.bam -l no -o RiboCode_ORFs_result

GTF specification should include start and stop codons

I think, the input gtf to metaplots and prepare_transcripts needs start_codon and stop_codon features in order to recognize something as an annotated CDS; CDS features are not enough. Could this be added as an option in GTF_update.rst?

Importantly, could this be clarified explicitly in the gtf specification in README.md?

Error in 'prepare_transcripts'

Hi,

When I try to run prepare_transcripts with command line:

prepare_transcripts -g Arabidopsis_thaliana.TAIR10.41.gtf -f tair10.fa -o tair10_annot/

I got this error:

Preparing annotation files ...
Reading the GTF file: Arabidopsis_thaliana.TAIR10.41.gtf .......
Traceback (most recent call last):
File "/usr/local/bin/prepare_transcripts", line 11, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'prepare_transcripts')()
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 308, in processTranscripts
gene_dict,transcript_dict = readGTF(gtfFile)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 174, in readGTF
field_dict = parsing_line(line)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 102, in parsing_line
field_dict = {"chrom": intern(chrom),"source":source,"feature": intern(feature),"iv":iv,"attr":parsing_attr(attr)}
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 74, in parsing_attr
k,v = i.strip().split(" ",1)
ValueError: not enough values to unpack (expected 2, got 1)

The gtf file was downloaded from http://plants.ensembl.org/index.html
Could you show me some advice about this error?
Have a nice day.

Question on handling multimappers

Dear RiboCode developers,

According to the documentation, it is recommended to include outFilterMultimapNmax 1 parameter in STAR alignment to exclude non-unique alignments and reduce noise for downstream analyses.

In case of default outFilterMultimapNmax 10 setting, how does RiboCode handle non-unique alignments? Are they included in P-site estimating and ORF detection? Does RiboCode differentiate between primary and secondary alignment flags when dealing with multi-mapped reads?

Thank you very much!

Question on adjusted P values reported in collapsed output

Dear developers,

It seems that the adjusted p values in the collapsed output was calculated from filtered p values. Shouldn't multiple testing correction be done before filtering p-values with the default pvalue cutoff of 0.05? To do the multiple testing correction correctly, could I set --pval-cutoff 1 and then filter the output by adjusted_pval < 0.05 manually?

> collapsed[order(-pval_combined)]
      pval_combined adjusted_pval
    1:  4.993873e-02  4.993873e-02
    2:  4.977925e-02  4.978278e-02
    3:  4.968042e-02  4.968748e-02
    4:  4.958320e-02  4.959377e-02
    5:  4.957438e-02  4.958848e-02
   ---                            
14073: 6.443817e-267 1.814192e-263
14074: 3.203795e-276 1.127496e-272
14075:  0.000000e+00  0.000000e+00
14076:  0.000000e+00  0.000000e+00
14077:  0.000000e+00  0.000000e+00

Besides, I noticed that there are two parameters on calculating combined p-values:

  --dependence_test {none,mic,pcc}
                        the method for measuring the dependence between frame1 and frame2. This test could help determine whether the combined p-values should be ajusted to account for the dependence between two test (i.e. F0 vs F1 and F0 vs F2). mic: Maximal Information
                        Coefficient; pcc: Pearson Correlation Coefficient.
  --stouffer_adj {none,nyholt,liji,gao,galwey}
                        the method for adjustment the cominbed p-values to account for the dependence between two tests (i.e. F0 vs F1 and F0 vs F2). see details at: https://search.r-project.org/CRAN/refmans/poolr/html/stouffer.html

What's the recommended way to set the two parameters?

Thanks in advance.

ValueError: unsupported pickle protocol: 5 in metaplots

when i use the metaplots function of ribocode 1.2.14， it show the error:, how can i slove it ,thank you in advance!!!

Create metaplot file and predict the P-site locations ...
Loading transcripts.pickle ...
Traceback (most recent call last):
File "/home/zuozd/miniconda3/envs/ribocode/bin/metaplots", line 10, in
sys.exit(main())
File "/home/zuozd/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 241, in main
gene_dict,transcript_dict = load_transcripts_pickle(os.path.join(args.annot_dir,"transcripts.pickle"))
File "/home/zuozd/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/prepare_transcripts.py", line 293, in load_transcripts_pickle
gene_dict, transcript_dict = pickle.load(fin)
ValueError: unsupported pickle protocol: 5

Detect C-terminal extensions with RiboCode

Hi,

I was wondering if it will be possible to identify C-terminal extensions using RiboCode. I've noticed that N-terminal extensions / truncations are indeed identified by RiboCode (they fall in the category 'annotated', but they have different start codon than the main annotated ORF).

Thank you very much,

Kind regards,

Marina

error in metaplots step

When I run metaplots, it gives an error:
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

I don't know where the problem is, please help, thanks.

RiboCode nf-core module

Hi Developers!

I am one of the developers of the nf-core/riboseq pipeline that we have started work on recently. It is now recognised as 'in-development" by the nf-core community but we still have a lot of work to do. I am trying to develop a RiboCode module so that the pipeline can support ORF calling as an option.

Firstly, I wanted to let you know about this effort and secondly, I have a few suggestions regarding the usability of the tool.

GTF parsing - this could be made to work on a wider range of GTF variants. Much of this can be handled by the GTFupdate subtool. eg GTFupdate should sort the gtf file
Switch to pyfaidx - Brent Pedersen the developer of pyfasta archived it back in 2018. This should not be too arduous but unfortunately it is not a straight swap.

I am happy to help with these and will likely make PRs here for them as I need them

remove PCR duplicate

Hi!
I'm sorry to bother you. I followed Ribocoed workflow, but I don't know whether need to remove PCR duplicate after STAR alignment. Could you give me some suggestions? Thank you very much!

Error in <Preparing the transcripts annotation files> step

ribocode was installed by conda and run in python2.7 env.
when i run this command prepare_transcripts -g ~/data/reference/IRGSP/IRGSP.gtf -f ~/data/reference/IRGSP/IRGSP.fa -o annotation/

then i get fellow error:
Traceback (most recent call last):
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/bin/prepare_transcripts", line 10, in
sys.exit(main())
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 311, in processTranscripts
genomic_seq = GenomeSeq(genomeFasta)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 205, in init
self.fh = Fasta(filename, key_fn = get_chrom)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/pyfasta/fasta.py", line 73, in init
flatten_inplace)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/pyfasta/records.py", line 48, in prepare
idx = cPickle.load(fh)
ValueError: unsupported pickle protocol: 4

please help thx!

请问如何进行lncRNA的ORF鉴定

prepare_transcripts这一步用gencode.lncRNA.gtf生成的文件中，cds文件就已经是空的了，后续用metaplots直接报错no orf出不了结果了

上面是lncRNA的gtf生成的，下面是完整gtf生成的

plot_orf_density error,it show the following arguments are required: --start-codon

plot_orf_density error,it show the following arguments are required: --start-codon,but i have added the start condn:
plot_orf_density -a totalreads_out/ -c metaplots_pre_config.txt -t itf00g40750.t4 -s 65 -e 295 --start-condon ATG --plot-annotated-orf yes
how can i slove?
thank you

Error when writing transcripts to file

Hi,
RiboCode runs on my dataset from Cryptococcus neoformas, then fails during the writing to gtf. This meant that ORF calling succeeded and wrote a complete .txt file, but for the .gtf output left large gaps of entire chromosomes or half-chromosomes.

Error message was:
Errors: 88% has finished! Writing the results to file ..... error when transform the transcript interval to genomic! Traceback (most recent call last): File "/usr/local/bin/RiboCode", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/RiboCode/RiboCode.py", line 63, in main output_gtf=output_gtf, output_bed=output_bed) File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 455, in main write_to_gtf(gene_dict, transcript_dict, orf_results, collapsed_orf_idx, outname) File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 192, in write_to_gtf exon_ivs = transcript_iv_transform(tobj, orf_iv) File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 285, in transcript_iv_transform exons_ivs.append(Interval_from_directional(exons_bound[i],exons_bound[i+1],strand))

I suggest three possible ways to help with this.

Report which transcript / exon has failed.
Put some exception-handling code around that failure so that remaining transcripts are still written.
Write out the ORFs in transcript co-ordinates first, which would be better for my application anyway.

I can provide access to data and annotations if that helps to debug!

Thanks
Edward

TypeError: '<' not supported between instances of 'str' and 'NoneType'

When i used metaplots -a RiboCode_annot -r HEK293Aligned.toTranscriptome.out.bam, an error occurred:
[2022-07-28 21:05:31] Create metaplot file and predict the P-site locations ...
Loading transcripts.pickle ...
Traceback (most recent call last):
File "/home/user/BGM/lit/anaconda3/envs/ribocode/bin/metaplots", line 10, in
sys.exit(main())
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 241, in main
meta_analysis(gene_dict,transcript_dict,args)
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 181, in meta_analysis
filter_tids = filter_transcript(gene_dict,transcript_dict)
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 48, in filter_transcript
level = list(sorted(levels))[0]
TypeError: '<' not supported between instances of 'str' and 'NoneType'
how can i solve this? thanx!!

请问关于ORF预测结果中‘Psites_frame0_RPKM’代表什么意思？

请问Psites_frame0_RPKM代表的位于0读码框架的reads计算的ORF的表达量吗，不包含读码框1和2的reads

ImportError: No module named Tkinter

The Tkinter is needed when import the matplotlib module. If the following error is returned, pls check whether the tkinter package is installed. On Linux platform, users can install this package using follow command:
apt-get install python-tk
or
yum install python-tk

Traceback (most recent call last):
File "/usr/bin/metaplots", line 7, in
from RiboCode.metaplots import main
File "/usr/lib64/python2.7/site-packages/RiboCode/metaplots.py", line 7, in
import matplotlib.pyplot as plt
File "/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 115, in
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/init.py", line 32, in pylab_setup
globals(),locals(),[backend_name],0)
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in
from six.moves import tkinter as Tk
File "/usr/lib/python2.7/site-packages/six.py", line 203, in load_module
mod = mod._resolve()
File "/usr/lib/python2.7/site-packages/six.py", line 115, in _resolve
return _import_module(self.mod)
File "/usr/lib/python2.7/site-packages/six.py", line 82, in _import_module
import(name)
ImportError: No module named Tkinter

psite HDF5 file is saved in the wrong location

Correct me if I am wrong, but currently, the hdf5 file with psite information is saved in working dir and not in defined output-dir, this makes it fail if output-dir is not the working directory for the call.

This was tested on the latest version through Conda.

Unknown Attribute Error

Hi,

I am getting a weird AttributeError with not much description of what's going wrong:

Traceback (most recent call last):
File "/home/bwee/.local/bin/metaplots", line 9, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'metaplots')()
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 241, in main
meta_analysis(gene_dict,transcript_dict,args)
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 227, in meta_analysis
distancePlot(distance_to_start_count,distance_to_stop_count,pre_psite_dict,length_counter,args.outname + sampleName)
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 108, in distancePlot
with PdfPages(outname + ".pdf") as pdf:
AttributeError: exit

Could someone help me understand what's going wrong?

Thanks,
Brendan

Incorrect handling of hdf5 file for repeated runs of RiboCode

Ribocode does not correctly handle the hdf5 file it creates to store temperary data, resulting in identical results for multiple runs that are ran from the same folder.

For example, running this script resulted in all identical ORF libraries for each of the datasets.

while IFS= read -r folder; do
    echo "$folder"
    mkdir "$folder/GRCh38_110/ribocode/"
    RiboCode_onestep -g "$assembly/Homo_sapiens.GRCh38.110.gtf" \
                     -f "$assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
                     -r "${folder}/GRCh38_110/aligned_tran.bam" \
                     -l no \
                     -o "${folder}/GRCh38_110/ribocode/ribo" \
                     -f0_percent 0.5
done < "$1"

I solved this by removing the hdf5 file created after each run

while IFS= read -r folder; do
    echo "$folder"
    mkdir "$folder/GRCh38_110/ribocode/"
    rm aligned_tran_psites.hd5                                               <-- remove file
    RiboCode_onestep -g "$assembly/Homo_sapiens.GRCh38.110.gtf" \
                     -f "$assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
                     -r "${folder}/GRCh38_110/aligned_tran.bam" \
                     -l no \
                     -o "${folder}/GRCh38_110/ribocode/ribo" \
                     -f0_percent 0.5
done < "$1"

psites_number table is not generated with v1.2.15

I think I still encounter an issue that is similar to issue#32
i.e., I did not get the psites_number table in the hdf5 file.

  group           name       otype dclass   dim
0     /        p_sites H5I_DATASET   VLEN 46826
1     / transcript_ids H5I_DATASET STRING 46826

Also the dimension of the p_sites table seems to be wrong.
Otherwise the code ran okay and generated other files.

This job is running on skl-119 on Wed May 17 23:21:46 EDT 2023
0% has finished! ^M2% has finished! ^M4% has finished! ^M6% has finished! ^M9% has finished! ^M11% has finished! ^M13% has finished! ^M15% has finished! ^M17% has finished! ^M19% has finished! ^M21% has finished! ^M23% has finished! ^M26% has finished! ^M28% has finished! ^M30% has finished! ^M32% has finished! ^M34% has finished! ^M36% has finished! ^M38% has finished! ^M41% has finished! ^M43% has finished! ^M45% has finished! ^M47% has finished! ^M49% has finished! ^M51% has finished! ^M53% has finished! ^M56% has finished! ^M58% has finished! ^M60% has finished! ^M62% has finished! ^M64% has finished! ^M66% has finished! ^M68% has finished! ^M70% has finished! ^M73% has finished! ^M75% has finished! ^M77% has finished! ^M79% has finished! ^M81% has finished! ^M83% has finished! ^M85% has finished! ^M88% has finished! ^M90% has finished! ^M92% has finished! ^M94% has finished! ^M96% has finished! ^M98% has finished! ^M[2023-05-17 23:29:14] Finished!
        Loading transcripts.pickle ...
        Reading bam file: /mnt/home/larrywu/CTRL_arabidopsis/data/RiboCode_STAR/ribo_mapped/D1//star_D1_Aligned.toTranscriptome.out.bam......
Finished reading bam file!

Any suggestions for how to deal with this issue? Thanks!