Giter VIP home page Giter VIP logo

riborf's Introduction

RibORF: Identify translated ORFs using ribosome profiling

RibORF is a computational algorithm to identify genome-wide translated ORFs using ribosome profiling data, distinguishing off-frame ORFs and RNAs not associated with ribosomes.

Requirements: A standard R installation and the package ¡°e1071¡± for the Support Vector Machine classifier. Installations need to be in the PATH.

Steps to analyze ribosome profiling data and run RibORF:

  1. Trim adapter of ribosome profiling reads.

perl removeAdaptor.pl fastqFile adapterSequence outputFile [readLengthCutoff] fastqFile: raw fastq read sequences; adapterSequence: 5' end sequence of adapter, 10nt is recommended; outputFile: output file; readLengthCutoff [optional]: read length cutoff after trimming adapters, default is 15 nt.

Example: perl removeAdaptor.pl ./example.data/sample.fastq CTGTAGGCAC ./example.data/adapter.sample.fastq 15

  1. Map trimmed reads to ribosomal RNAs. Then non-ribosomal reads were aligned to transcriptomes and genomes.

Example: bowtie -v 1 -k 2 ribosome_RNA.index --un=norrna.adapter.sample.txt -q adapter.sample.fastq ribosome.align.adapter.sample.txt tophat -p 2 --no-convert-bam --GTF transcripts.gtf -o outDir genome.index norrna.adapter.sample.txt

  1. Group reads based on fragment length, and check their 5' ends around start and stop codons of canonical protein-coding ORFs.

perl readDist.pl readFile geneFile outputDir readLength [leftNum] [rightNum] readFile: read mapping file, SAM format; geneFile: canonical protein-coding ORF annotation, genepred format; outputDir: output directory; readLength: specified RPF length; leftNum [optional]: N nucleotides upstream start codon and downstream stop codon, default: 30; rightNum [optional]: N nucleotides downstream start codon and upstream stop codon, default: 50.

Example: perl readDist.pl ./example.data/sample.mapping.sam ./example.data/hg19.coding.gene.txt ./example.data 30 30 50

  1. Correct read locations based on offset distances between 5¡¯ ends and ribosomal A-sites. Based on the distribution of read fragments around start and stop codon of canonical protein-coding ORFs, manually check the offset distance between 5¡¯ end and ribosomal A-site. Put correction parameters in a file, i.g. "offset.corretion.parameters.txt". Note: Different ribosomal profiling experiments may have different offset correction parameters.

perl offsetCorrect.pl readFile offsetParameterFile readCorrectedFile; readFile: read mapping file before offset correction, SAM format; offsetParameterFile: parameters for offset correction, 1st column: read length, 2nd column: offset distance; readCorrectedFile: output file after offset correction.

Example: perl offsetCorrect.pl ./example.data/sample.mapping.sam ./example.data/offset.corretion.parameters.txt ./example.data/corrected.sample.mapping.sam

  1. Check the corrected read locations around start and stop codons of canonical protein-coding ORFs. This step is to check whether read distribution after offset correction shows clear 3-nt periodicity.

perl readDist.pl readFile geneFile outputDir readLength [leftNum] [rightNum]

Example: perl readDist.pl ./example.data/corrected.sample.mapping.sam ./example.data/hg19.coding.gene.txt ./example.data 1 30 50

  1. Run RibORF to identify translated ORFs.

perl ribORF.pl readCorrectedFile candidateORFFile outputDir [orfLengthCutoff] [orfReadCutoff] readCorrectedFile: input read mapping file after offset correction; candidateORFFile: candidate ORFs, genePred format; outputDir: output directory, with files reporting testing parameters and predicted translating probability; orfLengthCutoff [optional]: cutoff of ORF length (nt), default: 12; orfReadCutoff [optional]: cutoff of supported read numbe, default: 11.

Example: perl ribORF.pl ./example.data/corrected.sample.mapping.sam ./example.data/candidate.ORF.txt ./example.data 10 10

For questions, please contact: Zhe Ji ([email protected] or [email protected])

riborf's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.