Giter VIP home page Giter VIP logo

rna-seq-analysis's Introduction

RNA-seq analysis

General sequencing data analysis materials

RNA-seq specific

RNA-seq experimental design

Quality Control

  • QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments
  • QUaCRS
  • RSeQC RNA-seq data QC
  • RNA-SeqQC

Normalization, quantification, and differential expression

Traditional way of RNA-seq analysis

A nice tutorial from f1000 research RNA-Seq workflow: gene-level exploratory analysis and differential expression from Michael Love who is the author of DESeq2.

A post from Nextgeneseek

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization

The three papers kind of replaces earlier tools from Salzberg’s group (Bowtie/TopHat,Cufflinks, and Cuffmerge)
they offer a totally new way to go from raw RNA-seq reads to differential expression analysis:
align RNA-seq reads to genome (HISATinstead of Bowtie/TopHat, STAR),
assemble transcripts and estimate expression (StringTie instead of Cufflinks), and
perform differential expression analysis (Ballgown instead of Cuffmerge).

RapMap: A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes. From Sailfish group.

  • BitSeq Transcript isoform level expression and differential expression estimation for RNA-seq

For mapping based methods, usually the raw reads are mapped to transcriptome or genome (need to model gaps by exon-exon junction), and then a gene/transcript level counts are obtained by:

Finally, differential expression is carried out by

  • DESeq2

  • EdgeR

  • limma Voom

  • EBseq An R package for gene and isoform differential expression analysis of RNA-seq data

  • JunctionSeq differential usage of exons and splice junctions in High-Throughput, Next-Generation RNA-Seq datasets. The methodology is heavily based on the DEXSeq bioconductor package.The core advantage of JunctionSeq over other similar tools is that it provides a powerful automated tools for generating readable and interpretable plots and tables to facilitate the interpretation of the results. An example results report is available here.

  • MetaSeq Meta-analysis of RNA-Seq count data in multiple studies

  • derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution

  • DGEclust is a program for clustering and differential expression analysis of expression data generated by next-generation sequencing assays, such as RNA-seq, CAGE and others

  • Degust: Perform RNA-seq analysis and visualisation. Simply upload a CSV file of read counts for each replicate; then view your DGE data.

  • Vennt Dynamic Venn diagrams for Differential Gene Expression.

  • GlimmaInteractive HTML graphics for RNA-seq data

Extra Notes

Benchmarking

bcbio.rnaseq
RNAseqGUI. I have used several times. looks good.
compcodeR
paper: Benchmark Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data
paper: Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms
paper:A benchmark for RNA-seq quantification pipelines

Map free

Blog posts on Kallisto

  1. Comparing unpublished RNA-Seq gene expression quantifiers
  2. Kallisto, a new ultra fast RNA-seq quantitation method from Next GEN SEEK
  3. kallisto paper summary: Near-optimal RNA-seq quantification from Next GEN SEEK
  4. Not-quite alignments: Salmon, Kallisto and Efficient Quantification of RNA-Seq data
  5. Using Kallisto for gene expression analysis of published RNAseq data
  6. How accurate is Kallisto? from Mark Ziemann
  7. ALIGNMENT FREE TRANSCRIPTOME QUANTIFICATION
  8. A sleuth for RNA-seq
  9. Using Salmon, Sailfish and Sleuth for differential expression
  10. Road-testing Kallisto

A biostar post: Do not feed rounded estimates of gene counts from kallisto into DESeq2 (please make sure you read through all the comments, and now there is a suggested workflow for feeding rounded estimates of gene counts to DESeq etc)

There is some confusion in the answers to this question that hopefully I can clarify with the three comments below:

  1. kallisto produces estimates of transcript level counts, and therefore to obtain an estimate of the number of reads from a gene the correct thing to do is to sum the estimated counts from the constituent transcripts of that gene. Of note in the language above is the word "estimate", which is necessary because in many cases reads cannot be mapped uniquely to genes. However insofar as obtaining a good estimate, the approach of kallisto (and before it Cufflinks, RSEM, eXpress and other "transcript level quantification tools") is superior to naïve "counting" approaches for estimating the number of reads originating from a gene. This point has been argued in many papers; among my own papers it is most clearly explained and demonstrated in Trapnell et al. 2013.
  1. Although estimated counts for a gene can be obtained by summing the estimated counts of the constituent transcripts from tools such as kallisto, and the resulting numbers can be rounded to produce integers that are of the correct format for tools such as DESeq, the numbers produced by such an approach do not satisfy the distributional assumptions made in DESeq and related tools. For example, in DESeq2, counts are modeled "as following a negative binomial distribution". This assumption is not valid when summing estimated counts of transcripts to obtain gene level counts, hence the justified concern of Michael Love that plugging in sums of estimated transcript counts could be problematic for DESeq2. In fact, even the estimated transcript counts themselves are not negative binomial distributed, and therefore also those are not appropriate for plugging into DESeq2. His concern is equally valid with many other "count based" differential expression tools.
  1. Fortunately there is a solution for performing valid statistical testing of differential abundance of individual transcripts, namely the method implemented in sleuth. The approach is described here. To test for differential abundance of genes, one must first address the question of what that means. E.g. is a gene differential if at least one isoform is? or if all the isoforms are? The tests of sleuth are performed at the granularity of transcripts, allowing for downstream analysis that can capture the varied questions that might make biological sense in specific contexts.

In summary, please do not plug in rounded estimates of gene counts from kallisto into DESeq2 and other tools. While it is technically possible, it is not statistically advisable. Instead, you should use tools that make valid distributional assumptions about the estimates.

However, Charlotte Soneson, Mike Love and Mark Robinson showed in a f1000 paper: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences that rounded values from transcript level can be fed into DESeq2 etc for gene-level differential expression, and it is valid and preferable in many ways.

Thanks Rob Patro for pointing it out!

  • artemis: RNAseq analysis, from raw reads to pathways, typically in a few minutes. Mostly by wrapping Kallisto and caching everything we possibly can.
  • isolator:Rapid and robust analysis of RNA-Seq experiments.

Isolator has a particular focus on producing stable, consistent estimates. Maximum likelihood approaches produce unstable point estimates: small changes in the data can result in drastically different results, conflating downstream analysis like clustering or PCA. Isolator produces estimates that are in general, simultaneously more stable and more accurate other methods

Batch effects

TACKLING BATCH EFFECTS AND BIAS IN TRANSCRIPT EXPRESSION by mike love
paper:Tackling the widespread and critical impact of batch effects in high-throughput data by Jeffrey T. Leek in Rafael A. Irizarry's lab.
A reanalysis of mouse ENCODE comparative gene expression data
Is it species or is it batch? They are confounded, so we can't know
Mouse / Human Transcriptomics and Batch Effects
Meta-analysis of RNA-seq expression data across species, tissues and studies:Interspecies clustering by tissue is the predominantly observed pattern among various studies under various distance metrics and normalization methods Surrogate Variable Analysis:SVA bioconductor
Paper Summary: Systematic bias and batch effects in single-cell RNA-Seq data

Databases

This package is for searching for datasets in EMBL-EBI Expression Atlas, and downloading them into R for further analysis. Each Expression Atlas dataset is represented as a SimpleList object with one element per platform. Sequencing data is contained in a SummarizedExperiment object, while microarray data is contained in an ExpressionSet or MAList object.

Gene Set enrichment analysis

Pathway analysis

  • [Statistical analysis and visualization of functional profiles for gene and gene clusters: bioconductor
  • clusterProfiler](http://www.bioconductor.org/packages/devel/bioc/html/clusterProfiler.html) by GuangChuang Yu from University of HongKong. Can do many jobs and GSEA like figure. It is very useful and I will give it a try besides
  • GAGE.
  • DAVID:The Database for Annotation, Visualization and Integrated Discovery (DAVID ). UPDATED in 2016!!!

Fusion gene detection

Alternative splicing

  • SplicePlot: a tool for visualizing alternative splicing Sashimi plots
  • Multivariate Analysis of Transcript Splicing (MATS)
  • SNPlice is a software tool to find and evaluate the co-occurrence of single-nucleotide-polymorphisms (SNP) and altered splicing in next-gen mRNA sequence reads. SNPlice requires, as input: genome aligned reads, exon-intron-exon junctions, and SNPs. exon-intron-exon junctions and SNPs may be derived from the reads directly, using, for example, TopHat2 and samtools, or they may be derived from independent sources
  • Visualizing Alternative Splicing github page
  • spladder Tool for the detection and quantification of alternative splicing events from RNA-Seq data
  • SUPPA This tool generates different Alternative Splicing (AS) events and calculates the PSI ("Percentage Spliced In") value for each event exploiting the fast quantification of transcript abundances from multiple samples

microRNAs and non-coding RNAs

transcriptional pausing

Allel specific expression

Single cell RNA-seq

single cell RNA-seq clustering

  • Geometry of the Gene Expression Space of Individual Cells
  • pcaReduce: Hierarchical Clustering of Single Cell Transcriptional Profiles.
  • Single-Cell Consensus Clustering bioconductor package
  • CountClust: Clustering and Visualizing RNA-Seq Expression Data using Grade of Membership Models. Fits grade of membership models (GoM, also known as admixture models) to cluster RNA-seq gene expression count data, identifies characteristic genes driving cluster memberships, and provides a visual summary of the cluster memberships
  • FastProject: A Tool for Low-Dimensional Analysis of Single-Cell RNA-Seq Data
  • SNN-Cliq Identification of cell types from single-cell transcriptomes using a novel clustering method

rna-seq-analysis's People

Contributors

crazyhottommy avatar dyndna avatar

Stargazers

xiucz avatar

Watchers

James Cloos avatar Wayne Fang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.