Giter VIP home page Giter VIP logo

conquer_comparison's Introduction

Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

This repository contains all the necessary code to perform the evaluation of differential expression analysis methods in single-cell RNA-seq data, available in

In this paper, we compare the performance of more than 30 approaches to differential gene expression analysis in the context of single-cell RNA-seq data. The main results can be further browsed in a shiny app.

Note: The purpose of the conquer_comparison repository is to provide a public record of the exact code that was used for our publication (Soneson & Robinson, Nature Methods 2018). In particular, it is not intended to be a software package or a general pipeline for differential expression analysis of single-cell data. As a consequence, running the code requires the same software and package versions that were used for our analyses (all versions are indicated in the paper). As the analysis involved running a large number of methods on many data sets and over an extended period of time, we cannot guarantee that it will run successfully with new releases of the software, or that exactly the same results will be obtained with newer versions of the packages. While the repository will not be updated to ensure that it runs with every new version of the used packages, the issues can be used to post questions and/or solutions as they arise.

The repository contains the following information:

  • config/ contains configuration files for all the data sets that we considered. The configuration files detail the cell populations that were compared, as well as the number of cells per group used in each comparison.
  • data/ contains some of the raw data that was used for the comparison. All data sets that were used can be downloaded as a bundle from http://imlspenticton.uzh.ch/robinson_lab/conquer_de_comparison/
  • export_results/ contains results for the final figures, in tabular format
  • scripts/ contains all R scripts used for the evaluation
  • shiny/ contains the code for a shiny app built to browse the results (http://imlspenticton.uzh.ch:3838/scrnaseq_de_evaluation)
  • unit_tests/ contains unit tests that were used to check the calculations
  • Makefile is the master script, which outlines the entire evaluation and calls all scripts in the appropriate order
  • include_filterings.mk, include_datasets.mk, include_methods.mk and plot_methods.mk are additional makefiles listing the filter settings, data set and differential expression methods used in the comparison

Running the comparison

Assuming that all prerequisites are available, the comparison can be run by simply typing

$ make

from the top directory (note, however, that this will take a significant amount of time!). The Makefile reads the three files include_filterings.mk, include_datasets.mk and include_methods.mk and performs the evaluation using the data sets, methods and filterings defined in these. The file plot_methods.mk detail the methods included in the final summary plots. For the code to execute properly, an .rds file containing a MultiAssayExperiment object for each data set must be provided in the data/ directory. Such files can be downloaded, e.g., from the conquer database. The files used for the evaluation are bundled together in an archive that can be downloaded from here

Adding a differential expression method

To add a differential expression method to the evaluation, construct a script in the form of the provided apply_*.R scripts (in the scripts/ directory), where * should be the name of the method. Then add the name of the method to include_methods.mk. To make it show up in the summary plots, add it to plot_methods.mk and assign it a color in scripts/plot_setup.R.

Adding a data set

To add a data set, put the .rds file containing the MultiArrayExperiment object in the data/ folder and construct a script in the form of the provided generate_config_*.R scripts (in the scripts/ directory), where * should be the name of the data set. Then add the name of the dataset to the appropriate variables in include_datasets.mk. Also, add the data set to the data/dataset_type.txt file, indicating the type of values in each data set.

A note on the data sets

Most data sets in the published evaluation are obtained from the conquer repository. The RPM values for the Usoskin dataset was downloaded from http://linnarssonlab.org/drg/ on December 18, 2016. The 10X data set was downloaded from https://support.10xgenomics.com/single-cell-gene-expression/datasets on September 17, 2017.

Cell cycle genes

The list of mouse cell cycle genes was obtained from http://www.sabiosciences.com/rt_pcr_product/HTML/PAMM-020A.html on March 9, 2017.

Unit tests

To run all the unit tests, start R, load the testthat package and run source("scripts/run_unit_tests.R"). Alternatively, to run just the unit tests in a given file, do e.g. test_file("unit_tests/test_trueperformance.R", reporter = "summary").

conquer_comparison's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

conquer_comparison's Issues

A quick question on constructing "L" object

This is amazing work!! I really appreciate that you made the expression data and the source codes available.

I was trying to apply your function on my data and was having a hard time constructing an "L" object to feed into any apply*.R function. Say, I would like to apply run_BPSC() on data set GSE74596, so I did the following things:

GSE74596 <- readRDS("GSE74596.rds")
GSE74596 <- updateObject(GSE74596)
experiments(GSE74596)
(GSE74596_gene <- experiments(GSE74596)[["gene"]])
L <- list()
L$count <- assays(GSE74596_gene)[["count"]]
L$condt <- colData(GSE74596)$characteristics_ch1.5

Could you please let me know if this is the right way to prepare an "L" object or if I need another function to do it? Any comments would be greatly appreciated.

Please let me know if this is not the right place to post questions,

Thanks,

Gabby

Creating Data obj, L

I can't find L, a data obj used in the almost all the apply- method in the repository. It will great if you could post it or a script used to generate L posted in the repository.

Thanks.

powsim and .rds formatting

Thank you for this great work, I expect it will save people like me a lot of time on testing out differential analysis method.

I was trying to just run this code on the provided data, but I am running into few issues.

First of all, I can't seem to install powsim as one of its dependencies, gu-mi/NBGOF cannot be installed due to ELF header error. Instead, I installed powsimR, powsim's update, which only runs on 3.4.0 or newer versions of R. As such, running powsimR introduces another issue of opening rds file, which I presume was created in R version 3.3.2, in R version 3.4.0. Using readRDS on the config$mae produces error stating the RDS file is outdated and suggests using updatedObject function. However, using updatedObject after readRDS results in an error below.

Error in validObject(.Object) :
class “MultiAssayExperiment” object: 'sampleMap' assay column not a factor

Please let me know if you know a fix to either installing powsim, or loading provided rds data in newer versions of R.

Additionally, I could not install two additional packages, DEsingle and NODES. These are two of the methods that are being tested, but I cannot find sources that has these packages.

Thank you for reading.

Difficulty in running the script

I'm trying to run the Makefile, but there are lots of issues. Could you update the script to R3.5 or provide a docker image of your working environment? Also, can you provide a testing include_datasets.mk containing minimum amount of datasets to test the Makefile?

Here are some issues:

  1. Under R3.3, it's very difficult to install some packages; for example, "SingleCellExperiment". Even its old version requires R>=3.4. Some packages reside only in github and do not provide old versions compatible with R3.3
  2. Since your scripts were written, some packages have changed their name; for example, powsim is now powsimR. And some packages have changed their function parameters or return values.
  3. The provided data bundle does not have some of the text files required by the Makefile. And the RDS files do not work with new version of MultiAssayExperiment, for example, on the pData slot.

Add a new DE method

To create a new Apply_*.R, what should be the return value of the method? I see that those Apply_*.R returns a list(session_info,timing, res, df), what should be the content of the res?

Thanks a lot!

Prefiltering of lowly expressed genes

In the paper, you described "prefiltering of lowly expressed genes" will improve edgeR/QLF method. Could you tell me what filtering criteria you applied to improve the results?

Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.