imb-computational-genomics-lab / ascend Goto Github PK

View Code? Open in Web Editor NEW

21.0 6.0 7.0 262.25 MB

R package - Analysis of Single Cell Expression, Normalisation and Differential expression (ascend)

R 96.89% C++ 3.11%

scrna-seq-analysis scrnaseq r rpackages singlecell

ascend's Introduction

THIS REPOSITORY HAS NOW MOVED TO powellgenomicslab/ascend!

ascend's People

Contributors

Stargazers

Watchers

Forkers

yawarjq quanaibn vd4mmind chitrita pedriniedoardo am-9 maikam

ascend's Issues

joining multiple matrices -> NewEMSet

I have attempted to load 3 unaggregated cellranger data sets using LoadCellranger(), extract the 3 matrices, join them and then make a new EMSet. I can create the new EMSet, but he issue I have noticed is that the CellInfo$batch slot does not recognise multiple batches are in there, even though the cell barcodes contain the correct suffix ('-1', '-2', '-3').

An example of the output is here:

> tail(ci)
           cell_barcode batch
2236 TTTGCGCCATTTCACT-3     1
2237 TTTGCGCGTTGTCGCG-3     1

I can probably just correct the data in 'batch', but is probably better to get it automatically

error when running runDESeq if both condition.a and condition.b contain more than 1 groups

Hi, after running runCORE, I got 4 clusters for the sample. Then I used runDESeq to identify DE genes between cluter1,3 and cluster2,4 with the following command.

cluster1_3_vs_2_4 <- runDESeq(scran_normalised, group = "cluster", condition.a = c("1","3"), condition.b = c("2", "4"), ngenes = 5000, fitType = "local", method = "per-condition")

got error information as

Loading required package: dynamicTreeCut
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: lattice
    Welcome to 'DESeq'. For improved performance, usability and
    functionality, please consider migrating to 'DESeq2'.
[1] "Identifying genes to retain..."
[1] "Running DESeq..."
  |=======                                                               |  10%
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: fewer than one row in the data
library(BiocParallel)
Execution halted

After checked the code of function runDESeq, I found one suspicious part which might be a bug. I think the third line in following in code, the condition.b should be changed to 'condition.a'. Otherwise, it will overwrite the original condition.b, and then make the actual comparison between condition.a itself and cause the error.

    if (length(condition.a) > 1) {
        reformatted <- reformatCondition(condition.a, condition_list = condition_list)
        condition.b <- reformatted$condition
        condition_list <- reformatted$condition_list
    }
    else {
        replace_idx <- which(condition_list %in% condition.a)
        condition.a <- as.character(condition.a)
        condition_list[replace_idx] <- condition.a
    }
    if (length(condition.b > 1)) {
        reformatted <- reformatCondition(condition.b, condition_list = condition_list)
        condition.b <- reformatted$condition
        condition_list <- reformatted$condition_list
    }
    else {
        replace_idx <- which(condition_list %in% condition.b)
        condition.b <- as.character(condition.b)
        condition_list[replace_idx] <- condition.b
    }

Vignettes Not Found By R

The vignettes are not found.

> browseVignettes("ascend")
No vignettes found by browseVignettes("ascend")

The package loads.

> library(ascend)
Loading required package: dplyr
Attaching package: ‘dplyr’
    ...        ...

For comparison, browseVignettes("scran") works as expected.

Installation problem

Installing package into ‘/home/maod/R/x86_64-pc-linux-gnu-library/3.5’
(as ‘lib’ is unspecified)

installing source package ‘ascend’ ...
** R
** data
*** moving datasets to lazyload DB
/opt/software/helix/R/3.5.1/lib64/R/bin/INSTALL: line 34: 20273 Done echo 'tools:::.install_packages()'
20274 Killed | R_DEFAULT_PACKAGES= LC_COLLATE=C "${R_HOME}/bin/R" $myArgs --slave --args ${args}
Error in i.p(...) :
(converted from warning) installation of package ‘/scratch/RtmpQuhDkV/file43bb645d3e75/ascend_0.9.9.tar.gz’ had non-zero exit status

Here's my sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS: /gpfs/ctgs0/software/helix/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /gpfs/ctgs0/software/helix/R/3.5.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 ps_1.3.0 prettyunits_1.0.2 rprojroot_1.3-2
[5] digest_0.6.18 crayon_1.3.4 withr_2.1.2 assertthat_0.2.0
[9] R6_2.4.0 backports_1.1.3 magrittr_1.5 rlang_0.3.1
[13] cli_1.0.1 curl_3.3 fs_1.2.6 remotes_2.0.2
[17] callr_3.1.1 devtools_2.0.1 desc_1.2.0 tools_3.5.1
[21] glue_1.3.0 pkgload_1.0.2 compiler_3.5.1 processx_3.2.1
[25] pkgbuild_1.0.2 sessioninfo_1.1.1 memoise_1.1.0 usethis_1.4.0

"Please ensure 'counts' are in your supplied assay list"

Hello,

I am trying to create a new EmSET from a single cell object. I keep getting an error in which it says I need to ensure 'counts' are in supplied assay list. However, when I check the structure of this single cell object, my assay is indeed listed as counts... Not sure why it isn't recognizing this? I have tried many different permutations but can't seem to get this to work.

Using a already processed and normalized dataset.

Hello again!

Sorry for the frequent questions.

I have everything working for data coming from cell ranger but would also like to use some of the tools in ascend with publicly available data sets.

I have a dataset from GEO Accession: GSE70630_OG_processed_data_v2.txt , that has cell barcodes, gene names and normalized counts.

It has been processed/normalized (I believe spike-ins are present however). I would like to use the differential expression, dimension reduction, cell cycle tools on this data set. But am running into problems as getting to PCA requires the other steps be completed prior.

I can get the data into an EMset format but thats really about it.

Any help would be appreciated! And sorry if this is a dumb question.

Subset Cells with Gene Label

Hello,

I'm having difficulty making use of the addGeneLabel functionality.

I was wondering if there was a way to subset cells based off a gene "signature", using the addGenelabel and subset functions?

For example, finding out which cells express a certain proportion of a gene list (signature) and then subsetting those cells or adding them as a condition in colInfo?

Like if I had a gene set (signature) for "stressed" and then figuring out how many cells possess a certain percentage of that signature, (say some % of the cells possess some % of that signature and thats what I decide is "stressed"). Then labeling them as "stressed" and the other cells as "Not-Stressed" as a condition so that I can subset or do further analysis.

Volcano plots

As requested by @MichaelPeibo in #14:

Also, I really like your devolcalno plot, shown here:
image
in these cases, you only show the label of some genes rather than all. How do you plot it?(I did not find tunable parameters in the plot function)

And what is the parameter setting for certain gene expression plot in tsne?(sorry for thousands of Qs...)

The DE volcano plots in our publication were generated manually with ggplot2 and without labels. I have tried coming up with some automated settings to produce a similar result, but it was actually quite difficult and I was unable to find a way to do this for all results. I generated an interactive ggplot2 that was labelled to guide my annotation of this plot.

To make the apoptosis genes more visible, they were overplotted onto the main scatter.

I may add a tutorial for this at a later stage.

NewAEMSet requires the genes matrix to be in order 1: ensembl_id then 2: gene_name

if changing the order, there is a error: Error in $<-.data.frame(*tmp*, "control", value = TRUE) :
replacement has 1 row, data has 0
I think this can be solved by calling columns by names?

Clustering options

As requested by @MichaelPeibo in #14:

besides, does Ascend has any options which can be used to cluster 'once for all' or tune the 'cluster resolution' like Seurat did, rather than, repeated clustering?

As of version 0.3, ascend has a resolution tuning option (nres) :

clustered.set <- RunCORE(em.set, nres = 50)

This allows users to select the number of resolutions to use to determine the optimal resolution. The number of resolutions users can choose must be between 20 and 100.

The CORE algorithm can be considered a 'once for all' method, in which it generates a series of results and selects the best one. If you are not satisfied with the result, you can refer to the other results with the data frame available through the GetRandMatrix function.

Failed to convert em.set to SCESet or SingleCellExperiment

hi,ascend team
I got this error when I did conversion as your tutorial told:
controls <- GetControls(em.set) sce.object <- ConvertToSCE(em.set, control.list = controls) sce.object

controls <- GetControls(em.set) sce.set <- ConvertToSCESet(em.set, control.list = controls)

but I got this error

Error in seq_len(ncol(assay)) :
argument must be coercible to non-negative integer
In addition: Warning messages:
1: 'newSCESet' is deprecated.
Use 'SingleCellExperiment' instead.
See help("Deprecated")
2: In seq_len(ncol(assay)) : first element used of 'length.out' argument

any suggestion?

MergeExprsMtx function does not exist?

The tutorial vignette mentions this function by name but the function does not seem to exist.

There is a function called JoinMatrices but this function doesn't do what I'm looking for - my expression matrices are sparse (class dgCMatrix) and this function wants data.frames. I'd prefer not to convert my sparse matrices to data.frames.

Problem in running scranNormalise

Hello,
I wanted to perform normalization with scranNormalise function. I created the newEMSet object in the following way:
em <- newEMSet(assays = list(counts = counts) )

Now when Im trying to run command:
scran_normalised <- scranNormalise(em, quickCluster = FALSE, min.mean = 1e-05)

Im getting error:
Please specify the name of the first column in colInfo as 'cell_barcode'.

However in my colnames(colInfo) in have cell_barcode (as I saw in ascend code it is created automatically when you dont specify it in the newEMSet).

Do you know any possible fix to the problem ?

Best,
Monika

Error: BiocParallel errors

Hi, Ascend team
after normalization by scranNormalise, I want to regress out the cell cycle factor by RegressConfoundingFactors, however, when I run this function , I got this error,

Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: NA/NaN/Inf in 'x'

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

Matrix products: default
BLAS: /share/app/cluster/R-3.4.3/lib64/R/lib/libRblas.so
LAPACK: /share/app/cluster/R-3.4.3/lib64/R/lib/libRlapack.so

 version
               _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          3
minor          4.3
year           2017
month          11
day            30
svn rev        73796
language       R
version.string R version 3.4.3 (2017-11-30)
nickname       Kite-Eating Tree

I did installed and configured the BiocParallel as you told, any suggestion on this? Thanks!

Problem with "newEMSet"

After the most recent update to the package, I'm able to create an EMSet, but it does not calculate the parameters that it would calculate before the update.

For example, when I create the EMSet using the code below:

EMSet <- newEMSet(assays = list(counts = counts), 
                   colInfo = data.frame(cellInfo), 
                   rowInfo = data.frame(genes), 
                   controls = controls
                   )

After this, I try to plot general QC metrics using plotGeneralQC as follows:

raw.qc.plots <- plotGeneralQC(EMSet)

It fails with the following error:

[1] "Plotting library size plots..."
[1] "Plotting average count plots..."
[1] "Plotting top gene expression..."
[1] "Controls detected. Plotting control-specific plots..."
**Error in `[.data.frame`(metrics_df, , c("cell_barcode", "batch", "qc_nfeaturecounts")) : 
  undefined columns selected**

I'm using OSX High Sierra (10.13.6) and R version 3.5.1.

My code was working with the previous version of ascend and now has some breaking changes.

Any help would be appreciated. Thank you!

addGeneLabel function is nonexistent

Hi,

I noticed the addGeneLabel function is does not exist within the current version of ascend.

There also isn't any source code for it. This would be a super handy function for me to use if it were to exist.

Any chance we can get an update on this?

runCORE removes some cells

Hi,
I am running ascend with the following commands:
em <- ascend::newEMSet(assays = list(counts = data))
em <- ascend::filterLowAbundanceGenes(em)
em <- ascend::normaliseByRLE(em)
em <- ascend::runPCA(em)
em <- ascend::runCORE(em, nres=40)
In some of my datasets after runCORE is performed I see less number of cells stored in em object. It seems ascend removes some cells when performing clustering. Is it something normal, in the sense that the low quality cells are removed or it can be related to some errors in the clustering function?
I would be grateful for your help,
Monika

Pathway analysis

As requested by @MichaelPeibo in #14:

Another point confused me is what you mentioned in your tutorial and your paper(congrats!), you think there are some apoptosis pathway related genes enrich in cluster2, how do you define it ? Is there any way to determine it automatically?

We used the enrichment tool provided by the Gene Ontology Consortium as a part of our analysis. This really requires someone knowledgable in cellular processes and pathways to evaluate the significance of the results, so it is best done manually. Tools to run this will be available in a package we are presently working on (scGPS).

SubsetCondition

SubsetCondition is not working.

missing characters in the RGC_Tutorial.pdf file

There seems to be a line missing some last characters in the RGC_Tutorial.pdf file (page 5)

em.set <- NewEMSet(ExpressionMatrix = matrix, GeneInformation = genes, CellInformation = barcodes, Controls = controls)

Cheers

ascend EMSet object bug

I am using ascend according to the published paper (https://academic.oup.com/gigascience/article/8/8/giz087/5554286#140534041), and find that the command to create the EMSet is fine, but then I type "EMSet" to check what this object is, or perform QC as well as other operation it always company with this error information. Could you please tell me what’s wrong about it? What can I do to fix it? I use the ascend_0.99.69.
I follow the command as the paper listed, command for EMSet construction:
EMSet <- EMSet(counts, colInfo = colInfo, controls = controls)
EMSet

The error information is:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘int_colData’ for signature ‘"NULL”’

I tried to add colData to EMSet function, but it still end with same bug. Thanks for your attention!

error when PlotGeneralQC(em.set)

I had an error when I did
raw.qc.plots <- PlotGeneralQC(em.set)

I skipped the 'Adding additional metadata to the EMSet' step, because I did not have the batch info and certain gene I want to see; instead, I directly run

raw.qc.plots <- PlotGeneralQC(em.set)

after get em.set object from

em.set <- NewEMSet(ExpressionMatrix = matrix, GeneInformation = genes,
CellInformation = barcodes, Controls = controls)

The error is

Error in [.data.frame(cell.information, , 2)] :
undefined column selected

Any suggestion on this?

failed to install the packages

Hi,
I want to try your analysis pipeline, but I got a problem when I install the packages.

install_github("IMB-Computational-Genomics-Lab/ascend")
Downloading GitHub repo IMB-Computational-Genomics-Lab/ascend@master
from URL https://api.github.com/repos/IMB-Computational-Genomics-Lab/ascend/zipball/master
Installation failed: transfer closed with outstanding read data remaining

which data type input to runCORE

Hello,

It is a question rather than issue. In README it is written that we can use ascend with read counts or UMI counts. I tried to perform clustering on Raw data , it gives me error to runPCA first, however once I run the PCA it tells me to normalise dataset before using this function. If I normalize dataset everything works fine. But my question is what to do if we want to avoid normalization and directly input counts into runCORE function. Is it possible ?

Best,
Monika

runTSNE step couldn't finish

the command I used:
scran_normalised <- runTSNE(scran_normalised, PCA = FALSE)

The dataset has ~18000 cells and ~20000 genes, after overnight running, it still couldn't finish.

A putative error in Adding additional metadata to the EMSet

Hi,
I think I found an error in the tutorial

actually, this will cause all the THY1 cell.info value 'TRUE', and the later code is correct for certain gene info.

error in scranNormalise(em.set)

met an error in scranNormalise(em.set) following the vignette:

norm.set <- scranNormalise(em.set)
[1] "Converting EMSet to SCESet..."
Error in seq_len(ncol(assay)) :
argument must be coercible to non-negative integer
In addition: Warning messages:
1: 'newSCESet' is deprecated.
Use 'SingleCellExperiment' instead.
See help("Deprecated")
2: In seq_len(ncol(assay)) : first element used of 'length.out' argument

This error is not there when I do not skip the cell cycle identification step:
em.set <- ConvertGeneAnnotation(em.set, "gene_symbol", "ensembl_id")
training.data <- readRDS(system.file("exdata", "human_cycle_markers.rds", package = "scran"))
em.set <- scranCellCycle(em.set, training.data)
cell.info <- GetCellInfo(em.set)
cell.info[1:5, ]
em.set <- ConvertGeneAnnotation(em.set, "ensembl_id", "gene_symbol")

cluster 0

just wondering in what case will the dynamic cut program generate cluster 0, which means neither of the available clusters has been assigned to an individual cell?

Thanks!

update AEMSet to EMSet

An object created with an older version of ASCEND doesn't seem to be readable in the new version. I guess this is due to the structural changes. If I load an old version of ASCEND and read it in, then load the new version over the top, is there a way to update the object to the new type and retain all the processing logs etc?

FindOptimalClusters() failing at stability values

The FIndOptimalClusters() function is failing on a specific sample with this error:

clust <- FindOptimalClusters(reduced.pca.10)
[1] "Performing unsupervised clustering..."
[1] "Generating clusters by running dynamicTreeCut at different heights..."
[1] "Calculating rand indices..."
[1] "Calculating stability values..."
Error in if (flat.counter[i] == 1 & flat.counter[i + 1] == 1) { :
missing value where TRUE/FALSE needed

imb-computational-genomics-lab / ascend Goto Github PK

ascend's Introduction

ascend's People

Contributors

Stargazers

Watchers

Forkers

ascend's Issues

Recommend Projects

Recommend Topics

Recommend Org