alanocallaghan / scater Goto Github PK

View Code? Open in Web Editor NEW

94.0 8.0 38.0 7.96 MB

Clone of the Bioconductor repository for the scater package.

Home Page: https://bioconductor.org/packages/devel/bioc/html/scater.html

R 99.86% Dockerfile 0.14%

scater's People

Contributors

Stargazers

Watchers

scater's Issues

Add cellular detection rate by batch plot for QC

From @davismcc on June 19, 2017 6:47

Copied from original issue: davismcc/archive-scater#116

different clustering results with rtsne() and plotTSNE() function

From @cmgraef on May 13, 2017 0:36

Hi,

thank you so much for providing such an amazing tool! Before using scater I made my tsne plots with the rtsne package. For a given single cell data set I get the following plot:

https://ibb.co/mEna5k

Here's the code for that:

library(Rtsne)

sc_expression <- t(read.csv("~scrna expression.csv", row.names=1))

tsne_plot <- Rtsne(sc_expression_matrix, check_duplicates = FALSE, perplexity = 50)
plot(tsne_plot$Y)

I obviously verified that result several times. When I ran the same data with the scater package I got this result:

https://ibb.co/hvDhkk

Here is the code for that:

library(Rtsne)
library(scater)

sc_example_counts <- read.csv("~scrna expression.csv", header=TRUE, row.names=1)
sc_example_cell_info <- read.csv("~phenotype.csv", row.names=1)

pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)

plotTSNE(example_sceset, perplexity = 50, colour_by = "cell_type")

I don't really understand how the results can be so different between the two packages especially since the plotTSNE() function uses the RTSNE package. Perplexity and other parameters are all the same. I would highly appreciate your advice on this!

Thanks,
Moritz

Copied from original issue: davismcc/archive-scater#111

change the point sizes in plotColData

Hi Davis,

Thank you for this amazing tool! I always use it in my daily work.
However, I've just installed the new R version and now I ended up with this small issue.
I'm trying to increase the size of points in a plot called by the function plotColData.

The exact call is: plotColData(sceset, x = "X" , y = "Y",colour_by="Z", size=2).

In the previous version, with the old function plotPhenoData, I could fix the problem in this way:

plotPhenoData(sceset, aes(x = X , y = Y , colour = Z),size=2) ,

but now it doesn't seem to work.

Thank you in advance.

Bests,

Elisabetta

Add `component` argument to `plotReducedDim`

Allow the user to choose specifically which components to plot.

Add option to use irlba in the runPCA() for speeding up PC calculation

As the scRNA dataset size keep increasing day by day , I would suggest to add option to use irlba for PC calculation. Just as what scran did.

Have the default cell filters been removed?

Look like there are no filter_... columns in colData slot anymore (for example, there used to be filter_on_total_features features). Have they been removed?

exprs() returning NULL

Hi! After updating from 1.4.0 to 1.6.3 I noticed that exprs(sce) is returning NULL.

I create my SingleCellExperiment using readKallistoResults:

> sce <- readKallistoResults(samples = kallisto_samples, directories = kallisto_dirs)
Kallisto log not provided - assuming all runs successful
Reading results for 91 samples:
................................................................................
...........
Using log2(TPM + 1) as 'exprs' values in output.
> exprs(sce)
NULL

I can workaround the issue using assays(sce)$exprs which returns the expression values.

> assays(sce)
List of length 4
names(4): exprs counts tpm feature_effective_length
> assays(sce)$exprs
              L20105_Track-48323_R1 L20106_Track-48324_R1 L20107_Track-48325_R1 L20108_Track-48326_R1 L20109_Track-48327_R1
contig_1               0.000000e+00          0.000000e+00          0.000000e+00          0.000000e+00          0.000000e+00
contig_2               0.000000e+00          0.000000e+00          0.000000e+00          1.120067e+00          9.317398e-01
contig_3               0.000000e+00          0.000000e+00          0.000000e+00          2.480053e-01          4.600715e-01
...

I had a look at the kallisto-wrapper.R but didn't find anything unusual. Is exprs deprecated?

Let me know if you need further info to debug.

Cheers!

plotPCA fails if no logcounts

If the SingleCellExperiment does not have a logcounts assay then plotPCA will cause an error. For example:

> data("sc_example_counts")
> example_sce <- SingleCellExperiment(assays = list(counts = sc_example_counts))
> plotPCA(example_sce)
Error in assay(object, i = exprs_values) : 
  'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i'
'i' not in names(assays(<SingleCellExperiment>))

Setting plotPCA(example_sce, exprs_values = "counts") works but you get the same error if exprs_values doesn't exists (plotPCA(example_sce, exprs_values = "not real")).

I think the main problem is the the error message isn't very clear. Maybe a simple check that exprs_values is in assays with a better message would help? I suspect the same thing might happen for some of the other plotting functions.

plotPlatePosition inverts rows

Hi,
I've either misinterpreted the parameters for plotPlatePosition, or there's a bug which gives plot_position a reversed 'columns' factor.
Here's my code, using the Tung test data (https://github.com/hemberg-lab/scRNA.seq.course/tree/master/tung) that replicates it.

source("https://bioconductor.org/biocLite.R")
biocLite("SingleCellExperiment")
biocLite("scater")
library(scater)
setwd("~/single_cell/training")

reads <- read.table("tung/reads.txt", sep = "\t")
anno <- read.table("tung/annotation.txt", sep = "\t", header = TRUE)
reads <- SingleCellExperiment(
  assays = list(counts = as.matrix(reads)), 
  colData = anno
)

reads <- calculateQCMetrics(reads)
plotPlatePosition(reads, plate_position = reads$well, colour_by = "well")

As you can see, the rows are in the reverse order (wells from row 'A', plotted in row 'H', wells from row 'B' plotted in row 'G', etc.
Is this a bug, or am I doing something wrong here?

Error when plotting more than 2 components with PlotPCA

I recently started using SingleCellExperiment and the newest version of scater but I can't plot a PCA with more than two components anymore now. I get this error message:

Error in plotReducedDim(object, ncomponents = ncomponents, use_dimred = "PCA", : ncomponents to plot is larger than number of columns of reducedDimension(object)

Revise vignette

From @davismcc on August 24, 2017 17:38

Copied from original issue: davismcc/archive-scater#127

plotHighestExpr with feature_names_to_plot argument incorrectly sorts

If feature_names_to_plot is set then the order will no longer be set according to average expression value. This is most likely due to ggplot's underlying handling of factor aes, which sorts alphabetically. You'll need to set the factor levels manually int he correct order (or using reorder) before passing it into ggplot to get the correct ordering.

plotHighestExprs with dgCMatrix

Check this error.

Colour plots according to arbitrary cell metrics

From @LTLA on April 25, 2016 17:39

Can the colour_by argument in plotPCA and friends accept vectors (factors or numeric values) with which coloration can be performed? Currently, it seems I have to do something like this:

sce$whee <- arbitrary.values
plotPCA(sce, colour_by="whee")

... rather than the simpler one-step:

plotPCA(sce, colour_by=arbitrary.values)

Copied from original issue: davismcc/archive-scater#39

Change color - plotTSNE

Hi there,

I know plotTSNE returns a ggplot object so I tried to change the color of the plot by adding:

plotTSNE(sce, colour_by="ACTB") + scale_fill_gradient2(low='red', mid ='white', high ='blue')

But I get this error:

Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.

Any idea about how to solve this?

Thanks!

pct_dropout for cells

I've just noticed that calculateQCMetrics no longer calculates dropout percentage for cells. Would it be possible to bring that back? If not it should probably be removed from the function documentation.

Thanks

Error when trying to read 10x results

Hi, I have a few directories with 10x data in them. They each have barcodes.tsv, genes.tsv and matrix.mtx.
When I run read10xResults(data_dirs,min_mean_gene_counts = 1, min_total_cell_counts = 1), I get:
Error in .colSums(data_mat): argument "m" is missing, with no default

My R environment looks like

The data I'm using is from https://support.10xgenomics.com/single-cell-gene-expression/datasets

Cannot allocate vector of size when running scater_gui

From @ericvon11 on June 7, 2017 18:56

Hi,

Thanks for building this awesome package. Unfortunately, I'm unable to utilize the scater_gui function right now. Given my limited computational knowledge, the gui would help a lot. I've created my SCESet from read10XResults (9.2 Mb in R), ran the calculateQCMetrics, and am now trying to run the scater_gui on my SCESet, but after opening the Shiny session in Chrome, the process either hangs and does nothing (no graphs displayed in the scater page), or if I retry it, I often get the error:

Warning: Error in : cannot allocate vector of size 1.3 Gb
Stack trace (innermost first):

    112: unlist
    111: list_to_array
    110: laply
    109: plyr::aaply
    108: t
    107: plotSCESet
    106: plot
    105: plot
    104: renderPlot
     94: <reactive:plotObj>
     83: plotObj
     82: origRenderFunc
     81: output$plot
      4: <Anonymous>
      3: do.call
      2: print.shiny.appobj
      1: <Promise>

I've tried gc() to allocate more space before this step and R goes down to about 1GB used, but after running scater_gui it'll run up the RAM usage to 99% and then drop and hang/throw the error.
I'm using a computer with 16GB DDR4 and a Kaby Lake processor.
This is a dataset of about 8,500 cells at about 80k reads per cell.

Do I just not have enough memory? Or is there something else wrong? It seems odd that R would use 13-14GB to accomplish this, so I figured it was worth asking here.

Thanks,
Eric

Copied from original issue: davismcc/archive-scater#115

show_smooth = TRUE not working in plotExpression

Hi! After updating to v1.8.0 I noticed some of my plots were missing the fitted smooth. Is it broken?

library(scater)
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
  assays = list(counts = sc_example_counts), 
  colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce)
plotExpression(example_sce, "Gene_1000", x = "Cell_Cycle", exprs_values = "counts",
               show_violin = FALSE, show_smooth = TRUE)

Cheers

Add plotScater to plotQC options

From @davismcc on August 24, 2017 17:37

Copied from original issue: davismcc/archive-scater#125

Errors in calculateQCMetrics

Hi,
Here I have a problem that when I run calculateQCMetrics like this,

> sce <- SingleCellExperiment(list(countData=rawF))
> is.mito <- grepl("^mt-", rownames(rawF))
> sce <- calculateQCMetrics(sce,feature_controls=list(Mt=is.mito))

Error in assay(object, i = exprs_values) : 
  'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i'
'i' not in names(assays(<SingleCellExperiment>))

BTW, rawF is a dgCMatrix. I'm not sure if it matters.

> class (rawF)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

How to deal with this problem?
Thanks

plotQC dgCMatrix coercion to data.frame error

plotQC() threw this error on a 10X dataset:

> plotQC(all.data)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame

Interactive/available cell labels on plots

Feature request from Ken Birnbaum.

merge multiple sce

I saw a mergeSingleCellExperiment function but it has been commented out. Do you have future plans to add this functions? the SingleCellExperiment has rbind and cbind but it doesnot seems sufficient.

cell cycle regression and Seurat object conversion

Hi Scater team,
I am currently trying different packages to cluster my 10x genomics data;
is scater able to regress out confounding factors such as cell cycle such as Seurat did in the ScaleData()?
Also, is scater able to directly convert SingleCellExperiment object to others such as Seurat or what Monocle2 did?
Thanks!

Tucking away scater's QC stats

Currently calculateQCMetrics loads up the SingleCellExperiment with a lot of baggage:

library(scater)
example(calculateQCMetrics)
colnames(colData(example_sce)) # 37 entries!

But we can hide it away by storing the QC metrics as a DataFrame in colData!

alt_sce <- example_sce
colData(alt_sce) <- colData(alt_sce)[,1:4]
colData(alt_sce)$scater_stats <- colData(example_sce)[,5:37]
colnames(colData(alt_sce)) # 4 entries
colData(alt_sce)$scater_stats # 33 entries

Amazing. Simply amazing. We could go even crazier and separate the various QC into further nested DataFrames based on whether they were computed on counts, logcounts, etc. but let's stay calm.

Add test data and checks for read10xResults()

Make density ticks in PCA and tSNE plots optional?

Hi there,

I asked Aaron at a conference: Is it possible to to make the "density ticks" (ticks at x- and y-axis that show the positions of data points in a PCA or tSNE plot) optional? I think it would be nice to have this option.

Thank you for the great package!
Chuan

using feature IDs with "." in getBMFeatureAnnos()

From @LindaDansereau on July 7, 2017 18:3

Hello,
I'm trying to get getBMFeatureAnnos() to run on a list of gene names which often contain a ".". However, it appears that anything after the "." gets stripped as the function runs, creating a number of duplicate row names in my case.

## Remove transcript ID artifacts from runKallisto (eg. ENSMUST00000201087.11 -> ENSMUST00000201087)
feature_ids <- gsub(pattern = "\\.[0-9]+", replacement = "", x = feature_ids)

Is there a way to make that an optional step? Is there another way around it?

Thank you for your help.

Linda

Example code copied from BioConductor support forum post (https://support.bioconductor.org/p/97849/)

#TestData loaded as a .csv file

TestData <- read.csv("testdata.csv", colClasses = c(list("character"), rep("numeric", 8)), row.names = 1)

TestData
#        X cell.1a cell.1b cell.1c cell.2a cell.2b cell.3a cell.3b cell.3c
#1 2RSSE.1     866    1404     898     129    1053     141      33      70
#2 2RSSE.2      58     171      65      17      70      36      11      17
#3 MTCE.23   14911   27132   10405   82033  117449   57775   11544   14426
#4 MTCE.25    1888    3615    1453    5891   40047    9144    2396    2947
#5 MTCE.31   20818   38746   12289  235235  211993  109575   19117   20580
#6   cct-6    1488    2236    1274     487    6430    1006    2311     381
#7   cct-8    1113    1679    1099     530    3727    1012    1135     130
#8   CD4.3      58      70      64      45     122      19      59      70
#9   CD4.7      34      37      27      56     400      11      53      88

sce <- newSCESet(countData = TestData)

sce <- getBMFeatureAnnos(sce, 
filters = "external_gene_name", 
attributes = c("wormbase_gene", "ensembl_gene_id","external_gene_name", "chromosome_name", "transcript_biotype", "go_id", "kegg_enzyme", "entrezgene"), 
feature_symbol = "external_gene_name", 
feature_id = "wormbase_gene", 
biomart = "ENSEMBL_MART_ENSEMBL", dataset = "celegans_gene_ensembl", host = "www.ensembl.org")

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘2RSSE’, ‘CD4’, ‘MTCE’

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.30.0       scater_1.2.0         ggplot2_2.2.1        Biobase_2.34.0      
 [5] BiocGenerics_0.20.0  gplots_3.0.1         RColorBrewer_1.1-2   edgeR_3.16.5        
 [9] limma_3.30.13        openxlsx_4.0.17      BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11         locfit_1.5-9.1       lattice_0.20-34      GO.db_3.4.0         
 [5] gtools_3.5.0         assertthat_0.2.0     digest_0.6.12        mime_0.5            
 [9] R6_2.2.2             plyr_1.8.4           stats4_3.3.2         RSQLite_2.0         
[13] zlibbioc_1.20.0      rlang_0.1.1          lazyeval_0.2.0       data.table_1.10.4   
[17] gdata_2.18.0         blob_1.1.0           S4Vectors_0.12.2     stringr_1.2.0       
[21] RCurl_1.95-4.8       bit_1.1-12           munsell_0.4.3        shiny_1.0.3         
[25] httpuv_1.3.5         vipor_0.4.5          pkgconfig_2.0.1      ggbeeswarm_0.5.3    
[29] htmltools_0.3.6      tximport_1.2.0       tibble_1.3.3         gridExtra_2.2.1     
[33] IRanges_2.8.2        matrixStats_0.52.2   XML_3.98-1.9         viridisLite_0.2.0   
[37] dplyr_0.7.1          bitops_1.0-6         grid_3.3.2           xtable_1.8-2        
[41] gtable_0.2.0         DBI_0.7              magrittr_1.5         scales_0.4.1        
[45] KernSmooth_2.23-15   stringi_1.1.5        reshape2_1.4.2       viridis_0.4.0       
[49] bindrcpp_0.2         org.Ce.eg.db_3.4.0   rjson_0.2.15         tools_3.3.2         
[53] bit64_0.9-7          glue_1.1.1           beeswarm_0.2.3       AnnotationDbi_1.36.2
[57] colorspace_1.3-2     rhdf5_2.18.0         caTools_1.17.1       shinydashboard_0.6.1
[61] memoise_1.1.0        bindr_0.1

Copied from original issue: davismcc/archive-scater#119

minor comment for the documentation of normalize()

From @friedue on April 19, 2017 15:6

The function normalize.SCESet() has the default parameter recompute_cpm = TRUE - the current help text does not mention that the additional prerequisites for this parameter are (i) the presence of actual values within cpm(sceSet) and (ii) exprs_values must be set to “counts”.

Since the default of the function that creates an sceSet object is to store cpm values in the exprs() slot, I was puzzled why normalize() would not re-compute the cpm values because I wasn't aware that they needed to be actually stored in the dedicated cpm() slot.

Copied from original issue: davismcc/archive-scater#107

plotRLE

I was trying to check the effect of normalisation by simply using:

plotRLE(sce, list(Raw = "counts", Norm = "logcounts"), c(FALSE, TRUE), style = "minimal", order_by_colour = FALSE)

But I get this error:

Error in x[[i]] <- value : attempt to select less than one element in OneIndex

Any idea of what is causing this?

Thank you for your help.

plotExplanatoryVariables should automatically remove zero variances

From @LTLA on July 17, 2017 16:18

... rather than just complaining about them and throwing an error.

Copied from original issue: davismcc/archive-scater#120

tximport 1.3.5 can import .h5a kallisto abundance files

From @mikelove on December 19, 2016 18:32

see here:

https://github.com/Bioconductor-mirror/tximport/blob/master/R/tximport.R#L36

Copied from original issue: davismcc/archive-scater#90

plotHighestExprs

From @weizhiting on May 22, 2017 15:51

Hi,when i use this command,plotHighestExprs(example_sceset,exprs_values = 'exprs'),
i get the attached png,but seems the circle in the png is not at the right position?Thanks!

Copied from original issue: davismcc/archive-scater#113

MAGIC adaptive kernel

From @dvdijk on April 18, 2017 23:52

I saw that there is a MAGIC implementation in Scater, however it doesn't use the adaptive kernel. The adaptive kernel is important for MAGIC to work well. I haven't coded in R for a long time but I can advise on how to correctly implement MAGIC.

Copied from original issue: davismcc/archive-scater#106

No fData slot

Hi,

I am using scater 1.6.3 for scRNAseq data normalization. After I run calcluateQCMetrics, no fData slot is getting generated. Please see the following code and error.

sce <- SingleCellExperiment(assays = list(counts = as([email protected], "dgCMatrix")))
exprs(sce) <- log2(calculateCPM(sce, use.size.factors = FALSE) + 1)
keep_feature <- Matrix::rowSums(exprs(sce) > 0) > 10
sce <- sce[keep_feature,]
is.mito <- grepl("^mt-", rownames(sce))
sce <- calculateQCMetrics(sce, feature_controls=list(Mt=is.mito))
fData(sce)

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘fData’ for signature ‘"SingleCellExperiment"’

Really appreciate any help available.

Thanks,

Bibaswan

Add capabilities for interactive plots

From @davismcc on August 24, 2017 17:38

e.g. with ggiraph or similar.

Copied from original issue: davismcc/archive-scater#126

Error in checkSlotAssignment(object, name, value) : assignment of an object of class “CompressedGRangesList” is not valid for slot ‘rowRanges’

Hi there,

I am trying to run tSNE+kmeans clustering on a SCE object but Im getting the following error.

#Perform dimensional reduction macro <- runTSNE(macro, rand_seed = 1, return_SCE = TRUE)

Error in checkSlotAssignment(object, name, value) : assignment of an object of class “CompressedGRangesList” is not valid for slot ‘rowRanges’ in an object of class “RangedSummarizedExperiment”; is(value, "GenomicRanges_OR_GRangesList") is not TRUE

Is there any way to fix this? I get this error whether i use runTSNE or plotTSNE.

Cheers,
Lucy

Unable to run plotHighestExpression with dgCMatrix stored in counts slot of a SCE object

When a dgCMatrix is stored in the counts slot of a SingleCellExperiment and I try to run plotHighestExpression, there seems to be a problem when converting a dgCMatrix to data.frame:

 Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame 
stop(gettextf("cannot coerce class \"%s\" to a data.frame", deparse(class(x))), 
    domain = NA) 
as.data.frame.default(x[[i]], optional = TRUE) 
as.data.frame(x[[i]], optional = TRUE) 
data.frame(data) 
setNames(data.frame(data), value.name) 
melt.default(df_pct_exprs_by_cell) 
reshape2::melt(df_pct_exprs_by_cell) 
plotHighestExprs(object, ...) 
plotQC(sce.cleaned, type = "highest-expression")

Add Rtsne to imports or dependencies?

The plotTSNE function does not work without Rtsne, but Rtsne is only in suggests and not depends or imports.

exprs and cpm are different

From @galib36 on May 12, 2017 10:43

Hello Davis,
When I calculate the cpm using log20(calculateCPM(cdSceset)+1) it comes different compared to the default exprs value while creating the sce object with cdSceset <- newSCESet(countData = cd, phenoData = pheno_data_fil). The count data is the raw data. Also while I try to normalize the data using

cdSceset <- scran::quickCluster(cdSceset)
cdSceset <- scran::computeSumFactors(cdSceset, sizes = 15, clusters = qclust)
cdSceset <- scran::normalize(cdSceset)

it gives me NA for some of the cells. Could you please let me know what is going wrong.

Thank you.

Best Regards,
Syed

Copied from original issue: davismcc/archive-scater#110

Plotting functions with sparse matrices

I've noticed some of the plotting functions fail if you have a SingleCellExperiment with sparse matrices like those returned by read10xResults.

Here are a couple of examples:

library(scater, quietly = TRUE)
sce10x <- read10xResults(system.file("extdata", package = "scater"))
colData(sce10x)$test <- rnorm(300)
sce10x <- normalise(sce10x)
#> Warning in .local(object, ...): using library sizes as size factors

# PlotQC
plotQC(sce10x, type = "explanatory", variables = "test")
#> Error in storage.mode(y) <- "double": no method for coercing this S4 class to a vector

# PlotRLE
plotRLE(sce10x)
#> Error in matrixStats::rowMedians(exprs_mat): Argument 'x' must be a matrix or a vector.

Input Consistency

A few functions allow a SingleCellExperiment object (e.g. calculateCPM) or a count matrix but the majority permit only a SingleCellExperiment (e.g. calculateFPKM). Are the few which accept a matrix remnants from the early days of the package and likely to be modified in a future version? calculateCPM and calculateFPKM perform a very similar task, so they ought to have consistent input formats.

Add function for feature annotation from `org.Xx.eg.db` objects

From @davismcc on April 14, 2017 15:33

Guards against future deprecation of Biomart and allows annotation without a network connection.

Copied from original issue: davismcc/archive-scater#105

Adding Ensembl IDs for human data

Hi,
when I tried to add ENSEMBL ID for human cell data, I found there is some 'NA' for the output;

# Adding Ensembl IDs. library(EnsDb.Hsapiens.v75) ensembl <- mapIds(EnsDb.Hsapiens.v75, keys=rownames(sce.qc.norm), keytype="SYMBOL",column=c("ENSEMBL"))

can anyone recommend which library should I use for mapping? I already tried some packages from AnnotationDbi, but I still get NA after mapping.

Maximum number of cells?

From @wikiselev on June 5, 2017 13:13

What is the maximum number of cells supported by newSCESet? I've tried 10^5 cells and got this error:

Error in .checkedCall(cxx_missing_exprs, exprs(object)) : 
  long vectors not supported yet: ../../../../R-devel/src/main/memory.c:3424

I then tried 99,999 cells and got the same error.

Copied from original issue: davismcc/archive-scater#114

calcAverage colSums() error

Hi there,

I'm getting an error from calling calcAverage() like so:

library(Matrix)
library(scran)
library(scater)
library(igraph)
library(BiocParallel)

raw_counts = readMM("raw_counts.mtx")

lib.sizes = Matrix::colSums(raw_counts) #works

#make sce object
sce = SingleCellExperiment(assays = list("counts" = as(raw_counts, "dgCMatrix")))
#filter low abundance genes
sce = sce[calcAverage(sce)>0.1,] #ERROR

Specifically, I see:

Error in colSums(mat) : 'x' must be an array of at least two dimensions
Calls: render ... withVisible -> eval -> eval -> [ -> calcAverage -> colSums

I have encountered similar problems in my code, which I have fixed with Matrix::colSums(). Perhaps a similar approach may work here?

Jonny

CalculateTPM for UMI counts

Hi,

In the manual it has mentioned that I have UMI counts I shall leave effective_length=NULL, yet when I ran the function, it gave me error saying: "effective_length argument is required if computing TPM from counts". Hence, I am not sure if I should still calculate effective length.

Thank you!

comment on the default perplexity setting for plotTSNE

From @friedue on April 19, 2017 15:9

The default setting of the plotTSNE function for perplexity is n.cells/5 - that would be very high for Drop-seq data sets of better quality with more than 1000 cells -- the original tSNE paper recommends perplexity values between 5 and 50. In my hands, perplexity values of 20-40 indeed work better for larger cell numbers. I'm fine with the default, but perhaps it should be mentioned in the documentation that depending on the number of cells this should perhaps be adjusted.

Copied from original issue: davismcc/archive-scater#108

plotScater too slow on large datasets

I tried running plotScater() blocked by dataset on a 4 dataset object. The function ran for ~20-30 mins without generating any output, at which point I stopped it. The total object dim is 24439 by 27965 (that's after removing genes with no expression). The 4 datasets comprise 10X data from 4 experimental conditions from one experiment.

alanocallaghan / scater Goto Github PK

scater's People

Contributors

Stargazers

Watchers

Forkers

scater's Issues

Recommend Projects

Recommend Topics

Recommend Org