carmonalab / scgate Goto Github PK

View Code? Open in Web Editor NEW

85.0 85.0 12.0 2.02 MB

marker-based purification of cell types from single-cell RNA-seq datasets

R 90.44% HTML 9.56%

filtering marker-genes scgate signatures single-cell

scgate's People

Contributors

Stargazers

Watchers

Forkers

bbimber ttriche kthorner crsky1023 kew24 swangam dhtc qindan2008 paolabc trichelab sukses24 serenturhal

scgate's Issues

scGate as a multi-class classifier

Recently i am using your tool "scGate" based on your paper <<scGate: marker-based purification of cell types from

heterogeneous single-cell RNA-seq datasets>>, and to annotate immune celltypes in tumor (mainly T, macrophages minor subsets).

It is so wonderful tool!Thank you!

However, i have 1 question, can you help me?

I want to use "scGate as a multi-class classifier" tutorial code
(https://urldefense.com/v3/__https://carmonalab.github.io/scGate.demo/*scgate-as-a-multi-class-classifier__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!X8a3BrbAA8MLrIeliqDYGdXkF5dFlEGF7_vgj2DLszASIdaR33bBmqsz3enA1G8tV-Nd35m3eNxx2vW-yE_PqCgnTVQL-dvnrzg2$ ), and what i want to subset are: 1） cluster1： "4 genes positive + 1 gene negative" as 1st signature, other 20 cluste1 marker genes
enriched by Seurat function "Findallmarkers" as 2nd signature;
2） cluster2： same 5 genes----"3 genes positive + 2 genes negative" as 1st signature, other 20 cluste2 marker genes
enriched by Seurat function "Findallmarkers" as 2nd signature;
3) cluster3: others.

I want to generate a custom signature, however, based on your "scGate as a multi-class classifier" tutorial code, i need
make a list. For every element in this list, i need add additional information as below:

models.DB <- scGate::get_scGateDB()
models.hs <- models.DB$human$generic
models.list <- models.hs[c("Bcell", "CD4T", "CD8T", "MoMacDC", "Plasma_cell", "NK")]

models.list[[1]][1:3,]
levels use_as name signature
1 level1 positive Immune PTPRC;LAPTM5;SRGN;CXCR4;CD52;COL1A1-;RAMP2-
2 level1 positive Lymphoid LCK
3 level1 positive PanBcell CD79A
table(models.list[[1]]$levels)

level1 level2 level3 level4 7 6 4 1 > table(models.list[[1]]$use_as)

negative positive

My question is: i can deal with "models.list[[1]]$use_as:", but i have no idea how to generate

"models.list[[1]]$levels", and "table(models.list[[1]]$name)".

Can you help me？

nn.idx with pre-calculated dimensionality reductions

I'm working with a DOGMA-seq dataset, a multimodal dataset with scATAC, scRNA, and scADT.

I've conducted weighted nearest neighbour analysis as described here: https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html

DOGMA <- FindMultiModalNeighbors(DOGMA, reduction.list = c("pca.rna","lsi.atac", "pca.adt"), dims.list = list(1:30, 2:30,1:30),  modality.weight.name = "DOGMA.weight")
DOGMA <- RunUMAP(DOGMA, nn.name = "weighted.nn", reduction.name = "umap.wnn", reduction.key = "wnnUMAP_")
DOGMA <- FindClusters(DOGMA, graph.name = "wsnn", algorithm = 3)

Then, using scGate:

obj <- scGate(DOGMA, model = models.list, assay = "GeneActivity", reduction = "umap.wnn")

I get this error:

Error in filter_bymean(q, positive = pos.names, negative = neg.names,  : 
  trying to get slot "nn.idx" from an object of a basic class ("NULL") with no slots

The weighted.nn Neighbour object was used in the RunUMAP() command that generated the umap.wnn.

> DOGMA@reductions$umap.wnn
A dimensional reduction object with key wnnUMAP_ 
 Number of dimensions: 2 
 Projected dimensional reduction calculated:  FALSE 
 Jackstraw run: FALSE 
 Computed using assay: ADT 
> DOGMA@neighbors$weighted.nn
A Neighbor object containing the 20 nearest neighbors for 4802 cells

How I do specify the Neighbour class (https://github.com/mojaveazure/seurat-object/blob/master/R/neighbor.R) with the nn.idx slot from the pre-calculated dimensionality reduction?

Create model for macrophages

Create basic marker for macrophages.
Relates to issue #21

More automatic translation of is.pure.levelX -> label?

Hello,

We're very interested in scGate. As far as I understand it, is you give it a hierarchical gate model, you end up with a bunch of columns corresponding to the cell classes (i.e. Bcell_UCell, Tcell_UCell), and then a series of binary is.pure.levelX columns. Is there a helper that more automatically translates from is.pure.levelX into the cell type label that this cell received from levelX?

Thanks

Prepare for upcoming Seurat v5 release

I am opening this issue as a notification because scGate is listed here as a package that relies (depends/imports/suggests) on Seurat. As you may know, we recently released Seurat v5 as a beta in March of this year, with new updates for spatial, multimodal, and massively scalable analysis. For more information on updates and improvements, check out our website https://satijalab.org/seurat/.
We are now preparing to release Seurat v5 to CRAN, and plan to submit it on October 23rd. While we have tried our best to keep things backward-compatible, it is possible that updates to Seurat and SeuratObject might break your existing functionality or tests. We wanted to reach out before the new version is on CRAN, so that there's time to report issues/incompatibilities and prepare you for any changes in your code base that might be necessary.

We apologize for any disruption or inconvenience, but hope that the improvements to Seurat v5 will benefit your users going forward.
To test the upcoming release, you can install Seurat from the seurat5 branch using the instructions available on this page: https://satijalab.org/seurat/articles/install.

Thank you!
Seurat v5 team

for merged datasets, allow for independent gating

When working with merged objects (ie containing multiple datasets/batches), one can perform gating on the merged object (in a single operation) or split the merged object and process each part separately. Due to kNN-smoothing, gating results may slightly vary. To facilitate working with merged objects robustly, scGate could allow for grouping by variable (eg. dataset) internally and gating each one separately.

scGate & performance.metrics

Hi,
I had two problems with scGate.
First, when using scGate with a SCT assay, I had the following warning : "The umi assay (RNA) is not present in the object. Cannot Compute additional residuals". I don't really know where the umi assays should be in the SCT assay. I coudnt solve the problem
Second, when using performance metrics, I had an integer overflow because of high integer values.
But when I did the operations myself as numerics, I didn't have any problems.

Best regards,

Deniz Fettahoglu

Possible issue related to future / parallelism?

Hello -

We have an R module that uses scGate and has automated testing. That testing has been running for a long time just fine. I assume it's due to the recent changes in scGate. I dont know if this is an scGate bug per se, but we started getting test failures on github actions with the stack below. This is doing very basic scGate::scGate() usage with a relatively small input dataset. I think this issue is that defaults might have changed in how you use futures.

My best interpretation at the moment is that the cloud agents are not that powerful, and the recent scGate changes result in all cores being used (2 in this case), meaning ScaleData is run with two workers. I didnt debug the "non-exportable reference" changes, but that is a feature of futures (https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html), which might have stricter enforcement now, and the combination of scGate defaulting to multithread and stricter enforcement might do it.

Again, I'm not sure this is actually an scGate bug, but i thought I'd report it in case anyone else hit it. For the purpose of our automated testing, I set plan(sequential) and it's back to working.

2022-12-24T22:14:45.3323666Z <FutureError/error/FutureCondition/condition>
2022-12-24T22:14:45.3326221Z ##[error]Error: MultisessionFuture (future_lapply-1) failed to call grmall() on cluster RichSOCKnode #1 (PID 120178 on localhost 'localhost'). The reason reported was 'error reading from connection'. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. Detected a non-exportable reference ('externalptr') in one of the globals ('...future.FUN' of class 'function') used in the future expression. The total size of the 15 globals exported is 6.38 MiB. The three largest globals are 'object' (6.00 MiB of class 'S4'), 'split.cells' (189.89 KiB of class 'list') and 'features' (123.05 KiB of class 'character')
2022-12-24T22:14:45.3327792Z Backtrace:
2022-12-24T22:14:45.3328120Z   1. RIRA::RunScGate(seuratObj, model = "Bcell")
2022-12-24T22:14:45.3328518Z        at test-scgate.R:76:2
2022-12-24T22:14:45.3328862Z   4. scGate::scGate(...)
2022-12-24T22:14:45.3329189Z   5. scGate:::run_scGate_singlemodel(...)
2022-12-24T22:14:45.3329480Z   6. scGate:::find.nn(...)
2022-12-24T22:14:45.3329811Z   8. Seurat:::ScaleData.Seurat(q, verbose = FALSE)
2022-12-24T22:14:45.3330119Z  10. Seurat:::ScaleData.Assay(...)
2022-12-24T22:14:45.3330286Z  15. Seurat:::ScaleData.default(...)
2022-12-24T22:14:45.3330453Z  16. future.apply::future_lapply(...)
2022-12-24T22:14:45.3330622Z  17. future.apply:::future_xapply(...)
2022-12-24T22:14:45.3330761Z  18. future::future(...)
2022-12-24T22:14:45.3330914Z  20. future:::run.Future(future)
2022-12-24T22:14:45.3331085Z  22. future:::run.ClusterFuture(future)
2022-12-24T22:14:45.3331372Z  23. future:::cluster_call(cl, fun = grmall, future = future, when = "call grmall() on")
2022-12-24T22:14:45.3331503Z  24. base::tryCatch(...)
2022-12-24T22:14:45.3331745Z  25. base (local) tryCatchList(expr, classes, parentenv, handlers)
2022-12-24T22:14:45.3331994Z  26. base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
2022-12-24T22:14:45.3332120Z  27. value[[3L]](cond)
2022-12-24T22:14:45.3332130Z

incomplete final line found by readTableHeader on HSPC_scGate_Model

This is a small issue, but I believe HSPC_scGate_Model.tsv is throwing a warning b/c it has an incomplete line (dropping the trailing tab). I'm pretty sure you could use "utils::read.table(fill = TRUE)" to avoid this. Again, not a huge issue, but thought i would report it.

Warning ('test-scgate.R:101:3'): scGates runs on all
incomplete final line found by readTableHeader on '/tmp/RtmpId3W7Y/scGate_models-master/human/PBMC/HSPC_scGate_Model.tsv'
Backtrace:
    ▆
 1. └─RIRA::RunScGateWithDefaultModels(seuratObj, dropAmbiguousConsensusValues = TRUE) at test-scgate.R:101:3
 2.   └─RIRA::RunScGateForModels(...)
 3.     └─RIRA::RunScGate(...)
 4.       └─RIRA::GetScGateModel(model)
 5.         └─scGate::get_scGateDB(force_update = T, destination = modelDir)
 6.           └─scGate:::load.model.helper(model_path, verbose = verbose)
 7.             └─utils::read.table(file.path(models_path, f), sep = "\t", header = T)

Logical operators

Hello,

Thank you for this great package. It's saving me a lot of time in annotating my subsets. The possibility of excluding some markers just adding the negative sign was a wonderful idea!

I was wondering if there is the chance of including logical operators in the signature argument i.e. "CD8A&KLRG1" if I want for example include all the KLRG1+ in the CD8 subset.

Thank you very much

Francesco

Can I run multiple scGate models and aggregate the results for cell types of interest?

Hi,

Thank you for developing and supporting scGate.

I'm running scGate across a few scRNA-seq data sets. I'm interested in T and NK cells.
Therefore I run first the Tcell scGate model and then NK. I don't know if you can run two models simultaneously.
Then, I look into the results and there are a very few cells that are both annotated as NK and T cell.
I was wondering if there is a change that these may represent doublets.

Did you ever came across this issue or is not recommended to run scGate with different models and then try to aggregate the independent results?

Thank you again.

António

new function to evaluate multiple models (list) with one command

model param in scGate() could accept a list of scGate models (list of data.frames). The function might 1) unlist all signatures from all models and layers, 2) conduct a consistency check to ensure that all signatures with equal name have equal gene gets; 3) calculate all UCell scores (by cell chunks); 4) evaluate each model logic sequentially

scGate() fails

Hi, I'm really glad to find a package with this kind of functionality, and have been following the documentation, but cannot get it to work. Running scGate on an integrated Seurat object (if relevant) and the following occurs regardless of what I set as the signature:

For when assay is integrated:

Running scGate on level 1...
Error: Cannot add a different number of cells than already present

For when assay is RNA:

Running scGate on level 1...
Error in irlba(A = t(x = object), nv = npcs, ...) :
max(nu, nv) must be positive
In addition: Warning messages:
1: In eval(predvars, data, env) : NaNs produced
2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length

Thanks in advance.

Update models for blood cells

The following models have to be updated to better work on blood immune cells:

Bcell
MoMacDC, Macrophages and Myeloid (if using more than one of those, results in "Multi" assignment)

See example pictures here
(created from: https://github.com/carmonalab/scRNAseq_data_processing/blob/master/scRNAseq_data_processing_ZhangY_2022_34653365.Rmd)

new function to obtain a multi-class variable from a set of scGate models' pure/impure output

After multiple scGate models have been evaluated (e.g. scGate_CD8T, scGate_Bcell):

               scGate_CD8T      scGate_Bcell
cell1                 Pure            Impure
cell2               Impure              Pure
cell3               Impure            Impure
cell4                 Pure              Pure

Generate a new 'multi-class' variable (from all scGate* variables, or user-defined list,), eg

               scGate_multiClass     
cell1                CD8T
cell2                Bcell   
cell3                None  
cell4                Multi

plot_UCell_scores() fails unless scGate() is run with save.levels=T

the function should handle the case where there are no metadata columns named "is.pure.level" (eg using the default save.levels=F) by using column "is.pure"

Re-use existing ucell score columns in a seurat object?

Hello -

When running scGate(), the most expensive step seems to be computing the UCell score columns, once per signature. If you re-run scGate() again, even if those columns exist, they seem to be re-calculated. Would you consider an optional flag (probably defaulting to FALSE), that would make scGate re-use any UCell columns that already exist? The use case would be if we're trying to re-run and compare different gate permutations.

I can appreciate that if there are conflicting gene signatures then re-using UCell columns could be a problem; however, if this feature was opt-in, then it would be up to the user to understand their gates and pay attention for conflicts.

scGate parallelization

As discussed with Massimo, scGate can be further parallelized (in addition to running UCell in parallel internally):
Providing a list of objects or providing a split.by parameter to split the input object into a list of objects (e.g. by samples metadata column) would allow to run scGate over multiple list object in parallel, speeding up processing, like implemented in ProjecTILs.

combining model_DBs

Hi,

I have a TILs dataset including both CD4 and CD8 T cells.

Can I create a combined DB
scGate_models_DB$human$TILs = c(scGate_models_DB$human$CD4_TIL,scGate_models_DB$human$CD8_TIL)

and run scGate on it ? or should I run on each model and combine the results ?

Thanks

Ucell scores for _additional_signatures_ not exported in multi-model mode [dev]

Error: irlba(A = t(x = object), nv = npcs, ...)

Hello,

We're getting errors running scGate::scGate(), along the lines of:

Error in irlba(A = t(x = object), nv = npcs, ...) :
max(nu, nv) must be strictly less than min(nrow(A), ncol(A))

Are you familiar with anything like this? Does this indicate our input has too few cells or a rare cluster with too few cells? Thanks in advance for any help.

Discrepancy in genes included in block list between Human and Mouse

The mouse block list contains Heatshock and Ifn-response genes, while the human block list does not.

ArchRproject implementation

Hi,
Thank you for this nice package. I am analysis my scATACseq and matching scRNAseq data using ArchR. I would be interested to use your package on my ArchRproject but it is not implemented. Do you think it will be possible to implement this option? However what data do I need to extract from my ArchRproject to run scGate?

Thanks in advances for your help

Use of pos.thr neg.thr

Hello! Thanks again for this package!

I have a question on the scGate function.
Here is the piece of code I use to gate on CD8 T cells trying to exclude CD4s and MAIT (sobj is my seurat object which are CD3+ T cells)

CD8 <- c("CD8A", "CD8B")
CD4 <- "CD4"
MAIT <- c("SLC4A10", "TRAV1-2")
mmCD8 <- scGate::gating_model(level=1, name="CD8T", signature = CD8)
mmCD8 <- scGate::gating_model(model=mmCD8, level=1, name="MAIT", signature = MAIT, negative=TRUE)
mmCD8 <- scGate::gating_model(model=mmCD8, level=1, name="CD4T", signature = CD4, negative=TRUE)
CD8sub <- scGate(sobj, model = mmCD8)
CD8sub <- subset(CD8sub, subset = `is.pure` == "Pure" )

After this if I run
FeaturePlot(CD8sub, c("CD8A", "CD4")

There are some areas that look like CD4+ (not much) and some areas that are CD8- (I can see one cluster CD8A- and it's also CD8B-)

I guess I could try to increase the pos.thr argument, right? But how much should I increase the threshold? What are here your suggestions to get pure CD8s?

Thanks!
Francesco

Add in custom scGATE db function

Hi people from he carmona lab,

I wanted to make a custom database for other cell types.

PFA the function that enables a use to add in a custom scGATE db. Feel free to add this to scGATE, as I think it would help others to use custom db.

Cheers,
Kerry

custom_db_scGATE <- function(repo_path.v) {
allfiles <- list.files(repo_path.v, recursive = TRUE)
modelfiles <- grep("scGate_Model.tsv", allfiles, value = TRUE)
uniq_dirs <- sort(unique(dirname(modelfiles)))
model_db <- list()
sub <- strsplit(uniq_dirs, split = "/")[[1]]
model_path <- file.path(repo_path.v, uniq_dirs)
master.table = read.table(paste0(model_path,"/","master_table.tsv"),sep ="\t",header =T)
df.models.toimpute <- list()
files.to.impute <- list.files(file.path(model_path),"_scGate_Model.tsv")
for(f in 1:length(files.to.impute)){
model.name <- strsplit(files.to.impute,"_scGate_Model.tsv")[[f]][1]
model.name
read.table(file.path(model_path,files.to.impute[[f]][1]),sep ="\t",header =T)
df.models.toimpute[[model.name]] <- read.table(paste0(model_path,"/",files.to.impute[f]),sep ="\t",header =T)
}
for(f in 1:length(files.to.impute)){
model.name <- strsplit(files.to.impute,"_scGate_Model.tsv")[[f]][1]
read.table(file.path(model_path,files.to.impute[[f]][1]),sep ="\t",header =T)
df.models.toimpute[[model.name]] <- merge(read.table(paste0(model_path,"/",files.to.impute[f]),sep ="\t",header =T)[1:3],master.table,by=c("name"))
}
df.models.toimpute
}
custom_db_scGATE("inst/extdata/scGate_models-master/human/CD8_TIL")

kNN smoothing parameters

Regarding kNN smoothing, could it be possible to allow the modification of neighbors' weight for smoothing, and to perform the classification without kNN smoothing?

Also, it is possible to implement only positive smoothing?

carmonalab / scgate Goto Github PK

scgate's People

Contributors

Stargazers

Watchers

Forkers

scgate's Issues

Recommend Projects

Recommend Topics

Recommend Org