Giter VIP home page Giter VIP logo

cosgr's Introduction

COSG in R

Accurate and fast cell marker gene identification with COSG

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

  • COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq and spatially resolved transcriptome data.
  • Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.
  • COSG is ultrafast for large-scale datasets, and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al., (2022). The preprint is available in bioRxiv.

Here is the R version for COSG, and the python version is hosted in https://github.com/genecell/COSG.

Installation

# install.packages('remotes')
remotes::install_github(repo = 'genecell/COSGR')

Usage

Please check out the vignette and the PBMC10K tutorial to get started.

suppressMessages(library(Seurat))
data('pbmc_small',package='Seurat')
# Check cell groups:
table(Idents(pbmc_small))
#> 
#>  0  1  2 
#> 36 25 19 
#######
# Run COSG:
marker_cosg <- cosg(
 pbmc_small,
 groups='all',
 assay='RNA',
 slot='data',
 mu=1,
 n_genes_user=100)
#######
# Check the marker genes:
 head(marker_cosg$names)
#>       0      1     2
#> 1   CD7 S100A8 MS4A1
#> 2  CCL5   TYMP CD79A
#> 3  GNLY S100A9 TCL1A
#> 4 LAMP1  FCGRT  NT5C
#> 5  GZMA IFITM3 CD79B
#> 6   LCK   LST1 FCER2
 head(marker_cosg$scores)
#>           0         1         2
#> 1 0.6391917 0.8954042 0.6922908
#> 2 0.6391267 0.8312083 0.5832425
#> 3 0.6328148 0.8120045 0.5757478
#> 4 0.6164937 0.7755955 0.5533107
#> 5 0.5846589 0.7413060 0.5163446
#> 6 0.5795238 0.7380483 0.5115180
####### Run COSG for selected groups, i.e., '0' and 2':
#######
marker_cosg <- cosg(
 pbmc_small,
 groups=c('0', '2'),
 assay='RNA',
 slot='data',
 mu=1,
 n_genes_user=100)

Tip

  1. If you would like to identify more specific marker genes, you could assign mu to larger values, such as mu=10 or mu=100.
  2. You could set the parameter remove_lowly_expressed to TRUE to not consider genes expressed very lowly in the target cell group, and you can use the parameter expressed_pct to adjust the threshold for the percentage. For example:
marker_region<-cosg(
    seo,
  groups='all',
  assay='peaks',
  slot='data',
  mu=100,
  n_genes_user=100,
  remove_lowly_expressed=TRUE,
  expressed_pct=0.1
)

Citation

If COSG is useful for your research, please consider citing Dai, M., Pei, X., Wang, X.-J., 2022. Accurate and fast cell marker gene identification with COSG. Brief. Bioinform. bbab579.

cosgr's People

Contributors

genecell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cosgr's Issues

How to use COSGR within scATAC and Spatial?

Thank you very much for this package, it will greatly reduce the time for us to select unique gene.

But I found that there are only tutorials used in Seurat and Scanpy on github. How can it be used in scATAC and spatial?

For example, how can it be integrated into the ArchR workflow? thank you very much.

Errors

Hi,

Thank you for sharing the wonderful method. Recently, I ran into issues with my new Apple silicon computer when I use cosg with a Seurat object. I could not figure out what was the problem. I will appreciate it if you can help.

marker_cosg <- COSG::cosg(seu, groups='all', assay='RNA', slot='data', mu=1, n_genes_user=2) \

Error in tabulate(genexcell[, idx_i]@i + 1) :
trying to get slot "i" from an object of a basic class ("matrix") with no slots

Here is my sessionInfo:

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10 purrr_1.0.1 readr_2.1.3
[6] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 SeuratObject_4.1.3
[11] Seurat_4.3.0

loaded via a namespace (and not attached):
[1] googledrive_2.0.0 Rtsne_0.16 colorspace_2.0-3 deldir_1.0-6
[5] ellipsis_0.3.2 ggridges_0.5.4 fs_1.5.2 rstudioapi_0.14
[9] spatstat.data_3.0-0 leiden_0.4.3 listenv_0.9.0 ggrepel_0.9.2
[13] lubridate_1.9.0 fansi_1.0.3 xml2_1.3.3 codetools_0.2-18
[17] splines_4.2.2 polyclip_1.10-4 jsonlite_1.8.4 COSG_0.9.0
[21] broom_1.0.2 ica_1.0-3 cluster_2.1.4 dbplyr_2.3.0
[25] png_0.1-8 uwot_0.1.14 shiny_1.7.4 sctransform_0.3.5
[29] spatstat.sparse_3.0-0 compiler_4.2.2 httr_1.4.4 backports_1.4.1
[33] assertthat_0.2.1 Matrix_1.5-3 fastmap_1.1.0 lazyeval_0.2.2
[37] gargle_1.2.1 cli_3.6.0 later_1.3.0 htmltools_0.5.4
[41] tools_4.2.2 igraph_1.3.5 gtable_0.3.1 glue_1.6.2
[45] RANN_2.6.1 reshape2_1.4.4 Rcpp_1.0.9 scattermore_0.8
[49] cellranger_1.1.0 vctrs_0.5.1 spatstat.explore_3.0-5 nlme_3.1-160
[53] progressr_0.13.0 lmtest_0.9-40 spatstat.random_3.0-1 globals_0.16.2
[57] rvest_1.0.3 timechange_0.2.0 mime_0.12 miniUI_0.1.1.1
[61] lifecycle_1.0.3 irlba_2.3.5.1 googlesheets4_1.0.1 goftest_1.2-3
[65] future_1.30.0 MASS_7.3-58.1 zoo_1.8-11 scales_1.2.1
[69] hms_1.1.2 promises_1.2.0.1 spatstat.utils_3.0-1 parallel_4.2.2
[73] RColorBrewer_1.1-3 reticulate_1.27 pbapply_1.7-0 gridExtra_2.3
[77] stringi_1.7.12 rlang_1.0.6 pkgconfig_2.0.3 matrixStats_0.63.0
[81] lattice_0.20-45 ROCR_1.0-11 tensor_1.5 patchwork_1.1.2
[85] htmlwidgets_1.6.1 cowplot_1.1.1 tidyselect_1.2.0 parallelly_1.34.0
[89] RcppAnnoy_0.0.20 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1
[93] generics_0.1.3 DBI_1.1.3 withr_2.5.0 haven_2.5.1
[97] pillar_1.8.1 proxyC_0.3.3 fitdistrplus_1.1-8 survival_3.4-0
[101] abind_1.4-5 sp_1.5-1 future.apply_1.10.0 crayon_1.5.2
[105] modelr_0.1.10 KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_3.0-3
[109] plotly_4.10.1 tzdb_0.3.0 readxl_1.4.1 grid_4.2.2
[113] data.table_1.14.6 reprex_2.0.2 digest_0.6.31 xtable_1.8-4
[117] httpuv_1.6.8 RcppParallel_5.1.6 munsell_0.5.0 viridisLite_0.4.1

Seurat V5 and cosg

Is there any way to solve the problem?

marker_cosg <- cosg(scRNA_object, groups='all',assay='SCT',

  •                 slot='data',mu=1,n_genes_user=100,
    
  •                 remove_lowly_expressed=TRUE,
    
  •                 expressed_pct=0.1)
    

Error in ..subscript.2ary(x, l[[1L]], l[[2L]], drop = drop[1L]) :
NA subscripts in x[i,j] not supported for 'x' inheriting from sparseMatrix
addition: Warning message:
x or y has vectors with all zero; consider setting use_nan = TRUE to set these values to NaN or use_nan = FALSE to suppress this warning

group option fails

Hi

thanks for the great tool. I just noticed that I can't use the groups options to run on specific identities due to the fact that in Seurat V4 subset is not in Seurat but in the SeuratObject library.

cheers
daniel

error when running COSG for selected groups

Hi!

I'm trying to run your usage sample and I am having an error when running this part:

####### Run COSG for selected groups, i.e., '0' and 2':
#######
marker_cosg <- cosg(
 pbmc_small,
 groups=c('0', '2'),
 assay='RNA',
 slot='data',
 mu=1,
 n_genes_user=100)

I get the error:

Error in if (groups == "all") { : the condition has length > 1

Apparently, because the object groups has more than 1 condition, it triggers an if statement error. How do you overcome this?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.