Giter VIP home page Giter VIP logo

crossicc's Introduction

CrossICC

check in Biotreasury

Table of the content

Overview

Unsupervised clustering of high-throughput molecular profiling data is widely adopted for discovering cancer subtypes. However, cancer subtypes derived from a single dataset are not usually applicable across multiple datasets from different platforms. We previously published an iterative clustering algorithm to address the issue (see this paper), but its use was hampered due to lack of implementation. In this work, we presented CrossICC that was an R package implementation of this method. Moreover, many new features were added to improve the performance of the algorithm. Briefly, CrossICC utilizes an iterative strategy to derive the optimal gene set and cluster number from consensus similarity matrix generated by consensus clustering. CrossICC is able to deal with multiple cross platform datasets so that requires no between-dataset normalizations. This package also provides abundant functions to help users visualize the identified subtypes and evaluate the subtyping performance. Specially, many cancer-related analysis methods are embedded to facilitate the clinical translation of the identified cancer subtypes.

There are two modes for the integration of clusters derived cross-platform datasets: cluster mode and sample mode. For cluster mode, samples from each platform are clustered separately and centroids of each sub cluster derived from ConsensusClusterPlus were further clustered to generate super cluster. This process avoided removing batch effect across platforms. The details step by step illustration of this algorithm can be found in our previous published paper and our recent submitted paper[coming soon]. For sample mode, sub clusters were firstly derived from ConsensusClusterPlus in each platform. We then calculated correlation coefficient between samples and centroids of clusters to get a new feature vector of each samples. Based on this new matrix, samples were divided into new clusters.

Installation

Via GitHub (latest)

  • Important! From bioconductor >3.12, CrossICC is nolonger available from biocondutor. This is because one core dependency of CrossICC MergeMaid is not mentained after that version. Here, we provide the only may to install MergeMaid before CrossICC installed:
  • Step 1. Download MergeMaid source code. via Shell console or directly download from URL below https://bioconductor.riken.jp/packages/3.1/bioc/src/contrib/MergeMaid_2.40.0.tar.gz
$ wget https://bioconductor.riken.jp/packages/3.1/bioc/src/contrib/MergeMaid_2.40.0.tar.gz
  • Step 2. Install CrossICC from R console.
# install MergeMaid from Source

install.packages("MergeMaid_2.40.0.tar.gz",build="source")
# install CrossICC from github
install.packages("devtools")
devtools::install_github("bioinformatist/CrossICC")

Usage

CrossICC has the ability to automatically process arbitrary numbers of expression datasets, no matter which platform they came from (Even you can use sequencing and microarray data together). What you only need is a list of matrices in R, without any type of pre-processing (never need manipulation like filtering or normalization).

library(CrossICC)
data(demo.platforms)
CrossICC.obj <- CrossICC(demo.platforms, skip.mfs = TRUE, max.iter = 100, 
                         cross = "cluster", fdr.cutoff = 0.1, 
                         ebayes.cutoff = 0.1, filter.cutoff = 0.1)

CrossICC will automatically iterate your data until it reaches convergence. By default, CrossICC will generate an .rds formatted object in your home path (~/, a.k.a $HOME in Linux), followed by an shiny app as shown below that is opened in your default browser, which provides you a very intuitive way to view the results.

Shiny app

Our package also comes with a shiny app. To run it:

  • Step 3(optional)
pkg.suggested <- c('ggalluvial', 'ggsci','rmarkdown', 'knitr', 'shiny', 'shinydashboard', 'shinyWidgets', "shinycssloaders", 'DT', 'ggthemes', 'ggplot2', 'pheatmap', 'RColorBrewer', 'tibble')
checkPackages <- function(pkg){
  if (!requireNamespace(pkg, quietly = TRUE)) {
    warning(paste0("Package ",pkg," needed for shiny app. Installing...."))
    install.packages(pkg)
  }
}
lapply(pkg.suggested, checkPackages)
shiny::runApp(system.file("shiny", package = "CrossICC"))

FAQ

  • Question 1: NA values involved in our data set, how to go through them?

A: Users may encounter unexpected errors due to NA values in raw dataset. Therefore, we strongly recommanded that users checked the NA valus in their data set before loading it into CrossICC. To check the completed cases in matrix, completed.cases can be a good option to do that. Here, we also present an example for users to impute there data in case they don’t want to remove case in the dataset. The imputation method shown here are KNN method from impute package.

# for a individual matrix, plz do imputation using the following r code
tempdata.impute=impute.knn(as.matrix(tempdata) ,k = 10, rowmax = 0.5, colmax = 0.8)
normalize.Data=as.data.frame(tempdata.impute$data)
  • Question 2 : can I install the CrossICC from bioconductor.

A: No, from bioconductor >3.12, CrossICC is nolonger available from biocondutor. This is because one core dependency of CrossICC MergeMaid is not mentained and we didnot get the lisence to update the package. So users could only install our package from github directly.

Contribution

Qi Zhao @likelet and Yu Sun @bioinformatist implemented the packages. Zhixiang Zuo @zhixiang supervise the project. Zekun Liu performed the test and helped run an example of the package. For more information or questions, plz contact either of the authors above.

Citation

Zhao Qi, Yu Sun, Zekun Liu, Hongwan Zhang, Xingyang Li, Kaiyu Zhu, Ze-Xian Liu, Jian Ren, and Zhixiang Zuo.(2020). CrossICC: iterative consensus clustering of cross-platform gene expression data without adjusting batch effect. Briefings in Bioinformatics, 21(5), 1818-1824.

crossicc's People

Contributors

bioinformatist avatar likelet avatar xiucaikun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

crossicc's Issues

bug

您好,shiny的survival模块输入模板文件报错:Warning: Error in read.table: 'file' must be a character string or connection。

Could not predict the result with uploaded matrix

One of the users raise the question that they could not get the result from an upload data matrix, with the following error message:
image

However, i could not reproduce such error with dataset they provided.
A possible solution is that to rerun the analysis under the command version A typical R codes under R console are below:

library(CrossICC)
yourdataset<-read.csv("GSE87466_for_predict.txt",header=T,row.names=1, check.names=F)
# `CrossICC.object` was generated by your previous step with crossICC, 
#   which could be found at the home folder in your computer, you can read it by 

CrossICC.object<-readRDS("~/CrossICC.object.rds")

# then perform prediction
predicted <- predictor(as.matrix(yourdataset), CrossICC.object)

Please let me know if you have any other question.

List of Deprecated Packages for Bioc3.13

FYI, CrossICC is on this list: https://stat.ethz.ch/pipermail/bioc-devel/2020-December/017533.html

The Bioconductor Team is continuing to identify packages that will be deprecated in the next release to allow for the Bioconductor community to respond accordingly. The list will be updated monthly. This is the current list of deprecated packages for Bioc 3.13 :

Unmaintained/Nonresponsive

Software:

affyQCReport
APAlyzer
bigmemoryExtras
CrossICC
DBChIP
dexus
EasyqpcR
genoset
Imetagene
Polyfit
proFIA
RDAVIDWebService
rnaSeqMap
seqplots
SSPA

warning messages: 1: In balance.cluster(all.sig, cc = cc, cluster.cutoff = cluster.cutoff, : User defined max super cluster number is larger than real value, will refine it to (n - 1)!

数据集中包括三个批次样本,list包含三个批次时可以正常运行
当list中只有一个批次时运行过程中出现warning,并停止运行,无法得到结果

CrossICC.input <- list(bt1)
test <- CrossICC(CrossICC.input)
Only one matrix detected. MergeMaid will not work. Will skip cross analysis.
Wed Apr 26 23:59:29 2023 -- Pre-processing data
Wed Apr 26 23:59:29 2023 -- Removing features with no variance
Wed Apr 26 23:59:29 2023 -- Performing MAD filtering
Wed Apr 26 23:59:29 2023 -- Scaling
No study names provided or something goes wrong with your study names. Will use auto-generated study names instead.
Wed Apr 26 23:59:29 2023 -- start iteration: 1
41 genes were engaged in this iteration.
Wed Apr 26 23:59:34 2023 -- start iteration: 2
40 genes were engaged in this iteration.
Wed Apr 26 23:59:40 2023 -- start iteration: 3
38 genes were engaged in this iteration.
Wed Apr 26 23:59:45 2023 -- start iteration: 4
38 genes were engaged in this iteration.
Error in CrossICC(CrossICC.input) : Result file already existed!
In addition: Warning messages:
1: In balance.cluster(all.sig, cc = cc, cluster.cutoff = cluster.cutoff, :
User defined max super cluster number is larger than real value, will refine it to (n - 1)!
2: In balance.cluster(all.sig, cc = cc, cluster.cutoff = cluster.cutoff, :
User defined max super cluster number is larger than real value, will refine it to (n - 1)!
3: In balance.cluster(all.sig, cc = cc, cluster.cutoff = cluster.cutoff, :
User defined max super cluster number is larger than real value, will refine it to (n - 1)!
4: In balance.cluster(all.sig, cc = cc, cluster.cutoff = cluster.cutoff, :
User defined max super cluster number is larger than real value, will refine it to (n - 1)!
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.