Giter VIP home page Giter VIP logo

celligner's Introduction

Celligner

Celligner is a computational approach for aligning tumor and cell line transcriptional profiles.

To learn more, see the paper

Remark

Celligner is initially an R project that you can find in the R/ folder.

A Python version was made that performs the same computations as the R version, but the results may differ slightly due to small implementation differences in the Louvain clustering and contrastive PCA steps.

Overview

A reference expression dataset (e.g. CCLE cell lines) should be fit using the fit() function, and a target expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the transform() function. See the run_celligner.py script for example usage. Celligner is unsupervised and does not require annotations to be run; as such they are not used in this version of the model but can be added post-hoc to aid in interpretation of the output. See the celligner_output.ipynb notebook for an example of how to draw an output UMAP.

The Celligner output can be explored at: https://depmap.org/portal/celligner/

Install

To see the old R package installation instruction, see the R/ folder.

Before running pip, make sure that you have R installed.

To install the latest version of Celligner in dev mode, run the following (note that Celligner requires the specific version of mnnpy that is associated with the repository as a submodule):

git clone https://github.com/broadinstitute/celligner.git
git checkout new_dev
cd celligner
pip install -e .
cd mnnpy 
pip install .

A dockerfile and build script is also provided.

Using Celligner

Celligner has fit() and transform() functions in the style of scikit-learn models.

A reference expression dataset (e.g. CCLE cell lines TPM expression) should first be fit:

from celligner import Celligner

my_celligner = Celligner()
my_celligner.fit(CCLE_expression)

A target expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the transform function:

my_celligner.transform(TCGA_expression)

The combined transformed expression matrix can then be accessed via my_celligner.combined_output. Clusters, UMAP coordinates and tumor-model distances for all samples can be computed with my_celligner.computeMetricsForOutput(). There are also functions to save/load a fitted Celligner model as a .pkl file.

Aligning the target dataset to a new reference dataset

This use case is for the scenario where you want to align the same target dataset to a new reference dataset (which might be the same reference dataset as before with some new samples). In this case you can call transform without the target dataset to re-use the previous target dataset and skip re-doing some computation (see diagram below).

my_celligner.fit(new_reference_expression)
my_celligner.transform()

Aligning a third dataset to the previous combined output

This use case is for the scenario where you have a third dataset (e.g. Met500 tumor samples), that you want to align the the previously aligned (e.g. CCLE+TCGA) dataset. This is the current approach for multi-dataset alignment taken by the Celligner app.

my_celligner.makeNewReference()
# The value of k1 should be selected based on the size of the new dataset. 
# We use k=20 for Met500 (n=~850), and k1=10 for the PDX datasets (n=~250-450).
my_celligner.mnn_kwargs.update({"k1":20, "k2":50}) 
my_celligner.transform(met500_TPM, compute_cPCs=False)

Diagram

This diagram provides an overview of how Celligner works, including for the different use cases described above.

Computational complexity

Depending on the dataset, Celligner can be quite memory hungry. For TCGA, expect at least 50-60Gb of memory to be used. You might need a powerfull computer, lots of swap and to increase R's default maximum allowed memory.

You can also use the low_memory=True option to reduce the memory used by Celligner in the memory intensive PCA & cPCA methods.

R Celligner

For the original R version of celligner, please check the R/README.md file here: https://github.com/broadinstitute.org/celligner/tree/master/R/README.md


Initial project:

Allie Warren @awarren

Initial python version:

Jérémie Kalfon @jkobject

Current maintainer:

Barbara De Kegel @bdekegel

celligner's People

Contributors

acwarren avatar dekegel avatar goldenla avatar jkobject avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

celligner's Issues

Installation gives error

installing with devtools::install_github("broadinstitute/celligner/R") give error:


Error: package or namespace load failed for  celligner  in namespaceExport(ns, exports):
 undefined exports: .average_correction, .center_along_batch_vector, .compute_tricube_average, .tricube_weighted_correction, calc_gene_stats, calc_tumor_CL_cor, check_NAs, cluster_data, create_Seurat_object, find_differentially_expressed_genes, get_cluster_averages, load_additional_data, load_data, modified_mnnCorrect, run_Celligner, run_MNN, run_cPCA, run_cPCA_analysis, run_lm_stats_limma_group, run_multidataset_alignment
Error: loading failed
Execution halted
ERROR: loading failed

sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
LAPACK: /company/software/lapack/3.6.0/lib/liblapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] ps_1.7.2          prettyunits_1.1.1 rprojroot_2.0.3   crayon_1.5.2     
 [5] withr_2.5.0       R6_2.5.1          lifecycle_1.0.3   magrittr_2.0.3   
 [9] rlang_1.0.6       cachem_1.0.6      cli_3.4.1         curl_4.3.3       
[13] remotes_2.4.2     fs_1.5.2          callr_3.7.3       ellipsis_0.3.2   
[17] devtools_2.4.3    tools_4.2.1       glue_1.6.2        purrr_0.3.5      
[21] pkgload_1.3.0     fastmap_1.1.0     compiler_4.2.1    processx_3.8.0   
[25] pkgbuild_1.3.1    sessioninfo_1.2.2 memoise_2.0.1     usethis_2.1.6    
> 

ValueError: annotations do not match X_pression

Hi,

Thanks for making the tool available.
I am trying to run it but I am struggling a bit. As I do not have access to the test datasets, I do not know how the annotation files are supposed to look like. I think my expression and annotation matrices have the same size, but when I try to run the .fit() function I get this error. Any ideas why? thanks.

image
image

__init__() got an unexpected keyword argument 'low_memory'

Hi team,

Thank you for developing this powerful tool. We are trying to use it for our own data but it hit this error when I tried to run the .transform().

reducing dimensionality...
doing differential expression analysis on the clusters..
running differential expression on 27 clusters
running limmapy on the samples
you need to have R installed with the limma library installed
3.5.3
there is 0.283 overlap between the fit and transform dataset in their most variable genes
doing cPCA..

TypeError Traceback (most recent call last)
/var/folders/7r/70gsq7hd1p5f6d8cgzf0ftb80000gr/T/ipykernel_43417/3434172114.py in
----> 1 _ = my_alligner.transform(X_val_one_percentile)

~/anaconda3/envs/local/lib/python3.9/site-packages/celligner/init.py in transform(self, X_pression, annotations, only_transform, _rerun, _doCPCA, recompute_contamination)
580 # TODO: try the automated version, (select the best alpha above 1?)
581 self.cpca_loadings = (
--> 582 CPCA(
583 standardize=False,
584 n_components=self.cpca_ncomp,

TypeError: init() got an unexpected keyword argument 'low_memory'

Looking at the code, it passes this argument when creating the CPCA instance.
https://github.com/broadinstitute/celligner/blob/master/celligner/__init__.py#L585

However, checking the contrastive package, it seems it doesn't accept this argument even for their latest version on Github.
https://github.com/abidlabs/contrastive/blob/master/contrastive/__init__.py#L44

Could you please advise what I should do in this case? Thank you very much in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.