broadinstitute / celligner Goto Github PK

View Code? Open in Web Editor NEW

7.0 5.0 3.0 30.89 MB

tumor - cancer cell line alignment. Use it on the depmap portal or install it with pip.

Home Page: https://cds.team/depmap/celligner/

License: The Unlicense

R 0.25% Dockerfile 0.02% Jupyter Notebook 91.17% Makefile 0.01% Python 0.12% HTML 8.43% Shell 0.01%

genomics tumor cancer alignement celligner rnaseq

celligner's Introduction

Celligner

Celligner is a computational approach for aligning tumor and cell line transcriptional profiles.

To learn more, see the paper

Remark

Celligner is initially an R project that you can find in the R/ folder.

A Python version was made that performs the same computations as the R version, but the results may differ slightly due to small implementation differences in the Louvain clustering and contrastive PCA steps.

Overview

A reference expression dataset (e.g. CCLE cell lines) should be fit using the fit() function, and a target expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the transform() function. See the run_celligner.py script for example usage. Celligner is unsupervised and does not require annotations to be run; as such they are not used in this version of the model but can be added post-hoc to aid in interpretation of the output. See the celligner_output.ipynb notebook for an example of how to draw an output UMAP.

The Celligner output can be explored at: https://depmap.org/portal/celligner/

Install

To see the old R package installation instruction, see the R/ folder.

Before running pip, make sure that you have R installed.

To install the latest version of Celligner in dev mode, run the following (note that Celligner requires the specific version of mnnpy that is associated with the repository as a submodule):

git clone https://github.com/broadinstitute/celligner.git
git checkout new_dev
cd celligner
pip install -e .
cd mnnpy 
pip install .

A dockerfile and build script is also provided.

Using Celligner

Celligner has fit() and transform() functions in the style of scikit-learn models.

A reference expression dataset (e.g. CCLE cell lines TPM expression) should first be fit:

from celligner import Celligner

my_celligner = Celligner()
my_celligner.fit(CCLE_expression)

A target expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the transform function:

my_celligner.transform(TCGA_expression)

The combined transformed expression matrix can then be accessed via my_celligner.combined_output. Clusters, UMAP coordinates and tumor-model distances for all samples can be computed with my_celligner.computeMetricsForOutput(). There are also functions to save/load a fitted Celligner model as a .pkl file.

Aligning the target dataset to a new reference dataset

This use case is for the scenario where you want to align the same target dataset to a new reference dataset (which might be the same reference dataset as before with some new samples). In this case you can call transform without the target dataset to re-use the previous target dataset and skip re-doing some computation (see diagram below).

my_celligner.fit(new_reference_expression)
my_celligner.transform()

Aligning a third dataset to the previous combined output

This use case is for the scenario where you have a third dataset (e.g. Met500 tumor samples), that you want to align the the previously aligned (e.g. CCLE+TCGA) dataset. This is the current approach for multi-dataset alignment taken by the Celligner app.

my_celligner.makeNewReference()
# The value of k1 should be selected based on the size of the new dataset. 
# We use k=20 for Met500 (n=~850), and k1=10 for the PDX datasets (n=~250-450).
my_celligner.mnn_kwargs.update({"k1":20, "k2":50}) 
my_celligner.transform(met500_TPM, compute_cPCs=False)

Diagram

This diagram provides an overview of how Celligner works, including for the different use cases described above.

Computational complexity

Depending on the dataset, Celligner can be quite memory hungry. For TCGA, expect at least 50-60Gb of memory to be used. You might need a powerfull computer, lots of swap and to increase R's default maximum allowed memory.

You can also use the low_memory=True option to reduce the memory used by Celligner in the memory intensive PCA & cPCA methods.

R Celligner

For the original R version of celligner, please check the R/README.md file here: https://github.com/broadinstitute.org/celligner/tree/master/R/README.md

Initial project:

Allie Warren @awarren

Initial python version:

Jérémie Kalfon @jkobject

Current maintainer:

Barbara De Kegel @bdekegel

celligner's People

Contributors

Stargazers

Watchers

Forkers

jkobject csmolnar abearab

celligner's Issues

Installation gives error

installing with devtools::install_github("broadinstitute/celligner/R") give error:


Error: package or namespace load failed for  celligner  in namespaceExport(ns, exports):
 undefined exports: .average_correction, .center_along_batch_vector, .compute_tricube_average, .tricube_weighted_correction, calc_gene_stats, calc_tumor_CL_cor, check_NAs, cluster_data, create_Seurat_object, find_differentially_expressed_genes, get_cluster_averages, load_additional_data, load_data, modified_mnnCorrect, run_Celligner, run_MNN, run_cPCA, run_cPCA_analysis, run_lm_stats_limma_group, run_multidataset_alignment
Error: loading failed
Execution halted
ERROR: loading failed

sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
LAPACK: /company/software/lapack/3.6.0/lib/liblapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] ps_1.7.2          prettyunits_1.1.1 rprojroot_2.0.3   crayon_1.5.2     
 [5] withr_2.5.0       R6_2.5.1          lifecycle_1.0.3   magrittr_2.0.3   
 [9] rlang_1.0.6       cachem_1.0.6      cli_3.4.1         curl_4.3.3       
[13] remotes_2.4.2     fs_1.5.2          callr_3.7.3       ellipsis_0.3.2   
[17] devtools_2.4.3    tools_4.2.1       glue_1.6.2        purrr_0.3.5      
[21] pkgload_1.3.0     fastmap_1.1.0     compiler_4.2.1    processx_3.8.0   
[25] pkgbuild_1.3.1    sessioninfo_1.2.2 memoise_2.0.1     usethis_2.1.6    
>

ValueError: annotations do not match X_pression

Hi,

Thanks for making the tool available.
I am trying to run it but I am struggling a bit. As I do not have access to the test datasets, I do not know how the annotation files are supposed to look like. I think my expression and annotation matrices have the same size, but when I try to run the .fit() function I get this error. Any ideas why? thanks.

init() got an unexpected keyword argument 'low_memory'

Hi team,

Thank you for developing this powerful tool. We are trying to use it for our own data but it hit this error when I tried to run the .transform().

reducing dimensionality...
doing differential expression analysis on the clusters..
running differential expression on 27 clusters
running limmapy on the samples
you need to have R installed with the limma library installed
3.5.3
there is 0.283 overlap between the fit and transform dataset in their most variable genes
doing cPCA..

TypeError Traceback (most recent call last)
/var/folders/7r/70gsq7hd1p5f6d8cgzf0ftb80000gr/T/ipykernel_43417/3434172114.py in
----> 1 _ = my_alligner.transform(X_val_one_percentile)

~/anaconda3/envs/local/lib/python3.9/site-packages/celligner/init.py in transform(self, X_pression, annotations, only_transform, _rerun, _doCPCA, recompute_contamination)
580 # TODO: try the automated version, (select the best alpha above 1?)
581 self.cpca_loadings = (
--> 582 CPCA(
583 standardize=False,
584 n_components=self.cpca_ncomp,

TypeError: init() got an unexpected keyword argument 'low_memory'

Looking at the code, it passes this argument when creating the CPCA instance.
https://github.com/broadinstitute/celligner/blob/master/celligner/__init__.py#L585

However, checking the contrastive package, it seems it doesn't accept this argument even for their latest version on Github.
https://github.com/abidlabs/contrastive/blob/master/contrastive/__init__.py#L44

Could you please advise what I should do in this case? Thank you very much in advance.