Giter VIP home page Giter VIP logo

celloracle's Introduction

CellOracle

GitHub Workflow Status PyPI PyPI - Python Version PyPI - Wheel Downloads Docker Pulls

CellOracle is a python library for in silico gene perturbation analyses using single-cell omics data and Gene Regulatory Network models.

For more information, please read our paper: Dissecting cell identity via network inference and in silico gene perturbation.

Documentation, Codes, and Tutorials

CellOracle documentation is available through the link below.

Web documentation

Questions and errors

If you have a question, error, bug, or problem, please use the Github issue page.

Supported Species and reference genomes

  • Human: ['hg38', 'hg19']
  • Mouse: ['mm39', 'mm10', 'mm9']
  • S.cerevisiae: ["sacCer2", "sacCer3"]
  • Zebrafish: ["danRer7", "danRer10", "danRer11"]
  • Xenopus tropicalis: ["xenTro2", "xenTro3"]
  • Xenopus laevis: ["Xenopus_laevis_v10.1"]
  • Rat: ["rn4", "rn5", "rn6"]
  • Drosophila: ["dm3", "dm6"]
  • C.elegans: ["ce6", "ce10"]
  • Arabidopsis: ["TAIR10"]
  • Chicken: ["galGal4", "galGal5", "galGal6"]
  • Guinea Pig: ["Cavpor3.0"]
  • Pig: ["Sscrofa11.1"]

Changelog

Please go to this page.

celloracle's People

Contributors

alireza-majd avatar cmhct7 avatar dburkhardt avatar derpylz avatar dongzehe avatar iandriver avatar kenjikamimoto avatar kenjikamimoto-wustl122 avatar sam-morris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

celloracle's Issues

(doc request) How do I use Bayesian ridge regression?

I looked in help(oracle.get_links), but I only see documentation for bagging ridge regression. Is there a way I can specify Bayesian ridge regression instead? I'm asking because an example with 50x fewer cells than my actual data, 25x fewer clusters (test mode), and 4x fewer rounds of bagging (than the tutorial uses) currently takes 40-50 minutes to run (on a 2016 MacBook). I don't know if it all scales linearly, but if it does, that's 5000*45 minutes / (24*60 min/day) ≈ 156 days. And I'll need to pack up and move to grad school before then!

Warning messages and missing scores

Thank you for developing CellOracle. I'm trying out the program for the first time. I've been doing this by using the tutorial online here.

I see a lot of warnings when I run the code related to network analysis. I think these come from the get_links() function:

I see thousands of lines that read
"modularity is implemented for undirected"

and then, I also see several instances of:
"
Warning messages:
1: In closeness(g) :
At centrality.c:2617 :closeness centrality is not well-defined for disconnected graphs
2: In edge.betweenness.community(g) :
At community.c:460 :Membership vector will be selected based on the lowest modularity score.
3: In edge.betweenness.community(g) :
At community.c:467 :Modularity calculation with weighted edge betweenness community detection might not make sense -- modularity treats edge weights as similarities while edge betwenness treats them as distances
4: In leading.eigenvector.community(g, options = list(maxiter = 1e+06, :
At community.c:1597 :This method was developed for undirected graphs
finished

Does this indicate that something went wrong at this step?

When I proceed with the next steps and look at the figures in the networ_score_per_gene folder they look strange. There are missing values for most of the clusters, I've attached a plot as an example.
score_dynamics_in_seurat_clusters_2000_Gli2

Plotting issues

Hi Morris Lab,

Thanks for the great GRN inference package!

I've run into a small issue when trying to plot multiple plots sequentially in the same python session.
For example, if I call

links.plot_scores_as_rank(cluster="Bcell", n_gene=30, save=f"CellOracleFigures/Bcellranked_score")

It gives me a nice plot of the TF rankings for B cells.
However, if I then call

links.plot_scores_as_rank(cluster="ImCD4", n_gene=30, save=f"ImCD4ranked_score")

I get a folder full of overlaid plots from both B cells and ImCD4.
The only workaround I've found is to exit python, and reload my links object. Even re-loading the links object doesn't fix it.

Second question: is there a way to set the default plot as pdf instead of png?

I'm new to Python, so perhaps these are things most people know, but I couldn't find anything on google.

Thanks so much!

tutorial for add TF infor dictionary manually

In current tutorial, the TF_to_TG_dictionary was inverted to TG_to_TF_dictionary, and then add to oracle object with addTFinfo_dictionary function. But it seems not right, and I don't think you need to invert TF_to_TG_dictionary.

Thanks for this wonderful tool.

Error in knn_imputation

Hello,

I encountered an error from velocyto when running

oracle.knn_imputation(n_pca_dims=n_comps, k=k, balanced=True, b_sight=k*8, b_maxl=k*4, n_jobs=4)

I also tried using the dataset Paul et al. 15 as in the tutorial and followed the exact same steps (with adjusted annotations). I was able reproduce all the output but still ran into the same error. Please see the error message below.

image

Thanks in advance!
Lan

error when run get_links

Hi,

I got "No model found. Do fit first." error when using get_links. What model it refers to? I am not sure if I missed a step or anything.

links = oracle.get_links(cluster_name_for_GRN_unit="free_annotation", alpha=10,
... verbose_level=0, test_mode=True)
No model found. Do fit first.
Traceback (most recent call last):
File "", line 2, in
File "/opt/conda/lib/python3.7/site-packages/celloracle/trajectory/oracle_core.py", line 649, in get_links
verbose_level=verbose_level, test_mode=test_mode)
File "/opt/conda/lib/python3.7/site-packages/celloracle/network_analysis/network_construction.py", line 68, in get_links
alpha=alpha, bagging_number=bagging_number, verbose_level=verbose_level, test_mode=test_mode)
File "/opt/conda/lib/python3.7/site-packages/celloracle/network_analysis/network_construction.py", line 128, in fit_GRN_for_network_analysis
tn
.updateLinkList(verbose=False)
File "/opt/conda/lib/python3.7/site-packages/celloracle/network/net_core.py", line 516, in updateLinkList
linkList = pd.concat(linkList, axis=0)
File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 281, in concat
sort=sort,
File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 329, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Thanks,
Yuping

monocle/monocle3 compatibility issues

Hi @KenjiKamimoto-wustl122 ,

I have tried to update my cicero processed data but since I updated the latest version of celloracle and I run the following command:

input_cds <-  suppressWarnings(newCellDataSet(indata,
                            phenoData = pd,
                            featureData = fd,
                            expressionFamily=VGAM::binomialff(),
                            lowerDetectionLimit=0))

I keep having the following error:

Error in newCellDataSet(indata, phenoData = pd, featureData = fd, expressionFamily = VGAM::binomialff(), : could not find function "newCellDataSet"
Traceback:

1. suppressWarnings(newCellDataSet(indata, phenoData = pd, featureData = fd, 
 .     expressionFamily = VGAM::binomialff(), lowerDetectionLimit = 0))
2. withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))

I know this has to do with compatibility with monocle3 so I went and installed monocle. Then it works until I run this function:

input_cds <- preprocessCDS(input_cds, norm_method = "none")

and I get the following error:

Error in preprocessCDS(input_cds, norm_method = "none"): could not find function "preprocessCDS"

I have tried to install celloracle from scratch in a different machine and get the same error. Do you know what the issue may be?

Issue with Simulation

Hi,
Thanks for creating a great tool. I am trying to do the simulation test for some of the TFs I am interested in. For Two of the factors (Errb4 and Meis2), I am getting the following error. For one it complaints of not finding the gene in the var slot and for the other, there seems to not have enough regulatory connection in the GRNs. I was wondering if there are ways to tweak the network construction commands so that we could see the results for any gene of interest.
-Are there ways in which the gene of interest does not need to be variable?
-Also is possible to play around with the cut-off score to increase the connections in the network.

I am attaching both the errors as screenshots.
Erbb4

Meis2

Thanks in advance

Best,
Ann

error when importing cell oracle

Hi!

I installed cell oracle, as well as the other packages. however, when importing cell oracle, I get the following error:

Error: use() got an unexpected keyword argument 'warn'

import celloracle as co
Traceback (most recent call last):
File "", line 1, in
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/celloracle/init.py", line 8, in
from . import utility, network, network_analysis, go_analysis, data, data_conversion, oracle_utility
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/celloracle/utility/init.py", line 18, in
from .load_hdf5 import load_hdf5
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/celloracle/utility/load_hdf5.py", line 8, in
from ..motif_analysis.tfinfo_core import load_TFinfo
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/celloracle/motif_analysis/init.py", line 11, in
from .motif_analysis_utility import is_genome_installed
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/celloracle/motif_analysis/motif_analysis_utility.py", line 24, in
from gimmemotifs.scanner import Scanner
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 29, in
from gimmemotifs.utils import parse_cutoff,as_fasta,file_checksum
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/utils.py", line 29, in
from gimmemotifs.plot import plot_histogram
File "/home/jovyan/my-conda-envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/plot.py", line 19, in
mpl.use("Agg", warn=False)
TypeError: use() got an unexpected keyword argument 'warn'

Thanks!

problems with install (louvain)

Hi,

I am trying to install cellOracle on a shared server, which I don't have the sudo permission.
OS: CentOS 7.8. queing system: slurm
I tried first to go through the installation webpage, but failed.

Then I tried to install in the following steps:
ml purge
ml load Python/3.6.6-foss-2018b
virtualenv --system-site-packages /xxx/home/user/cellOracle
source /xxx/home/user/cellOracle/bin/activate
conda install gcc_linux-64 llvm
pip install numpy scipy cython numba matplotlib scikit-learn h5py click pysam
pip install velocyto
git clone https://github.com/velocyto-team/velocyto.py.git
cd software/velocyto.py/
pip install -e . #velocyto --help, check install successfully or not
pip install scanpy==1.4.4 umap-learn==0.3.10
pip install genomepy==0.5.5 gimmemotifs==0.13.1
pip install goatools pyarrow tqdm joblib jupyter
All above works.

But when I come to the final step, which is installed cellOracle, stucked.
pip install git+https://github.com/morris-lab/CellOracle.git
It seems related to louvain, which I also tried to install it before install cellOracle:
pip install python-louvain
pip install --user --upgrade python-louvain

The error log:
ERROR: Failed building wheel for louvain
Running setup.py clean for louvain
Successfully built celloracle
Failed to build louvain
Installing collected packages: louvain, fa2, celloracle
Running setup.py install for louvain ... error

Any ideas on how to fix it? thanks a lot!

best practices for multiple TFinfo

I have a set of single cell data where each cluster has it's own set of ATACseq data. I've already created the TF_to_TG_dictionaries, but now I'm unsure if i should:

  1. create multiple celloracle objects (with all the cells), each time loading a different TF_to_TG_dictionary (then throwing away the networks that are not matching the TF_to_TG_dictionary)
  2. create an abbreviated celloracle object for each cluster (so a subset of cells), and load the appropriate TF_to_TG_dictionary for each cluster.

I guess my real question is: does the GRN inference step for each cluster use information from the neighboring cells?

Using mouse ATAC Atlas with mm10

I'd like to compare the results I get from CellOracle with my ATAC data and with mouse ATAC atlas. Since my data is mapped to mm10 I'd like to run liftover on the atlas data. However, is it correct to take the output from co.data.load_TFinfo_df_mm9_mouse_atac_atlas() and translate the genomic coordinates, or should I start from all the way from the normalised counts?

EDIT: I think the coordinates of TFinfo_df aren't taken into account when oracle.import_TF_data is used, are they? Does this idea of translating mm9 to mm10 make sense to you at all?

Thanks!

ATACseq as 'base' GRN structure.

Hi, thank you all for developing this computational tool.
I am new to bioinformatics, would like to use CellOracle to practice finding and inferring GRNs.

If I were to use a human ATACseq data to assemble the 'base' GRN structure, do I have to process the raw data first such as quantifying the transcripts reads from the raw data or alignment them via bowtie2? Or do I just import them as fastq.gz files using the code in the tutorial step 1 (ATAC-seq data preprocessing)?
Because I notice the mouse ATACseq file obtained from 10x genomics are already in a different format.
Thanks in advance!

Regards,
Wil

Preprint fig 7: duplicated panel?

Maybe I need a better monitor, but I cannot spot any difference between the "velocity" diagrams for Zfp57 KO and OE. Is it possible you accidentally included the knockout twice?

PCA memory requirements

I'm trying CellOracle using an scRNA dataset with ~50k cells. Running PCA is taking over an hour, and currently (it's still running) uses 34G of memory. Is this expected? With SCANPY, this would be much quicker, right?

Matplotlib on macosx - python as framework

Hi,

After the conda installation following the steps documented, when importing cell oracle, I got the following error -

import celloracle
Traceback (most recent call last):
File "", line 1, in
File "../CellOracle/celloracle/init.py", line 8, in
from . import utility, network, network_analysis, go_analysis, data, data_conversion
File "../CellOracle/celloracle/utility/init.py", line 13, in
from .load_hdf5 import load_hdf5
File "../CellOracle/celloracle/utility/load_hdf5.py", line 5, in
from ..network_analysis import load_links
File "../CellOracle/celloracle/network_analysis/init.py", line 7, in
from . import gene_analysis
File "../CellOracle/celloracle/network_analysis/gene_analysis.py", line 19, in
import matplotlib.pyplot as plt
File "../miniconda3/envs/celloracle_env_new2/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2372, in
switch_backend(rcParams["backend"])
File "../miniconda3/envs/celloracle_env_new2/lib/python3.6/site-packages/matplotlib/pyplot.py", line 207, in switch_backend
backend_mod = importlib.import_module(backend_name)
File "../miniconda3/envs/celloracle_env_new2/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "../miniconda3/envs/celloracle_env_new2/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 14, in
from matplotlib.backends import _macosx
ImportError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

matplotlib.version
'3.0.3'

I fixed this by adding

import matplotlib
if platform == "darwin": # OS X
matplotlib.use('TkAgg')

before

import matplotlib.pyplot as plt

in all celloracle .py files that imported matplotlib.

Inconsistent plots with adata.raw.X and imputed_counts + kernel dies at get transition probability

Hi, thank you all for developing this really useful computational tool!
I am having some problems with the predictive GRNs/TF that come out of the analysis. They don’t seem to replicate what I analyzed previously with scanpy (and canonical marker genes of each cell states) and when I plot the expression levels of “imputed_count” and adata.raw.X I get very different results (see attached figure).

plot

I don’t have ATAC-seq data so I used Scenic data that I previously calculated to make a custom dictionary following the steps described here “Make TF info dictionary manually”.

Furthermore, when I get to this stage
Get transition probability:
oracle.estimate_transition_prob(n_neighbors=200, knn_random=True, sampled_fraction=0.5)
the kernel dies.
Any suggestion would be greatly appreciated!
Thank you!
vittoria

Variability score of a gene is too low. Cannot perform simulation.

Hi Kenji (@KenjiKamimoto-wustl122),

Thanks for developing such amazing tool!
I have a couple of questions please. I was testing two different conditions. For example: tumour and normal, I found a gene which had high network score and I was able to perform the knockout simulation using that gene in the tumour samples. The results were interesting. However, when I tried to use that particular gene to perform the gene knockout simulation in healthy samples it shows an error saying that "Variability score of a gene is too low. Cannot perform simulation."

I was just wondering if you have any explanation for that please? Does it mean that this particular TF gene has very low variability score in normal samples (/healthy) and does not involve in any transition in the cell types please? Both conditions (all samples) using 3000 high variable genes and 20k cells.

Second question is that would it be possible to extract the results from the simulation for some tests (probability of a cell state transition based on the simulated data) ? like in a data frame please?

Thank you very much!

Using Different Motifs

Is it possible to use custom motifs for the scan?

It seems the gimmemotifs package links to an older version of the CisBP, and I was wondering if I could use version 2 of CisBP.

seurat-monocle object conversion to anndata

Hi

I am trying to implement cell oracle on Seurat-monocle analyzed data. I am able to successfully convert the rds file to anndata. In the object, there are dimensionality reduction from Seurat and monocle (Fig 1). But after converting to anndata I can only find X_umap (Fig 2 and 3). Is there a way to export all reductions? I would like to use the dimensions from monocle for plotting and further analysis.

Fig1
Fig2
Fig3

Mac OS install

Hi, I just wanted to share my experience installing on Mac OS 10.14.6 (Mojave). For velocyto, I hit an error very similar to this. It was fixed by doing conda install gcc llvm before retrying conda install velocyto. With the celloracle install, I worked through some errors by doing conda install cmake and conda install -c psi4 gcc-5. Here are the details.

With the celloracle install, I first got this error:

ERROR: Command errored out with exit status 1:
     command: //anaconda/envs/celloracle_env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/setup.py'"'"'; __file__='"'"'/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/pip-egg-info
         cwd: /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/
    Complete output (27 lines):
    ++ pwd
    + oldpath=/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost
    + cd ./xgboost/
    + mkdir -p build
    + cd build
    + cmake ..
    ./xgboost/build-python.sh: line 21: cmake: command not found
    + echo -----------------------------
    -----------------------------
    + echo 'Building multi-thread xgboost failed'
    Building multi-thread xgboost failed
    + echo 'Start to build single-thread xgboost'
    Start to build single-thread xgboost
    + cmake .. -DUSE_OPENMP=0
    ./xgboost/build-python.sh: line 27: cmake: command not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/setup.py", line 42, in <module>
        LIB_PATH = libpath['find_lib_path']()
      File "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/xgboost/libpath.py", line 50, in find_lib_path
        'List of candidates:\n' + ('\n'.join(dll_path)))
    XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
    List of candidates:
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/xgboost/libxgboost.dylib
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/xgboost/../../lib/libxgboost.dylib
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-xktd3clu/xgboost/xgboost/./lib/libxgboost.dylib
    //anaconda/envs/celloracle_env/xgboost/libxgboost.dylib
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

This error remains after conda install xgboost, but after conda install cmake, it changes to the following.

  ERROR: Command errored out with exit status 1:
     command: //anaconda/envs/celloracle_env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/setup.py'"'"'; __file__='"'"'/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/pip-egg-info
         cwd: /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/
    Complete output (61 lines):
    ++ pwd
    + oldpath=/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost
    + cd ./xgboost/
    + mkdir -p build
    + cd build
    + cmake ..
    -- The CXX compiler identification is GNU 4.8.5
    -- The C compiler identification is GNU 4.8.5
    -- Checking whether CXX compiler has -isysroot
    -- Checking whether CXX compiler has -isysroot - yes
    -- Checking whether CXX compiler supports OSX deployment target flag
    -- Checking whether CXX compiler supports OSX deployment target flag - yes
    -- Check for working CXX compiler: /anaconda/envs/celloracle_env/bin/c++
    -- Check for working CXX compiler: /anaconda/envs/celloracle_env/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Checking whether C compiler has -isysroot
    -- Checking whether C compiler has -isysroot - yes
    -- Checking whether C compiler supports OSX deployment target flag
    -- Checking whether C compiler supports OSX deployment target flag - yes
    -- Check for working C compiler: /anaconda/envs/celloracle_env/bin/cc
    -- Check for working C compiler: /anaconda/envs/celloracle_env/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- CMake version 3.14.0
    CMake Error at CMakeLists.txt:14 (message):
      GCC version must be at least 5.0!


    -- Configuring incomplete, errors occurred!
    See also "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/build/CMakeFiles/CMakeOutput.log".
    + echo -----------------------------
    -----------------------------
    + echo 'Building multi-thread xgboost failed'
    Building multi-thread xgboost failed
    + echo 'Start to build single-thread xgboost'
    Start to build single-thread xgboost
    + cmake .. -DUSE_OPENMP=0
    -- CMake version 3.14.0
    CMake Error at CMakeLists.txt:14 (message):
      GCC version must be at least 5.0!


    -- Configuring incomplete, errors occurred!
    See also "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/setup.py", line 42, in <module>
        LIB_PATH = libpath['find_lib_path']()
      File "/private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/libpath.py", line 50, in find_lib_path
        'List of candidates:\n' + ('\n'.join(dll_path)))
    XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
    List of candidates:
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/libxgboost.dylib
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/../../lib/libxgboost.dylib
    /private/var/folders/8q/3gpm_n1j5rb5_8f1qrdc_1nh0000gn/T/pip-install-t3ki2ri1/xgboost/xgboost/./lib/libxgboost.dylib
    //anaconda/envs/celloracle_env/xgboost/libxgboost.dylib
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I then upgraded gcc with conda install -c psi4 gcc-5 and verified that gcc --version shows version 5.x. After that, the celloracle installation worked.

Error when importing AnnData

I get the error when importing AnnData object:

import_anndata_as_raw_count(adata=adata, cluster_column_name="cell_type", embedding_name="X_pca")

The error is originates from _get_clustercolor_from_anndata, as there's no cell_type_colors in my data:

KeyError: 'cell_type_colors'

I use pre-defined cell types and not the clustering results, that's why there's no color information. I guess I'd have to populate that manually with some random colors?

Memory requirements for markov transitions

Hi again! I'm running CO on a 54k-cell dataset using ~1k variable genes. The network inference and KO simulation runs fine, but I have problems with the Markov transitions used to visualize the simulated KO. My kernel keeps dying when I run oracle.estimate_transition_prob. I am currently running it like this:

oracle.estimate_transition_prob(n_neighbors=200, knn_random=True, sampled_fraction=0.01) 

I have had problems running velocyto.R on this same dataset in the past, and I think those were caused by storing a dense 54k by 54k matrix in memory, but I don't know if this is the same issue or not. What would you suggest doing?

Thanks!

Add to library path

I'm trying to run CellOracle in an HPC environment. Naturally, there's an issue with installing the dependencies. Some of the R packages aren't there, and the backlog for centralised installation is quite big.

I figured out I could create a new key in config:

config = {"R_path": "R", "n_cpu": cpu_count()}

something like:

config = {"R_path": "R", "lib_path": None, "n_cpu": cpu_count()}

I could then set it to point to my own library path where I have the packages. This config could then be provided as an argument to the Rscript:

command = f"{r_dir}Rscript {parent_path[0]}/rscripts_for_network_analysis/get_newtork_scores.R {folder}"

Then the R code can use it append to the .libPaths.

So I've got two questions:

  • Does that sound reasonable?
  • Would you potentially accept that into the code base as a PR?

Pointing out typos in filenames and the documentation

Hi, I'm Naoto Yamaguchi from the University of Tokyo.
I enjoyed reading your paper and watching the CellOracle workshop video on YouTube!!

I found some trivial typos in your filenames.

  • CellOracle/docs/notebooks/04_Network_analysis/Network_analysis_with_with_Paul_etal_2015_data.ipynb
    →Network_analysis_with_Paul_etal_2015_data.ipynb

  • CellOracle/docs/notebooks/05_simulation/Gata1_KO_simulation_with_with_Paul_etal_2015_data.ipynb
    →Gata1_KO_simulation_with_Paul_etal_2015_data.ipynb

  • CellOracle/celloracle/data_conversion/process_srurat_object.py
    →process_seurat_object.py


In addition, I found some format errors in the documentation.

  • indent is required in the table of content
  1. Transcription factor binding motif scan
    3.1 check reference genome installation
    https://morris-lab.github.io/CellOracle.documentation/tutorials/motifscan.html
  • There are some duplicate section numbers
  1. Single-cell RNA-seq data preprocessing
    A. scRNA-seq data preprocessing with scanpy
    (8. Check data / 8. Make annotation for cluster)
    https://morris-lab.github.io/CellOracle.documentation/tutorials/scrnaprocess.html#a-scrna-seq-data-preprocessing-with-scanpy

  2. Network analysis
    (4. Save and Load. / 4. GRN calculation)
    https://morris-lab.github.io/CellOracle.documentation/tutorials/networkanalysis.html

  • section number is missing
  1. Simulation with GRNs
    (section 5.2 is missing)
    https://morris-lab.github.io/CellOracle.documentation/tutorials/simulation.html

Problem saving oracle large object.

Hi,

I'm trying to save my oracle object using the oracle.to_hdf5 function, but I run into the following error:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-17-b7f5462aee49> in <module>
----> 1 oracle.to_hdf5("/home/large.celloracle.oracle")

~/my-conda-envs/celloracle/lib/python3.6/site-packages/celloracle/trajectory/oracle_core.py in to_hdf5(self, file_path)
     91         dump_hdf5(obj=self, filename=file_path,
     92                   data_compression=compression_opts,  chunks=(2048, 2048),
---> 93                   noarray_compression=compression_opts, pickle_protocol=2)
     94 
     95 

~/my-conda-envs/celloracle/lib/python3.6/site-packages/celloracle/utility/hdf5_processing.py in dump_hdf5(obj, filename, data_compression, chunks, noarray_compression, pickle_protocol)
     92                                          fletcher32=False, shuffle=False)
     93             else:
---> 94                 serialized = _obj2uint(attribute, compression=noarray_compression, protocol=pickle_protocol)
     95                 _file.create_dataset(f"&{k}", data=serialized,
     96                                      chunks=tuple((min(1024, len(serialized)),)),

~/my-conda-envs/celloracle/lib/python3.6/site-packages/celloracle/utility/hdf5_processing.py in _obj2uint(obj, compression, protocol)
     23     An array encoding in bytes (uint8) the object pickled
     24     """
---> 25     zstr = zlib.compress(pickle.dumps(obj, protocol=protocol), compression)
     26     return np.frombuffer(zstr, dtype=np.uint8)
     27 

OverflowError: cannot serialize a string larger than 4GiB

Which seems to be caused by this issue.

Do you have any idea how to fix this?

NotImplementedError: "intersectBed" does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method.

Hi,

I was trying to rerun celloracle (it worked previously). I encountered this error:
NotImplementedError: "intersectBed" does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method.

when running "tss_annotated = ma.get_tss_info(peak_str_list=peaks, ref_genome= "hg38")"

problem with integrate_tss_peak_with_cicero

Hi,

I found the output after run ma.integrate_tss_peak_with_cicero is not reasonable, it only kept the ones with coaccess = 1.
The number of peaks is 116374, the number of cicero_connections is 11976636. The number of TSS_annotation is 12511.
After integration, only 12511 left.

I wonder
galGal6_tss_info.bed.txt

step5_celloracle_step2a_tssIntegCicero.Rmd.txt

all_peaks.csv.zip

if the my custom TSS_annotation could be the reason for the abnormal output.
I attached the data I used, and also the command which is from tutorial. The cicero_connection file is too large to upload...

Error with `links.plot_cartography_term`

Hi,

I'm trying to get the links.plot_cartography_term plot to run and I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-94-c684043bc52c> in <module>
----> 1 links.plot_cartography_term(goi = "MEF2A", save = f"{save_folder}/MEF2A_cartography")

~/my-conda-envs/celloracle/lib/python3.6/site-packages/celloracle/network_analysis/links_object.py in plot_cartography_term(self, goi, save)
    312                Plots will not be saved if [save=None]. Default is None.
    313         """
--> 314         plot_cartography_term(links=self, goi=goi, save=save)
    315 
    316 

~/my-conda-envs/celloracle/lib/python3.6/site-packages/celloracle/network_analysis/gene_analysis.py in plot_cartography_term(links, goi, save)
    379 
    380     #print(tt)
--> 381     tt = tt.loc[links.palette.index.values, order].fillna(0)
    382     sns.heatmap(data=tt, cmap="Blues", cbar=False)
    383     if not save is None:

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1279         # ugly hack for GH #836
   1280         if self._multi_take_opportunity(tup):
-> 1281             return self._multi_take(tup)
   1282 
   1283         # no shortcut needed

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
   1338         d = {
   1339             axis: self._get_listlike_indexer(key, axis)
-> 1340             for (key, axis) in zip(tup, o._AXIS_ORDERS)
   1341         }
   1342         return o._reindex_with_indexers(d, copy=True, allow_dups=True)

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in <dictcomp>(.0)
   1338         d = {
   1339             axis: self._get_listlike_indexer(key, axis)
-> 1340             for (key, axis) in zip(tup, o._AXIS_ORDERS)
   1341         }
   1342         return o._reindex_with_indexers(d, copy=True, allow_dups=True)

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1551 
   1552         self._validate_read_indexer(
-> 1553             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1554         )
   1555         return keyarr, indexer

~/my-conda-envs/celloracle/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1653             if not (ax.is_categorical() or ax.is_interval()):
   1654                 raise KeyError(
-> 1655                     "Passing list-likes to .loc or [] with any missing labels "
   1656                     "is no longer supported, see "
   1657                     "https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"  # noqa:E501

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

Do you have a workaround for this?

Thanks!

Degree distribution

Hi there,

Thanks for developing such awesome tool! I have been running the tutorial using my own datasets. However, I've encountered a problem when calculating the network score using links.get_score() function. The data has been log-transformed.

Error in if (get_go) { : argument is not interpretable as logical Execution halted Error in if (get_go) { : argument is not interpretable as logical
Execution halted
Modularity is implemented for undirected graphs only.
Modularity is implemented for undirected graphs only.

May I know if there's something wrong with my degree distribution network please? I was using
links.filter_links(p=0.001, weight="coef_abs", thread_number=2000)

My KNN calculation and I'm using 10 for the n_comps
n_comps = min(n_comps, 10)
pca1

and I've attached my degree distribution for one of my cluster:
cluster1

Thank you very much for your help.

How to generate hg19_tss_info.bed myself or where could it be downloaded

I would like to run cellOracle on my human scATAC-seq dataset, but I was lost about how to get a hg19 tss_annotated bed file as shown in the example. The bioRxiv paper says it came from HOMER. However, when I checked the webpage of HOMER, HOMER is to annotate where peaks located relative annotated gene regions, and it did not provide TSS annotation. Also, the output of HOMER is different from that of the example shown in cellOracle's tutorial. I also think of taking +-500 bp of the annotated gene start site as the TSS annotation. However, the region length of the example varies gene by gene. That also confuses me how should I get the annotated TSS regions. If the developers could explain how they get their mm9/10_tss_info.bed in detail, I would be much appreciative. Thanks a lot.

In silico perturbed gene expression values

Thank you again for this awesome tool!

During in silico perturbation simulation, namely "3.2. calculate future gene expression after perturbation", are the predicted expression values for all genes in response to perturbation of a single gene saved somewhere that can be retrieved for downstream analysis such as differential gene expression analysis?

For example, after simulating GATA1 KO, can we get the predicted differentially expressed genes in a specific cluster of interest?

Thank you!

threshold for tfi.filter_motifs_by_score

Hi,

Thank you for sharing the wonderful package. I am working through the tutorials. It is unclear to me how the CellOracle score for a certain motif/locus defined is defined and how a threshold, which was set at 10.5 in the tutorial 02, is determined.

Thank you,

James

The embedding param when importing data for network analysis

I'm following the tutorial and want to apply CellOracle to my own data. I'm not sure about the embedding_name param:

def import_anndata_as_raw_count(self, adata, cluster_column_name=None, embedding_name=None,

What kind of embedding can that be? I have a mixture of 3 cancer cell lines in one dataset, so I guess a trajectory isn't a good fit here. Can I use something like PCA or UMAP?

Thanks in advance!

Installation error

Hi,

I have been trying to install CellOracle using the instructions here. All of the dependencies installed without a problem, but when I tried to run pip install git+https://github.com/morris-lab/CellOracle.git, I got an error message that I do not know how to resolve:

$pip install git+https://github.com/morris-lab/CellOracle.git
Collecting git+https://github.com/morris-lab/CellOracle.git
  Cloning https://github.com/morris-lab/CellOracle.git to /tmp/pip-req-build-drlsr_n9
  Running command git clone -q https://github.com/morris-lab/CellOracle.git /tmp/pip-req-build-drlsr_n9
    ERROR: Command errored out with exit status 1:
     command: /data/homezvol1/smorabit/.conda/envs/celloracle_env/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-drlsr_n9/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-drlsr_n9/setup.py'"'"';
f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-19oh25yb
         cwd: /tmp/pip-req-build-drlsr_n9/
    Complete output (21 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-drlsr_n9/setup.py", line 16, in <module>
        from celloracle import __version__
      File "/tmp/pip-req-build-drlsr_n9/celloracle/__init__.py", line 8, in <module>
        from . import utility, network, network_analysis, go_analysis, data, data_conversion, oracle_utility

      File "/tmp/pip-req-build-drlsr_n9/celloracle/utility/__init__.py", line 18, in <module>
        from .load_hdf5 import load_hdf5
      File "/tmp/pip-req-build-drlsr_n9/celloracle/utility/load_hdf5.py", line 8, in <module>
        from ..motif_analysis.tfinfo_core import load_TFinfo
      File "/tmp/pip-req-build-drlsr_n9/celloracle/motif_analysis/__init__.py", line 11, in <module>
        from .motif_analysis_utility import is_genome_installed
      File "/tmp/pip-req-build-drlsr_n9/celloracle/motif_analysis/motif_analysis_utility.py", line 24, in <module>
        from gimmemotifs.scanner import Scanner
      File "/data/homezvol1/smorabit/.conda/envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 29, in <module>
        from gimmemotifs.utils import parse_cutoff,as_fasta,file_checksum
      File "/data/homezvol1/smorabit/.conda/envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/utils.py", line 29, in <module>
        from gimmemotifs.plot import plot_histogram
      File "/data/homezvol1/smorabit/.conda/envs/celloracle_env/lib/python3.6/site-packages/gimmemotifs/plot.py", line 19, in <module>
        mpl.use("Agg", warn=False)
    TypeError: use() got an unexpected keyword argument 'warn'
    ----------------------------------------
WARNING: Discarding git+https://github.com/morris-lab/CellOracle.git. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I would greatly appreciate your help in resolving this as I was really interested to use CellOracle!

Thanks,
Sam

error when I try to convert seurat object to anndata object

Hi!

I was trying to convert my already analysed data into an anndata object, using the API command, but I get this error:

input file name: /home/jovyan/data/scRNAseq_folder/Seurat_objects/RNAseq_Seurat_UMAP_neighbours_day.31.rds
loading seurat object ...
seurat object version is 3x
processing data ...
making matrix files ...
[1] "active_assay is RNA"
Error in Matrix::writeMM(obj = SO@assays$RNA@data, file = "tmp/data.mtx") :
Cholmod error 'error reading/writing file' at file ../Check/cholmod_write.c, line 634
Calls: ->
Execution halted
Build process aborts.

run scan motifs independently, then import into cellOracle

Hi,

I wonder if it is possible to run gimme motifs or scan independently, then import the results to cellOracle for the following steps.
I am asking this, as the version of gimme CellOracle needed did not work well in my system. While the latest version of gimmemotifs and genomepy work well. It would be great if there is a port for that.

Big thanks!

ValueError: Length of passed values is 2, index implies 70.

Hi @KenjiKamimoto-wustl122 !

I'm trying to run celloracle on a new dataset and I'm getting an error that I'm not sure how to fix. I'm following the oroginal
workflow from the tutorial, and when I reach this point:

%%time
links = oracle.get_links(cluster_name_for_GRN_unit = "scNym", alpha = 10,
                         verbose_level = 10, test_mode = False)

I get the following error:

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=30.0), HTML(value='')))
inferring GRN for B_cells...
HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=3689.0), HTML(value='')))


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed exec> in <module>

~/miniconda3/lib/python3.8/site-packages/celloracle/trajectory/oracle_core.py in get_links(self, cluster_name_for_GRN_unit, alpha, bagging_number, verbose_level, test_mode)
    902 
    903         """
--> 904         links = get_links(oracle_object=self,
    905                           cluster_name_for_GRN_unit=cluster_name_for_GRN_unit,
    906                           alpha=alpha, bagging_number=bagging_number,

~/miniconda3/lib/python3.8/site-packages/celloracle/network_analysis/network_construction.py in get_links(oracle_object, cluster_name_for_GRN_unit, alpha, bagging_number, verbose_level, test_mode)
     65 
     66     # calculate GRN for each cluster
---> 67     linkLists = _fit_GRN_for_network_analysis(oracle_object, cluster_name_for_GRN_unit=cluster_name_for_GRN_unit,
     68                                   alpha=alpha, bagging_number=bagging_number,  verbose_level=verbose_level, test_mode=test_mode)
     69 

~/miniconda3/lib/python3.8/site-packages/celloracle/network_analysis/network_construction.py in _fit_GRN_for_network_analysis(oracle_object, cluster_name_for_GRN_unit, alpha, bagging_number, verbose_level, test_mode)
    121                          TFinfo_dic=oracle_object.TFdict,
    122                          verbose=False)
--> 123             tn_.fit_All_genes(bagging_number=bagging_number,
    124                               alpha=alpha, verbose=verbose)
    125 

~/miniconda3/lib/python3.8/site-packages/celloracle/network/net_core.py in fit_All_genes(self, bagging_number, scaling, model_method, command_line_mode, log, alpha, verbose)
    309             verbose (bool): Whether or not to show a progress bar.
    310         """
--> 311         self.fit_genes(target_genes=self.all_genes,
    312                        bagging_number=bagging_number,
    313                        scaling=scaling,

~/miniconda3/lib/python3.8/site-packages/celloracle/network/net_core.py in fit_genes(self, target_genes, bagging_number, scaling, model_method, save_coefs, command_line_mode, log, alpha, verbose)
    416 
    417                 for target_gene in loop:
--> 418                     coefs = _get_bagging_ridge_coefs(target_gene=target_gene,
    419                                                      gem=self.gem,
    420                                                      gem_scaled=self.gem_standerdized,

~/miniconda3/lib/python3.8/site-packages/celloracle/network/regression_models.py in get_bagging_ridge_coefs(target_gene, gem, gem_scaled, TFdict, cellstate, bagging_number, scaling, n_jobs, alpha, solver)
    128 
    129     # get results
--> 130     coefs = _get_coef_matrix(model, reg_all)
    131 
    132     # remove cellstate data from coefs

~/miniconda3/lib/python3.8/site-packages/celloracle/network/regression_models.py in _get_coef_matrix(ensemble_model, feature_names)
    143     n_estimater = len(ensemble_model.estimators_features_)
    144     coef_list = \
--> 145         [pd.Series(ensemble_model.estimators_[i].coef_,
    146                    index=feature_names[ensemble_model.estimators_features_[i]])\
    147          for i in range(n_estimater)]

~/miniconda3/lib/python3.8/site-packages/celloracle/network/regression_models.py in <listcomp>(.0)
    143     n_estimater = len(ensemble_model.estimators_features_)
    144     coef_list = \
--> 145         [pd.Series(ensemble_model.estimators_[i].coef_,
    146                    index=feature_names[ensemble_model.estimators_features_[i]])\
    147          for i in range(n_estimater)]

~/miniconda3/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    311                 try:
    312                     if len(index) != len(data):
--> 313                         raise ValueError(
    314                             f"Length of passed values is {len(data)}, "
    315                             f"index implies {len(index)}."

ValueError: Length of passed values is 2, index implies 70.

Do you have any ideas on what the issue may be?

The environment I'm using is here:

WARNING: If you miss a compact list, please try `print_header`!
-----
anndata     0.7.5
scanpy      1.6.0
sinfo       0.3.1
-----
OpenSSL             19.1.0
PIL                 7.2.0
anndata             0.7.5
appdirs             1.4.4
backcall            0.2.0
boltons             NA
bucketcache         0.12.1
cairo               1.19.1
celloracle          0.5.2
certifi             2020.06.20
cffi                1.14.0
chardet             3.0.4
colorama            0.4.3
concurrent          NA
cryptography        2.9.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
decorator           4.4.2
diskcache           5.0.3
encodings           NA
fbpca               NA
genericpath         NA
genomepy            0.8.4
geosketch           NA
get_version         2.1
gimmemotifs         0.14.4
goatools            v1.0.6
h5py                2.10.0
idna                2.7
igraph              0.7.1
ipykernel           5.3.4
ipython_genutils    0.2.0
ipyvue              1.4.0
ipyvuetify          1.5.1
ipywidgets          7.5.1
jedi                0.17.2
joblib              0.17.0
kiwisolver          1.2.0
legacy_api_wrap     1.2
llvmlite            0.34.0
logbook             1.5.3
logomaker           NA
loompy              3.0.6
louvain             0.6.1
matplotlib          3.2.0
mkl                 2.3.0
mpl_toolkits        NA
natsort             7.0.1
networkx            2.5
norns               NA
ntpath              NA
numba               0.51.2
numexpr             2.7.1
numpy               1.19.1
numpy_groupies      0.9.13
opcode              NA
packaging           20.4
pandas              1.1.3
parso               0.7.0
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
posixpath           NA
prompt_toolkit      3.0.7
psutil              5.7.2
ptyprocess          0.6.0
pyarrow             1.0.1
pybedtools          0.8.1
pycparser           2.20
pydoc_data          NA
pyexpat             NA
pyfaidx             0.5.9.1
pygments            2.7.1
pyparsing           2.4.7
pysam               0.16.0.1
pytz                2020.1
represent           1.6.0
requests            2.19.1
scanpy              1.6.0
scipy               1.5.2
seaborn             0.10.1
setuptools_scm      NA
sinfo               0.3.1
six                 1.14.0
sklearn             0.23.2
socks               1.7.1
sphinxcontrib       NA
sre_compile         NA
sre_constants       NA
sre_parse           NA
statsmodels         0.12.0
storemagic          NA
tables              3.6.1
threadpoolctl       2.1.0
tornado             6.0.4
tqdm                4.50.2
traitlets           4.3.3
typing_extensions   NA
urllib3             1.23
velocyto            0.17.17
wcwidth             0.2.5
xdg                 NA
xxhash              2.0.0
yaml                5.1.2
zmq                 19.0.2
-----
IPython             7.18.1
jupyter_client      6.1.6
jupyter_core        4.6.3
jupyterlab          2.2.8
notebook            6.1.1
-----
Python 3.8.3 (default, May 19 2020, 18:47:26) [GCC 7.3.0]
Linux-4.15.0-123-generic-x86_64-with-glibc2.10
60 logical CPU cores, x86_64
-----
Session information updated at 2020-11-23 17:03

checking_installation: rnetcarto -> NG

I already install the rnetcarto package by using install.packages("rnetcarto"). And I can libarry(rnetcarto). But when enter the python to check the library. It put the error.
Error in library(rnetcarto) : there is no package called ‘rnetcarto’
Execution halted
Build process aborts.
checking_installation: rnetcarto -> NG
R library, rnetcarto is unavailable. Please check installation.

I switch the R version 3.5.3 to version 3.6.2, and install all the dependent R packages including rnetcarto. But it still put the same error.

motif_analysis.scan runtime crash: Segmentation fault: 11

While running tfi.scan I am running into the following error which seems to be related to (https://kb.iu.edu/d/aqsj). Any ideas what could be causing this? I am using human, hg38, reference genome and using 14000 peaks from ATACseq data. I have run it with the default motifs and the program still crashes.

Here is the output:
Checking your motifs... Motifs format looks good.

Initiating scanner...

Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time.

Convert peak info into DNA sequences ...

Scanning motifs ... It may take several hours if you proccess many peaks.

HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value='')))
Segmentation fault: 11

about the gene expression data used in the linear model fitting

Hello, thank you a lot for the great tool! In your tutorial with the GRN recontrunction, you mentioned the following below:

##In this notebook, we use raw mRNA count as an input of Oracle object.
adata.X = adata.raw.X.copy()

The scRNASeq data associated with "adata" was processed using scanpy. What you mean by "raw mRNA count" in "adata.raw.X" here is actually the normalized data before log-transformation, and adata.X keeps the log-transformed and scaled data. Am I correct?

In the paper, you mentioned that normalized data without log-transformation should be used for model fitting. Is adata.X (along with TF info) used to fit the linear model describing the regulatory relationship between target genes and TFs during the GRN reconstruction procedure?

I have a scRNASeq dataset processed using Seurat. I converted the Seurat object to AnnData using the data_conversion module in CellOracle. Let's say the adata_from_seurat is the AnnData from the Seurat object. adata_from_seurat.raw.X here keeps the raw counts (that are integers from the CellRanger pipeline), not normalized data. adata_from_seurat.X keeps the log-normalized data from "NormalizeData" function in Seurat. Before building the GRN, should I make adata_from_seurat.X save the normalized data without log transformation (i.e., feature counts for a feature in a cell / the number of total features counts in that cell*scale_factor)? Your kind reply is much appreciated.

alternative method for motif scan

Hi,

I wonder if it is possible to import results of motif scan from other methods, such as Homer.
As I really struggled hard with gimmemotifs, and always end up with error. No matter if it is the specific version mentioned in the installation, or their latest version. I went quite smooth with Homer.

I think it would make CellOracle more flexible.

Thanks & best regards,
Menghan

Single TF of interest ChIP-seq input

Hello! Thank you for this awesome tool! In particular, I would love to be able to simulate changes in a scRNAseq data set after a single transcription factor of interest is perturbed.

From the tutorial, it appears that the GRN is inferred from ATAC-seq data and nearby TF motifs.
If I have ChIP-seq data for my single TF of interest, is it possible to use this instead of an ATACseq dataset to predict perturbation changes (or is there a way to contribute this added information?) Or is it important for CellOracle to have comprehensive information on all the TFs and their target genes based on motifs in open chromatin regions?

Thank you!
Jenny

GRNs and gene perturbation simulation

Hello,

Thank you for developing Cell Oracle. I am interested in using this package to analyze my data (specifically starting from 5. Simulation with GRNs). I am wondering whether I can use GRNs generated by pySCENIC to simulate signal propagation inside a cell with Cell Oracle? Or, do I need to follow your pipeline entirely and generate the GRNs as described here? I want to use Cell Oracle because I am interested in using it to estimate the effect of gene perturbation (either KO or over expression) and its effect on cell fate.

Thank you for your time!

Reference genome for new species

Hi Kenji,

Thank you and the team for this fantastic package! We performed in silico gene over-expression in mouse heart scRNA-seq and the result matched well to the real data generated in the lab! For broader usage of this great tool, I'm wondering if you could add Guinea pig reference genome for TSS annotation in the future, or let us know if there is a way to incorporate custom genomes.

Best,
Irene

question: Seurat label transfer

Hi,

I have already processed my scRNAseq and scATACseq data using Seurat and transferred labels from ScRNAseq to my scATACseq. at the moment I am in the process of running celloracle using peaks From all cells in the same experiment. I was wondering if it would make sense to separate peaks based on annotated cluster and then calculate everything (Cicero, TF scan) on a per cluster basis. I am asking this because this would limit the number of potential regulatory TFs to those relevant to each cluster.

Best wishes,
Ruxandra

add chicken genome to refgenomes

Hi,

I wonder if it is possible to add chicken genome galGal6 to reference genomes?
Although the latest version on Ensembl now is version100, what I used is version97. As we would like to keep consistent with other data from the lab.

It would be also helpful is there is some guideline so that make it possible for me to make reference genomes.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.