jaydu1 / vitae Goto Github PK

View Code? Open in Web Editor NEW

25.0 6.0 7.0 849.41 MB

Joint Trajectory Inference for Single-cell Genomics Using Deep Learning with a Mixture Prior

Home Page: https://jaydu1.github.io/VITAE/

License: MIT License

Python 0.63% Jupyter Notebook 99.28% R 0.09% Dockerfile 0.01%

tensorflow trajectory-inference python single-cell-sequencing

vitae's Introduction

Joint Trajectory Inference for Single-cell Genomics Using Deep Learning with a Mixture Prior

This is a Python package, VITAE, to perform trajectory inference for single-cell RNA sequencing (scRNA-seq). VITAE is a probabilistic method combining a latent hierarchical mixture model with variational autoencoders to infer trajectories from posterior approximations. VITAE is computationally scalable and can adjust for confounding covariates to learn a shared trajectory from multiple datasets. VITAE also provides uncertainty quantification of the inferred trajectory and cell positions and can find differentially expressed genes along the trajectory. For more information, please check out our manuscript on bioRXiv.

Tutorials

We provide some example notebooks. You could start working with VITAE on tutorial_dentate.

notebook	system	details	reference
tutorial_dentate	neurons	3585 cells and 2182 genes, 10x Genomics	Hochgerner et al. (2018)
tutorial_mouse_brain	neurons	16651 cells and 14707 genes	Yuzwa et al. (2017), Ruan et al. (2021)

In case GitHub rendering stops working, NbViewer is an alternative online tool to render Jupyter Notebooks.

Datasets and Documents are availble.

Dependency

Our Python package is available on conda-forge and PyPI and the user can install the CPU version with the following command:

# using conda with conda-forge channel
>>> conda install -c conda-forge pyvitae

# or using PyPI
>>> pip install pyvitae

To enable GPU for TensorFlow, one should install CUDA dependencies and the tensorflow-gpu package. We also recommend using conda, miniconda, or virtualenv to manage the Python environment and install the package in a new environment. After installing all required packages, one can open the Jupyter Notebook via the terminal:

>>> jupyter notebook

The required TensorFlow versions are:

Package	Version
tensorflow	>=2.3.0
tensorflow_probability	>=0.11.0

License

This project is licensed under the terms of the MIT license.

vitae's People

Contributors

Stargazers

Watchers

Forkers

natnaelt iceshadows masterstormtrooper tianyucodings zktuong ronfinn

vitae's Issues

Implement in scvi-tools

Hello,

I found your manuscript to be interesting and I'm wondering whether you have any interest in implementing a version that takes a pre-trained scvi-tools model as input (e.g., scVI) . I think this would get a lot of usage in our package!

Issue in model.pre_train when setting processed=True in model.preprocess_data

Hello,

I have an issue at the step in which the autoencoder is pretrained only when I give a preprocess anndata object (it works if the adata object is not preprocessed beforehand):

Preprocess data step:

# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )


# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                     data_type = 'Gaussian', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64,         # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)
                     processed=True)

Pretrain step:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-2da55840b803> in <module>
      3                 batch_size=256,              # (Optional) the batch size for pre-training (the default is 32).
      4                 alpha=0.10,                  # (Optional) the value of alpha in [0,1] to encourage covariate adjustment. Not used if there is no covariates.
----> 5                 num_epoch = 300,             # (Optional) the maximum number of epoches (the default is 300).
      6                 ) 

~/anaconda3/lib/python3.7/site-packages/VITAE/VITAE.py in pre_train(self, stratify, test_size, random_state, learning_rate, batch_size, L, alpha, num_epoch, num_step_per_epoch, early_stopping_patience, early_stopping_tolerance, path_to_weights)
    274                                                 batch_size,
    275                                                 self.X[id_train].astype(tf.keras.backend.floatx()),
--> 276                                                 self.scale_factor[id_train].astype(tf.keras.backend.floatx()))
    277         self.test_dataset = train.warp_dataset(self.X_normalized[id_test], 
    278                                                 None if self.c_score is None else self.c_score[id_test].astype(tf.keras.backend.floatx()),

TypeError: 'NoneType' object is not subscriptable

Thank you in advance.

Best regards.

Running model.init_inference in GPU version failed

Hello,
model.init_inference is very slow to run using the CPU version (but it is running) but I cannot get it to run by using the GPU version.

I get the following error:

# initialize inference
model.init_inference(batch_size=128, 
                     L=150,            # L is the number of MC samples
                     dimred='umap',    # dimension reduction methods
                     #**kwargs         # extra key-value arguments for dimension reduction algorithms.    
                     random_state=seed
                    ) 
# after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

Computing posterior estimations over mini-batches.

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

<ipython-input-27-91c48b13b6e4> in <module>()
      4                      dimred='umap',    # dimension reduction methods
      5                      #**kwargs         # extra key-value arguments for dimension reduction algorithms.
----> 6                      random_state=seed
      7                     ) 
      8 # after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

10 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

ResourceExhaustedError:  OOM when allocating tensor with shape[128,150,1653,57] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node Tile_1 (defined at /usr/local/lib/python3.7/dist-packages/VITAE/model.py:367) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference__get_inference_4681280]

Function call stack:
_get_inference

I tried to reduce the batch size 64,32,16,8 but all failed. I am not running out of memory.

The is due to the size of the input data. When I reduce the number of cells in my data, it is working.

Thank you in advance.

Best regards

error in model.preprocess_data if an annData object is given as input in model.get_data

Hello,

Thanks for developing VITAE.

I tried to use VITAE but I have an issue regarding the model.preprocess_data when I give an annData object as an input of model.get_data function.

The preprocession should be done by scanpy which it is installed but I get the error:

# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )

# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                      data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
                      )
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-aa66b286b1ac> in <module>()
     35 model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
     36                       data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
---> 37                       npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
     38                       )

2 frames
/usr/local/lib/python3.7/dist-packages/VITAE/preprocess.py in _recipe_seurat(adata, gene_num)
    238     This uses a particular preprocessing
    239     """
--> 240     cell_mask = sc.pp.filter_cells(adata, min_genes=200, inplace=False)[0]
    241     adata = adata[cell_mask,:]
    242     gene_mask = sc.pp.filter_genes(adata, min_cells=3, inplace=False)[0]

NameError: name 'sc' is not defined

I do not understand what is the issue because you import scanpy as sc in your defined function?

Thank you in advance.

Best regards.

jaydu1 / vitae Goto Github PK

vitae's Introduction

Joint Trajectory Inference for Single-cell Genomics Using Deep Learning with a Mixture Prior

Tutorials

Dependency

License

vitae's People

Contributors

Stargazers

Watchers

Forkers

vitae's Issues

Implement in scvi-tools

Issue in model.pre_train when setting processed=True in model.preprocess_data

Running model.init_inference in GPU version failed

error in model.preprocess_data if an annData object is given as input in model.get_data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent