Giter VIP home page Giter VIP logo

vitae's Introduction

Python PyPI-Downloads Conda Downloads docs

Joint Trajectory Inference for Single-cell Genomics Using Deep Learning with a Mixture Prior

This is a Python package, VITAE, to perform trajectory inference for single-cell RNA sequencing (scRNA-seq). VITAE is a probabilistic method combining a latent hierarchical mixture model with variational autoencoders to infer trajectories from posterior approximations. VITAE is computationally scalable and can adjust for confounding covariates to learn a shared trajectory from multiple datasets. VITAE also provides uncertainty quantification of the inferred trajectory and cell positions and can find differentially expressed genes along the trajectory. For more information, please check out our manuscript on bioRXiv.

Tutorials

We provide some example notebooks. You could start working with VITAE on tutorial_dentate.

notebook system details reference
tutorial_dentate neurons 3585 cells and 2182 genes, 10x Genomics Hochgerner et al. (2018)
tutorial_mouse_brain neurons 16651 cells and 14707 genes Yuzwa et al. (2017),
Ruan et al. (2021)

In case GitHub rendering stops working, NbViewer is an alternative online tool to render Jupyter Notebooks.

Datasets and Documents are availble.

Dependency

Our Python package is available on conda-forge and PyPI and the user can install the CPU version with the following command:

# using conda with conda-forge channel
>>> conda install -c conda-forge pyvitae

# or using PyPI
>>> pip install pyvitae

To enable GPU for TensorFlow, one should install CUDA dependencies and the tensorflow-gpu package. We also recommend using conda, miniconda, or virtualenv to manage the Python environment and install the package in a new environment. After installing all required packages, one can open the Jupyter Notebook via the terminal:

>>> jupyter notebook

The required TensorFlow versions are:

Package Version
tensorflow >=2.3.0
tensorflow_probability >=0.11.0

License

This project is licensed under the terms of the MIT license.

vitae's People

Contributors

jaydu1 avatar jingshuw avatar minggao97 avatar tianyucodings avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vitae's Issues

Implement in scvi-tools

Hello,

I found your manuscript to be interesting and I'm wondering whether you have any interest in implementing a version that takes a pre-trained scvi-tools model as input (e.g., scVI) . I think this would get a lot of usage in our package!

Issue in model.pre_train when setting processed=True in model.preprocess_data

Hello,

I have an issue at the step in which the autoencoder is pretrained only when I give a preprocess anndata object (it works if the adata object is not preprocessed beforehand):

  • Preprocess data step:
# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )


# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                     data_type = 'Gaussian', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64,         # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)
                     processed=True)

  • Pretrain step:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-2da55840b803> in <module>
      3                 batch_size=256,              # (Optional) the batch size for pre-training (the default is 32).
      4                 alpha=0.10,                  # (Optional) the value of alpha in [0,1] to encourage covariate adjustment. Not used if there is no covariates.
----> 5                 num_epoch = 300,             # (Optional) the maximum number of epoches (the default is 300).
      6                 ) 

~/anaconda3/lib/python3.7/site-packages/VITAE/VITAE.py in pre_train(self, stratify, test_size, random_state, learning_rate, batch_size, L, alpha, num_epoch, num_step_per_epoch, early_stopping_patience, early_stopping_tolerance, path_to_weights)
    274                                                 batch_size,
    275                                                 self.X[id_train].astype(tf.keras.backend.floatx()),
--> 276                                                 self.scale_factor[id_train].astype(tf.keras.backend.floatx()))
    277         self.test_dataset = train.warp_dataset(self.X_normalized[id_test], 
    278                                                 None if self.c_score is None else self.c_score[id_test].astype(tf.keras.backend.floatx()),

TypeError: 'NoneType' object is not subscriptable

Thank you in advance.

Best regards.

Running model.init_inference in GPU version failed

Hello,
model.init_inference is very slow to run using the CPU version (but it is running) but I cannot get it to run by using the GPU version.

I get the following error:

# initialize inference
model.init_inference(batch_size=128, 
                     L=150,            # L is the number of MC samples
                     dimred='umap',    # dimension reduction methods
                     #**kwargs         # extra key-value arguments for dimension reduction algorithms.    
                     random_state=seed
                    ) 
# after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

Computing posterior estimations over mini-batches.

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

<ipython-input-27-91c48b13b6e4> in <module>()
      4                      dimred='umap',    # dimension reduction methods
      5                      #**kwargs         # extra key-value arguments for dimension reduction algorithms.
----> 6                      random_state=seed
      7                     ) 
      8 # after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

10 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

ResourceExhaustedError:  OOM when allocating tensor with shape[128,150,1653,57] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node Tile_1 (defined at /usr/local/lib/python3.7/dist-packages/VITAE/model.py:367) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference__get_inference_4681280]

Function call stack:
_get_inference

I tried to reduce the batch size 64,32,16,8 but all failed. I am not running out of memory.

The is due to the size of the input data. When I reduce the number of cells in my data, it is working.

Thank you in advance.

Best regards

error in model.preprocess_data if an annData object is given as input in model.get_data

Hello,

Thanks for developing VITAE.

I tried to use VITAE but I have an issue regarding the model.preprocess_data when I give an annData object as an input of model.get_data function.

The preprocession should be done by scanpy which it is installed but I get the error:

# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )

# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                      data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
                      )
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-aa66b286b1ac> in <module>()
     35 model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
     36                       data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
---> 37                       npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
     38                       )

2 frames
/usr/local/lib/python3.7/dist-packages/VITAE/preprocess.py in _recipe_seurat(adata, gene_num)
    238     This uses a particular preprocessing
    239     """
--> 240     cell_mask = sc.pp.filter_cells(adata, min_genes=200, inplace=False)[0]
    241     adata = adata[cell_mask,:]
    242     gene_mask = sc.pp.filter_genes(adata, min_cells=3, inplace=False)[0]

NameError: name 'sc' is not defined

I do not understand what is the issue because you import scanpy as sc in your defined function?

Thank you in advance.

Best regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.