Giter VIP home page Giter VIP logo

scvi.jl's Introduction

README

A Julia package for fitting VAEs to single-cell data using count distributions. Based on the Python implementation in the scvi-tools package.

Introduction

The scVI model was first proposed in Lopez R, Regier J, Cole MB et al. Deep generative modeling for single-cell transcriptomics. Nat Methods 15, 1053-1058 (2018).

More on the much more extensive Python package ecosystem scvi-tools can be found on the website and in the corresponding paper Gayoso A, Lopez R, Xing G. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 40, 163โ€“166 (2022).

This is the documentation for the Julia version implementing basic functionality, including the following (non-exhausive list):

  • standard and linearly decoded VAE models
  • support for negative binomial generative distribution with and without zero-inflation
  • different ways of specifying the dispersion parameter
  • store data in a (very basic) Julia version of the Python AnnData objects
  • several built-in datasets
  • training routines supporting a wide range of customisable hyperparameters

Getting started

The package can be downloaded from this Github repo and added with the Julia package manager via:

using Pkg 
Pkg.add(url="https://github.com/maren-ha/scVI.jl")
using scVI 

Built-in data

There are currently three datasets for which the package supplies built-in convenience functions for loading, processing and creating corresponding AnnData objects. They can be downloaded from this Google Drive data folder. The folder contains all three datasets, namely

After downloading the data folder from Google Drive, the functions load_cortex, load_tasic and load_pbmc can be used to get preprocessed adata objects from each of these datasets, respectively (check the docs for more details).

Demo usage

The following illustrate a quick demo usage of the package.

We load the cortex data (one of the built-in datasets of scvi-tools). If the data subfolder has been downloaded and the Julia process is started in the parent directory of that folder, the dataset will be loaded directly from the one supplied in data, if not, it will be downloaded from the original authors.

We subset the dataset to the 1200 most highly variable genes and calculate the library size. Then, we initialise a scVAE model and train for 50 epochs using default arguments. We visualise the results by running UMAP on the latent representation and plotting.

# load cortex data
adata = load_cortex(; verbose=true)

# subset to highly variable genes 
subset_to_hvg!(adata, n_top_genes=1200, verbose=true)

# calculate library size 
library_log_means, library_log_vars = init_library_size(adata)

# initialise scVAE model 
m = scVAE(size(adata.countmatrix,2);
        library_log_means=library_log_means,
        library_log_vars=library_log_vars
)

# train model
training_args = TrainingArgs(
    max_epochs=50, # 50 for 10-dim 
    weight_decay=Float32(1e-6),
)
train_model!(m, adata, training_args)

# plot UMAP of latent representation 
plot_umap_on_latent(m, adata; save_plot=true)

Docs

Current version of docs.

This is the manual deployment of the local build. Automatic deployment with GithubActions/TravisCI is work in progress and will (hopefully) be available soon.

As with every Julia package, you can access the docstrings of exported function by typing ? into the REPL, followed by the function name. E.g., ?normalize_counts prints the following to the REPL:

help?> normalize_counts
search: normalize_counts normalize_counts!

  normalize_counts(mat::Abstractmatrix)

  Normalizes the countdata in mat by dividing it by the size factors calculated with estimate_size_factors. Assumes a countmatrix mat in cell x gene format as
  input, returns the normalized matrix.

To reproduce the current version of the documentation or include your own extensions of the code, you can run julia make.jl inside the docs subfolder of the scVI folder. Then, you can access the documentation in the HTML file at docs/build/index.html.

Testing

Runtests can be executed via

Pkg.test("scVI")

This checks whether basic functionality works: loads the PBMC dataset, initialises a scVAE model and start training (just 2 epochs for the sake of checking and faster runtime). Further tests to be added.


TODO

  • fix data loading --> from GoogleDrive
  • document loss_registry field and option during training!
  • implement and export functions for prior and posterior sampling
  • deploy docs
  • integration of Muon.jl for data handling
  • support gene_batch and gene_label dispersion

Contributions, reporting of bugs and unexpected behaviour, missing functionalities, etc. are all very welcome, please do get in touch!

scvi.jl's People

Contributors

maren-ha avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.