Giter VIP home page Giter VIP logo

cellij's Introduction

Multi-Omics Factor Analysis

MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.

Please visit our website for installation instructions, tutorials, and much more!

cellij's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cellij's Issues

Port MSc thesis repo to official cellij repo

The repo got quite dirty during the final push for the MSc thesis, so I have to partially clean it up before we can push it over.

  • Fix GH actions
  • Have a MSc thesis state release tag
  • Port it over as a PR to main
  • Replace all occurances of mfmf with cellij

2023-03-31: ToDos

Prioritized

  • Switch get_w/get_z to pull from pyro param storage @timtreis
  • Provide save and load functionality @timtreis
  • Fix CI @timtreis
  • Skip missings in obs during inference @timtreis
  • na_strategy: add impute with means @timtreis
  • Change Black formatter to only work on merge @timtreis
  • Logging
  • Add OrderedDict for moments in mofa model @martinrohbeck
  • Log number of missings when new dataset is added

Done

Unclear

  • scale all modalities according to features and likelihoods?

Recrate original object incl. data after loading model

Currently we are stripping the ._data attribute of the FA model before saving. I think that makes sense, to save memory.
However, when reloading the model, we should be able to re-add data to still run downstream analysis, that user proeprties from self._data. Otherwise, we have to make sure that the important downstream functionalities work w/o self._data
This currently does not work, because add_data() throws an error. @ data expert @timtreis, maybe you can have a look at this?

Feel free to use the notebook in the current PR from branch feature/issue-x/example-notebook as an example. I already added the cell, but commented it out it.

Implement GPs

Going to keep some notes here for reference

Notes

Fix Data related Warning

Running

# Afterwards, we need to add the data
model.add_data(data=mdata)

raises
/home/m015k/code/cellij/cellij/core/_factormodel.py:251: FutureWarning: Passing 'suffixes' which cause duplicate columns {'T6_x', 'T5_x', 'treatedAfter_x', 'Gender_x', 'IGHV_x', 'Diagnosis_x', 'ConsClust_x', 'died_x', 'IC50beforeTreatment_x', 'Age4Main_x'} in the result is deprecated and will raise a MergeError in a future version.

coming from some suffixes. See code line

anndata_object.obs = anndata_object.obs.merge(

Implement the necessary data structures for training

Convert the preprocessed and clean MuData into a pytorch.Dataset wrapped into a pytorch.DataLoader to facilitate training during inference, e.g. when introducing mini-batching for SVI. Keep in mind sample-/feature-wise metadata stored in .obs and .var fields.

ToDos Deadline

Everybody:

  • Read draft, make suggestions, draw conclusions from Figures

Tim:

  • #68
  • Run GP on some test data as a POC
  • Fill in Table in Appendix with established methods
  • Make Plots Features (x-axs) vs Factor Norms (y-axis) for Non-negativity vs. HS & SNS & Laplace for non-negative DGP
  • 1st plot: 2 UMAPs of latent space z (one colored by time, one colored by diff stage). Inference w/o any covariates
  • 2nd plot: 2 UMAPs of latent space z (one colored by time, one colored by diff stage). Inference only with time in GP
  • 3rd plot: 2 UMAPs of latent space z (one colored by time, one colored by diff stage). Inference with time and diff stage in GP
  • Gridsearch over lengthscales [0.001, 0.01, 0.1, 1, 2, 5, 10]
  • Predict only 1 or 2 factors from the GP, estimate other factors separately

Arber:

  • Refactor generative models with multiple plate combinations (SnS etc...)
  • Write subsection about DGP of synthetic data
  • Add benchmark results on sparsity and recon error as a table (prec, rec, f1, rmse)

Martin:

  • Make Plots Features (x-axis) vs R2 Reconstruction (y-axis) for different Sparsity Priors
  • CLL Data
  • Depending on GPU, repeat all plots but with samples/views/missings on y-axis
  • Make Heatmaps: Features x Samples vs Time until Convergence for different Priors

Not assigned (for grabs):

Road to MVP

The following boxes should be checked for an MVP:

  • Implement base factor analysis model (consisting of #9, #10)
  • Implement priors
    • Horseshoe
    • ARD
    • Spike&Slab
  • Reproduce ground truth synthetic data
    • Create synthetic (bio-inspired) data (#7)
  • Reproduce results from MOFA CLL analysis
  • Readme with guidelines on environment setup (#3) and minimal example
  • Implement MOFA(+) model wrapper
  • Implement tests
  • Documentation for all the parts involved

Each checkbox will have it's own issue. This issue deals solely as an overview, feel free to edit according to your thoughts.

Priority Task List

Priority High

  • Make sure get_factors() and get_weights work for all sparsity priors (currently not the case due to sampling procedures)
  • Add a Jeffrey's prior for the variance in the SnSLasso
  • (Testcase) Develop standalone pyro distributions for priors (#20 )
  • Provide save and load functionality @timtreis
  • Skip missings in obs during inference @timtreis
  • na_strategy: add impute with means @timtreis
  • Change Black formatter to only work on merge @timtreis
  • Add proper Logging @timtreis
  • Add CUDA support (#56 ) @arberqoku
  • Implement MOFA+ (#44 )

Priority Medium

  • Implement and Experiment with Custom Guide class (#10 )
  • Implement tests for benchmarking metrics
  • Add OrderedDict for moments in mofa model @martinrohbeck
  • Log number of missings when new dataset is added @timtreis

Priority Low

  • (unclear) scale all modalities according to features and likelihoods?

Implement MOFA+

To get this up and running, we need to allow multiple groups in the factor analysis and also allow sparsity priors for these groups.

Refactor base FactorModel

Update the interface of the base FactorModel. Make sure to accommodate beginner and experienced users.
Simplest use case:

model = FactorModel(n_factors)
model.add_data(...)
# implicitly create model with normal priors and normal likelihoods...
model.fit()

Advanced use case:

model = FactorModel(n_factors)
model.add_data(...)
model.set_data_options(...)
model.set_model_options(...)
model.set_training_options(...)
model.fit()

Data Generator for Synthetic Data

We need code to create synthetic ground truth data to benchmark the framework.

Options should include:

  • number of factors, features, samples, datasets
  • likelihoods
  • sparsity levels
  • noise levels

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.