MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.
Please visit our website for installation instructions, tutorials, and much more!
Implementation of a Modular Multi-Omics Factor Model Framework
License: BSD 3-Clause "New" or "Revised" License
MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.
Please visit our website for installation instructions, tutorials, and much more!
The repo got quite dirty during the final push for the MSc thesis, so I have to partially clean it up before we can push it over.
mfmf
with cellij
As discussed in Slack
Prioritized
get_w
/get_z
to pull from pyro
param storage @timtreismerge
@timtreisDone
Unclear
Guide
needs to execute Generative
once to get the site names and shapes for the initialization. However, storing the Generative
model object as a submodule leads to some inheritance error.
See: https://github.com/pyro-ppl/pyro/blob/dev/pyro/infer/autoguide/guides.py#L69
Title says it all.
Write an argparse
or click
based python script to generate synthetic data, perform a training instance, and save the output.
If the MuData object contains only 1 modality we don't need to merge the metadata
cellij/cellij/core/factormodel.py
Line 562 in 5844535
Notes:
Resources:
We should give this optimized einsum for cuda a shot: https://pypi.org/project/opt-einsum-torch/
It is very likely that users come up with large matrices, and we might gain some performance improvements or maybe without something like this we might even run into memory issues.
Currently we are stripping the ._data
attribute of the FA model before saving. I think that makes sense, to save memory.
However, when reloading the model, we should be able to re-add data to still run downstream analysis, that user proeprties from self._data
. Otherwise, we have to make sure that the important downstream functionalities work w/o self._data
This currently does not work, because add_data()
throws an error. @ data expert @timtreis, maybe you can have a look at this?
Feel free to use the notebook in the current PR from branch feature/issue-x/example-notebook as an example. I already added the cell, but commented it out it.
Going to keep some notes here for reference
Running
# Afterwards, we need to add the data
model.add_data(data=mdata)
raises
/home/m015k/code/cellij/cellij/core/_factormodel.py:251: FutureWarning: Passing 'suffixes' which cause duplicate columns {'T6_x', 'T5_x', 'treatedAfter_x', 'Gender_x', 'IGHV_x', 'Diagnosis_x', 'ConsClust_x', 'died_x', 'IC50beforeTreatment_x', 'Age4Main_x'} in the result is deprecated and will raise a MergeError in a future version.
coming from some suffixes. See code line
cellij/cellij/core/_factormodel.py
Line 251 in a0385d7
Convert the preprocessed and clean MuData
into a pytorch.Dataset
wrapped into a pytorch.DataLoader
to facilitate training during inference, e.g. when introducing mini-batching for SVI. Keep in mind sample-/feature-wise metadata stored in .obs
and .var
fields.
Implement a version of MOFA with structured sparsity in the factor loadings.
The CellijModel
class describes the generative model and hence implements Pyro's model()
function.
The CellijGuide
class describes the variational distribution and hence implements Pyro's guide()
function.
Make use of Dataloaders to implement efficient minibatch training to handle larger datasets.
Everybody:
Tim:
Arber:
Martin:
Not assigned (for grabs):
Focusing on two main objectives, the reconstruction loss (RMSE, R2) and modeling structured sparsity (precision, recall, F1).
There are fewer boilerplate and cleaner tests, among other features like supporting unittests, tox, etc.
The following boxes should be checked for an MVP:
Each checkbox will have it's own issue. This issue deals solely as an overview, feel free to edit according to your thoughts.
Priority High
get_factors()
and get_weights
work for all sparsity priors (currently not the case due to sampling procedures)merge
@timtreisPriority Medium
Priority Low
Given different likelihoods, we should provide the correct conjugate prior, see here.
To get this up and running, we need to allow multiple groups in the factor analysis and also allow sparsity priors for these groups.
Update the interface of the base FactorModel
. Make sure to accommodate beginner and experienced users.
Simplest use case:
model = FactorModel(n_factors)
model.add_data(...)
# implicitly create model with normal priors and normal likelihoods...
model.fit()
Advanced use case:
model = FactorModel(n_factors)
model.add_data(...)
model.set_data_options(...)
model.set_model_options(...)
model.set_training_options(...)
model.fit()
probably related to Implement the horseshoe prior as a standalone pyro distribution #20
We need code to create synthetic ground truth data to benchmark the framework.
Options should include:
Currently, the user can define the device on which to train the models, but the device is not set:
cellij/cellij/core/_factormodel.py
Line 86 in fafb6a1
Resources:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.