biofam / mofapy2 Goto Github PK

Multi-omics factor analysis v2

Home Page: https://biofam.github.io/MOFA2/

License: GNU Lesser General Public License v3.0

Python 95.92% Jupyter Notebook 4.08%

mofapy2's Introduction

Multi-Omics Factor Analysis

MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion.
This repository contains mofapy2 Python library source code.

For the downstream analysis in Python please check the mofax package: https://github.com/bioFAM/mofax
For the downstream analysis in R please check the MOFA2 package: https://github.com/bioFAM/MOFA2

Please visit our website for details, tutorials, and much more.

Installation

Install the stable version from the Python Package Index:

pip install mofapy2

Or install the latest development version from the repository:

pip install git+https://github.com/bioFAM/mofapy2@dev --force-reinstall --no-deps

mofapy2's People

Contributors

Stargazers

Watchers

mofapy2's Issues

`AttributeError: Module 'scipy' has no attribute 'shape'`

Dear authors,

I encountered this error when running mofapy2 (v 0.7.0) via muon.

It reached this point:

- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (ADT): gaussian
- View 1 (RNA): gaussian



GPU mode is activated

But then the error kicks in.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/scipy/__init__.py:137, in __getattr__(name)
    136 try:
--> 137     return globals()[name]
    138 except KeyError:

KeyError: 'shape'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[84], line 1
----> 1 mu.tl.mofa(mdata, gpu_mode=True, use_var="highly_variable")

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/muon/_core/tools.py:586, in mofa(data, groups_label, use_raw, use_layer, use_var, use_obs, likelihoods, n_factors, scale_views, scale_groups, center_groups, ard_weights, ard_factors, spikeslab_weights, spikeslab_factors, n_iterations, convergence_mode, use_float32, gpu_mode, gpu_device, svi_mode, svi_batch_size, svi_learning_rate, svi_forgetting_rate, svi_start_stochastic, smooth_covariate, smooth_warping, smooth_kwargs, save_parameters, save_data, save_metadata, seed, outfile, expectations, save_interrupted, verbose, quiet, copy)
    570     ent.set_smooth_options(
    571         scale_cov=smooth_kwargs["scale_cov"],
    572         start_opt=smooth_kwargs["start_opt"],
   (...)
    582         frac_inducing=smooth_kwargs["frac_inducing"],
    583     )
    585 logging.info(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] Building the model...")
--> 586 ent.build()
    587 logging.info(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] Running the model...")
    588 ent.run()

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/run/entry_point.py:1405, in entry_point.build(self)
   1397 else:
   1398     tmp = buildBiofam(
   1399         self.data,
   1400         self.dimensionalities,
   (...)
   1403         self.train_opts,
   1404     )
-> 1405 tmp.main()
   1407 # Create BayesNet class
   1408 if self.train_opts["stochastic"]:

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/build_model/build_model.py:56, in buildBiofam.main(self)
     47 self.init_model = initModel(
     48     dim=self.dim,
     49     data=self.data,
   (...)
     52     seed=self.train_opts["seed"],
     53 )
     55 # Build all nodes
---> 56 self.build_nodes()
     58 # Define markov blankets
     59 self.createMarkovBlankets()

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/build_model/build_model.py:65, in buildBiofam.build_nodes(self)
     62 """Method to build all nodes"""
     64 # Build general nodes
---> 65 self.build_Z()
     66 self.build_W()
     67 self.build_Tau()

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/build_model/build_model.py:104, in buildBiofam.build_Z(self)
     95     self.init_model.initSZ(
     96         qmean_T1="pca",
     97         Y=self.data,
     98         impute=True,
     99         weight_views=self.train_opts["weight_views"],
    100     )
    101 else:
    102     # self.init_model.initZ(qmean=0)
    103     # self.init_model.initZ(qmean="random")
--> 104     self.init_model.initZ(
    105         qmean="pca",
    106         Y=self.data,
    107         impute=True,
    108         weight_views=self.train_opts["weight_views"],
    109     )

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/build_model/init_model.py:145, in initModel.initZ(self, pmean, pvar, qmean, qvar, qE, qE2, Y, impute, weight_views)
    142         exit()
    144 # Initialise the node
--> 145 self.nodes["Z"] = Z_Node(
    146     dim=(self.N, self.K),
    147     pmean=pmean,
    148     pvar=pvar,
    149     qmean=qmean,
    150     qvar=qvar,
    151     qE=qE,
    152     qE2=qE2,
    153     weight_views=weight_views,
    154 )

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/core/nodes/Z_nodes.py:20, in Z_Node.__init__(self, dim, pmean, pvar, qmean, qvar, qE, qE2, weight_views)
     17 def __init__(
     18     self, dim, pmean, pvar, qmean, qvar, qE=None, qE2=None, weight_views=False
     19 ):
---> 20     super().__init__(
     21         dim=dim, pmean=pmean, pvar=pvar, qmean=qmean, qvar=qvar, qE=qE, qE2=qE2
     22     )
     24     self.mini_batch = None
     25     self.factors_axis = 1

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/core/nodes/variational_nodes.py:148, in UnivariateGaussian_Unobserved_Variational_Node.__init__(self, dim, pmean, pvar, qmean, qvar, qE, qE2)
    146 Unobserved_Variational_Node.__init__(self, dim)
    147 # Initialise the P and Q distributions
--> 148 self.P = UnivariateGaussian(dim=dim, mean=pmean, var=pvar)
    149 self.Q = UnivariateGaussian(dim=dim, mean=qmean, var=qvar, E=qE, E2=qE2)

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/core/distributions/univariate_gaussian.py:46, in UnivariateGaussian.__init__(self, dim, mean, var, E, E2)
     43     self.to_float32()
     45 # Check that dimensionalities match
---> 46 self.CheckDimensionalities()

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mofapy2/core/distributions/basic_distributions.py:68, in Distribution.CheckDimensionalities(self)
     66 """General method to do a sanity check on the dimensionalities"""
     67 # p_dim = set(map(s.shape, self.params.values()))
---> 68 e_dim = set(map(s.shape, self.expectations.values()))
     69 # assert len(p_dim) == 1, "Parameters have different dimensionalities"
     70 assert len(e_dim) == 1, "Expectations have different dimensionalities"

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/scipy/__init__.py:139, in __getattr__(name)
    137     return globals()[name]
    138 except KeyError:
--> 139     raise AttributeError(
    140         f"Module 'scipy' has no attribute '{name}'"
    141     )

Do you know how to solve this? My scipy version is 1.12.0. Thanks in advance.

Regards,
Mikhael

AttributeError: 'BayesNet' object has no attribute 'train_stats'

When running MOFA with muon an error can occur where the model cannot find the training stats when it goes to save the results. This is outlined in this issue in the R repository (linking it here because it seems more relevant here).

Output

Loaded view='rna' group='group1' with N=5412 samples and D=23519 features...
Loaded view='atac' group='group1' with N=5412 samples and D=135866 features...


Warning: 1 features(s) in view 0 have zero variance, consider removing them before training the model...

Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (rna): poisson
- View 1 (atac): bernoulli



GPU mode is activated



######################################
## Training the model with seed 1 ##
######################################


Attempting to save the model at the current iteration...
Saving model in /tmp/mofa_20240522-102240_interrupted.hdf5...
Note: the model to be saved is not trained.


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/mofapy2/run/entry_point.py:26, in keyboardinterrupt_saver.<locals>.saver(self, *args, **kwargs)
     25 try:
---> 26     func(self, *args, **kwargs)
     27 # Internal methods will raise TypeError when interrupted

File ~/.local/lib/python3.8/site-packages/mofapy2/run/entry_point.py:1400, in entry_point.run(self)
   1399 # Train the model
-> 1400 train_model(self.model)

File ~/.local/lib/python3.8/site-packages/mofapy2/build_model/train_model.py:26, in train_model(model)
     24 print("\n")
---> 26 model.iterate()
     28 print("\n")

File ~/.local/lib/python3.8/site-packages/mofapy2/core/BayesNet.py:267, in BayesNet.iterate(self)
    266 convergence_token = 1
--> 267 elbo.iloc[0] = self.precompute()
    268 number_factors[0] = self.dim["K"]

File ~/.local/lib/python3.8/site-packages/mofapy2/core/BayesNet.py:225, in BayesNet.precompute(self)
    224 for n in self.nodes:
--> 225     self.nodes[n].precompute(self.options)
    227 # Precompute ELBO

File ~/.local/lib/python3.8/site-packages/mofapy2/core/nodes/multiview_nodes.py:115, in Multiview_Node.precompute(self, options)
    114 for m in self.activeM:
--> 115     self.nodes[m].precompute(options)

File ~/.local/lib/python3.8/site-packages/mofapy2/core/nodes/nongaussian_nodes.py:206, in Poisson_PseudoY.precompute(self, options)
    205 self.updateParameters()
--> 206 self.updateExpectations()

File ~/.local/lib/python3.8/site-packages/mofapy2/core/nodes/nongaussian_nodes.py:222, in Poisson_PseudoY.updateExpectations(self)
    218 tau = self.markov_blanket["Tau"].getValue()
    219 self.E = (
    220     self.params["zeta"]
    221     - sigmoid(self.params["zeta"])
--> 222     * (1 - self.obs / self.ratefn(self.params["zeta"]))
    223     / tau
    224 )
    225 self.E[self.mask] = 0.0

File cupy/_core/core.pyx:1697, in cupy._core.core._ndarray_base.__array_ufunc__()

File cupy/_core/_kernel.pyx:1283, in cupy._core._kernel.ufunc.__call__()

File cupy/_core/_kernel.pyx:159, in cupy._core._kernel._preprocess_args()

File cupy/_core/_kernel.pyx:145, in cupy._core._kernel._preprocess_arg()

TypeError: Unsupported type <class 'numpy.ndarray'>

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In [4], line 1
----> 1 mu.tl.mofa(mdata, use_layer = "raw", gpu_mode=True)

File ~/.local/lib/python3.8/site-packages/muon/_core/tools.py:586, in mofa(data, groups_label, use_raw, use_layer, use_var, use_obs, likelihoods, n_factors, scale_views, scale_groups, center_groups, ard_weights, ard_factors, spikeslab_weights, spikeslab_factors, n_iterations, convergence_mode, use_float32, gpu_mode, gpu_device, svi_mode, svi_batch_size, svi_learning_rate, svi_forgetting_rate, svi_start_stochastic, smooth_covariate, smooth_warping, smooth_kwargs, save_parameters, save_data, save_metadata, seed, outfile, expectations, save_interrupted, verbose, quiet, copy)
    584 ent.build()
    585 logging.info(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] Running the model...")
--> 586 ent.run()
    588 if (
    589     smooth_kwargs is not None
    590     and "new_values" in smooth_kwargs
    591     and smooth_kwargs["new_values"]
    592     and smooth_covariate
    593 ):
    594     logging.info(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] Interpolating factors...")

File ~/.local/lib/python3.8/site-packages/mofapy2/run/entry_point.py:39, in keyboardinterrupt_saver.<locals>.saver(self, *args, **kwargs)
     34     else:
     35         tmp_file = os.path.join(
     36             "/tmp",
     37             "mofa_{}_interrupted.hdf5".format(strftime("%Y%m%d-%H%M%S")),
     38         )
---> 39     self.save(outfile=tmp_file)
     40     print(
     41         "Saved partially trained model in {}. Exiting now.".format(tmp_file)
     42     )
     43 else:

File ~/.local/lib/python3.8/site-packages/mofapy2/run/entry_point.py:1743, in entry_point.save(self, outfile, save_data, save_parameters, expectations)
   1740     tmp.saveSmoothOptions(self.smooth_opts)
   1742 # Save training statistics
-> 1743 tmp.saveTrainingStats()
   1745 # Save variance explained values
   1746 tmp.saveVarianceExplained()

File ~/.local/lib/python3.8/site-packages/mofapy2/build_model/save_model.py:741, in saveModel.saveTrainingStats(self)
    738 """Method to save the training statistics"""
    740 # Get training statistics
--> 741 stats = self.model.getTrainingStats()
    743 # Create HDF5 group
    744 stats_grp = self.hdf5.create_group("training_stats")

File ~/.local/lib/python3.8/site-packages/mofapy2/core/BayesNet.py:502, in BayesNet.getTrainingStats(self)
    500 def getTrainingStats(self):
    501     """Method to return training statistics"""
--> 502     return self.train_stats

AttributeError: 'BayesNet' object has no attribute 'train_stats'

Pip freeze:

using Python: /app/software/Python/3.8.2-GCCcore-9.3.0/bin/python3
absl-py==1.2.0
alembic==1.8.1
anndata==0.9.2
asttokens==2.0.8
autopage==0.5.1
backcall==0.2.0
blosc2==2.0.0
cachetools==5.2.0
cliff==4.0.0
cmaes==0.8.2
cmd2==2.4.2
colorlog==6.7.0
contourpy==1.0.5
cupy-cuda12x==12.3.0
cycler==0.11.0
debugpy==1.6.3
entrypoints==0.4
executing==1.1.1
fastrlock==0.8.2
fcsparser==0.2.8
fonttools==4.37.4
greenlet==1.1.3.post0
h5py==3.7.0
igraph==0.10.8
importlib-resources==5.10.0
ipykernel==6.16.0
ipython==8.5.0
jedi==0.18.1
jupyter-client==8.6.1
jupyter-core==5.7.2
kiwisolver==1.4.4
leidenalg==0.10.1
lisa2==2.3.0
llvmlite==0.41.1
MACS2==2.2.7.1
Mako==1.2.3
Markdown==3.4.1
matplotlib==3.6.3
matplotlib-inline==0.1.6
mellon==1.4.1
mira-multiome==1.0.4
ml-dtypes==0.2.0
mofapy2==0.7.1
MOODS-python==1.9.4.1
msgpack==1.0.8
mudata==0.2.3
muon==0.1.6
natsort==8.2.0
nest-asyncio==1.5.6
networkx==2.8.7
numba==0.58.1
numexpr==2.8.6
numpy==1.24.4
opt-einsum==3.3.0
optuna==2.10.1
pandas==1.5.3
parso==0.8.3
patsy==0.5.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
platformdirs==4.2.2
prettytable==3.4.1
prompt-toolkit==3.0.31
protobuf==5.26.1
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyasn1-modules==0.2.8
pynndescent==0.5.7
pyperclip==1.8.2
pyro-api==0.1.2
pyro-ppl==1.8.2
pyzmq==24.0.1
requests-oauthlib==1.3.1
rsa==4.9
scanpy==1.8.2
scikit-learn==1.3.2
scipy==1.10.1
seaborn==0.12.0
sinfo==0.3.4
SQLAlchemy==1.4.42
stack-data==0.5.1
statsmodels==0.13.2
stdlib-list==0.8.0
stevedore==4.0.0
tables==3.8.0
tensorboard-plugin-wit==1.8.1
threadpoolctl==3.1.0
torch==1.12.1
tornado==6.2
tqdm==4.64.1
traitlets==5.14.3
typing-extensions==4.4.0
tzdata==2024.1
umap-learn==0.5.3
xgboost==2.0.3

Model with one latent factor less than specified

Hi,

It seems to me like the model is built with one factor less than specified in set_model_options():

from mofapy2.run.entry_point import entry_point

ent = entry_point()

ent.set_data_options(
    scale_groups = False, 
    scale_views = False
    )

ent.set_data_matrix([[data]], likelihoods = ["gaussian"])

ent.set_model_options(
    factors = 3, 
    spikeslab_weights = False, 
    ard_factors = False,
    ard_weights = False
    )

ent.set_train_options(
    iter = 1000, 
    convergence_mode = "slow", 
    startELBO = 1, 
    freqELBO = 1, 
    dropR2 = 0.000, 
    gpu_mode = False, 
    verbose = False, 
    seed = 1
    )

ent.build()
ent.run()

This should result in a model with 3 factors, but from the first iteration on I only get 2 factors:

######################################
## Training the model with seed 1 ##
######################################


ELBO before training: -160245.23 

Iteration 1: time=0.01, ELBO=-38881.59, deltaELBO=121363.642 (75.73619616%), Factors=2
Iteration 2: time=0.03, ELBO=-35202.00, deltaELBO=3679.586 (2.29622203%), Factors=2
Iteration 3: time=0.02, ELBO=-34646.27, deltaELBO=555.731 (0.34680037%), Factors=2
...

Model fails to train with nongaussian likelihood

Hello,

I have been able to run mofa using GPU mode only when using the Gaussian likelihood. If I try to use Poisson, I consistently get an error similar to what others have raised in different issues:

Attempting to save the model at the current iteration...
Saving model in /tmp/mofa_20240725-203015_interrupted.hdf5...
Note: the model to be saved is not trained.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/mofapy2/run/entry_point.py", line 49, in saver
func(self, args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/mofapy2/run/entry_point.py", line 1020, in run
train_model(self.model)
File "/usr/local/lib/python3.11/dist-packages/mofapy2/build_model/train_model.py", line 28, in train_model
model.iterate()
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/BayesNet.py", line 224, in iterate
elbo.iloc[0] = self.precompute()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/BayesNet.py", line 193, in precompute
self.nodes[n].precompute(self.options)
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/nodes/multiview_nodes.py", line 105, in precompute
self.nodes[m].precompute(options)
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/nodes/nongaussian_nodes.py", line 189, in precompute
self.updateExpectations()
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/nodes/nongaussian_nodes.py", line 202, in updateExpectations
self.E = self.params["zeta"] - sigmoid(self.params["zeta"])(1-self.obs/self.ratefn(self.params["zeta"])) / tau
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "cupy/_core/core.pyx", line 1697, in cupy._core.core._ndarray_base.array_ufunc
File "cupy/_core/_kernel.pyx", line 1286, in cupy._core._kernel.ufunc.call
File "cupy/_core/_kernel.pyx", line 159, in cupy._core._kernel._preprocess_args
File "cupy/_core/_kernel.pyx", line 145, in cupy._core._kernel._preprocess_arg
TypeError: Unsupported type <class 'numpy.ndarray'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "run_mofa.py", line 57, in < module>
ent.run()
File "/usr/local/lib/python3.11/dist-packages/mofapy2/run/entry_point.py", line 59, in saver
self.save(outfile=tmp_file)
File "/usr/local/lib/python3.11/dist-packages/mofapy2/run/entry_point.py", line 1273, in save
tmp.saveTrainingStats()
File "/usr/local/lib/python3.11/dist-packages/mofapy2/build_model/save_model.py", line 533, in saveTrainingStats
stats = self.model.getTrainingStats()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/mofapy2/core/BayesNet.py", line 387, in getTrainingStats
return self.train_stats
^^^^^^^^^^^^^^^^
AttributeError: 'BayesNet' object has no attribute 'train_stats'

MEFISTO with Poisson likelihood- numerical problems

see bioFAM/MEFISTO_tutorials#2
Copying to here in case this repository is monitored more frequently than the tutorials one.

Change default GPU device

Hey there,

since many machines are multi-gpu machines these days it would be great to be able to specify the GPU device when running mofa. Currently the code is inevitably executed on GPU device 0 when the user sets gpu_mode=True.

The easiest way to change this is by running

import cupy as cp
cp.cuda.Device(gpu_device).use()

I'd say it is most intuitive to provide an extra argument, e.g., gpu_device: int, to the set_train_options() function here, and execute the command from above after the cupy import here .

If you think this is helpful for you, I'm happy to provide a PR - if you prefer another way to integrate this (e.g., using env variables), let me know.

not training

Hi, I have some issue with training

######################################
## Training the model with seed 1 ##
######################################


Attempting to save the model at the current iteration...
Saving model in /mofa_train_interrupted.hdf5...
Note: the model to be saved is not trained.

and AttributeError: 'BayesNet' object has no attribute 'train_stats'

How to load the saved model after running the "mu.tl.mofa"?

Hi mofa team, thanks a lot for your contribution! I trained a model by running the "mu.tl.mofa" successfully, and the model is saved into an hdf5 file. However, I wonder how can I load the model and reuse it? Thanks in advance!

Prediction on held-out data

Is it possible to make predictions using a fitted MEFISTO model on held-out test data that were not used in training?

AnnData version 0.9

Can you bump the requirements and test against AnnData 0.9? The 0.8 requirement prevents installation with recent AnnData

Module 'scipy' has no attribute 'where'

Dear all,

I was trying to run a MOFA model in python but I faced the following error during the training:

Module 'scipy' has no attribute 'where'

My data consists of two views with different amount of patients/view

Below is the code:

D = [len(df_blood_RNA_melted) + len(df_biopsy_RNA_melted),len(df_blood_EPICs_melted) + len(df_biopsy_EPICs_melted)]
M = len(D)
K = 5
N = [len(df[df["group"] == "Mild"]),len(df[df["group"] == "Moderate"]),len(df[df["group"] == "Severe"])]
G = len(N)

ent.set_data_df(df, likelihoods = ["gaussian","gaussian"])

ent.set_model_options(
factors = 10,
spikeslab_weights = True,
ard_weights = True,
ard_factors = True
)

ent.set_train_options(
convergence_mode = "fast",
dropR2 = 0.001,
gpu_mode = True,
seed = 123
)

ent.build()

ent.run()

Thanks a lot,
Enrique

Crash depending on normalization

I have been experimenting some alternative ways to normalize the data, but, for some reason, a few of them crash the model when training. The crash snippet is below:

ELBO before training: -11480018.67 

Iteration 1: time=170.32, ELBO=-1483968.21, deltaELBO=9996050.457 (87.07346866%), Factors=9
Iteration 2: time=155.26, ELBO=-1402406.25, deltaELBO=81561.958 (0.71046886%), Factors=8
Iteration 3: time=140.18, ELBO=-1401338.20, deltaELBO=1068.051 (0.00930356%), Factors=7
Iteration 4: time=125.85, ELBO=-1400270.60, deltaELBO=1067.606 (0.00929969%), Factors=6
Iteration 5: time=112.09, ELBO=-1399202.99, deltaELBO=1067.605 (0.00929968%), Factors=5
Iteration 6: time=98.70, ELBO=-1398135.39, deltaELBO=1067.605 (0.00929968%), Factors=4
Iteration 7: time=84.30, ELBO=-1397067.78, deltaELBO=1067.604 (0.00929967%), Factors=3
Iteration 8: time=70.65, ELBO=-1396000.18, deltaELBO=1067.604 (0.00929967%), Factors=2
Iteration 9: time=54.47, ELBO=-1394932.58, deltaELBO=1067.603 (0.00929966%), Factors=1
All factors shut down, no structure found in the data.
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[32], [line 1](vscode-notebook-cell:?execution_count=32&line=1)
----> [1](vscode-notebook-cell:?execution_count=32&line=1) ent.run()

File [~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:57](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:57), in keyboardinterrupt_saver.<locals>.saver(self, *args, **kwargs)
     [54](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:54) @wraps(func)
     [55](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:55) def saver(self, *args, **kwargs):
     [56](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:56)     try:
---> [57](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:57)         func(self, *args, **kwargs)
     [58](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:58)     # Internal methods will raise TypeError when interrupted
     [59](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:59)     except (KeyboardInterrupt, TypeError):

File [~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:1434](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:1434), in entry_point.run(self)
   [1431](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:1431) self.model.setTrainOptions(self.train_opts)
   [1433](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:1433) # Train the model
-> [1434](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/run/entry_point.py:1434) train_model(self.model)

File [~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:27](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:27), in train_model(model)
     [24](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:24) print("#" * 38)
     [25](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:25) print("\n")
---> [27](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:27) model.iterate()
     [29](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:29) print("\n")
     [30](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/build_model/train_model.py:30) print("#" * 23)
...
--> [216](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/core/BayesNet.py:216)     exit()
    [218](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/core/BayesNet.py:218) if return_idx:
    [219](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a226563322d494244227d.vscode-resource.vscode-cdn.net/home/ec2-user/precision-medicine/notebooks/CCF/multiomics/~/.venv/lib64/python3.9/site-packages/mofapy2/core/BayesNet.py:219)     return drop

NameError: name 'exit' is not defined

This error occurs when I use ent.set_data_options(scale_views=True), but also when I tried normalizing the data outside of the package's scope. I realize it might be just informing on the results, but I leave the log here in case it might help you improve on the reporting.

Memory skyrockets

Hello! I have been following your tutorial, and, although I got it to work with my data, I was forced to previously filter the number of features, otherwise the memory would skyrocket. I have a dataset with a little over 1000 samples, but over 50k features. I can only get the dataset to run, on an instance with 124 Gb of RAM, when I pick only the top 1000 features with the most variance. Is there a different way to run this using all the data? If not, is there a more appropriate way to run this in batches?

Thank you for your time!

Faulty construction of ThetaZ node

Hi,

I was running MOFA2 (R package) on some simple test data with three views and two groups and wanted to try out the spike-slab prior on factors. So I set model_opts$spikeslab_factors <- TRUE and built a model as usual. However, I ran into a bizarre error in run_mofa:

Error: numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('float64'), dtype('<U6')) -> None

Following the Python stack trace, I pinpointed it to

mofapy2/mofapy2/build_model/build_model.py

Lines 151 to 156 in 46513cf

 self.init_model.initThetaZ( 

 self.data_opts["samples_groups"], 

 qa=initTheta_a, 

 qb=initTheta_b, 

 qE=initTheta_qE, 

 )

on the v0.7.0 branch. It still looks the same on master though.

Comparing with the call signature in question

mofapy2/mofapy2/build_model/init_model.py

Line 873 in 09062fb

def initThetaZ(self, pa=1.0, pb=1.0, qa=1.0, qb=1.0, qE=None):

I assume that this is old code that was not updated. To my understanding, self.data_opts["samples_groups"] is a list of strings but is being passed as pa, which is supposed to be a scalar.

	self.init_model.initThetaZ(
	self.data_opts["samples_groups"],
	qa=initTheta_a,
	qb=initTheta_b,
	qE=initTheta_qE,
	)