geomscale / dingo Goto Github PK

A python library for metabolic networks sampling and analysis

License: GNU Lesser General Public License v3.0

Python 57.39% C++ 13.37% MATLAB 0.21% Cython 5.71% Jupyter Notebook 23.32%

sampling metabolic-network polytope random-walk systems-biology hacktoberfest

dingo's Introduction

dingo is a Python package that analyzes metabolic networks. It relies on high dimensional sampling with Markov Chain Monte Carlo (MCMC) methods and fast optimization methods to analyze the possible states of a metabolic network. To perform MCMC sampling, dingo relies on the C++ library volesti, which provides several algorithms for sampling convex polytopes. dingo also performs two standard methods to analyze the flux space of a metabolic network, namely Flux Balance Analysis and Flux Variability Analysis.

dingo is part of GeomScale project.

Installation

Note: Python version should be 3.8.x. You can check this by running the following command in your terminal:

python --version

If you have a different version of Python installed, you'll need to install it (start here) and update-alternatives (start here)

Note: If you are using GitHub Codespaces. Start here to set the python version. Once your Python version is 3.8.x you can start following the below instructions.

To load the submodules that dingo uses, run

git submodule update --init

You will need to download and unzip the Boost library:

wget -O boost_1_76_0.tar.bz2 https://boostorg.jfrog.io/artifactory/main/release/1.76.0/source/boost_1_76_0.tar.bz2
tar xjf boost_1_76_0.tar.bz2
rm boost_1_76_0.tar.bz2

You will also need to download and unzip the lpsolve library:

wget https://sourceforge.net/projects/lpsolve/files/lpsolve/5.5.2.11/lp_solve_5.5.2.11_source.tar.gz
tar xzvf lp_solve_5.5.2.11_source.tar.gz
rm lp_solve_5.5.2.11_source.tar.gz

Then, you need to install the dependencies for the PySPQR library; for Debian/Ubuntu Linux, run

sudo apt-get update -y
sudo apt-get install -y libsuitesparse-dev

To install the Python dependencies, dingo is using Poetry,

curl -sSL https://install.python-poetry.org | python3 - --version 1.3.2
poetry shell
poetry install

To exploit the fast implementations of dingo, you have to install the Gurobi solver. Run

pip3 install -i https://pypi.gurobi.com gurobipy

Then, you will need a license. For more information, we refer to the Gurobi download center.

Unit tests

Now, you can run the unit tests by the following commands:

python3 tests/fba.py
python3 tests/full_dimensional.py
python3 tests/max_ball.py
python3 tests/scaling.py
python3 tests/rounding.py
python3 tests/sampling.py

If you have installed Gurobi successfully, then run

python3 tests/fast_implementation_test.py

Tutorial

You can have a look at our Google Colab notebook on how to use dingo.

Documentation

It quite simple to use dingo in your code. In general, dingo provides two classes:

metabolic_network represents a metabolic network
polytope_sampler can be used to sample from the flux space of a metabolic network or from a general convex polytope.

The following script shows how you could sample steady states of a metabolic network with dingo. To initialize a metabolic network object you have to provide the path to the json file as those in BiGG dataset or the mat file (using the matlab wrapper in folder /ext_data to modify a standard mat file of a model as those in BiGG dataset):

from dingo import MetabolicNetwork, PolytopeSampler

model = MetabolicNetwork.from_json('path/to/model_file.json')
sampler = PolytopeSampler(model)
steady_states = sampler.generate_steady_states()

dingo can also load a model given in .sbml format using the following command,

model = MetabolicNetwork.from_sbml('path/to/model_file.sbml')

The output variable steady_states is a numpy array that contains the steady states of the model column-wise. You could ask from the sampler for more statistical guarantees on sampling,

steady_states = sampler.generate_steady_states(ess=2000, psrf = True)

The ess stands for the effective sample size (ESS) (default value is 1000) and the psrf is a flag to request an upper bound equal to 1.1 for the value of the potential scale reduction factor of each marginal flux (default option is False).

You could also ask for parallel MMCS algorithm,

steady_states = sampler.generate_steady_states(ess=2000, psrf = True,
                                               parallel_mmcs = True, num_threads = 2)

The default option is to run the sequential Multiphase Monte Carlo Sampling algorithm (MMCS) algorithm.

Tip: After the first run of MMCS algorithm the polytope stored in object sampler is usually more rounded than the initial one. Thus, the function generate_steady_states() becomes more efficient from run to run.

Rounding the polytope

dingo provides three methods to round a polytope: (i) Bring the polytope to John position by apllying to it the transformation that maps the largest inscribed ellipsoid of the polytope to the unit ball, (ii) Bring the polytope to near-isotropic position by using uniform sampling with Billiard Walk, (iii) Apply to the polytope the transformation that maps the smallest enclosing ellipsoid of a uniform sample from the interior of the polytope to the unit ball.

from dingo import MetabolicNetwork, PolytopeSampler

model = MetabolicNetwork.from_json('path/to/model_file.json')
sampler = PolytopeSampler(model)
A, b, N, N_shift = sampler.get_polytope()

A_rounded, b_rounded, Tr, Tr_shift = sampler.round_polytope(A, b, method="john_position")
A_rounded, b_rounded, Tr, Tr_shift = sampler.round_polytope(A, b, method="isotropic_position")
A_rounded, b_rounded, Tr, Tr_shift = sampler.round_polytope(A, b, method="min_ellipsoid")

Then, to sample from the rounded polytope, the user has to call the following static method of PolytopeSampler class,

samples = sample_from_polytope(A_rounded, b_rounded)

Last you can map the samples back to steady states,

from dingo import map_samples_to_steady_states

steady_states = map_samples_to_steady_states(samples, N, N_shift, Tr, Tr_shift)

Other MCMC sampling methods

To use any other MCMC sampling method that dingo provides you can use the following piece of code:

sampler = polytope_sampler(model)
steady_states = sampler.generate_steady_states_no_multiphase() #default parameters (method = 'billiard_walk', n=1000, burn_in=0, thinning=1)

The MCMC methods that dingo (through volesti library) provides are the following: (i) 'cdhr': Coordinate Directions Hit-and-Run, (ii) 'rdhr': Random Directions Hit-and-Run, (iii) 'billiard_walk', (iv) 'ball_walk', (v) 'dikin_walk', (vi) 'john_walk', (vii) 'vaidya_walk'.

Fast and slow mode

If you have installed successfully the gurobi library, dingo turns to the fast mode by default. To set a certain mode you could use the following member functions,

sampler = polytope_sampler(model)

#set fast mode to use gurobi library
sampler.set_fast_mode()
#set slow mode to use scipy functions
sampler.set_slow_mode()

Apply FBA and FVA methods

To apply FVA and FBA methods you have to use the class metabolic_network,

from dingo import MetabolicNetwork

model = MetabolicNetwork.from_json('path/to/model_file.json')
fva_output = model.fva()

min_fluxes = fva_output[0]
max_fluxes = fva_output[1]
max_biomass_flux_vector = fva_output[2]
max_biomass_objective = fva_output[3]

The output of FVA method is tuple that contains numpy arrays. The vectors min_fluxes and max_fluxes contains the minimum and the maximum values of each flux. The vector max_biomass_flux_vector is the optimal flux vector according to the biomass objective function and max_biomass_objective is the value of that optimal solution.

To apply FBA method,

fba_output = model.fba()

max_biomass_flux_vector = fba_output[0]
max_biomass_objective = fba_output[1]

while the output vectors are the same with the previous example.

Set the restriction in the flux space

FVA and FBA, restrict the flux space to the set of flux vectors that have an objective value equal to the optimal value of the function. dingo allows for a more relaxed option where you could ask for flux vectors that have an objective value equal to at least a percentage of the optimal value,

model.set_opt_percentage(90)
fva_output = model.fva()

# the same restriction in the flux space holds for the sampler
sampler = polytope_sampler(model)
steady_states = sampler.generate_steady_states()

The default percentage is 100%.

Change the objective function

You could also set an alternative objective function. For example, to maximize the 1st reaction of the model,

n = model.num_of_reactions()
obj_fun = np.zeros(n)
obj_fun[0] = 1
model.objective_function(obj_fun)

# apply FVA using the new objective function
fva_output = model.fva()
# sample from the flux space by restricting
# the fluxes according to the new objective function
sampler = polytope_sampler(model)
steady_states = sampler.generate_steady_states()

Plot flux marginals

The generated steady states can be used to estimate the marginal density function of each flux. You can plot the histogram using the samples,

from dingo import plot_histogram

model = MetabolicNetwork.from_json('path/to/e_coli_core.json')
sampler = PolytopeSampler(model)
steady_states = sampler.generate_steady_states(ess = 3000)

# plot the histogram for the 14th reaction in e-coli (ACONTa)
reactions = model.reactions
plot_histogram(
        steady_states[13],
        reactions[13],
        n_bins = 60,
        )

The default number of bins is 60. dingo uses the package matplotlib for plotting.

Plot a copula between two fluxes

The generated steady states can be used to estimate and plot the copula between two fluxes. You can plot the copula using the samples,

from dingo import plot_copula

model = MetabolicNetwork.from_json('path/to/e_coli_core.json')
sampler = PolytopeSampler(model)
steady_states = sampler.generate_steady_states(ess = 3000)

# plot the copula between the 13th (PPC) and the 14th (ACONTa) reaction in e-coli
reactions = model.reactions

data_flux2=[steady_states[12],reactions[12]]
data_flux1=[steady_states[13],reactions[13]]

plot_copula(data_flux1, data_flux2, n=10)

The default number of cells is 5x5=25. dingo uses the package plotly for plotting.

dingo's People

Contributors

Stargazers

Watchers

dingo's Issues

Redundant methods in MetabolicNetwork

Starting from here https://github.com/GeomScale/dingo/blob/develop/dingo/MetabolicNetwork.py#L226 it seems that there is a number of methods that are not used anywhere in the code:

medium(self, medium: Dict[str, float])
set_active_bound(reaction: str, reac_index: int, bound: float)
shut_down_reaction(self, index_val)

Is this intentional? Should we remove them or they are going to be used in the future?

Read `.xml` models

To address the challenges that come along with the metabolic network reconstruction process,
the metabolic modeling community has adopted the Systems Biology Markup Language (SBML)
to a great extent.

Therefore, most metabolic models are in a .xml format and this is the reason that supporting
this format would benefit dingo the most.

To this end, dingo could make use of the libsbml library via its Python interface.

Modify model's medium

Modify both MetabolicNetwork and PolytopeSampler classes to change the bounds of the medium reactions

non-uniqueness issue in dynamic FBA

In dynamic FBA (dFBA) [doi: 10.1016/S0006-3495(02)73903-9] we have an issue when going from a step to the next regarding the non uniqueness of the FBA solutions. In the DFBAlab approach they address this challenge using lexicographic LP [doi: 10.1186/s12859-014-0409-8].

In dingo we could give it a shot for a dFBA module where the non-uniqueness challenge would be addressed by sampling at each cycle and select/pick the sample that is closest to the flux distribution of the previous one.
The boundaries and the biomass are then updated as in all the dFBA implementations.
For the first cycle, we could get the mean of the flux samples.

You may also have a look at the COMETS papers for applications and alternative implementations of the dFBA

COMETS (2014): DOI: 10.1016/j.celrep.2014.03.070
COMETS (2021): DOI: 10.1038/s41596-021-00593-3

Error importing MetabolicNetwork class from dingo package in Google Colab (GSoC 2023)

Describe the bug
When running the cell provided in the README of the "https://github.com/GeomScale/dingo" project on Google Colab, an error message appears stating that the name 'MetabolicNetwork' cannot be imported from 'dingo' at an unknown location.

To Reproduce
Steps to reproduce the behavior:

Go to the following link: "https://colab.research.google.com/github/GeomScale/dingo/blob/develop/tutorials/dingo_tutorial.ipynb"
Run all cells until the point where it is written "from dingo import MetabolicNetwork".

Expected behavior
The code should be able to import the MetabolicNetwork class from the dingo package without any errors.

Screenshots
N/A

Desktop (please complete the following information):

OS: Windows
Browser: Google Chrome
Version: 110.0.5481.178 (Official Build) (64-bit)

Additional context
I am working on task 1 for applying to GSoC on this project and I need assistance in resolving this issue so that I can complete my task.

Read `.mat` models

Metabolic models is common to be available in .mat format.

At the moment dingo is able to read a .mat model once someone
run the matlab_model_wrapper.m.
That makes MATLAB a pre-requisite for a number of cases.

It would benefit the dingo library to support .mat files without this step.

Add Contribution guidelines

Add a detailed description of how other developers can contribute to dingo.

Example: https://github.com/GeomScale/volume_approximation/blob/develop/CONTRIBUTING.md

include volume approximation

Is your feature request related to a problem? Please describe.
I am slicing a polytope into orthants and need to approximate the volume of each orthant to correct a likelihood computed for flux samples drawn from each orthant.

Describe the solution you'd like
It would be massively helpful if dingo would include volume computation

sampler does not converge with iLJ478 model

The following piece of code

import unittest
import os
import scipy
import numpy as np
from dingo import MetabolicNetwork, PolytopeSampler
from dingo.gurobi_based_implementations import fast_inner_ball


current_directory = os.getcwd()
input_file_json = current_directory + "/ext_data/iLJ478.json"

model = MetabolicNetwork.from_json(input_file_json)
model.set_fast_mode()

sampler = PolytopeSampler(model)
sampler.set_fast_mode()

steady_states = sampler.generate_steady_states()

returns

phase 1: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 4470.89
phase 2: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 2341.2
phase 3: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 2240.94
phase 4: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 738.163
phase 5: number of correlated samples = 1200, effective sample size = 9, ratio of the maximum singilar value over the minimum singular value = 30.622
phase 6: number of correlated samples = 1200, effective sample size = 38, ratio of the maximum singilar value over the minimum singular value = 17.2763
phase 7: number of correlated samples = 1200, effective sample size = 277, ratio of the maximum singilar value over the minimum singular value = 3.276
phase 8: number of correlated samples = 1200, effective sample size = 7, ratio of the maximum singilar value over the minimum singular value = 27.1693
phase 9: number of correlated samples = 1200, effective sample size = 9, ratio of the maximum singilar value over the minimum singular value = 612.636
phase 10: number of correlated samples = 1200, effective sample size = 9, ratio of the maximum singilar value over the minimum singular value = 8242.79
phase 11: number of correlated samples = 1200, effective sample size = 9, ratio of the maximum singilar value over the minimum singular value = 3117.12
phase 12: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 28234.6
phase 13: number of correlated samples = 1200, effective sample size = 8, ratio of the maximum singilar value over the minimum singular value = 6980.29
Segmentation fault (core dumped)

iLJ478.json is downloaded from http://bigg.ucsd.edu/

Desktop:

OS: Ubuntu 20.04.2 LTS
python: Python 3.8.5 [GCC 9.3.0]

alternative suffix for sbml models

Some models might have a suffix of .sbml and not .xml

Add this option in the read_sbml_model() function.

Support ellipsoid sampling methods

Make ellipsoid walks such as Dikin, John, Vaidya available to dingo. Those methods are implemented in volesti.

Update README installation instructions to be more user friendly

I encountered difficulties setting up the Dingo project completely for the first time and discussed the issues on Gitter. After repeatedly configuring the entire project over the past few weeks, particularly while addressing issue #83, I am convinced that making minor modifications to the README file can greatly streamline the setup process, ultimately saving considerable time and effort for new users

Challenges Faced:

Version incompatibility due to Python and project-dependent libraries.
Unclear instructions on the Poetry version for setup.

Proposed Change in instructions:

Make sure the Python version is 3.8 workflows
Install a specific version of poetry workflows

curl -sSL https://install.python-poetry.org | python3 - --version 1.3.2

Install the dependencies for the PySPQR library using

sudo apt-get update -y
sudo apt-get install -y libsuitesparse-dev

Proposal for trial
For a quick and hassle-free Dingo project trial, consider using GitHub Codespaces. It provides an instant cloud-based development environment with limited free usage every month. Regardless of your operating system, you can set up and run experiments in under 10 minutes(depends on internet speed), thanks to its streamlined installation process. Additionally, consider utilizing another tab to include screenshots with installation steps and examples, which are mostly done.

Error with biomass index handling after parsing json files

The following piece of code

import unittest
import os
import scipy
import numpy as np
from dingo import MetabolicNetwork, PolytopeSampler
from dingo.gurobi_based_implementations import fast_inner_ball


current_directory = os.getcwd()
input_file_json = current_directory + "/ext_data/iAB_RBC_283.json"

model = MetabolicNetwork.from_json(input_file_json)
model.set_fast_mode()

sampler = PolytopeSampler(model)
sampler.set_fast_mode()

steady_states = sampler.generate_steady_states()

returns

  File "tests/parallel.py", line 20, in <module>
    model = MetabolicNetwork.from_json(input_file_json)
  File "/home/workspace/dingo/dingo/MetabolicNetwork.py", line 74, in from_json
    return cls(tuple_args)
  File "/home/workspace/dingo/dingo/MetabolicNetwork.py", line 56, in __init__
    or (self._biomass_index < 0)
TypeError: '<' not supported between instances of 'NoneType' and 'int'

iAB_RBC_283.json is downloaded from http://bigg.ucsd.edu/

Desktop:

OS: Ubuntu 20.04.2 LTS
python: Python 3.8.5 [GCC 9.3.0]

fast_inner_ball computes negative radius

Describe the bug
The following code:

import unittest
import os
import scipy
import numpy as np
from dingo import MetabolicNetwork, PolytopeSampler
from dingo.gurobi_based_implementations import fast_inner_ball


current_directory = os.getcwd()
input_file_json = current_directory + "/ext_data/iSB619.json"

model = MetabolicNetwork.from_json(input_file_json)
model.set_fast_mode()

sampler = PolytopeSampler(model)
sampler.set_fast_mode()

steady_states = sampler.generate_steady_states()

returns

The radius calculated has negative value. The polytope is infeasible or something went wrong with the solver
Traceback (most recent call last):
  File "tests/parallel.py", line 27, in <module>
    steady_states = sampler.generate_steady_states()
  File "/home/vissarion/workspace/dingo/dingo/PolytopeSampler.py", line 136, in generate_steady_states
    self._A, self._b, Tr, Tr_shift, samples = P.fast_mmcs(
  File "dingo/volestipy.pyx", line 219, in volestipy.HPolytope.fast_mmcs
    temp_center, radius = fast_inner_ball(self._A, self._b)
TypeError: 'NoneType' object is not iterable

Desktop:

OS: Ubuntu 20.04.2 LTS
python: Python 3.8.5 [GCC 9.3.0]

Refactor mmcs algorithm

The mmcs (sampling + rounding) algorithm is mainly implemented in the bindings of dingo (mainly in https://github.com/GeomScale/dingo/blob/develop/dingo/bindings/bindings.cpp#L207)

A more natural way it to be implemented in volesti and imported in dingo with a binding function. Alternatively it could be another sampling method that can be called by apply_sampling https://github.com/GeomScale/dingo/blob/develop/dingo/bindings/bindings.cpp#L86

loopless sampling

Thermodynamics constraints are rather important in metabolic modeling.
If not considered, they would lead to thermodynamically infeasible/implausible flux loops within flux sample.

Loopless space though is a non-convex one.
Approaches such as the LooplessFluxSampler could be used for more valid sampling from the biological point of view.

Further literature:

Sampling on the flux space of multiple metabolic networks

Proposed roadmap (from @hariszaf in #18):

implement approach from cobra toolbox to build merged model
sample by asking biomass of species A to be 0 and biomass of species B to be max
sample with the vice versa conditions
sample asking for both biomass functions to be max

For Begginer

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Error in installation (incompatible versions of Python and numpy)

We tried to install dingo on MacBook Pro with M1 Max processor, running macOS Ventura 13.5.

Python version 3.11.4

We followed the installation steps adapted for the Mac.

brew install suite-sparse

poetry installation was successful.

curl -sSL https://install.python-poetry.org | python3 -

The errors were from the step poetry install .

dingo_error_m1_mac.txt

Error using fva after sampling

The following code

import dingo
dingo_model = dingo.MetabolicNetwork.from_sbml("ext_data/e_coli_core.xml")
sampler = dingo.PolytopeSampler(dingo_model)
samples = sampler.generate_steady_states()
fva_output = dingo_model.fva()

returns an error

Traceback (most recent call last):
  File "workspace/dingo_vfisikop/script.py", line 16, in <module>
    fva_output = dingo_model.fva()
  File "workspace/dingo_vfisikop/dingo/MetabolicNetwork.py", line 111, in fva
    return fast_fva(
  File "workspace/dingo_vfisikop/dingo/gurobi_based_implementations.py", line 162, in fast_fva
    max_biomass_flux_vector, max_biomass_objective = fast_fba(lb, ub, S, c)
  File "workspace/dingo_vfisikop/dingo/gurobi_based_implementations.py", line 112, in fast_fba
    optimum_sol.append(v[i].x)
UnboundLocalError: local variable 'v' referenced before assignment

sample asking for a min/max value of the objective function

One might need to sample using not a fixed value for the objective function but asking its value to be at least or at the most $c$.

It would be really useful for the flux sampling function to provide this option.

Compartments

To model a community, we will need to track down the reactions and their corresponding metabolites, that
take place in the extracellular space.

It is my belief that it would benefit our efforts the most, to use models in the Systems Biology Markup Language (SBML)
format to this end.

This means both to implement a function for reading such files (the libSBML library will probably be of help for this, but also
to add an attribute in our model, e.g. model.extracellular to keep track of these metabolites.

@TolisChal @vissarion what 's your thoughts?

Consider a docker image

To avoid installation issues among operating issues, consider coming up with a Docker image for dingo.

There is a Docker Image of gurobi that could probably be used as the base-image.

code coverage for dingo

Add code coverage for dingo.

One could use codecov https://about.codecov.io

`NameError` when using the `sample_from_polytope` function

For a A and b numpy objects, the sample_from_polytope function of dingo's PolytopeSampler class
returns a NameError: name 'self' is not defined.

Here is an example case that will lead you to this error:

from dingo import PolytopeSampler
import numpy as np

# Build a random A matrix and a b vecrtor to use them to build a polytope
A = np.random.random([32,21])
b = np.random.random([32,])

# Now, try to sample using the sample_from_polytope method of the PolytopeSampler class
samples = PolytopeSampler.sample_from_polytope(A,b)

And here is the full error message returned:

   173         P = HPolytope(A, b)
    174 
--> 175         if self._parameters["fast_computations"]:
    176             A, b, Tr, Tr_shift, samples = P.fast_mmcs(
    177                 ess, psrf, parallel_mmcs, num_threads

We probably need to add a check for gurobi as in line 52.

rounding.py is excluded from github actions

The test file sampling.py is not included in the github actions.
However, when I added it fails.

segmentation fault when ESS low

In some cases, when asking for a rather low ESS, (<10) dingo returns segmentation fault.

Add volume computation support and motivation

This is a feature request for volume computation (implemented in volesti) to be included in dingo.

Apart from being a fundamental computation there is resent motivation from the area of metabolic networks [1] where they compute the volume of a V-polytope (a polytope given as the convex hull of its vertices). There are also computations of the intersection and union of V-polytopes.

[1] Régimbeau et al. - Contribution of genome-scale metabolic modelling to niche theory

Add examples and use-cases

dingo currently has a limited number of examples of use, mainly in tests.

Create an example directory and add analysis for networks from public datasets e.g. http://bigg.ucsd.edu

Other use cases using FVA, FBA or sampling for metabolic models are welcome to be added there too.

Modernize LP solver interface

This a feature request for how dingo is handling and solving LPs. Now it is using lp-solve library by default and Gurobi solver optionally.

The interface can be more modern and modular if a package like optlang is used. Many different solvers can be then used through optlang's interface (such as cplex, gurobipy, scipy).

dingo crashes with various metabolic models

Describe the bug
The following piece of code

import unittest
import os
import scipy
import numpy as np
from dingo import MetabolicNetwork, PolytopeSampler
from dingo.gurobi_based_implementations import fast_inner_ball


current_directory = os.getcwd()
input_file_json = current_directory + "/ext_data/iAB_RBC_283.json"

model = MetabolicNetwork.from_json(input_file_json)
model.set_fast_mode()

sampler = PolytopeSampler(model)
sampler.set_fast_mode()

steady_states = sampler.generate_steady_states()

returns

  File "tests/parallel.py", line 20, in <module>
    model = MetabolicNetwork.from_json(input_file_json)
  File "/home/workspace/dingo/dingo/MetabolicNetwork.py", line 74, in from_json
    return cls(tuple_args)
  File "/home/workspace/dingo/dingo/MetabolicNetwork.py", line 56, in __init__
    or (self._biomass_index < 0)
TypeError: '<' not supported between instances of 'NoneType' and 'int'

iAB_RBC_283.json is downloaded from http://bigg.ucsd.edu/

Desktop (please complete the following information):

OS: Ubuntu 20.04.2 LTS
python: Python 3.8.5 [GCC 9.3.0]

`generate_steady_states` fails when you edit a model's optimal percentage twice

model = dingo.MetabolicNetwork.from_json("ext_data/e_coli_core.json")

model.set_opt_percentage(90)
sampler = dingo.PolytopeSampler(model)
sampler.generate_steady_states()

model.set_opt_percentage(20)
sampler = dingo.PolytopeSampler(model)
sampler.generate_steady_states()

would return:

phase 1: number of correlated samples = 500, effective sample size = 3, ratio of the maximum singilar value over the minimum singular value = 4548.22
phase 2: number of correlated samples = 500, effective sample size = 123, ratio of the maximum singilar value over the minimum singular value = 3.07681
phase 3: number of correlated samples = 500, effective sample size = 148, ratio of the maximum singilar value over the minimum singular value = 2.8596
phase 4: number of correlated samples = 1900, effective sample size = 754
[5]total ess 1028: number of correlated samples = 3400


[5]maximum marginal PSRF: 1.03027


UnboundLocalError                         Traceback (most recent call last)
[/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/dev.ipynb](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/dev.ipynb) Cell 26 line 8
      [6](vscode-notebook-cell:/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/dev.ipynb#X36sZmlsZQ%3D%3D?line=5) model.set_opt_percentage(20)
      [7](vscode-notebook-cell:/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/dev.ipynb#X36sZmlsZQ%3D%3D?line=6) sampler = dingo.PolytopeSampler(model)
----> [8](vscode-notebook-cell:/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/dev.ipynb#X36sZmlsZQ%3D%3D?line=7) sampler.generate_steady_states()

File [~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:161](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:161), in PolytopeSampler.generate_steady_states(self, ess, psrf, parallel_mmcs, num_threads)
    [149](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:149) def generate_steady_states(
    [150](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:150)     self, ess=1000, psrf=False, parallel_mmcs=False, num_threads=1
    [151](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:151) ):
    [152](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:152)     """A member function to sample steady states.
    [153](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:153) 
    [154](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:154)     Keyword arguments:
   (...)
    [158](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:158)     num_threads -- the number of threads to use for parallel mmcs
    [159](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:159)     """
--> [161](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:161)     self.get_polytope()
    [163](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:163)     P = HPolytope(self._A, self._b)
    [165](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:165)     if self._parameters["fast_computations"]:

File [~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:82](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:82), in PolytopeSampler.get_polytope(self)
     [66](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:66) """A member function to derive the corresponding full dimensional polytope
     [67](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:67) and a isometric linear transformation that maps the latter to the initial space.
     [68](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:68) """
     [70](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:70) if (
     [71](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:71)     self._A == []
     [72](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:72)     or self._b == []
   (...)
     [76](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:76)     or self._T_shift == []
     [77](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:77) ):
     [79](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:79)     (
     [80](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:80)         max_biomass_flux_vector,
     [81](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:81)         max_biomass_objective,
---> [82](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:82)     ) = self._metabolic_network.fba()
     [84](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:84)     if (
     [85](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:85)         self._parameters["fast_computations"]
     [86](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:86)         and self._parameters["remove_redundant_facets"]
     [87](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:87)     ):
     [89](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:89)         A, b, Aeq, beq = fast_remove_redundant_facets(
     [90](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:90)             self._metabolic_network.lb,
     [91](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:91)             self._metabolic_network.ub,
   (...)
     [94](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:94)             self._parameters["opt_percentage"],
     [95](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/PolytopeSampler.py:95)         )

File [~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:117](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:117), in MetabolicNetwork.fba(self)
    [114](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:114) """A member function to apply the FBA method on the metabolic network."""
    [116](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:116) if self._parameters["fast_computations"]:
--> [117](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:117)     return fast_fba(self._lb, self._ub, self._S, self._biomass_function)
    [118](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:118) else:
    [119](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/MetabolicNetwork.py:119)     return slow_fba(self._lb, self._ub, self._S, self._biomass_function)

File [~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:112](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:112), in fast_fba(lb, ub, S, c)
    [109](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:109)     v = model.getVars()
    [111](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:111) for i in range(n):
--> [112](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:112)     optimum_sol.append(v[i].x)
    [114](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:114) optimum_sol = np.asarray(optimum_sol)
    [116](https://file+.vscode-resource.vscode-cdn.net/home/luna.kuleuven.be/u0156635/github_repos/GeomScale/dingo/~/github_repos/GeomScale/dingo/dingo/gurobi_based_implementations.py:116) return optimum_sol, optimum_value

UnboundLocalError: local variable 'v' referenced before assignment

It seems the error is related to the https://github.com/GeomScale/dingo/blob/aaae4ae63f432da45eb0f4363a92767ba537a074/dingo/gurobi_based_implementations.py#L109
and the status == GRB.OPTIMAL that is not the case in the second edit.
where the status is INF_OR_UNBD for some reason (https://www.gurobi.com/documentation/current/refman/optimization_status_codes.html)

Consider renaming `biomass_function` routine

Should we rename this to objective function ?
This way when we are not using the biomass function as objective we will not have misunderstandings.

sampling using the maximum entropy probability distribution

Up to now, dingo supports sampling using the uniform, the
multivariate exponential and the multivariate Gaussian distributions.

A quite interesting addition from the biologist-point-of-view would be
to enable sampling using the maximum entropy probability distribution.

You may see more about why this would be the case under this study.

"In addition to accounting for fluctuations, the maximum entropy construction provides a principled interpolation between two extremal regimes of metabolic network function. In the “uniform” (no-optimization) limit, no control is exerted over metabolic fluxes: they are selected at random as long as they are permitted by stoichiometry, resulting in broad yet non-trivial flux distributions that support a small, non-zero growth rate. In
the FBA limit, fluxes are controlled precisely to maximize the growth rate, with zero fluctuations. "

For more about max entropy distribution:
https://journals.aps.org/pr/abstract/10.1103/PhysRev.106.620

geomscale / dingo Goto Github PK

dingo's Introduction

Installation

Unit tests

Tutorial

Documentation

Rounding the polytope

Other MCMC sampling methods

Fast and slow mode

Apply FBA and FVA methods

Set the restriction in the flux space

Change the objective function

Plot flux marginals

Plot a copula between two fluxes

dingo's People

Contributors

Stargazers

Watchers

Forkers

dingo's Issues

Recommend Projects

Recommend Topics

Recommend Org