bandframework / taweret Goto Github PK
View Code? Open in Web Editor NEWPython package for Bayesian Model Mixing
Home Page: https://bandframework.github.io/Taweret/
License: MIT License
Python package for Bayesian Model Mixing
Home Page: https://bandframework.github.io/Taweret/
License: MIT License
As pointed out by Kevin in PR #92, the test & deploy action in that branch runs many tests that each build and test the Taweret wheel on a different setup. However, since Taweret publishes to PyPI only a source distribution and a pure Python universal wheel this is certainly overkill. In addition, it builds a wheel, doesn't test it, and then publishes it.
The idea of having the sanity checks in the action run over all possible installations seems good. For example, there is a possibility that significant time could pass between the last PR testing and publishing to PyPI. If something changed in the setups between the PR and publishing, we want to test on the latest setups before publishing.
Therefore, in this issue we should study the possibility of redoing the action so that it
One possible difficulty with this is storing and propagating the built distributions across jobs.
Some of the GitHub actions have failed.
Hi! I'm reviewing your paper for JOSS and am working my way through assessing testing.
I cannot get the tests to pass for test_trees.py
. It is a mix of error modes, but here is my summary:
test_mixing
and test_predict
fail when making draws in _read_in_preds
with ValueError: need at least one array to concatenate
.test_predict_wts
fails with same ValueError
, in the _read_in_wts
function.test_sigma
fails with AttributeError
in returning the _posterior
hidden attribute.AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean 'posterior'?
I've cloned Taweret
as described on the repository installation instructions, including installing open-mpi
and OpenBT
as described in the dependencies. I am running Python 3.10.9 on macOS-14.1.1-arm64-arm-64bit
. Here is a complete report of pytest on my machine: taweret_report.txt
Finally, it does not look like you are using any CI/CD on the Taweret main branch, which would be ideal. Is there a reason you are not using CI/CD such as Travis or through GitHub Actions? If there is a specific reason (which I bet there is), I'm totally fine with that. I do think, however, that you must disclose in the documentation how often you run tests during development. This is not only important for reliability and reproducibility, but will help other developers who want to contribute to Taweret.
Hi! I'm reviewing your paper for JOSS and am working my way through assessing functionality. I cannot rerun the notebooks on my system following the install instructions on the GitHub repository.
Thus far, I've run into two issues:
The sampler ptemcee
is needed for Taweret
to work, but is not included in the requirements.txt
file.
Training to find the posterior fails with the following traceback: coleman_BMM_traceback.txt. The error seems to be in ptemcee
, which is a bit concerning to me as that package is no longer maintained.
Running the notebook through binder seems to solve the problem, but that is launching a version of Taweret
that is not the one under review. This launches (and presumably executes) Taweret
via danOSU/Taweret, as opposed to that hosted by bandframework/Taweret.
I suspect this is on my end and not an issue with Taweret
. Any suggestions at where to dig at this? I would like to be able to run through the example notebooks on the documentation so I can play around with the software and get to the more fun parts of the review.
It looks like your codecov reporting is only showing that pytest is running the scripts in test.py
, and not reporting the actual code coverage over Taweret/*/*
. You may have Taweret blocked in your codecove settings yml
.
The simplest means for users to install Taweret
would be by issuing pip install Taweret
with that command installing openbtmixing
automatically from PyPI. However, we need to determine if such a scheme is feasible given openbtmixing
's dependence on MPI. As part of this, we can try to determine all the ways in which users can install and use both of these packages.
openbtmixing
’s build system shall allows users to use the software on laptops, desktops, nodes, clusters, or supercomputers. The build system shall be capable for building, installing, and using on macOS, *nix, Windows/Ubuntu, and Windows/Powershell.
openbtmixing
’s build system shall allow for CI testing across the Cartesian product of common setupsopenbtmixing
command line tools (CLT) and libraries shall be made available through at least one compatible package manager (e.g., homebrew, apt-get, spack, etc.).
openbtmixing
C++ should be installed by itself and then the Python package just looks for it or finds it in the path. Is this what we want? Should the Python package build its own internal version of the tools as it presently does?openbtmixing
be installable with and without integrated CLT?openbtmixing
with the compiler suite and MPI implementation of their choice. This includes using suites and implementations installed by experts and that are optimized for the associated platform.openbtmixing
shall be pip installable. Will this be a source-only distribution with automatic building integrated? Can we distribute prebuilt wheels and satisfy our MPI-based requirements?Taweret
shall be pip installable. If MPI is isolated in openbtmixing
, then Taweret
is a pure python package and users shall be able to pip install from PyPI via source distribution or universal binary wheel. Should we account for users that want to run through a git clone? Should we allow for users/developers to install via a clone in editable/developer mode (i.e., pip install -e
)?openbtmixing
Python package shall be listed as an external dependence of Taweret
.
pip install Taweret
would try to install openbtmixing
from PyPI and therefore with the default openbtmixing
install scheme. Feasible? Good idea?openbtmixing
distribution includes an MPI implementation, then the integration of that implementation in the package shall be such that other implementations in a user's system cannot be used with openbtmixing
at execution by accident and such that our MPI implementation cannot accidentally insert itself in a different software’s stack.openbtmixing
and Taweret
in a regular Python installation or in an anaconda installationopenbtmixing
openbtmixing
Python package know what compilers to use to build the CLT/libraries and where to find the MPI implementation?Taweret
on macOS in a general Python installation is
brew install open-mpi
(or mpich and also a particular version using which compiler suite?)brew install eigen
(header only => no compilation here)brew install openbtmixing
(C++ CLT and libraries built with your homebrew MPI installation & matching compiler suite using your eigen)pip install Taweret
(this should automatically install openbtmixing
Python package based on your homebrew openbtmixing
/MPI)In the BivariateLinear.predict
function there is a bug:
Perhaps a better way to organize the loop is
for i in range(0,len(n_args_for_models)):
bit = 0 if i == 0 else 1
start = self.n_mix + bit * sum(n_arg_for_models[:i])
stop = start + n_args_for_models[i]
model_params.append(sample[start: stop])
Minor language comments:
The package currently declares many external Python package dependencies and indeed many packages are installed with Taweret. However, some of these are likely not true dependencies (i.e, not needed for a minimal, functional, correct installation).
For example, GitHub actions need to test coverage, but users don't. So, no need to install pytest-cov
. Similarly, GH actions need to generate the sphinx documentation, but most users won't. If they do, I think that it's reasonable that they install all the sphinx packages manually. Need to figure out what to do for notebook-based examples.
My main concern is the ease of installation of the software. I strongly recommend distributing it on PyPI and using PDM (or similar) and a pyproject.toml to manage dependences, rather than a setup.py and requirements.txt. This is important to ensure sustainable dependency management into the future.
Other non-python dependencies that are currently described in the installation section of the online documentation would be better if automatically installable though a Python function included with the package.
When I try to build the sphinx documentation in HTML using tox -e html
, I get many warnings and two errors.
Determine if these are due to a poor tox specification or genuine. Some tools contain a flag that treats warnings as errors. It might be good to setup all documentation GH actions with such a flag when possible to ensure that documentation quality does not drift over time. This implies that we first need to get rid of all warnings and errors.
I see that the notebooks are excluded from the sphinx build. It sounds like these notebooks are examples that might be pulled out of the sphinx-based documentation. In such a case, it would be good to pull that folder out of docs
.
As part of this, the docs action could be rewritten to use tox -e html
.
I am in the process of studying Taweret's build system and how it is built into a Python package for use with Taweret. I will likely need to explore changes to that interface in this repo and open this issue to keep track of that work.
Hi! I'm reviewing your paper for JOSS and am working my way through assessing installation. I've run into some problems and comments, which I'm listing below:
The website documentation has different installation instructions than the GitHub repository. The former directs installation from git via danOSU/Taweret, which is not the repository tagged for review in JOSS. Your website documentation installation page should be updated to point to the correct repository bandframework/Taweret.
The installation instructions on the GitHub repository lists the repository address as TaweretOrg/Taweret
, not bandframework/Taweret
. I know it properly redirects, but I think it would be prudent to keep naming consistent.
If it is not installable via pip
(why not?), please remove the line which states pip install Taweret
on the website documentation.
What is the reason you have not hosted Taweret
on a commonly used package manager (e.g. conda
or pip
)? I presume you have a good reason for this, but cloning the repository directly and adding it to PATH
will decrease the number of people who can easily use your software.
I will add this as a to do here for everyone contributing: I think we should make sure our function doc strings all describe what the function is expected to return
I get some failing tests with pdm run pytest
on MacOS:
====================================================================== test session starts =======================================================================
platform darwin -- Python 3.10.3, pytest-8.1.1, pluggy-1.4.0
configfile: pyproject.toml
plugins: cov-5.0.0
collected 13 items
test_bivariate_linear.py ... [ 23%]
test_gaussian.py ..... [ 61%]
test_trees.py .FFFF [100%]
============================================================================ FAILURES ============================================================================
__________________________________________________________________________ test_mixing ___________________________________________________________________________
def test_mixing():
x_train = np.loadtxt(
taweret_wd + 'test/bart_bmm_test_data/2d_x_train.txt').reshape(80, 2)
x_train = x_train.reshape(2, 80).transpose()
y_train = np.loadtxt(
taweret_wd + 'test/bart_bmm_test_data/2d_y_train.txt').reshape(80, 1)
# Set prior information
mix.set_prior(
k=2.5,
ntree=30,
overallnu=5,
overallsd=0.01,
inform_prior=False)
# Check tuning & hyper parameters
assert mix.k == 2.5, "class object k is not set."
assert mix.ntree == 30, "class object ntree is not set."
assert mix.overallnu == 5, "class object nu is not set."
assert mix.overallsd == 0.01, "class object overallsd is not set."
assert mix.overalllambda == 0.01**2, "class object overalllambda is not set."
assert mix.inform_prior == False, "class object inform_prior is not set."
# Train the model
> fit = mix.train(
X=x_train,
y=y_train,
ndpost=10000,
nadapt=2000,
nskip=2000,
adaptevery=500,
minnumbot=4)
test_trees.py:73:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../Taweret/mix/trees.py:410: in train
self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtcli'
def _run_model(self, cmd="openbtcli"):
"""
Private function, run the cpp program via the command line using
a subprocess.
"""
# Check to see if executable is installed via debian
sh = shutil.which(cmd)
# Check to see if installed via wheel
pyinstall = False
if sh is None:
pywhl_path = os.popen("pip show openbtmixing").read()
pywhl_path = pywhl_path.split("Location: ")
if len(pywhl_path)>1:
pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
sh = shutil.which(cmd, path=pywhl_path)
pyinstall = True
# Execute the subprocess, changing directory when needed
if sh is None:
# openbt exe were not found in the current directory -- try the
# local directory passed in
sh = shutil.which(cmd, path=self.local_openbt_path)
if sh is None:
> raise FileNotFoundError(
"Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------
Results stored in temporary path: /var/folders/02/lkk5tj_96tg1q_5p7dlc_1mw0000gn/T/openbtmixing_fz3x2yjx
Running model...
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
__________________________________________________________________________ test_predict __________________________________________________________________________
def test_predict():
# Get test data
n_test = 30
x1_test = np.outer(np.linspace(-3, 3, n_test), np.ones(n_test))
x2_test = x1_test.copy().transpose()
f0_test = (np.sin(x1_test) + np.cos(x2_test))
x_test = np.array([x1_test.reshape(x1_test.size,),
x2_test.reshape(x1_test.size,)]).transpose()
# Read in test results
pmean_test = np.loadtxt(
taweret_wd +
'test/bart_bmm_test_data/2d_pmean.txt')
eps = 0.10
# Get predictions
> ppost, pmean, pci, pstd = mix.predict(X=x_test, ci=0.95)
test_trees.py:107:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../Taweret/mix/trees.py:521: in predict
self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtpred'
def _run_model(self, cmd="openbtcli"):
"""
Private function, run the cpp program via the command line using
a subprocess.
"""
# Check to see if executable is installed via debian
sh = shutil.which(cmd)
# Check to see if installed via wheel
pyinstall = False
if sh is None:
pywhl_path = os.popen("pip show openbtmixing").read()
pywhl_path = pywhl_path.split("Location: ")
if len(pywhl_path)>1:
pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
sh = shutil.which(cmd, path=pywhl_path)
pyinstall = True
# Execute the subprocess, changing directory when needed
if sh is None:
# openbt exe were not found in the current directory -- try the
# local directory passed in
sh = shutil.which(cmd, path=self.local_openbt_path)
if sh is None:
> raise FileNotFoundError(
"Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
________________________________________________________________________ test_predict_wts ________________________________________________________________________
def test_predict_wts():
# Get weights
n_test = 30
x1_test = np.outer(np.linspace(-3, 3, n_test), np.ones(n_test))
x2_test = x1_test.copy().transpose()
x_test = np.array([x1_test.reshape(x1_test.size,),
x2_test.reshape(x1_test.size,)]).transpose()
> wpost, wmean, wci, wstd = mix.predict_weights(X=x_test, ci=0.95)
test_trees.py:123:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../Taweret/mix/trees.py:604: in predict_weights
self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtmixingwts'
def _run_model(self, cmd="openbtcli"):
"""
Private function, run the cpp program via the command line using
a subprocess.
"""
# Check to see if executable is installed via debian
sh = shutil.which(cmd)
# Check to see if installed via wheel
pyinstall = False
if sh is None:
pywhl_path = os.popen("pip show openbtmixing").read()
pywhl_path = pywhl_path.split("Location: ")
if len(pywhl_path)>1:
pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
sh = shutil.which(cmd, path=pywhl_path)
pyinstall = True
# Execute the subprocess, changing directory when needed
if sh is None:
# openbt exe were not found in the current directory -- try the
# local directory passed in
sh = shutil.which(cmd, path=self.local_openbt_path)
if sh is None:
> raise FileNotFoundError(
"Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
___________________________________________________________________________ test_sigma ___________________________________________________________________________
def test_sigma():
sig_eps = 0.05
assert np.abs((np.mean(mix.posterior) - 0.1)
> ) < sig_eps, "Inaccurate sigma calculation."
test_trees.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Taweret.mix.trees.Trees object at 0x12b6626e0>
@property
def posterior(self):
'''
Returns the posterior distribution of the error standard deviation,
which is learned during the training process.
Parameters:
------------
:param: None.
Returns:
---------
:returns: The posterior of the error standard deviation .
:rtype: np.ndarray
'''
> return self._posterior
E AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean: 'posterior'?
../Taweret/mix/trees.py:180: AttributeError
==================================================================== short test summary info =====================================================================
FAILED test_trees.py::test_mixing - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_predict - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_predict_wts - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_sigma - AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean: 'posterior'?
================================================================== 4 failed, 9 passed in 3.28s ===================================================================
Some minor things I noticed when trying to run "sh run_to_rebuild_tawret_rst.sh" - these could be Taweret issues OR it could be just be me:
If the user has jinja2 version 3.1 or higher, then they will get an error with sphinx. This is a jinja2 issue and was solved by downgrading my version to 3.0.3. We could specify this in the requirements file.
We may need to also include the sphinx related libraries in the requirement txt file. I had to install each one manually, which isn't a problem, but could be made easier on the user by automatically installing.
I received the error "Taweret/doces/source/contents.rst not found." - Should I have a contents.rst file? I rebased last night so everything should be up to date.
I am not using conda to edit the documentation. It might be good to include the pip commands in the documentation in addition to conda (?)
On branch 85_FixCI
, I have added the ability to run flake8
on all Python code in the repo using the tools script check_python_code.sh
.
@asemposki @ominusliticus @jcyannotty
Do you all want to keep this facility? If so, then I suppose that each developer will eventually need to clean up their code on a separate branch so that it is passing flake8
. Once the code is passing, I can setup a CI action that runs the script on PRs.
Hi! I'm reviewing your paper for JOSS. This issue summarizes my (minor) comments on the paper, which I find to be well written and informative.
Taweret
that can also calibrate the models while mixing." What does "calibrate" mean in this context? Are you doing formal simulation-based calibration of the Bayesian models, or more similar to Bayesian Validation Metrics. Finding out more about this calibration is also difficult in the docs, but seems to be an important point.It would be excellent if we could get the C++ debian package to be completely set up and ready to go once somebody downloads Taweret onto either their machine or into a codespace. This would entail adding some kind of criterion into the setup of this package so that everything builds without user input. Kyle Godbey has ideas for this that we should look into.
Also, unit testing via GitHub Actions or some other way.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.