bandframework / taweret Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 8.0 155.64 MB

Python package for Bayesian Model Mixing

Home Page: https://bandframework.github.io/Taweret/

License: MIT License

Python 100.00%

bayesian-model-mixing bayesian-statistics regression-trees

taweret's Introduction

Taweret

Welcome to the GitHub repo for Taweret, the state of the art Python package for applying Bayesian Model Mixing!

About

Taweret is a new generalized package to help with applying Bayesian model mixing methods, developed by members of the BAND collaboration, to a wide variety of problems in physics.

Features

At present, this package possesses the following BMM methods:

Linear model mixing ( With simultaneous model mixing and calibration)
Multivariate BMM
Bayesian Trees

Documentation

See Taweret's docs webpage here.

Cloning

This repository uses submodules. To clone this repository and automatically checkout all the submodules, use

git clone --recursive https://github.com/bandframework/Taweret.git

If you want to limit the size of the repository (this or the submodules), you can use the depth flag

git clone --depth=1 https://github.com/bandframework/Taweret.git

Inside the directory containing the cloned repository, you then run

git submodule update --init --depth=1

Prerequisites

The Trees module depends on OpenMPI. Please ensure OpenMPI is installed with shared/built libraries prior to using the Trees module.

Testing

The test suite requires the pytest package to be installed and can be run from the test/ directory. To test the current BMM methods, first install the required packages and then run the following three lines of code:

To installing requirements, first navigate to the Taweret directory. The requirements.txt file is located in the root of this directory. Once in the Taweret directory, then execute the following line of code from the terminal.

pip install -e .

Once all installation is complete, proceed with testing by naviagating to the test/ directory and executing the following three lines of code.

pytest test_bivariate_linear.py
pytest test_gaussian.py
pytest test_trees.py

Windows Users:

Taweret also depends on the OpenBT Mixing package in order to execute the trees modulde. This package is built with OpenMPI thus Windows users can work with the trees module using Windows Subsystem for Linux. Installation instructions are shown below.

OpenBT will run within the Windows 10 Windows Subsystem for Linux (WSL) environment. For instructions on installing WSL, please see (https://ubuntu.com/wsl). We recommend installing the Ubuntu 20.04 WSL build. There are also instructions here on keeping your Ubuntu WSL up to date, or installing additional features like X support. Once you have installed the WSL Ubuntu layer, start the WSL Ubuntu shell from the start menu and then you can begin working with Taweret.

Citing Taweret

If you have benefited from Taweret, please cite our software using the following format:

@inproceedings{Taweret,
    author = "Liyanage, Dan and Semposki, Alexandra and Yannotty, John and Ingles, Kevin",
    title  = "{{Taweret: A Python Package for Bayesian Model Mixing}}",
    year   = "2023",
    url    = {https://github.com/bandframework/Taweret}
}

and our explanatory paper:

@article{Ingles:2023nha,
    author = "Ingles, Kevin and Liyanage, Dananjaya and Semposki, Alexandra C. and Yannotty, John C.",
    title = "{Taweret: a Python package for Bayesian model mixing}",
    eprint = "2310.20549",
    archivePrefix = "arXiv",
    primaryClass = "nucl-th",
    month = "10",
    year = "2023"
}

Please also cite the BAND collaboration software suite using the format here.

BAND SDK compliance

Check out our SDK form here.

Contact

To contact the Taweret team, please submit an issue through the Issues page.

Authors: Kevin Ingles, Dan Liyanage, Alexandra Semposki, and John Yannotty.

taweret's People

Contributors

Stargazers

Watchers

Forkers

jcyannotty ominusliticus asemposki daocalendar sudhanvalalit kylegodbey danosu

taweret's Issues

JOSS Review -- Failing `test_trees`

Hi! I'm reviewing your paper for JOSS and am working my way through assessing testing.

I cannot get the tests to pass for test_trees.py. It is a mix of error modes, but here is my summary:

test_mixing and test_predict fail when making draws in _read_in_preds with ValueError: need at least one array to concatenate.
test_predict_wts fails with same ValueError, in the _read_in_wts function.
test_sigma fails with AttributeError in returning the _posterior hidden attribute.AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean 'posterior'?

I've cloned Taweret as described on the repository installation instructions, including installing open-mpi and OpenBT as described in the dependencies. I am running Python 3.10.9 on macOS-14.1.1-arm64-arm-64bit. Here is a complete report of pytest on my machine: taweret_report.txt

Finally, it does not look like you are using any CI/CD on the Taweret main branch, which would be ideal. Is there a reason you are not using CI/CD such as Travis or through GitHub Actions? If there is a specific reason (which I bet there is), I'm totally fine with that. I do think, however, that you must disclose in the documentation how often you run tests during development. This is not only important for reliability and reproducibility, but will help other developers who want to contribute to Taweret.

Bug in adding model parameters

In the BivariateLinear.predict function there is a bug:

https://github.com/danOSU/Taweret/blob/51fb7abc2d093c7c647383246617c5b78bb20793/Taweret/mix/bivariate_linear.py#L189

Perhaps a better way to organize the loop is

for i in range(0,len(n_args_for_models)):
    bit = 0 if i == 0 else 1
    start = self.n_mix + bit * sum(n_arg_for_models[:i])
    stop = start + n_args_for_models[i]
    model_params.append(sample[start: stop])

weird zero rows in the output of both data and sigma. Probably coming from TrueModel evaluation return structure inside Data class.

https://github.com/danOSU/Taweret/blob/02897a27264ca5948a0a2098af76f600c67ef438/Taweret/models/samba_models.py#L474C5-L474C5

Codecov only assessing coverage in `tests.py`

It looks like your codecov reporting is only showing that pytest is running the scripts in test.py, and not reporting the actual code coverage over Taweret/*/*. You may have Taweret blocked in your codecove settings yml.

Better Documentation

I will add this as a to do here for everyone contributing: I think we should make sure our function doc strings all describe what the function is expected to return

JOSS Paper review - language comments

Minor language comments:

Footnote 1: Recommend writing "hippopotamus" in full
Line 66: Should read "data"
Line 70: The sentence seems incomplete
Line 104: sum-to-one should not be hyphenated
Lines 157-158: did you mean "for individual contributions" or "for contributing individuals"?

JOSS Review -- Incorrect Installation Instructions

Hi! I'm reviewing your paper for JOSS and am working my way through assessing installation. I've run into some problems and comments, which I'm listing below:

The website documentation has different installation instructions than the GitHub repository. The former directs installation from git via danOSU/Taweret, which is not the repository tagged for review in JOSS. Your website documentation installation page should be updated to point to the correct repository bandframework/Taweret.
The installation instructions on the GitHub repository lists the repository address as TaweretOrg/Taweret, not bandframework/Taweret. I know it properly redirects, but I think it would be prudent to keep naming consistent.
If it is not installable via pip (why not?), please remove the line which states pip install Taweret on the website documentation.
What is the reason you have not hosted Taweret on a commonly used package manager (e.g. conda or pip)? I presume you have a good reason for this, but cloning the repository directly and adding it to PATH will decrease the number of people who can easily use your software.

JOSS Review -- Paper Comments

Hi! I'm reviewing your paper for JOSS. This issue summarizes my (minor) comments on the paper, which I find to be well written and informative.

Wording and Clarity

In the abstract (line 11), you say "[...] such that each model's best qualities are preserved in the final result." The term "best" is highly subjective and qualitative whereas I believe you mean the most quantitatively predictive features of the model are preserved. Can you be more concrete with what you mean here?
Caption of Table 1, the parenthetical "(e.g. in heavy-ion collisions, this is the centrality bin)" is very, very field specific to nuclear physics and sows more confusion than clarity for out-of-field specialists (such as myself) who are still proficient in Bayesian statistics. Can you either i) give another example outside of nuclear physics or ii) explain what "number of inputs" means in more generic language. I would prefer the latter.
Lines 64-65. "Currently, this is the only mixing method in Taweret that can also calibrate the models while mixing." What does "calibrate" mean in this context? Are you doing formal simulation-based calibration of the Bayesian models, or more similar to Bayesian Validation Metrics. Finding out more about this calibration is also difficult in the docs, but seems to be an important point.
Lines 150-151. Why have you not yet integrated CI/CD on this package? I mentioned this in my other issue (#52), but I would hope for a better explanation on why this is not integrated currently if you aim to fix it in further development.

Suggested Additions

I think a schematic figure giving an intuition for the utility of BMM would really help the accessibility of the paper. I realize that whoever is reading this is already (probably) in the weeds and looking for a way to implement ideas, but I think an additional figure outlining how BMM can help weight models in describing experimental data would significantly augment the pedagogy of the paper.
I love that all authors on the paper are given equal contribution and really celebrate the collaborative effort on the work. However, just looking through the Taweret commit history, It's hard to really identify who contributed where and how much contribution occurred through pen-and-paper theory and discussion. I think the paper would benefit from a short "contribution" section which lists who was involved with what. If all authors are involved in everything, that's great! I'd love to see that listed.
"Taweret" is a great name for the package, but it's significance is not explained anywhere in the paper. I would appreciate a sentence addition to the paper (like you have in the docs) that explains why you chose "Taweret", even though having a hippopotamus-goddess as a mascot should be self explanatory.

JOSS Review -- `ptemcee` error while reproducing Coleman Bivariate Linear BMM models.

Hi! I'm reviewing your paper for JOSS and am working my way through assessing functionality. I cannot rerun the notebooks on my system following the install instructions on the GitHub repository.

Thus far, I've run into two issues:

The sampler ptemcee is needed for Taweret to work, but is not included in the requirements.txt file.
Training to find the posterior fails with the following traceback: coleman_BMM_traceback.txt. The error seems to be in ptemcee, which is a bit concerning to me as that package is no longer maintained.

Running the notebook through binder seems to solve the problem, but that is launching a version of Taweret that is not the one under review. This launches (and presumably executes) Taweret via danOSU/Taweret, as opposed to that hosted by bandframework/Taweret.

I suspect this is on my end and not an issue with Taweret. Any suggestions at where to dig at this? I would like to be able to run through the example notebooks on the documentation so I can play around with the software and get to the more fun parts of the review.

JOSS Review - failing tests

I get some failing tests with pdm run pytest on MacOS:

====================================================================== test session starts =======================================================================
platform darwin -- Python 3.10.3, pytest-8.1.1, pluggy-1.4.0
configfile: pyproject.toml
plugins: cov-5.0.0
collected 13 items                                                                                                                                               

test_bivariate_linear.py ...                                                                                                                               [ 23%]
test_gaussian.py .....                                                                                                                                     [ 61%]
test_trees.py .FFFF                                                                                                                                        [100%]

============================================================================ FAILURES ============================================================================
__________________________________________________________________________ test_mixing ___________________________________________________________________________

    def test_mixing():
        x_train = np.loadtxt(
            taweret_wd + 'test/bart_bmm_test_data/2d_x_train.txt').reshape(80, 2)
        x_train = x_train.reshape(2, 80).transpose()
    
        y_train = np.loadtxt(
            taweret_wd + 'test/bart_bmm_test_data/2d_y_train.txt').reshape(80, 1)
    
        # Set prior information
        mix.set_prior(
            k=2.5,
            ntree=30,
            overallnu=5,
            overallsd=0.01,
            inform_prior=False)
    
        # Check tuning & hyper parameters
        assert mix.k == 2.5, "class object k is not set."
        assert mix.ntree == 30, "class object ntree is not set."
        assert mix.overallnu == 5, "class object nu is not set."
        assert mix.overallsd == 0.01, "class object overallsd is not set."
        assert mix.overalllambda == 0.01**2, "class object overalllambda is not set."
        assert mix.inform_prior == False, "class object inform_prior is not set."
    
        # Train the model
>       fit = mix.train(
            X=x_train,
            y=y_train,
            ndpost=10000,
            nadapt=2000,
            nskip=2000,
            adaptevery=500,
            minnumbot=4)

test_trees.py:73: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../Taweret/mix/trees.py:410: in train
    self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtcli'

    def _run_model(self, cmd="openbtcli"):
        """
        Private function, run the cpp program via the command line using
        a subprocess.
        """
        # Check to see if executable is installed via debian
        sh = shutil.which(cmd)
    
        # Check to see if installed via wheel
        pyinstall = False
        if sh is None:
            pywhl_path = os.popen("pip show openbtmixing").read()
            pywhl_path = pywhl_path.split("Location: ")
            if len(pywhl_path)>1:
                pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
                sh = shutil.which(cmd, path=pywhl_path)
                pyinstall = True
    
        # Execute the subprocess, changing directory when needed
        if sh is None:
            # openbt exe were not found in the current directory -- try the
            # local directory passed in
            sh = shutil.which(cmd, path=self.local_openbt_path)
            if sh is None:
>               raise FileNotFoundError(
                    "Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E               FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.

../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------
Results stored in temporary path: /var/folders/02/lkk5tj_96tg1q_5p7dlc_1mw0000gn/T/openbtmixing_fz3x2yjx
Running model...
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
__________________________________________________________________________ test_predict __________________________________________________________________________

    def test_predict():
        # Get test data
        n_test = 30
        x1_test = np.outer(np.linspace(-3, 3, n_test), np.ones(n_test))
        x2_test = x1_test.copy().transpose()
        f0_test = (np.sin(x1_test) + np.cos(x2_test))
        x_test = np.array([x1_test.reshape(x1_test.size,),
                          x2_test.reshape(x1_test.size,)]).transpose()
    
        # Read in test results
        pmean_test = np.loadtxt(
            taweret_wd +
            'test/bart_bmm_test_data/2d_pmean.txt')
        eps = 0.10
    
        # Get predictions
>       ppost, pmean, pci, pstd = mix.predict(X=x_test, ci=0.95)

test_trees.py:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../Taweret/mix/trees.py:521: in predict
    self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtpred'

    def _run_model(self, cmd="openbtcli"):
        """
        Private function, run the cpp program via the command line using
        a subprocess.
        """
        # Check to see if executable is installed via debian
        sh = shutil.which(cmd)
    
        # Check to see if installed via wheel
        pyinstall = False
        if sh is None:
            pywhl_path = os.popen("pip show openbtmixing").read()
            pywhl_path = pywhl_path.split("Location: ")
            if len(pywhl_path)>1:
                pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
                sh = shutil.which(cmd, path=pywhl_path)
                pyinstall = True
    
        # Execute the subprocess, changing directory when needed
        if sh is None:
            # openbt exe were not found in the current directory -- try the
            # local directory passed in
            sh = shutil.which(cmd, path=self.local_openbt_path)
            if sh is None:
>               raise FileNotFoundError(
                    "Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E               FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.

../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
________________________________________________________________________ test_predict_wts ________________________________________________________________________

    def test_predict_wts():
        # Get weights
        n_test = 30
        x1_test = np.outer(np.linspace(-3, 3, n_test), np.ones(n_test))
        x2_test = x1_test.copy().transpose()
        x_test = np.array([x1_test.reshape(x1_test.size,),
                          x2_test.reshape(x1_test.size,)]).transpose()
    
>       wpost, wmean, wci, wstd = mix.predict_weights(X=x_test, ci=0.95)

test_trees.py:123: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../Taweret/mix/trees.py:604: in predict_weights
    self._run_model(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Taweret.mix.trees.Trees object at 0x12b6626e0>, cmd = 'openbtmixingwts'

    def _run_model(self, cmd="openbtcli"):
        """
        Private function, run the cpp program via the command line using
        a subprocess.
        """
        # Check to see if executable is installed via debian
        sh = shutil.which(cmd)
    
        # Check to see if installed via wheel
        pyinstall = False
        if sh is None:
            pywhl_path = os.popen("pip show openbtmixing").read()
            pywhl_path = pywhl_path.split("Location: ")
            if len(pywhl_path)>1:
                pywhl_path = pywhl_path[1].split("\n")[0] + "/openbtmixing"
                sh = shutil.which(cmd, path=pywhl_path)
                pyinstall = True
    
        # Execute the subprocess, changing directory when needed
        if sh is None:
            # openbt exe were not found in the current directory -- try the
            # local directory passed in
            sh = shutil.which(cmd, path=self.local_openbt_path)
            if sh is None:
>               raise FileNotFoundError(
                    "Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.")
E               FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.

../Taweret/mix/trees.py:857: FileNotFoundError
---------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------
WARNING: Package(s) not found: openbtmixing
___________________________________________________________________________ test_sigma ___________________________________________________________________________

    def test_sigma():
        sig_eps = 0.05
        assert np.abs((np.mean(mix.posterior) - 0.1)
>                     ) < sig_eps, "Inaccurate sigma calculation."

test_trees.py:140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Taweret.mix.trees.Trees object at 0x12b6626e0>

    @property
    def posterior(self):
        '''
        Returns the posterior distribution of the error standard deviation,
        which is learned during the training process.
    
        Parameters:
        ------------
        :param: None.
    
        Returns:
        ---------
        :returns: The posterior of the error standard deviation .
        :rtype: np.ndarray
    
        '''
>       return self._posterior
E       AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean: 'posterior'?

../Taweret/mix/trees.py:180: AttributeError
==================================================================== short test summary info =====================================================================
FAILED test_trees.py::test_mixing - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_predict - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_predict_wts - FileNotFoundError: Cannot find openbt executables. Please specify the path using the argument local_openbt_path in the constructor.
FAILED test_trees.py::test_sigma - AttributeError: 'Trees' object has no attribute '_posterior'. Did you mean: 'posterior'?
================================================================== 4 failed, 9 passed in 3.28s ===================================================================

Documentation Issues

Some minor things I noticed when trying to run "sh run_to_rebuild_tawret_rst.sh" - these could be Taweret issues OR it could be just be me:

If the user has jinja2 version 3.1 or higher, then they will get an error with sphinx. This is a jinja2 issue and was solved by downgrading my version to 3.0.3. We could specify this in the requirements file.
We may need to also include the sphinx related libraries in the requirement txt file. I had to install each one manually, which isn't a problem, but could be made easier on the user by automatically installing.
I received the error "Taweret/doces/source/contents.rst not found." - Should I have a contents.rst file? I rebased last night so everything should be up to date.
I am not using conda to edit the documentation. It might be good to include the pip commands in the documentation in addition to conda (?)

Compatibility with Windows machines

It would be excellent if we could get the C++ debian package to be completely set up and ready to go once somebody downloads Taweret onto either their machine or into a codespace. This would entail adding some kind of criterion into the setup of this package so that everything builds without user input. Kyle Godbey has ideas for this that we should look into.

Also, unit testing via GitHub Actions or some other way.

JOSS Paper review - code comments

My main concern is the ease of installation of the software. I strongly recommend distributing it on PyPI and using PDM (or similar) and a pyproject.toml to manage dependences, rather than a setup.py and requirements.txt. This is important to ensure sustainable dependency management into the future.

Other non-python dependencies that are currently described in the installation section of the online documentation would be better if automatically installable though a Python function included with the package.