An implementation of the model described in Generic Inference in Latent Gaussian Process Models (currently under review by an ML journal) along with the framework to run the experiments described in the paper.
[Running experiments](#running experiments)
[Experiment results](#experiment results)
[Using the model](#using the model)
We require the following packages that can be directly installed from pip:
- GPy >= 1.0.7
- matplotlib >= 1.5.1
- numpy >= 1.11.1
- scikit-learn >= 0.17.1
- scipy >= 0.17.0
- pandas >= 0.18.0
We also require a Theano installation whose installation instructions can be found here. We recommend configuring Theano to use GPUs which will cause the code to perform an order of magnitude faster.
The script found at run_experiments.py
allows us to run the experiments described in the
paper. run_experiment.py
makes use of various flags to specify the configuration of the
experiment. For example
./run_experiment.py -e mnist -m full -s 0.04 -o -p 500
makes predictions on the mnist dataset, with a full covariance posterior, a sparsity factor of 0.04, stochastic optimization with a minibatch of size 500. For full details on each flag run
./run_experiment.py -h
We also support launching multiple experiments at once, although we recommend only doing so for small scale experiments. To run multiple experiments at once create a json file which contains the following attributes:
- num_processes: The number of experiments to run at once.
- experiment_names: A list of names of the datasets to use.
- methods: A list of methods to use for the experiment.
- run_ids: The id of the dataset partition to use.
Then run
./run_experiment.py -f JSON_FILE_NAME
the script will then run all possible combinations of configurations found in the json file. An
example json file can be found in ./experiment_configs.json
.
Each experiment that gets run generated a new directory in ../results
titled
EXPERIMENT_NAME_DAY-MONTH-YEARhMINmSECs_PID
. The directory contains the following files:
- config.csv: The configuration information for the experiment.
- EXPERIMENT_NAME.log: Logging data during the optimization process.
- model.dump: A dump of the model that can be used to reload a regular snapshot of the model into memory.
- predictions.csv: The predictions the model made on the test data for the experiment.
- train.csv: A file containing training data used in the experiment.
The model can be used directly without the experiment framework. An example can be found in
src/example.py
. In-depth documentation for the model is available in comments found in
src/savigp.py
.
We give a quick summary of the design of the code. We can split the design into two sections: the experiment framework, and the Gaussian process model.
The experiment framework provides the mean to load datasets into memory, optimizers, graph results, and scripts to specify the configuration of experiments. The files in the experiment framework consist of:
- data_source.py: Loads datasets held in
./data
into memory. - data_transformation.py: Contains various utility functions to pre-process the data.
- model_logging.py: Logs experiment information found in
../results
. - optimizer.py: Various functions to optimize the model.
- run_experiment.py: The script to allow users to run the experiment.
- run_model.py: Sets up the model, calls into the optimizer, makes predictions, and ensures all experiment info is logged to disk.
- setup_experiment.py: Configures experiments with information provided by
run_experiment.py
alongside extra details not provided by the user.
The model consists of a generic Gaussian process superclass (gaussian_process.py
), along with
two subclasses, one for a full covariance posterior (full_gaussian_process.py
) and one for
a diagonal posterior (diagonal_gaussian_process.py
).
The model also consists of classes to represent the posterior distribution (gaussian_mixture.py
,
full_gaussian_mixture.py
, and diagonal_gaussian_mixture.py
), and a kernel function
implementation (kernel.py
) and various likelihood models (likelihood.py
).
A thin wrapper around the model and various functions found in optimizer.py
can be found in
src/savigp.py
. The wrapper aims to provide a scikit-like API and should be used for any
purposes outside of the experiment framework. We also hold various utility functions in util.py
.