jamesrobertlloyd / gpss-research Goto Github PK

Kernel structure discovery research code - likely to be unstable

License: MIT License

M 0.05% MATLAB 15.46% Python 10.13% Mathematica 3.61% TeX 63.68% Shell 0.15% R 0.09% HTML 3.45% CSS 0.01% Makefile 0.04% C++ 0.75% Fortran 2.50% C 0.02% OpenEdge ABL 0.07%

gpss-research's People

Contributors

Stargazers

Watchers

gpss-research's Issues

Superimpose approximately periodic components to visualise the trend

e la Bayesian Data Analysis 3

Optimise parameters using random subset of data

For speed, but then compute nll using full data - randomising the subset should also guard against an unlucky subset being chosen at the beginning of the search

Does the jitter heuristic work

Does the search multiply by Lin again?
Is the jitter size correct (too big and we lose optimised values - too small and spurious Lins will appear again)

Remove redundancy operation should occur after random restarts

Allow min period heuristic type to be an experiment parameter

e.g. when data is known to be on a regular grid but is sparse

Kernels should not have scale factors

Only the constant kernel should - fits better with symbolic regression grammars

Linear kernel translation sometimes wrong

Need to look at gradient - can't just re-use SE*something logic

Changepoints confuse discussion of monotonicity

Before computing stats python can work out where a kernel applies - potentially even a sum of kernels

Expand all subkernel expressions

e.g. should consider the expansion A + B + C -> (A + B) * D + C

Sum, Product, Changepoint should be operators

This will likely tidy up some duplicate code since we know that all operators have operands etc

Also - if we record properties like commutativity / distributivity etc. we can abstract their behaviour.

Pick best marginal likelihoods over several runs of the algorithm

Changepoint etc. kernels are slow

Use bsxfun and generally optimise the formulae

Plot decomp not stable when searching over noise

English language kernel descriptions

More data

Earthquakes
EEG
Changepoint papers?
Fault detection papers?
Multiresolution paper?
Fix some of the current data sets that were subsampled.

Output variances stored on log scale - make sure summary routines aware of this

e.g. product kernel incorrectly uses output_variance

Cosine kernel period out by factor of 2

Whoops!

Limit the number of decomposition components plotted

Bunch all the other components together when demonstrating a decomposition

SE*Lin - linear standard deviation

Not variance - change the text accordingly

Make suitable for multi-d again

Changepoints etc. should select a dimension to act upon (but should pass all data shape and variables downstream)

The 10 fold cross validation needs to be updated

The new data shape parameters need to behave correctly

SE lengthscale restarts

Should sometimes be very large e.g. twice the data range - this is the neutral value (ie.. inifintiy)

Unstandardised data?

Most aspects of the algorithm can scale appropriately but hard to control everything - should we just standardise data before running the search?

Translation should comment about the future

e.g. SE * Per - the posterior mean tends to zero but this is just due to uncertainty about the period

In general, SE*Per should talk about a range of plausible periods

Use spectrogram / other approaches to find candidate frequencies

Periodic components seem the most difficult to find - probably requiring good initial values of hyperparameters

If we want an additive model should we always have purely additive components?

Might help for parsimony but does not feel like the right way forward

cblparallel should sanitise its code input

% signs!

Try the log transform of some data

A teaser for learning output warping or a demonstration of deficiency?

How would we compare marginal likelihoods? Check out the warped GP paper.

Mixture of lengthscales kernel

Rather than the broad mixture that is RQ - maybe try a tighter mixture e.g. a Gaussian centred on a particular lengthscale?

Will it ever be useful to separate the constant component of the linear kernel?

Combine anticorrelated components

If the sum of two kernel components dramatically reduces uncertainty (at points where the uncertainty is greater than zero e.g. blackouts / changepoints) then these components probably belong together e.g. A + A + B -> 2A + B