Giter VIP home page Giter VIP logo

gptree's Introduction

DOI

zkdtree

GPTree (zkdtree) is a library for quickly interpolating points in a multidimensional space.

At the moment, it uses simple Gaussian Processes with an RBF kernel.

The library allows to perform training and inference efficiently by using spatial data structures and suitable nearest neighbour approximations. Specifically inference scales logarithmically with the number of training samples, rather than cubically as in the naive implementation; the cubic scaling is instead with the much smaller number of nearest neighbours. Contrary to the approach in [Shen, Ng, Seeger 2005], the algorithm adopted here results in a stable inversion of the covariance matrix irrespective of the correlation length.

Summary of the grid construction algorithm

The input data consists on a N x M matrix corresponding the M coordinates of the N data points at which the target function is evaluated and y, the vector of values of the function at each data point.

The data is firstly structured in a k-d tree. The implementation is inspired by that in scikit-learn, but has some added simplifications and optimizations for this use case. For example the data is reordered physically in order to avoid an extra level of indirection and improve cache locality.

A file (grid) containing the tree and hyperparameters is exported using the cereal library.

Inference for a point x_test is then achieved by first finding the nneighbours (which is itself an hyperparmeter) closest point closest points to x_test, (X_nearest, y_nearest)

y_test = K(x_test, X_nearest)@inverse(K(X_nearest, X_rearest) + noise@Identity)@y_nearest

where K is the two point Kernel function and noise is the noise hyperparmeter. Currently the inversion of the nneighbours x nneighbours linear system is done on the fly for each point, since this is efficient enough for the problems this code has been used for, when using the appropriate LAPACK routine.

An estimate of the uncertainty at the test point can also be provided by the library.

Hyperparameter estimation can be achieved using a Leave One Out cross validation based method, for example with the help of the Python interface. More facilities will be provided in the future.

What is in here

  • A C++ shared library with the definition of the KDTree and tools to saving the tree to disk.
  • A C interface to the C++ code.
  • A Python interface to the C interface, using Cython.
  • Executables to build a grid file from a text file and to query basic properties of the grid.

While C++ is used for development, C headers, which are much easier to integrate with third party code are sufficient for embedding the grids for the purposes of making predictions.

Building

A binary conda package with all the dependencies can be obtained with conda-build by running

conda build conda-recipe

Manual builds can be accomplished by satisfying the relevant dependencies below and by following the steps described in conda-recipe/build.sh:

mkdir build
cd build
meson -Dbuildtype=release -Dprefix=${PREFIX}
ninja install

Dependencies

Build system

The project is build and tested using

Build dependencies

  • A modern C++ compiler, capable of handling C++17.
  • A LAPACK provider.

Runtime dependencies

Dependencies for the Python interface

Testing

See conda-recipe/run_test.sh; essentially one can run

meson test

inside the build directory.

Integrating grid predictions

The C interface is the recommended way of integrating grid predictions into third party software. In C itself, it would look like

#include <stdio.h>
#include <stdlib.h>
#include <zkdtree/capi.h>

int main(){
	char *err;
	zkdtree* t = zkdtree_load("r.tree", &err);
	if(!t){
		printf("Error: %s", err);
		free(err);
		return 1;
	}
	double x[1] = {0};
	printf("Result at x=0 is %f", zkdtree_interpolate(t, x));
	zkdtree_delete(t);
	return 0;

}

A pkg-config file specifying the required build options is generated as a part of the installation. This allows required compiler flags to be easily queried with build systems such as CMake or Meson. The following Meson configuration suffices to build the file above

project('zkdtreetest', 'c')
deps = [dependency('zkdtree')]
exe = executable('main', 'main.c', dependencies:deps)

gptree's People

Contributors

zaharid avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.