Giter VIP home page Giter VIP logo

gpboost's Introduction

GPBoost icon

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Table of Contents

  1. Get Started
  2. Modeling Background
  3. News
  4. Open Issues - Contribute
  5. References
  6. License

Get started

GPBoost is a software library for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models.

The GPBoost library is written in C++ and it has a C API. There exist both a Python package and an R package.

For more information, you may want to have a look at:

Modeling Background

Both tree-boosting and Gaussian processes are techniques that achieve state-of-the-art predictive accuracy. Besides this, tree-boosting has the following advantages:

  • Automatic modeling of non-linearities, discontinuities, and complex high-order interactions
  • Robust to outliers in and multicollinearity among predictor variables
  • Scale-invariance to monotone transformations of the predictor variables
  • Automatic handling of missing values in predictor variables

Gaussian process and mixed effects models have the following advantages:

  • Probabilistic predictions which allows for uncertainty quantification
  • Modeling of dependency which, among other things, can allow for more efficient learning of the fixed effects / regression function

For the GPBoost algorithm, it is assumed that the response variable (label) is the sum of a non-linear mean function and so-called random effects. The random effects can consists of

  • Gaussian processes (including random coefficient processes)
  • Grouped random effects (including nested, crossed, and random coefficient effects)
  • A sum of the above

The model is trained using the GPBoost algorithm, where training means learning the covariance parameters of the random effects and the mean function F(X) using a tree ensemble. In brief, the GPBoost algorithm is a boosting algorithm that iteratively learns the covariance parameters and adds a tree to the ensemble of trees using a gradient and/or a Newton boosting step. In the GPBoost library, covariance parameters can be learned using (Nesterov accelerated) gradient descent or Fisher scoring. Further, trees are learned using the LightGBM library. See Sigrist (2020) for more details.

News

Open Issues - Contribute

Software issues

  • Add possibility to save gp_model to file
  • Add Python tests for gp_model (see corresponding R tests)
  • Setting up Travis CI for GPBoost

Computational issues

  • Add GPU support for Gaussian processes

Methodological issues

  • Add a spatio-temporal Gaussian process model (e.g. a separable one)
  • Add possibility to predict latent Gaussian processes and random effects (e.g. random coefficients)
  • Add some form of safeguard agains too large steps when applying Nesterov acceleration for covariance parameter estimation

References

Sigrist Fabio. "Gaussian Process Boosting". Preprint (2020).

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.

License

This project is licensed under the terms of the Apache License 2.0. See LICENSE for additional details.

gpboost's People

Contributors

fabsig avatar

Stargazers

PEP 8 Speaks avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.