Giter VIP home page Giter VIP logo

closest-git / litemort Goto Github PK

View Code? Open in Web Editor NEW
56.0 9.0 9.0 1.51 MB

A memory efficient GBDT on adaptive distributions. Much faster than LightGBM with higher accuracy. Implicit merge operation.

License: MIT License

CMake 0.39% Python 27.29% C++ 69.83% C 2.49%
gbdt gradient-boosting machine-learning data-mining-algorithms binary-classification regression-algorithms high-performance-computing

litemort's Introduction

Gradient boosting algorithm is one of the most interesting and overlooked algorithm in machine learning. There are huge gaps between the simple theoretical formula and practical implementations, especially the histogram technique . The histogram-based feature representation not only greatly improves the speed, but also improves the accuracy. In some sense, the histogram is a sparse embedding technique, which map the noisy feature to a more compact and more robust space. And we could get more along this direction. Based on the deep understanding of feature embedding technique, we present LiteMORT, which use much less memory than other GBDT libs. It also has higher accuracy in some datasets. LiteMORT reveals that GBDT algorithm can have much more potential than most people would expect.

Some key features of LiteMORT

1. Faster than LightGBM with higher accuracy

For example , in the latest Kaggle competition IEEE-CIS Fraud Detection competition (binary classification problem) :

1) LiteMORT is much faster than LightGBM. LiteMORT needs only a quarter of the time of LightGBM.

2)LiteMORT has higher auc than LightGBM.

auc_8_fold

time_8_fold

For the detail comparison of this competition, please see https://github.com/closest-git/ieee_fraud.

2. Use much less memory than other GBDT libs

  1. Share memory with data source (pandas dataframe, numpy ndarray, list, vector… )

LiteMORT would not allocate extra memory for features stored in continuous memory. In the gradient boosting process, nearly all visit to data is on the pointer and some offsets.

  1. Implicit merging for “merge overflow problem”

In real application, we usually don’t save all the data in one big data table. They are always many smaller ones instead. But in the data analysis or machine learning task, we have to access all datas. Or we have to merge some small datasets to get some huge datasets, which are too huge to be processed by many classical machine learning algorithms. We called this phenomenon as “merge overflow problem”. LiteMORT use a smart implicit merging technique to deal with this problem. Just send all small datasets to LiteMORT, LiteMORT would generate the histograms for each merged features. In the later training process, all operations are on these histograms. No need to generate the huge merged dataset as the classical method or other GBDT libs(LightGBM, XGBoost,...)

3. sklearn-like api interface.

from litemort import *
mode = LiteMORT(params).fit(train_x, train_y, eval_set=[(eval_x, eval_y)])
pred_val = model.predict(eval_x)
pred_raw = model.predict_raw(eval_x)

4. Just one line to transform from lightGBM to LiteMORT.

Support parameters of LightGBM

As shown below, just one more line to transform from lightGBM to LiteMORT.

if model_type == 'mort':
    model = LiteMORT(params).fit_1(X_train, y_train, eval_set=[(X_valid, y_valid)])
if model_type == 'lgb':
    model = lgb.LGBMRegressor(**params, n_jobs=-1)
    model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)])
pred_test = model.predict(X_test)

Citation

Please use the following bibtex entry:

[1] Chen, Yingshi."LiteMORT: A memory efficient gradient boosting tree system on adaptive compact distributions." arXiv preprint arXiv:2001.09419 (2020).

Author

LiteMORT was written by Yingshi Chen ([email protected])

litemort's People

Contributors

closest-git avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

litemort's Issues

Assertion `nThread>0 && nThread<32' failed

Hi, I have an error when I run your code:
python: /home/cys/LiteMORT/src/tree/GBRT.cpp:32: Grusoft::GBRT::GBRT(Grusoft::FeatsOnFold*, Grusoft::FeatsOnFold*, double, Grusoft::BoostingForest::MODEL, int, int): Assertion `nThread>0 && nThread<32' failed.
Aborted (core dumped)
May you help check it?
Screen Shot 2021-11-26 at 4 12 55 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.