Giter VIP home page Giter VIP logo

infiniteboost's Introduction

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

infiniteboost's People

Contributors

arogozhnikov avatar bonext avatar naereen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

infiniteboost's Issues

Compare with other algorithms

Is it a deliberate decision to not compare this algorithm to popular implementations such as xgboost and lightgbm? If this is fundamental research I can imagine it is (not) yet at the same level. Giving some numbers for comparison will give a clearer view of the purpose of the paper to the reader :)

Make predictions for classfication task of infiniteBoost

I am trying to test infiniteboost with titanic dataset from kaggle.

titanic_df = pd.read_csv("train_cleaned")
y = titanic_df["Survived"].values
X = titanic_df.drop("Survived", axis = 1).values
clf = InfiniteBoosting(loss = LogisticLoss(),n_estimators= 100)
X = BinTransformer().fit_transform(X)
clf.fit(X,y)
ypred = clf.staged_decision_function(X)
y_last_pred = clf.decision_function(X)
y_last_pred

It is a classification problem, how can I know the infiniteBoost will consider it as a classification problem?(the target variable is y, the value of y is 0 or 1(int)).
And when I used the decision_function to make predictions, I found it doesn't look like neither probabilities nor classes.
So, how does inifiniteBoost work with classification tasks and how to use it to make predictions of probabilities?

Question: InfiniteBoost vs XGBoost ?

This is more a question than an issue (I can close it at any time), did you compare your approach with XGBoost implementation ? It could be interesting to compare, especially on overfitting.

A small typo here : InfiniteBost -> InfiniteBoost

Thanks

Why should I use InfiniBoost?

I read the paper (thanks) but I am still puzzled - I don't see any ground-breaking improvements in precision or performance over RF or GB? What is the big benefit?

Thanks

[CONFUSED] -fopenmp -O3" failed with exit status 1 ??

Hi. I don't understand why using pip install . throws me error:

Running setup.py install for infiniteboost ... error
Complete output from command /home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "infiniteboost.fortranfunctions" sources
f2py options: []
adding 'build/src.linux-x86_64-2.7/fortranobject.c' to sources.
adding 'build/src.linux-x86_64-2.7' to include_dirs.
adding 'build/src.linux-x86_64-2.7/infiniteboost/fortranfunctions-f2pywrappers2.f90' to sources.
build_src: building npy-pkg config files
running build_py
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/infiniteboost
copying infiniteboost/researchlosses.py -> build/lib.linux-x86_64-2.7/infiniteboost
copying infiniteboost/researchboosting.py -> build/lib.linux-x86_64-2.7/infiniteboost
copying infiniteboost/init.py -> build/lib.linux-x86_64-2.7/infiniteboost
copying infiniteboost/researchtree.py -> build/lib.linux-x86_64-2.7/infiniteboost
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'infiniteboost.fortranfunctions' extension
compiling C sources
C compiler: gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fopenmp -O2 -march=core2 -ftree-vectorize -fPIC

creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/build
creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/infiniteboost
compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
gcc: build/src.linux-x86_64-2.7/fortranobject.c
In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from build/src.linux-x86_64-2.7/fortranobject.h:13,
                 from build/src.linux-x86_64-2.7/fortranobject.c:2:
/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
gcc: build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c
In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from build/src.linux-x86_64-2.7/fortranobject.h:13,
                 from build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:19:
/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c: In function ‘initfortranfunctions’:
build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:778:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   Py_TYPE(&PyFortran_Type) = &PyType_Type;
   ^
compiling Fortran 90 module sources
creating build/temp.linux-x86_64-2.7/infiniteboost
Fortran f77 compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
extra options: '-Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost'
extra f90 options: '-fopenmp -O3'
gfortran:f90: infiniteboost/fortranfunctions.f90
infiniteboost/fortranfunctions.f90:124.14:

        !$OMP SIMD
              1
Error: Unclassifiable OpenMP directive at (1)
infiniteboost/fortranfunctions.f90:124.14:

        !$OMP SIMD
              1
Error: Unclassifiable OpenMP directive at (1)
error: Command "/usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops -Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c -c infiniteboost/fortranfunctions.f90 -o build/temp.linux-x86_64-2.7/infiniteboost/fortranfunctions.o -Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost -fopenmp -O3" failed with exit status 1

----------------------------------------

Command "/home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-Q3sQ_y-build/

....Even though I have GNU (gcc, g++, gfortran) installed in my machine. I have Intel i5 4 cores.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.