Giter VIP home page Giter VIP logo

fastbdt's Introduction

FastBDT

Stochastic Gradient Boosted Decision Trees, usable standalone, and via Python Interface.

FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification

Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cache-friendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fitting-phase and application-phase, in comparison with popular implementations in software frameworks like TMVA, scikit-learn and XGBoost. The concepts used to optimize the execution time and performance studies are discussed in detail in this paper. The key ideas include: An equal-frequency binning on the input data, which allows replacing expensive floating-point with integer operations, while at the same time increasing the quality of the classification; a cache-friendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/C++ and Python. It is extensively used in the field of high energy physics by the Belle II experiment.

Installation

  • cmake .
  • make
  • make install
  • make package (optional to build rpm, deb packages)
  • python3 setup.py install (optional to install the python package)

Usage

Before you do anything you want to execute the unittests:

  • ./unittest

But usually it should be more convinient to use FastBDT as a library and integrate FastBDT directly into your application using

  • the C++ shared/static library (see example/CPPExample.cxx),
  • the C shared library,
  • or the Python3 library python/FastBDT.py (see example/PythonExample.py ).

Further reading

This work is mostly based on the papers by Jerome H. Friedman

FastBDT also implements the uGB techniques to boost to flatness:

fastbdt's People

Contributors

pseyfert avatar sroecker avatar thomaskeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastbdt's Issues

Error importing from PyFastBDT into a Jupyter Notebook

When I run from PyFastBDT import FastBDT in a Jupyter Notebook it gives the following,

OSError                                   Traceback (most recent call last)
<ipython-input-8-fa8af2d7e581> in <module>()
----> 1 from PyFastBDT import FastBDT

/usr/local/lib/python3.5/dist-packages/PyFastBDT/FastBDT.py in <module>()
     16 #FastBDT_library = ctypes.cdll.LoadLibrary(FastBDT_library_path)
     17 
---> 18 FastBDT_library =  ctypes.cdll.LoadLibrary(os.getcwd() + '/libFastBDT_CInterface.so')
     19 print('Loaded ', FastBDT_library)
     20 

/usr/lib/python3.5/ctypes/__init__.py in LoadLibrary(self, name)
    423 
    424     def LoadLibrary(self, name):
--> 425         return self._dlltype(name)
    426 
    427 cdll = LibraryLoader(CDLL)

/usr/lib/python3.5/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    345 
    346         if handle is None:
--> 347             self._handle = _dlopen(self._name, mode)
    348         else:
    349             self._handle = handle

OSError: /home/josh/Documents/Python Scripts/libFastBDT_CInterface.so: cannot open shared object file: No such file or directory

Is there something that can be run prior to starting up the jupyter notebook or in the notebook the will allow this to import?

OSError: [WinError 126] The specified module could not be found

Hi,
Sorry if it is a simple error but I need some help. I am trying to import FastBDT in my file but it doesn't work. I am on Windows 7 with python 3.6

  • I have downloaded cmake then run 'cmake .' in the terminal and got some new files, as we can see on the image below. (I don't know if it was even necessary to do this if I only want to use this library in python). I also got a small error saying that it could not find GTest.

p1

  • Then in this same directory in the terminal I have run the command 'python3 setup.py.in install'
  • Finally when I try to run the example code an error arrives, print screen below.

p2

Does anyone know how to solve this problem please?
Thank you

PS: it's my first issue post, sorry if it is not well written

I have o issue ,can you help me?

FileNotFoundError: Could not find module 'E:\360MoveData\Users\Administrator\Desktop\FastBDT-master\PyFastBDT*libFastBDT_CInterface.so'* (or one of its dependencies). Try using the full path with constructor syntax.

Small fix to compile on Fedora 36

My first attempt to compile got errors like this:
Building CXX object CMakeFiles/FastBDT_static.dir/src/FastBDT.cxx.o
In file included from /home/msevior/Dropbox/FastBDT_github/FastBDT/src/FastBDT.cxx:5:
/home/msevior/Dropbox/FastBDT_github/FastBDT/include/FastBDT.h: In member function ‘Value FastBDT::FeatureBinning::BinToValue(unsigned int) const’:
/home/msevior/Dropbox/FastBDT_github/FastBDT/include/FastBDT.h:215:28: error: ‘numeric_limits’ is not a member of ‘std’
215 | return -std::numeric_limits::infinity();
....
This was very simply fixed by adding..

#include <limits>

To FastBDT.h

Now it compiles with 0 errors

Please make this minor change...

Definition of AUC?

When calculating AUC of ROC, most people use false positve and true positve as their axises. From my understanding, efficiency corresponds to true positve but purity does not match to 1 - false positive.

purity = true signals that passed the cut / events that passed the cut
1 - false positive = true backgrounds that failed the cut / true backgrounds

The consequence is that the ROC curve of efficiency and purity does not start and end at the diagonal points. Is my understanding correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.