Giter VIP home page Giter VIP logo

Comments (3)

guitargeek avatar guitargeek commented on August 22, 2024

Hi @kpedro88, thanks for checking out the library and opening the issue!

Indeed, updating the benchmarks was long overdue. I have now added a commit where I updated the benchmark code and results. While doing that, I noticed some performance regression in fastforest in the last year and also fixed that (see the second-to-last commit).

I hope you can reproduce the benchmarks now and that they will make sense to you! The reason why xgboost appeared so fast was that newer versions use multithreading by default, so I needed to explicitly set it to one thread now. Treelike will probably be slower than fastforest, at least that's what I observed. Fastforest and m2cgen are very similar now apparently, in your numbers FastForest was faster and for me it was m2cgen. A few years ago m2cgen was still slower, but I suppose it caught up thanks to improved compilers.

Let me know if these issue is resolved for you or if you want me to follow up on something :) I'm also very curious about your own reader!

from xgboost-fastforest.

kpedro88 avatar kpedro88 commented on August 22, 2024

Thanks @guitargeek, I'm able to run all the benchmarks now and I get similar, though not exactly the same, behavior on my Intel CPU. I also observe that the results vary more than expected based on the randomly-generated BDT and input data.

Versions:

  • Compiler: GCC 11.1.0
  • Python: 3.9.6 (older)
  • xgboost: 1.5.2
  • m2cgen: 0.9.0
  • ROOT: 6.24.06 (newer)

Results:
FastForest: 1.34 s
treelite: 2.24 s
m2cgen: 1.41 s
xgboost: 1.33 s
TMVA: 9.2 s

from xgboost-fastforest.

kpedro88 avatar kpedro88 commented on August 22, 2024

I'm just going to summarize the details of my custom reader here, since it turns out not to be faster. (At one time, I had observed it to be faster than the CMS GBRTree, but this was long ago with gcc ~5.)

GBRTree/FastForest stores the trees in separate vectors:

        std::vector<int> rootIndices_;
        std::vector<CutIndexType> cutIndices_;
        std::vector<FeatureType> cutValues_;
        std::vector<int> leftIndices_;
        std::vector<int> rightIndices_;
        std::vector<TreeResponseType> responses_;

and uses them as (inside of the index-updating while loop):

            auto r = rightIndices_[index];
            auto l = leftIndices_[index];
            index = array[cutIndices_[index]] > cutValues_[index] ? r : l;

My idea was to store all elements in a single vector to improve memory locality, originally using a struct, but it can be simplified even further:

	std::vector<int> rootIndices_;
	//"node" layout: index, cut, left, right
	std::vector<int> nodes_;
	std::vector<float> responses_;

The inside of the index-updating while loop then becomes:

				node = &nodes_[index];
				index = features[*node] <= (float&)(*++node) ? *++node : *(node+2);

Surprisingly, this is actually slower than FastForest. It took godbolt, llvm-mca, callgrind, and Vincenzo to figure out why: gcc emits branchless code for the FastForest version, but switches to conditional branching for my version. It's then slowed down by branch prediction misses.

It's not entirely clear why gcc does this, but I eventually found a modification that forces it to be branchless:

				node = &nodes_[index];
				index = *(node+2+(features[*node] > (float&)(*(node+1))));

which results in a similar speed to FastForest (still a bit slower, suffering from a few extra instructions).

Interestingly, clang (12.0.0) emits branchless instructions for both versions and they perform very similarly. (And gcc with -O3 emits branching instructions for both and they slow down...)

The demo code is in https://github.com/kpedro88/XGBoost-FastForest/blob/aos3/benchmark/benchmark-01-aos.cpp and a script to run everything is https://gist.github.com/kpedro88/4f60741a1d3b8b2830d4447f239f13a2#file-run_bdt-sh. Godbolt links:
FastForest: https://godbolt.org/z/3EosczYEf
Mine: https://godbolt.org/z/nddbMWq4h
Mine (branchless): https://godbolt.org/z/E5xrcYdMG

from xgboost-fastforest.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.