Giter VIP home page Giter VIP logo

Comments (4)

mayer79 avatar mayer79 commented on May 20, 2024

Hello and sorry you run into a problem.

I probably cannot solve the error coming from parallel computing on Windows. But maybe I can help you with the single-threaded problem (no movement in error bar):

  1. SHAP analyses are usually done by decomposing 200-2000 predictions. Maybe you can start with an X consisting of 500 randomly sampled rows.
  2. What is the size of your background data set? Scott Lundberg proposes to use small sets of only a few rows (up to 100). Also here I'd suggest to use a bg_X of 20. If the speed is acceptable, you can increase to 100.
  3. I guess you are not using exact = TRUE?
  4. Are you using the current CRAN version?

What do you observe?

from kernelshap.

OlexiyPukhov avatar OlexiyPukhov commented on May 20, 2024

Thank you for the swift answer. I was able to solve the problem thanks to your ideas and some tinkering. Initially, my bg_x and x were the same, both being 27k x 140. I set x to be a subset of 2000 rows and bg_x to be 500 with parallel processing for 12 threads enabled with exact = FALSE. Processing finished after about 5 minutes.

Will the fact that I am now using a smaller dataset for x and bg_x change my results relative to using my full dataset for x and bg_x? I am able to use the full dataset for x and bg_x with treeSHAP.

from kernelshap.

mayer79 avatar mayer79 commented on May 20, 2024

Sweet! Thanks for testing. With 500 rows, your background data is still very large, but 5 minutes is quite acceptable. Using even larger X and bg_X will change the result, but only slightly. I usually decompose between 1000 and 2000 predictions, even with TreeSHAP.

TreeSHAP is indeed magnitudes faster than KernelSHAP but can only be used for tree-based methods, while KernelSHAP works for all model classes. For trees, I would not use KernelSHAP in practice.

By default, in your case, exact = FALSE, so you don't need to specify it explicitly.

from kernelshap.

OlexiyPukhov avatar OlexiyPukhov commented on May 20, 2024

Since I'm doing research, a longer processing time is acceptable but I suppose for production TreeSHAP would be preferred. Thanks again!

from kernelshap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.