Giter VIP home page Giter VIP logo

Comments (3)

reiinakano avatar reiinakano commented on August 17, 2024

Hi, this problem isn't managed at all. It is your responsibility to run only as many workers as will fit in your memory. For big datasets, this is usually 1.

I don't see this as a problem since if you were to do hyperparameter optimization on a big dataset in Jupyter, you wouldn't run multiple sklearn processes either.

from xcessiv.

alvarouc avatar alvarouc commented on August 17, 2024

Thanks for your quick answer. I have to note that a tool like this should make an efficient use of the resources, given the computational expense of hyper-parameter search. So, the ideal scenario would be that the data is collectively used by all processes. Unless, the communication cost from node to central node is too expensive, which is not the case for a single node configuration.

Some of the model's implementation in sklearn do not exploit parallelism (I have numpy with OpenBLAS but still they use only one core, please let me know if I am wrong). This is why I find myself building pipelines in parallel and sometimes I can't afford more than one copy of the data in memory.

I will put an example up as soon as I get the time.

from xcessiv.

reiinakano avatar reiinakano commented on August 17, 2024

You are correct, but this is not really a tool to replace sklearn. If sklearn's implementation does not exploit parallelism, running it in xcessiv will not (unless you write your own algorithms, which xcessiv allows).

Any special memory management handling must play safe with arbitrary code that users run for loading data/fitting estimators,and I'm not sure how feasible that is. There may be cases when an algorithm must modify the data in-place, and that would not play well if other processes were using that data as well.

from xcessiv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.