Giter VIP home page Giter VIP logo

Comments (4)

Eight1911 avatar Eight1911 commented on June 15, 2024 1

I see how classification trees can be compactified using the data structure from NodeMeta. However, the case seems a little harder for regression trees where there may be as many labels as there are data points.

One property that may help make creating more compact regression trees is the fact that for node::NodeMeta, indX[node.region] already gives the index of every sample that falls into node. Considering that Y[indX[node.region]] == Y[indX][node.region], we might store just the single array tree.labels = Y[indX] at the top level, and store node.region at the node level. With this, we can recover the labels for in each node by taking tree.labels[node.region]. Since we only need a single array, this may cut the overhead of having one array per Leaf.

from decisiontree.jl.

bensadeghi avatar bensadeghi commented on June 15, 2024

Though I do like the approach mentioned, my main concern is that it would potentially make the already heavy Leaf (and generally, tree) even heavier. Maybe a tuple would be lighter than a Dict. I would say that the top issue to be resolved is #44 , where a simple tree takes up GBs (!!) of space on disk, using JLD.jl or BSON.jl .
I've been meaning to test how well trees made of NodeMeta type would write to disk, as they employ compact counts of the labels. And if they do write well, then we should consider modifying the current Leaf and Node types, or dropping them all together and adopt NodeMeta instead.
It would be good to experiment with the approaches you mentioned and see which better reduces the size on disk.

from decisiontree.jl.

bensadeghi avatar bensadeghi commented on June 15, 2024

Sounds good.
Would love to see how well it writes to disk using JLD and BSON.

from decisiontree.jl.

baggepinnen avatar baggepinnen commented on June 15, 2024

Has there been any progress on adding weights to samples? It would be an awesome feature to have :)
I can see weights appearing in the code but there is no interface for the user to specify them. They seem to be used internally to build boosting stumps? Would it be possible to expose an API where the user can supply a vector of weights when building a tree or a forest?

Edit: I'm working on a PR to add support for this

from decisiontree.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.