I'm thinking of adding support for sample weights, which isn't compatible with the cur

Adding a new field to the `Leaf` API to support sample weights about decisiontree.jl HOT 4 OPEN

Eight1911 commented on June 15, 2024 1

Adding a new field to the `Leaf` API to support sample weights

from decisiontree.jl.

Comments (4)

Eight1911 commented on June 15, 2024 1

I see how classification trees can be compactified using the data structure from NodeMeta. However, the case seems a little harder for regression trees where there may be as many labels as there are data points.

One property that may help make creating more compact regression trees is the fact that for node::NodeMeta, indX[node.region] already gives the index of every sample that falls into node. Considering that Y[indX[node.region]] == Y[indX][node.region], we might store just the single array tree.labels = Y[indX] at the top level, and store node.region at the node level. With this, we can recover the labels for in each node by taking tree.labels[node.region]. Since we only need a single array, this may cut the overhead of having one array per Leaf.

from decisiontree.jl.

bensadeghi commented on June 15, 2024

Though I do like the approach mentioned, my main concern is that it would potentially make the already heavy Leaf (and generally, tree) even heavier. Maybe a tuple would be lighter than a Dict. I would say that the top issue to be resolved is #44 , where a simple tree takes up GBs (!!) of space on disk, using JLD.jl or BSON.jl .
I've been meaning to test how well trees made of NodeMeta type would write to disk, as they employ compact counts of the labels. And if they do write well, then we should consider modifying the current Leaf and Node types, or dropping them all together and adopt NodeMeta instead.
It would be good to experiment with the approaches you mentioned and see which better reduces the size on disk.

from decisiontree.jl.

bensadeghi commented on June 15, 2024

Sounds good.
Would love to see how well it writes to disk using JLD and BSON.

from decisiontree.jl.

baggepinnen commented on June 15, 2024

Has there been any progress on adding weights to samples? It would be an awesome feature to have :)
I can see weights appearing in the code but there is no interface for the user to specify them. They seem to be used internally to build boosting stumps? Would it be possible to expose an API where the user can supply a vector of weights when building a tree or a forest?

Edit: I'm working on a PR to add support for this

from decisiontree.jl.

Recommend Projects

Adding a new field to the `Leaf` API to support sample weights about decisiontree.jl HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent