Comments (4)
I see how classification trees can be compactified using the data structure from NodeMeta
. However, the case seems a little harder for regression trees where there may be as many labels as there are data points.
One property that may help make creating more compact regression trees is the fact that for node::NodeMeta
, indX[node.region]
already gives the index of every sample that falls into node
. Considering that Y[indX[node.region]] == Y[indX][node.region]
, we might store just the single array tree.labels = Y[indX]
at the top level, and store node.region
at the node level. With this, we can recover the labels for in each node by taking tree.labels[node.region]
. Since we only need a single array, this may cut the overhead of having one array per Leaf
.
from decisiontree.jl.
Though I do like the approach mentioned, my main concern is that it would potentially make the already heavy Leaf
(and generally, tree) even heavier. Maybe a tuple would be lighter than a Dict. I would say that the top issue to be resolved is #44 , where a simple tree takes up GBs (!!) of space on disk, using JLD.jl or BSON.jl .
I've been meaning to test how well trees made of NodeMeta
type would write to disk, as they employ compact counts of the labels. And if they do write well, then we should consider modifying the current Leaf
and Node
types, or dropping them all together and adopt NodeMeta
instead.
It would be good to experiment with the approaches you mentioned and see which better reduces the size on disk.
from decisiontree.jl.
Sounds good.
Would love to see how well it writes to disk using JLD and BSON.
from decisiontree.jl.
Has there been any progress on adding weights to samples? It would be an awesome feature to have :)
I can see weights appearing in the code but there is no interface for the user to specify them. They seem to be used internally to build boosting stumps? Would it be possible to expose an API where the user can supply a vector of weights when building a tree or a forest?
Edit: I'm working on a PR to add support for this
from decisiontree.jl.
Related Issues (20)
- Citation / Reference for DecisionTree.jl HOT 12
- RNG “shuffling” introduced in #174 is fundamentally flawed HOT 17
- Round thresholds in display of trees HOT 2
- Compatibility bounds for AbstractTrees.jl HOT 1
- Replicate Python model in Julia HOT 4
- Feature importance from random forest regression HOT 5
- Add multithreading support in RF predictions: probabilistic classification - and regression HOT 2
- Custom stopping criteria and loss functions HOT 9
- Fail to precompile the DecisionTree.jl on M1 mac HOT 2
- Add functionality for adding trees to an existing forest HOT 2
- [Tracking Issue] Add document strings to public methods
- Add support for specifying the `loss` used in random forests and AdaBoost model HOT 4
- Standardize the way fit! and predict methods take X matrix (features) HOT 3
- precompile problem julia 1.8.5 LinuxMint HOT 1
- Is out-of-bag error of RandomForestClassifier implementable ? HOT 2
- documentation: Clarify n_subfeatures in build_tree? HOT 4
- Why regression used in apply_forest only if type of labels in model is Float64? HOT 2
- Can offer a interface for DataFrames.jl? HOT 2
- Feature Request: Class Weighting capabilities HOT 1
- Memory leakage upon repeated training HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decisiontree.jl.