Comments (21)
@aldanor any updates ?
from lightgbm.
Any thoughts on this?
from lightgbm.
I would also be very interested in seeing this feature implemented in LightGBM. As aldanor stated above the Pseudo-code suggested earlier is correct and is how XGBoost implements monotonic constraints.
As such this feature should be fairly trivial to implement for someone with an intimate knowledge of the codebase.
from lightgbm.
From practical perspective (outside kaggle-world!), this feature would be extremely helpful in many applications where reasonable model behavior is relevant.
from lightgbm.
It seems the MC(Monotonic constraints) could be cumulative, that is, if both model A and B is MC, then A+B is MC.
So we only need to enable MC in decision tree learning.
combine @chivee 's pseudo code and @AbdealiJK 's suggestion.
I think the algorithm is:
min_value = node.min_value
max_value = node.max_value
check(min_value <= split.left_output)
check(min_value <= split.right_output)
check(max_value >= split.left_otput)
check(max_value >= split.right_output)
mid = (split.left_output + split.right_output) / 2;
if (split.feature is monotonic increasing) {
check(split.left_output <= split.right_output)
node.left_child.set_max_value(mid)
node.right_child.set_min_value(mid)
}
if (split.feature is monotonic decreasing ) {
check(split.left_output >= split.right_output)
node.left_child.set_min_value(mid)
node.right_child.set_max_value(mid)
}
from lightgbm.
I'm pasting the snippets for the monotonic constraints here
IF (split is a continuous variable and monotonic)
THEN take average of left and right child nodes if current split is used
IF monotonic increasing THEN CHECK left average <= right average
IF monotonic decreasing THEN CHECK left average >= right average
@alexvorobiev , do you have referable papers for this features?
from lightgbm.
@chivee I only have the reference to the R GBM package https://cran.r-project.org/package=gbm
from lightgbm.
@alexvorobiev , thanks for your sharing. I'm trying to get the idea behind this method.
from lightgbm.
Note that the given pseudo code only ensures the split to be in the correct order and not the whole model as a later split could lead the model to be non monotonic
from lightgbm.
@guolinke Would you be able to advise how to approach this and whether it's feasible? I.e., where should it belong, would it be sufficient to implement it just somewhere in feature_histogram.hpp
? I guess FeatureMetainfo
could just contain the -1/0/1
constraint then.
Here's the meat of the implementation in XGBoost, for reference: https://github.com/dmlc/xgboost/blob/master/src/tree/param.h#L422 -- all of it pretty much contained in CalcSplitGain()
, plus CalcWeight()
. Where would stuff like this go in LightGBM?
from lightgbm.
@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?
following may is useful:
The split gain calculation: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L291-L297
The leaf-output calculation:
https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L305-L308
from lightgbm.
@guolinke I may add some links here about the implementation in XGBoost:
https://xgboost.readthedocs.io/en/latest//tutorials/monotonic.html
dmlc/xgboost#1514
dmlc/xgboost#1516
from lightgbm.
@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?
@guolinke Monotonic constraints may be a very important requirement for the resulting models. For many reasons: e.g., as noted above, there could be domain knowledge that must be respected - e.g., in insurance and risk management problems.
How about we all cooperate and make this work?
from lightgbm.
@aldanor very cool, would like to work together with it.
from lightgbm.
@aldanor would you like to create a PR first ? I can provide my help in the PR.
from lightgbm.
@guolinke I will give it a try, yep. Your suggested algorithm in the snippet above looks fine, that's kind of what like xgboost does (in exact mode though, not histogram; do you think there would be any complications here because of binning?)
Where would this code belong then, treelearner/feature_histogram.hpp
? (I still have to read through most of the code).
Edit: what do you mean by check(...)
here? E.g., if (!(...)) { return; }
?
from lightgbm.
@aldanor
The check
means return gain with -inf
if didn't meet the condition, as a result, that split will not be chosen.
I think there is not different for the MC in binned algorithm.
We need to update the calculation of gain: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L354-L357 and https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L415-L418 .
We may need to wrap these to a new function, and implement both non-constraint and MC for them.
from lightgbm.
< removed due to irrelevance>
from lightgbm.
@j-mark-hou
there is one bug in your code, refer to @AbdealiJK `s comment and my algorithm below.
from lightgbm.
got it, I'll wait for someone with a better understanding of the codebase to implement this then.
from lightgbm.
you can try #1314
from lightgbm.
Related Issues (20)
- [python-package] UserWarning with num_iterations HOT 3
- Segmentation fault when use compilation option -DUSE_TIMETAG=ON HOT 1
- Segmentation fault when use compilation option -DUSE_TIMETAG=ON HOT 2
- [Question] SHAP library: Is it possible to have a node in LightGBM that has no coverage (no samples assigned to it)? HOT 2
- Custom loss for LightGBM lead to negative probability prediction HOT 2
- [Question] Setting values for linear coefficient HOT 5
- [QUESTION] Adding trees manually on txt file HOT 1
- LGBM_BoosterUpdateOneIterCustom requires objective_func_ not null HOT 2
- [R-package] Expose start_iteration to dump/save/lgb.model.dt.tree HOT 1
- [GPU] Kernel crashed when using GPU HOT 5
- [python-package] LGBM hangs with high number of categories HOT 2
- [python-package] Early stopping callback added when early_stopping_round = 0 HOT 15
- [python-package] `init_score` causes the predictions different HOT 6
- Classifier predict numerical precision issue with large raw_score HOT 4
- score belong to different target classes. class 0 when init_score is given else 1 HOT 2
- LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost HOT 2
- [R-package] lgb.cv() fails with categorical features HOT 3
- [python-package] Can't continued training on Dataset with SequenceDataset(lgb.Sequence)
- Q:error in predict
- [python-package] CUDA version not truly installing HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightgbm.