Comments (11)
We can get the feature importance for XGBRegressor, With the following code.
regr.booster().get_score(importance_type='weight')
Should be added as a function to XGBRegressor
from xgboost.
Hi, seems same code in feature_importances can also work for XGBRegressor, thought it is not a function of XGBRegressor.
Can help to add this function to XGBRegressor ?
from xgboost.
The algorithm automatically select features that can improve loss most. So
there is no need to specify feature importance...
On Friday, May 30, 2014, Damien Lefortier [email protected] wrote:
Hi; could you please indicate me the way to determine the importance of
each feature? I'm using the Python interface and I could not find anything
in the doc. Thanks!—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/11.
from xgboost.
I understand what the algorithm does; my aim is different: looking at features importances, I could get some knowledge about which features work and which do not.
from xgboost.
In current python module, you can use text dump to dump out the tree, to
see what features are used.
There is undocumented extra option in console mode, you can dump with
xgboost console tool, with option dump_stats=1, this will dump additional
statistics ( gain for loss gain in that split and cover for sum weights in
the branch), that can help you to determine the importance of the feature.
Hope this answers your question
On Fri, May 30, 2014 at 9:24 AM, Damien Lefortier [email protected]
wrote:
I understand what the algorithm does; my aim is different: looking at
features importances, I could get some knowledge about which features work
and which do not.—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/11#issuecomment-44670216.
Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington
from xgboost.
Nice! Thanks, that works indeed although it requires some post processing :).
I think that a function such as dump_feature_strength(), which would return the weighted contribution of each feature to the model, would be nice. See e.g. Section 10.13 of the book "the elements of statistical learning" where such weights are defined.
Just an idea ;).
from xgboost.
Thanks for the suggestion. It is really hard to define a "strength score"
of a feature in GBM that all people agree, due to the complex model
contribution in tree ensemble.
So we allow user to dump out the model to make their own judgement.. But
you are right, this part could be improved.
On Fri, May 30, 2014 at 10:05 AM, Damien Lefortier <[email protected]
wrote:
Nice! Thanks, that works indeed although it requires some post processing
:).I think that a function such as dump_feature_strength(), which would
return the weighted contribution of each feature to the model, would be
nice. See e.g. Section 10.13 of the book "the elements of statistical
learning" where such weighted are defined.Just an idea ;).
—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/11#issuecomment-44674560.
Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington
from xgboost.
This is supported in the newest version.
https://github.com/tqchen/xgboost/blob/master/wrapper/xgboost.py#L400
from xgboost.
Hi, may be I am asking a lot, but would it be possible to have the same function for R implementation? If it represents work R people may take time to learn Python :-)
Kind regards
Michael
from xgboost.
Yes, @FeiYeYe and @hetong007 are working on this at issue #114
from xgboost.
regr.booster().get_fscore()
works as well
from xgboost.
Related Issues (20)
- AttributeError: Can't get attribute '_can_use_qdm' on <module 'xgboost.sklearn' HOT 4
- Changing the order of rows in a toy dataset yields dramatically different predictions for XGBRanker HOT 3
- Global configuration could manage global number of threads HOT 4
- Integer overflow in `get_dump` and `trees_to_dataframe` HOT 1
- Model with External memory doesn't work with categorical features - with reproducible experiment (Python) HOT 1
- XGBoostError: [15:33:38] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\data\data.cc:277: All feature_types must be one of {int, float, i, q, c}. HOT 1
- Set DMatrix feature_names when passing in a dask dataframe HOT 2
- Weighting isnt applied when using custom/sklearn eval_metric callback via sklearn API HOT 1
- multi-output regression - memory overload HOT 2
- xgboost too slow HOT 10
- Are There Ways to Identify Which Subsample of Data an XGBoost Tree Was Fitted On? HOT 2
- Proposal: Consistent package naming convention HOT 2
- Support cuPy array in custom objective with the sklearn interface.
- segfault when `X` is `None` in `.fit()`
- KeyError: "None of [Int64Index([1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n ...\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n dtype='int64', length=163)] are in the [columns]" HOT 2
- Mismatch of split_conditions loading via `json.load` and `load_model` HOT 4
- Custom objective for multi-output regression with multi_strategy=multi_output_tree HOT 2
- Allow users to set the logger level in XGBoost-PySpark HOT 4
- Is there a way to update tree node split thresholds using new data without changes tree structures ?
- No response using xgboost after uwsgi integration with flask HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xgboost.