Comments (10)
This is going to occur when the number of training data points is not equal to the number of labels. For each data point (row) you need to have a corresponding label. Can you make sure that is the case for your problem?
from feature-selector.
Thanks for the reply. Just for my understanding, if I pass in a standard dataframe, is it identifying rows or columns that have zero importance? I have tried passing labels in for both and I either get the above error (passing df.columns
in) or ValueError: y contains new labels:
(passing df.index
in). Each row and column has a label. I understand that these errors are being thrown by sklearn, but any advice would be appreciated.
from feature-selector.
In machine learning, features are in the columns with observations in the rows. As we want to identify features with zero importance, we check the columns. You should be passing in the entire dataframe (with observations in the rows and features in the columns) along with the labels for identifying zero importance features. You need to have the same number of observations in the dataframe and in the labels.
from feature-selector.
Hi Will, thanks for the reply. I know you are not hear to teach ignorami like me, but I do appreciate your advice. My dataframe is that shape features as columns and observations as rows. Is the label argument the label of the feature or the observation? I have tried with both and am getting errors either way, with the initial error documented initially when passing in the feature names as labels, but the ValueError: y contains new labels: [A list of observation names]
error when passing the observations in as labels. My dataset is wide rather than long (around 250 features and 150 observations), could this be the source of the errors? I have checked that the length of the labels and index are the same.
from feature-selector.
I have also just done some more evaluation and it appears that when I pass the observation names as labels I get the contains new labels
error, and it lists 23 (15.333%) of the labels as new, and these change each time I attempt identify_zero_importance
. Is this something to do with the test/train split?
from feature-selector.
Could you share the code that is giving you errors?
from feature-selector.
There is a lot of wrangling to get the dataframe in shape, then I call
fs = FeatureSelector(data=df, labels=df.index)
fs.identify_zero_importance(task = 'classification', eval_metric = 'auc',
n_iterations = 10, early_stopping = True)
Which leads to the error:
ValueError: y contains new labels: [`a list of 23 (15.333%) of the items in the index`]
from feature-selector.
The labels should be in a separate array, not in the dataframe itself. What kind of labels do you have, binary, multiclass, or continuous?
from feature-selector.
Hi Will,
They are multiclass labels - strings of geographical area names. I have tried passing the index values in as a list rather than a direct call to the df, but it is giving the same error as before.
from feature-selector.
FYI: I found this write up describing the problem. It helped me get past this problem:
https://datascience.stackexchange.com/questions/20199/train-test-split-error-found-input-variables-with-inconsistent-numbers-of-sam
from feature-selector.
Related Issues (20)
- Won't install w Pip, won't import with legacy environment. HOT 3
- What is the metric to describe importance, is it "split" or "gain"?
- Filter auc
- Why identify_collinear does't consider statistical importance of Pearson coeff?
- Cannot run identify_zero_importance on small data
- Error when attempting to use with a continuous target variable
- Cant run zero imprtance HOT 1
- Beware feature importances HOT 1
- How to Use a Custom Scoring Metric?
- bug HOT 1
- can you clarify how you remove correlated features HOT 3
- AttributeError: 'str' object has no attribute 'shape' HOT 1
- Packing for pip
- How to use feature-selector for 3 classes HOT 3
- Inconsistent requirements HOT 1
- Add release
- Add link to license in readme HOT 1
- Making it `scikit-learn` compatible
- How do I set the cpu usage
- 'DataFrame' object has no attribute 'append' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from feature-selector.