Comments (4)
I looked that X is not preprocessed, so I preprocessed it before calling the shap.Explainer().. but now I have this problem (that I think was not correct to preprocessed X...):
Traceback (most recent call last):
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/streamlit_app/app.py", line 95, in
shap_values = explainer(single_employer_processed)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 76, in call
return super().call(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_explainer.py", line 264, in call
row_result = self.explain_row(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 120, in explain_row
outputs = fm(extended_delta_indexes, zero_index=0, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 59, in call
return self._delta_masking_call(masks, zero_index=zero_index, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 205, in _delta_masking_call
outputs = self.model(*subset_masked_inputs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/models/_model.py", line 28, in call
out = self.inner_model(*args)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 584, in predict_proba
Xt = transform.transform(Xt)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 827, in transform
Xs = self._fit_transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 65, in call
return super().call(iterable_with_config)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 127, in call
return self.function(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 940, in _transform_one
res = transformer.transform(X)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 1586, in transform
X_int, X_mask = self._transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 192, in _transform
diff, valid_mask = check_unknown(Xi, self.categories[i], return_mask=True)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_encode.py", line 304, in _check_unknown
if np.isnan(known_values).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
from shap.
Hey, would help a lot if you could provide a complete example that we can copy and paste in order to reproduce the issue. Would be amazing if you could at least provide a couple of sample rows that reproduce the issue.
from shap.
Thanks @CloseChoice,
I tried without inserting inside the Pipeline the OrdinalEncoder() and all the execution is correctly executed, but.. i don't like avoid this solution.. so i hope to solve this problem
I will try to add the sample code:
import shap
import pandas as pd
from utils.model import ChurnModel
import matplotlib.pyplot as plt
import numpy as np
churn_model = ChurnModel()
model_trained = churn_model.load_latest_model(artifacts_dir="./utils/model_artifact/")
df = churn_model.get_dataset(size=200)
X, y = df.drop(columns=["Exited"]), df["Exited"]
print(X.info())
print(X.head())
print(X.isna().sum())
data = {'CreditScore': ["43743"],
'Geography': ["Spain"],
'Gender': ["Male"],
'Age': ["34"],
'Tenure': ["13"],
'Balance': ["342"],
'NumOfProducts': ["4"],
'HasCrCard': ["1"],
'IsActiveMember': ["1"],
'EstimatedSalary': ["384972.0"]
}
features = pd.DataFrame(data)
categ_lst, numerical_cols = churn_model.get_categ_features(), churn_model.get_numerical_features()
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")
print(features.head())
print(f"The prediction of this sample is: {model_trained.predict(features)}")
explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show()
Now, the link for the dataset is: https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction?resource=download
To read the dataframe:
df_ = pd.read_csv(self.dataset_path, sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
df = df_.drop(columns=["RowNumber", "CustomerId", "Surname"])
The sklearn pipeline apply is:
preprocessor = ColumnTransformer(
transformers=[
('cat', OrdinalEncoder(), categ_lst),
('num', StandardScaler(), numerical_cols)
]
)
self.model = Pipeline([
('preprocessor', preprocessor),
('classifier', LogisticRegression(random_state=42))
])
The list of string and numeric values:
categ_lst = ["Gender", "Geography"]
numerical_cols = list(set(df.columns) - set(["Exited", "Gender", "Geography"]))
from shap.
Sorry, but your example is still not reproducible. I tried the following but this throws a different error:
import shap
import pandas as pd
# from utils.model import ChurnModel
import matplotlib.pyplot as plt
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
categ_lst = ["Geography", "Gender", "Age", "HasCrCard", "IsActiveMember"]
numerical_cols = ["CreditScore", "Age", "Tenure", "Balance", "NumOfProducts", "EstimatedSalary"]
preprocessor = ColumnTransformer(
transformers=[
('cat', OrdinalEncoder(), categ_lst),
('num', StandardScaler(), numerical_cols)
]
)
model = Pipeline([
('preprocessor', preprocessor),
('classifier', LogisticRegression(random_state=42))
])
df = pd.read_csv('bugs/data/Churn_Modelling.csv')
df = df.loc[df.notnull().all(1), :]
X, y = df.drop(columns=["Exited"]), df["Exited"]
model_trained = model.fit(X, y)
print(X.info())
print(X.head())
print(X.isna().sum())
# I ignore this for now since it does not work as expected. Always throws an error that some unexpected category values was found
data = {'CreditScore': [43743],
'Geography': ["Spain"],
'Gender': ["Male"],
'Age': [34.],
'Tenure': [13.],
'Balance': [342.],
'NumOfProducts': [4.],
'HasCrCard': [1.],
'IsActiveMember': [1.],
'EstimatedSalary': [384972.0]
}
features = X.iloc[0, :] # pd.DataFrame(data)
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")
print(features.head())
# print(f"The prediction of this sample is: {model_trained.predict(features)}")
explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show()
Would be great if you could help make this reproducible so that we can start working on a solution for the problem. As I see you are interested in fixing this, so we would need a reproducible example for the tests either way ;)
from shap.
Related Issues (20)
- Questions: question about SamplingExplainer HOT 1
- BUG: SHAP values calculated using CPU differ from SHAP values calculated using GPU HOT 5
- BUG: Additivity check failed HOT 1
- When plotting the shap text it is showing an extra letter(Ġ) before every word. HOT 1
- Demangle pytorch and tensorflow dependencies
- ENH: Python 3.13 Support
- TypeError: In v0.20, force plot now requires the base value as the first parameter! Try shap.plots.force(explainer.expected_value, shap_values) or for multi-output models try shap.plots.force(explainer.expected_value[0], shap_values[0]). HOT 1
- Key not found with shap.TreeExplainer and XGBRegressor HOT 1
- BUG: Unable to Generate SHAP values for a dataframe containing text data trained on lstm model HOT 2
- BUG: Error with SHAP Partial Dependence Plot: ValueError: DataFrame.dtypes for data must be int, float, bool or category
- ENH: integrated gradients HOT 1
- Support tf 2.16 and keras 3 HOT 4
- BUG: shap.plots.bar(shap_values) TypeError
- BUG: Custom masker offset is not working properly
- BUG: AssertionError, the SHAP explanations do not sum up to the model's output!
- BUG: Background dataset subsampling HOT 4
- BUG: AttributeError: 'tuple' object has no attribute 'as_list' (tensorflow 2.15.0) HOT 2
- [Meta-issue] Release 0.46.0
- [Meta-issue] Increase test coverage
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shap.