Issue Deion I parsed my dataset converting all column-values

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

BUG: TypeError: ufunc 'isfinite' not supported for the input types about shap HOT 4 OPEN

Jeremy98-alt commented on June 25, 2024

BUG: TypeError: ufunc 'isfinite' not supported for the input types

from shap.

Comments (4)

Jeremy98-alt commented on June 25, 2024

I looked that X is not preprocessed, so I preprocessed it before calling the shap.Explainer().. but now I have this problem (that I think was not correct to preprocessed X...):

Traceback (most recent call last):
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/streamlit_app/app.py", line 95, in
shap_values = explainer(single_employer_processed)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 76, in call
return super().call(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_explainer.py", line 264, in call
row_result = self.explain_row(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 120, in explain_row
outputs = fm(extended_delta_indexes, zero_index=0, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 59, in call
return self._delta_masking_call(masks, zero_index=zero_index, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 205, in _delta_masking_call
outputs = self.model(*subset_masked_inputs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/models/_model.py", line 28, in call
out = self.inner_model(*args)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 584, in predict_proba
Xt = transform.transform(Xt)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 827, in transform
Xs = self._fit_transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 65, in call
return super().call(iterable_with_config)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 127, in call
return self.function(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 940, in _transform_one
res = transformer.transform(X)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 1586, in transform
X_int, X_mask = self._transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 192, in _transform
diff, valid_mask = check_unknown(Xi, self.categories[i], return_mask=True)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_encode.py", line 304, in _check_unknown
if np.isnan(known_values).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

from shap.

CloseChoice commented on June 25, 2024

Hey, would help a lot if you could provide a complete example that we can copy and paste in order to reproduce the issue. Would be amazing if you could at least provide a couple of sample rows that reproduce the issue.

from shap.

Jeremy98-alt commented on June 25, 2024

Thanks @CloseChoice,
I tried without inserting inside the Pipeline the OrdinalEncoder() and all the execution is correctly executed, but.. i don't like avoid this solution.. so i hope to solve this problem
I will try to add the sample code:

    import shap
    import pandas as pd
    from utils.model import ChurnModel 
    import matplotlib.pyplot as plt
    import numpy as np
    
    churn_model = ChurnModel()
    
    model_trained = churn_model.load_latest_model(artifacts_dir="./utils/model_artifact/")
    df = churn_model.get_dataset(size=200)
    X, y = df.drop(columns=["Exited"]), df["Exited"]
    
    print(X.info())
    print(X.head())
    print(X.isna().sum())
    
    data = {'CreditScore': ["43743"],
            'Geography': ["Spain"],
            'Gender': ["Male"],
            'Age': ["34"],
            'Tenure': ["13"],
            'Balance': ["342"],
            'NumOfProducts': ["4"],
            'HasCrCard': ["1"],
            'IsActiveMember': ["1"],
            'EstimatedSalary': ["384972.0"]
    }
    
    features = pd.DataFrame(data)
    categ_lst, numerical_cols = churn_model.get_categ_features(), churn_model.get_numerical_features()
    features[categ_lst] = features[categ_lst].astype("string")
    features[numerical_cols] = features[numerical_cols].astype("float")
    
    print(features.head())
    print(f"The prediction of this sample is: {model_trained.predict(features)}")
    
    explainer = shap.Explainer(model_trained.predict_proba, X)
    transformed = model_trained["preprocessor"].transform(features)
    transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
    
    print(transformed)
    shap_values = explainer(transformed)
    shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
    plt.show()

Now, the link for the dataset is: https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction?resource=download

To read the dataframe:

df_ = pd.read_csv(self.dataset_path, sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
df = df_.drop(columns=["RowNumber", "CustomerId", "Surname"])

The sklearn pipeline apply is:

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

        self.model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])

The list of string and numeric values:

categ_lst = ["Gender", "Geography"]
 numerical_cols = list(set(df.columns) - set(["Exited", "Gender", "Geography"]))

from shap.

CloseChoice commented on June 25, 2024

Sorry, but your example is still not reproducible. I tried the following but this throws a different error:

import shap
import pandas as pd
# from utils.model import ChurnModel 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

categ_lst = ["Geography", "Gender", "Age", "HasCrCard", "IsActiveMember"]

numerical_cols = ["CreditScore", "Age", "Tenure", "Balance", "NumOfProducts", "EstimatedSalary"]

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])



df = pd.read_csv('bugs/data/Churn_Modelling.csv')

df = df.loc[df.notnull().all(1), :]
X, y = df.drop(columns=["Exited"]), df["Exited"]

model_trained = model.fit(X, y)

print(X.info())
print(X.head())
print(X.isna().sum())

# I ignore this for now since it does not work as expected. Always throws an error that some unexpected category values was found
data = {'CreditScore': [43743],
        'Geography': ["Spain"],
        'Gender': ["Male"],
        'Age': [34.],
        'Tenure': [13.],
        'Balance': [342.],
        'NumOfProducts': [4.],
        'HasCrCard': [1.],
        'IsActiveMember': [1.],
        'EstimatedSalary': [384972.0]
}

features = X.iloc[0, :] # pd.DataFrame(data)
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")

print(features.head())
# print(f"The prediction of this sample is: {model_trained.predict(features)}")

explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)

print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show()

Would be great if you could help make this reproducible so that we can start working on a solution for the problem. As I see you are interested in fixing this, so we would need a reproducible example for the tests either way ;)

from shap.

BUG: TypeError: ufunc 'isfinite' not supported for the input types about shap HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent