Giter VIP home page Giter VIP logo

Comments (4)

Jeremy98-alt avatar Jeremy98-alt commented on June 25, 2024

I looked that X is not preprocessed, so I preprocessed it before calling the shap.Explainer().. but now I have this problem (that I think was not correct to preprocessed X...):

Traceback (most recent call last):
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/streamlit_app/app.py", line 95, in
shap_values = explainer(single_employer_processed)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 76, in call
return super().call(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_explainer.py", line 264, in call
row_result = self.explain_row(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/explainers/_exact.py", line 120, in explain_row
outputs = fm(extended_delta_indexes, zero_index=0, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 59, in call
return self._delta_masking_call(masks, zero_index=zero_index, batch_size=batch_size)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/utils/_masked_model.py", line 205, in _delta_masking_call
outputs = self.model(*subset_masked_inputs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/shap/models/_model.py", line 28, in call
out = self.inner_model(*args)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 584, in predict_proba
Xt = transform.transform(Xt)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 827, in transform
Xs = self._fit_transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 65, in call
return super().call(iterable_with_config)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/parallel.py", line 127, in call
return self.function(*args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/pipeline.py", line 940, in _transform_one
res = transformer.transform(X)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 1586, in transform
X_int, X_mask = self._transform(
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 192, in _transform
diff, valid_mask = check_unknown(Xi, self.categories[i], return_mask=True)
File "/mnt/c/Users/j.sapienza/OneDrive - Reply/Desktop/Demo IP 20240509/demo-streamlit-xai/.venv/lib/python3.8/site-packages/sklearn/utils/_encode.py", line 304, in _check_unknown
if np.isnan(known_values).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

from shap.

CloseChoice avatar CloseChoice commented on June 25, 2024

Hey, would help a lot if you could provide a complete example that we can copy and paste in order to reproduce the issue. Would be amazing if you could at least provide a couple of sample rows that reproduce the issue.

from shap.

Jeremy98-alt avatar Jeremy98-alt commented on June 25, 2024

Thanks @CloseChoice,
I tried without inserting inside the Pipeline the OrdinalEncoder() and all the execution is correctly executed, but.. i don't like avoid this solution.. so i hope to solve this problem
I will try to add the sample code:

    import shap
    import pandas as pd
    from utils.model import ChurnModel 
    import matplotlib.pyplot as plt
    import numpy as np
    
    churn_model = ChurnModel()
    
    model_trained = churn_model.load_latest_model(artifacts_dir="./utils/model_artifact/")
    df = churn_model.get_dataset(size=200)
    X, y = df.drop(columns=["Exited"]), df["Exited"]
    
    print(X.info())
    print(X.head())
    print(X.isna().sum())
    
    data = {'CreditScore': ["43743"],
            'Geography': ["Spain"],
            'Gender': ["Male"],
            'Age': ["34"],
            'Tenure': ["13"],
            'Balance': ["342"],
            'NumOfProducts': ["4"],
            'HasCrCard': ["1"],
            'IsActiveMember': ["1"],
            'EstimatedSalary': ["384972.0"]
    }
    
    features = pd.DataFrame(data)
    categ_lst, numerical_cols = churn_model.get_categ_features(), churn_model.get_numerical_features()
    features[categ_lst] = features[categ_lst].astype("string")
    features[numerical_cols] = features[numerical_cols].astype("float")
    
    print(features.head())
    print(f"The prediction of this sample is: {model_trained.predict(features)}")
    
    explainer = shap.Explainer(model_trained.predict_proba, X)
    transformed = model_trained["preprocessor"].transform(features)
    transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)
    
    print(transformed)
    shap_values = explainer(transformed)
    shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
    plt.show()

Now, the link for the dataset is: https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction?resource=download

To read the dataframe:

df_ = pd.read_csv(self.dataset_path, sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
df = df_.drop(columns=["RowNumber", "CustomerId", "Surname"])

The sklearn pipeline apply is:

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

        self.model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])

The list of string and numeric values:

categ_lst = ["Gender", "Geography"]
 numerical_cols = list(set(df.columns) - set(["Exited", "Gender", "Geography"]))

from shap.

CloseChoice avatar CloseChoice commented on June 25, 2024

Sorry, but your example is still not reproducible. I tried the following but this throws a different error:

import shap
import pandas as pd
# from utils.model import ChurnModel 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

categ_lst = ["Geography", "Gender", "Age", "HasCrCard", "IsActiveMember"]

numerical_cols = ["CreditScore", "Age", "Tenure", "Balance", "NumOfProducts", "EstimatedSalary"]

preprocessor = ColumnTransformer(
            transformers=[
                ('cat', OrdinalEncoder(), categ_lst),
                ('num', StandardScaler(), numerical_cols)
            ]
        )

model = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', LogisticRegression(random_state=42))
        ])



df = pd.read_csv('bugs/data/Churn_Modelling.csv')

df = df.loc[df.notnull().all(1), :]
X, y = df.drop(columns=["Exited"]), df["Exited"]

model_trained = model.fit(X, y)

print(X.info())
print(X.head())
print(X.isna().sum())

# I ignore this for now since it does not work as expected. Always throws an error that some unexpected category values was found
data = {'CreditScore': [43743],
        'Geography': ["Spain"],
        'Gender': ["Male"],
        'Age': [34.],
        'Tenure': [13.],
        'Balance': [342.],
        'NumOfProducts': [4.],
        'HasCrCard': [1.],
        'IsActiveMember': [1.],
        'EstimatedSalary': [384972.0]
}

features = X.iloc[0, :] # pd.DataFrame(data)
features[categ_lst] = features[categ_lst].astype("string")
features[numerical_cols] = features[numerical_cols].astype("float")

print(features.head())
# print(f"The prediction of this sample is: {model_trained.predict(features)}")

explainer = shap.Explainer(model_trained.predict_proba, X)
transformed = model_trained["preprocessor"].transform(features)
transformed = pd.DataFrame(transformed, columns=df.drop(columns=["Exited"]).columns, dtype=float)

print(transformed)
shap_values = explainer(transformed)
shap.plots.waterfall(shap_values[0,:, 1], max_display = 10)
plt.show()

Would be great if you could help make this reproducible so that we can start working on a solution for the problem. As I see you are interested in fixing this, so we would need a reproducible example for the tests either way ;)

from shap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.