shparkley's Introduction

Shparkley: Scaling Shapley Values with Spark

Contents

Installation
Requirements
Example Usage

Shparkley is a PySpark implementation of Shapley values which uses a monte-carlo approximation algorithm.

Given a dataset and machine learning model, Shparkley can compute Shapley values for all features for a feature vector. Shparkley also handles training weights and is model-agnostic.

Requirements

You must have Apache Spark installed on your machine/cluster.

Contributors

Stargazers

Watchers

shparkley's Issues

Can Shparkley package generate shap values for an entire validation dataset?

As observed in the simple.ipynb file, Shparkley package has generated shap values for a single datapoint, so I wanted to check whether If we input several rows to be investigated, does shparkley provides shap values for all rows?

current:
query_row = Row(fico=600, loan_amount=300, number_of_delinquencies=1, repaid_all_previous_affirm_loans=0)
shapley_values_shparkley = compute_shapley_for_sample(
df=train_spark_df,
model=model_with_shparkley_interface,
row_to_investigate=query_row,
)

Expected:
query_rows =
Row(fico=600, loan_amount=300, number_of_delinquencies=1, repaid_all_previous_affirm_loans=0);
Row(fico=700, loan_amount=350, number_of_delinquencies=0, repaid_all_previous_affirm_loans=0);
Row(fico=680, loan_amount=370, number_of_delinquencies=1, repaid_all_previous_affirm_loans=1);
shapley_values_shparkley = compute_shapley_for_sample(
df=train_spark_df,
model=model_with_shparkley_interface,
row_to_investigate=query_rows,
)

Assignment operator is used in filter instead of equality operator

I believe the line
row = dataset.filter(dataset.row_id = 'xxxx').rdd.first()
should instead use the equality operator like below
row = dataset.filter(dataset.row_id == 'xxxx').rdd.first()

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects

I wanted to try out this package, because this implements pyspark version of shapley value generations.
So, I just copy pasted "simple.ipynb" file into my environment to just observe everything basic is working alright or not, but able to see code is breaking at input cell [32]. Attached are the screenshots, could anyone please look into them?

Recommend Projects

affirm / shparkley Goto Github PK

shparkley's Introduction

Shparkley: Scaling Shapley Values with Spark

Installation

Requirements

Example Usage

shparkley's People

Contributors

Stargazers

Watchers

Forkers

shparkley's Issues

Can Shparkley package generate shap values for an entire validation dataset?

Assignment operator is used in filter instead of equality operator

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent