Giter VIP home page Giter VIP logo

safe's People

Contributors

kant avatar olagacek avatar pbiecek avatar plubon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

safe's Issues

Structure of the transformer

How I can print out the structure of the safe_transformer in a human readable way?
To know which variables were transformed and how

Alternative for partial dependence plot

Dear Gosiewska,

I am currently using your SAFE code together with you SAFE ML paper as a part of my master thesis for my Econometrics master Quantitative Finance (in the Netherlands).
I love your approach and I am very curious for the results in my research. However, there is a small adjustment that I would like to make in the code. Namely, I want to use the SHAP dependence plot as an alternative for the partial dependence plot (inspired by the paper of Lundberg: Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888 (2018).). Lundberg states that the SHAP values are the only consistent feature attribution values, such that the SHAP dependence plot is a rich alternative to the partial dependence plot.

However, I am not specifically a master in programming, so I have failed so far to adjust this part of the code. Please let me know your thoughts on this adjustment and if you would like to implement this.

With kind regards,

Jeroen van den Boogaard

Performance issues

Hello!

The idea for such an approach is awesome! I'm currently testing your solution and I've got some issues regarded the performance. I'm trying to use SafeTransformer on a dataset ~60 wide and 11k long and the computations are quite long. Have you got any tips on how to make computations faster or are you planning to speed up this algorithm? Also, I'd love to see some speed tests if possible :)

Best regards,
Patryk

reproducible README

X_train and y_train are undefined in the README.md in the line

surrogate_model = surrogate_model.fit(X_train, y_train)

Changing sklearn install requirement

Hey there. Great package!

Just wanted to point something out that I've seen across a few packages. The install was fine for me because I already have this install, but I think you should remove sklearn from this in setup.py:

    install_requires=[
          'numpy',
          'ruptures',
          'sklearn',
          'pandas',
          'scipy',
          'kneed'
      ],

and replace it with "scikit-learn".

    install_requires=[
          'numpy',
          'ruptures',
          'scikit-learn',
          'pandas',
          'scipy',
          'kneed'
      ],

It does install the wrong package and could confuse individuals who have not installed scikit learn before because it would not install the real package for them. https://pypi.org/project/sklearn/ mentions not to use that specification as an install.

Problem when there's no changepoint.

SafeTransformer returns an empty data frame when no transformation is applied. Below is an example from https://github.com/ModelOriented/SAFE/blob/master/examples/SafeTransformerTests_Classification.ipynb

I've changed the penalty value from 1 to 1000. SafeTransformer does not discretize any variable and as a result, returns an empty data frame. It is because SafeTransformer does not return features which were not transformed.

from SafeTransformer import SafeTransformer
from sklearn.datasets import load_digits, load_iris, load_wine
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
data = load_iris()

import pandas as pd
X = pd.DataFrame(data.data)
y = pd.Series(data.target)

from sklearn.naive_bayes import GaussianNB
X_train, X_test, y_train, y_test = train_test_split(X, y)
from xgboost import XGBClassifier
surrogate_model = XGBClassifier().fit(X_train, y_train)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
base_model = LogisticRegression().fit(X_train, y_train)
base_predictions = base_model.predict(X_test)


pen=1000 #here is difference (large penalty) 
safe_transformer = SafeTransformer(model=surrogate_model, penalty=pen)

safe_transformer = safe_transformer.fit(X_train)

X_train_transformed = safe_transformer.transform(X_train)

X_train_transformed

How about adding a parameter that defines how to deal with the feature for which no changepoint was found?
One option would be removing variable when the dataset is transformed (it is already implemented), the second option would be fixing a changepoint on a median.
I think changepoint fixed on median should be default value because it would prevent situations when transformation returns an empty data frame.

Verbosity of fit and predict

How I can turn off messages in the fit and predict?
It will be good to have three levels: no messages, current messages, more detailed messages with information related to the extracted transformation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.