modeloriented / safe Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 9.0 1.22 MB

Surrogate Assisted Feature Extraction

License: MIT License

Python 100.00%

safe's People

Contributors

Stargazers

Watchers

Forkers

agosiewska minghao2016 bigdatamatta kant zeta1999 jimmy-inl pfraczek subodhk26 dkchoi0308

safe's Issues

Structure of the transformer

How I can print out the structure of the safe_transformer in a human readable way?
To know which variables were transformed and how

Alternative for partial dependence plot

Dear Gosiewska,

I am currently using your SAFE code together with you SAFE ML paper as a part of my master thesis for my Econometrics master Quantitative Finance (in the Netherlands).
I love your approach and I am very curious for the results in my research. However, there is a small adjustment that I would like to make in the code. Namely, I want to use the SHAP dependence plot as an alternative for the partial dependence plot (inspired by the paper of Lundberg: Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888 (2018).). Lundberg states that the SHAP values are the only consistent feature attribution values, such that the SHAP dependence plot is a rich alternative to the partial dependence plot.

However, I am not specifically a master in programming, so I have failed so far to adjust this part of the code. Please let me know your thoughts on this adjustment and if you would like to implement this.

With kind regards,

Jeroen van den Boogaard

Performance issues

Hello!

The idea for such an approach is awesome! I'm currently testing your solution and I've got some issues regarded the performance. I'm trying to use SafeTransformer on a dataset ~60 wide and 11k long and the computations are quite long. Have you got any tips on how to make computations faster or are you planning to speed up this algorithm? Also, I'd love to see some speed tests if possible :)

Best regards,
Patryk

Summary of the SafeTransformer doesn't display levels

Hi, I am preparing examples of SAFE. It seems to me that summary() for SafeTransformer doesn't work properly.

At the end of this jupyter notebook it seems that there are no variable levels for the variable Sex.

reproducible README

X_train and y_train are undefined in the README.md in the line

surrogate_model = surrogate_model.fit(X_train, y_train)

Changing sklearn install requirement

Hey there. Great package!

Just wanted to point something out that I've seen across a few packages. The install was fine for me because I already have this install, but I think you should remove sklearn from this in setup.py:

    install_requires=[
          'numpy',
          'ruptures',
          'sklearn',
          'pandas',
          'scipy',
          'kneed'
      ],

and replace it with "scikit-learn".

    install_requires=[
          'numpy',
          'ruptures',
          'scikit-learn',
          'pandas',
          'scipy',
          'kneed'
      ],

It does install the wrong package and could confuse individuals who have not installed scikit learn before because it would not install the real package for them. https://pypi.org/project/sklearn/ mentions not to use that specification as an install.

Problem when there's no changepoint.

SafeTransformer returns an empty data frame when no transformation is applied. Below is an example from https://github.com/ModelOriented/SAFE/blob/master/examples/SafeTransformerTests_Classification.ipynb

I've changed the penalty value from 1 to 1000. SafeTransformer does not discretize any variable and as a result, returns an empty data frame. It is because SafeTransformer does not return features which were not transformed.

from SafeTransformer import SafeTransformer
from sklearn.datasets import load_digits, load_iris, load_wine
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
data = load_iris()

import pandas as pd
X = pd.DataFrame(data.data)
y = pd.Series(data.target)

from sklearn.naive_bayes import GaussianNB
X_train, X_test, y_train, y_test = train_test_split(X, y)
from xgboost import XGBClassifier
surrogate_model = XGBClassifier().fit(X_train, y_train)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
base_model = LogisticRegression().fit(X_train, y_train)
base_predictions = base_model.predict(X_test)


pen=1000 #here is difference (large penalty) 
safe_transformer = SafeTransformer(model=surrogate_model, penalty=pen)

safe_transformer = safe_transformer.fit(X_train)

X_train_transformed = safe_transformer.transform(X_train)

X_train_transformed

How about adding a parameter that defines how to deal with the feature for which no changepoint was found?
One option would be removing variable when the dataset is transformed (it is already implemented), the second option would be fixing a changepoint on a median.
I think changepoint fixed on median should be default value because it would prevent situations when transformation returns an empty data frame.

Verbosity of fit and predict

How I can turn off messages in the fit and predict?
It will be good to have three levels: no messages, current messages, more detailed messages with information related to the extracted transformation

modeloriented / safe Goto Github PK

safe's People

Contributors

Stargazers

Watchers

Forkers

safe's Issues

Structure of the transformer

Alternative for partial dependence plot

Performance issues

Summary of the SafeTransformer doesn't display levels

reproducible README

Changing sklearn install requirement

Problem when there's no changepoint.

Verbosity of fit and predict

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent