modeloriented / safe Goto Github PK
View Code? Open in Web Editor NEWSurrogate Assisted Feature Extraction
License: MIT License
Surrogate Assisted Feature Extraction
License: MIT License
How I can print out the structure of the safe_transformer
in a human readable way?
To know which variables were transformed and how
Dear Gosiewska,
I am currently using your SAFE code together with you SAFE ML paper as a part of my master thesis for my Econometrics master Quantitative Finance (in the Netherlands).
I love your approach and I am very curious for the results in my research. However, there is a small adjustment that I would like to make in the code. Namely, I want to use the SHAP dependence plot as an alternative for the partial dependence plot (inspired by the paper of Lundberg: Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888 (2018).). Lundberg states that the SHAP values are the only consistent feature attribution values, such that the SHAP dependence plot is a rich alternative to the partial dependence plot.
However, I am not specifically a master in programming, so I have failed so far to adjust this part of the code. Please let me know your thoughts on this adjustment and if you would like to implement this.
With kind regards,
Jeroen van den Boogaard
Hello!
The idea for such an approach is awesome! I'm currently testing your solution and I've got some issues regarded the performance. I'm trying to use SafeTransformer on a dataset ~60 wide and 11k long and the computations are quite long. Have you got any tips on how to make computations faster or are you planning to speed up this algorithm? Also, I'd love to see some speed tests if possible :)
Best regards,
Patryk
Hi, I am preparing examples of SAFE. It seems to me that summary()
for SafeTransformer doesn't work properly.
At the end of this jupyter notebook it seems that there are no variable levels for the variable Sex
.
X_train and y_train are undefined in the README.md in the line
surrogate_model = surrogate_model.fit(X_train, y_train)
Hey there. Great package!
Just wanted to point something out that I've seen across a few packages. The install was fine for me because I already have this install, but I think you should remove sklearn from this in setup.py:
install_requires=[
'numpy',
'ruptures',
'sklearn',
'pandas',
'scipy',
'kneed'
],
and replace it with "scikit-learn".
install_requires=[
'numpy',
'ruptures',
'scikit-learn',
'pandas',
'scipy',
'kneed'
],
It does install the wrong package and could confuse individuals who have not installed scikit learn before because it would not install the real package for them. https://pypi.org/project/sklearn/ mentions not to use that specification as an install.
SafeTransformer returns an empty data frame when no transformation is applied. Below is an example from https://github.com/ModelOriented/SAFE/blob/master/examples/SafeTransformerTests_Classification.ipynb
I've changed the penalty value from 1 to 1000. SafeTransformer does not discretize any variable and as a result, returns an empty data frame. It is because SafeTransformer does not return features which were not transformed.
from SafeTransformer import SafeTransformer
from sklearn.datasets import load_digits, load_iris, load_wine
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
data = load_iris()
import pandas as pd
X = pd.DataFrame(data.data)
y = pd.Series(data.target)
from sklearn.naive_bayes import GaussianNB
X_train, X_test, y_train, y_test = train_test_split(X, y)
from xgboost import XGBClassifier
surrogate_model = XGBClassifier().fit(X_train, y_train)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
base_model = LogisticRegression().fit(X_train, y_train)
base_predictions = base_model.predict(X_test)
pen=1000 #here is difference (large penalty)
safe_transformer = SafeTransformer(model=surrogate_model, penalty=pen)
safe_transformer = safe_transformer.fit(X_train)
X_train_transformed = safe_transformer.transform(X_train)
X_train_transformed
How about adding a parameter that defines how to deal with the feature for which no changepoint was found?
One option would be removing variable when the dataset is transformed (it is already implemented), the second option would be fixing a changepoint on a median.
I think changepoint fixed on median should be default value because it would prevent situations when transformation returns an empty data frame.
How I can turn off messages in the fit and predict?
It will be good to have three levels: no messages, current messages, more detailed messages with information related to the extracted transformation
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.