Giter VIP home page Giter VIP logo

risk-slim's People

Contributors

halleewong avatar jnirschl avatar llja0112 avatar nathandxn avatar ryanhammonds avatar ustunb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

risk-slim's Issues

Validation metrics when doing CV

Hi Berk, I am wondering if there is any way to get the performance metrics of the model when using the cv_indices flag (I am doing a 10 fold-cv). I am mainly interested in accuracy, balanced_acc, precision, recall, AUROC and AUPR of the folds.

I have written an independent script myself and get the overall performance metrics, but since I am using the same data for getting the coefficients vector and for the validation, the model is overfitting. I have noticed that the model outputs one coefficient vector for all of the folds, instead of one for each iteration.

Since you mention in your paper that you are doing nested CV, was wondering if you were saving the metrics for each fold somewhere.

Thanks for the help and congrats for the great library!

Tests on the breastcancer_data.csv

Hey,

I have a small doubt regarding the output of the second code (ex_02_advanced_options.py) . After running it, we get a table containing risk scores for the classes selected by the optimizer. In case of the breast cancer dataset, the values under each feature (columns of the CSV) are not binary in nature. In this case, how do we proceed ? cause just multiplying the class weights with values might lead to very high final scores which might result in false probability values for P(Y = 1 | x) (almost 99% for all samples)

Here is a truncated output generated by the code

Pr(Y = +1) = 1/(1 + exp(-17 - score))
+-----------------------------------------+------------------+-----------+
| ClumpThickness                          |         1 points |   + ..... |
| MarginalAdhesion                        |         1 points |   + ..... |
| BareNuclei                              |         1 points |   + ..... |
| BlandChromatin                          |         1 points |   + ..... |
| Mitoses                                 |         1 points |   + ..... |
+-----------------------------------------+------------------+-----------+
| ADD POINTS FROM ROWS 1 to 5             |            SCORE |   = ..... |
+-----------------------------------------+------------------+-----------+

How do I interpret this in case of non binary data ?

Also, for the same code, the ex_02...py file, you haven't defined what P is. Referring to the ex_03_constraints.py file, I guess this line should be included in the code just after the data has been imported

N, P = data['X'].shape

Nested 5-fold Cross-validation and performance metric?

Hi Berk,

I am trying to run the RiskSLIM model for my own dataset and am trying to do a nested k-fold cross-validation (same as what you've done in the paper). However, I wasn't able to find the code where nested cross-validation is being handled. The only thing I found related is the fold_csv_file in the utils.py, but to me it seems like it is just regular k-fold cross-validation instead of nested validation. I am also thinking of doing the cross-validation based on different performance metrics (accuracy, AUC, etc.) and am not sure where could I change those things.

Thanks!

open-source solver

Any updates on the open source solver development? Been two years... Perhaps this is abandonware now?

error running example files

Hi,
Thanks for sharing your code. I tried to run the ex_01_quickstart.py file but it gave me the following error.
capture

Do you know of any fix for this? I have tried to dig into the code but found nothing that could work.
Thanks.

Output table from riskSLIM

Hi there,

I am currently implementing riskSLIM with some additive stumps (e.g., age >=0, age>=10, age>=20) as input features and got something like this for the output table

+----------------------------------------------+------------------+
| Pr(Y = +1) = 1.0/(1.0 + exp(-(-5 + score)) | | |
| ================================================|
| Gender Male | 5 points | + ….. |
| Age >= 0 | 4 points | + ..... |
| Days >= 10 | 1 points | + ..... |
| ================================================|
| ADD POINTS FROM ROWS 1 to 3 | SCORE | = ..... |
+----------------------------------------------+------------------+

where conditions like Age >= 0 are always true for any observation (and hence redundant). I wonder if you have any idea why this is happening as the regularization term in the objective function should take care of it? For example, the table below would reach the same accuracy/AUC but would have a lower objective value (because of the regularization term). I tried different values of c (i.e. weight for regularization in the objective function) but the issue still remains.

+----------------------------------------------+------------------+
| Pr(Y = +1) = 1.0/(1.0 + exp(-(-1 + score)) | | |
| ================================================|
| Gender Male | 5 points | + ….. |
| Days >= 10 | 1 points | + ..... |
| ================================================|
| ADD POINTS FROM ROWS 1 to 3 | SCORE | = ..... |
+----------------------------------------------+------------------+

Thank you very much for your time. I really appreciate your help!

To-Do

  • __repr__ and __str__ for RiskSLIMClassifier
  • template file don't overwrite
  • update tests
  • mushroom examples
  • finalize docsite

Operational constraints not working

Hi there,

I'm currently applying riskSLIM to a problem of mine and am trying to implement operational constraints in the form of 'At least one of A or B must be selected'. However, I've noticed that riskSLIM seems to only accommodate constraints of the type 'At most one of A or B can be selected'. To address this, I attempted to add the 'at least one' constraint with the following three versions of code:

`cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [1.0]*len(constraint))],
senses = "E",
rhs = [1.0]
)

cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [1.0]*len(constraint))],
senses = "G",
rhs = [1.0]
)

cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [-1.0]*len(constraint))],
senses = "L",
rhs = [-1.0]
)`

but none worked. Although these constraints are successfully added to the CPLEX model (I printed out the entire model to confirm), the resulting scoring tables consistently violate these specific constraints. I've spent a quite a few time examining the code, yet I'm unable to determine why this isn't working. Is this a limitation of the model itself, or is there something I'm overlooking?

Any helps would greatly appreciated. Thanks!

'm.dll' does not exist on Windows 10 platforms

When trying to install risk-slim from source, I discovered that it will throw an ImportError because of m.dll. Changing this to msvcrt.dll in setup.py will cause it to work on Windows; I do not know if this error still occurs on other platforms.

Error to pickle.load(results_files)

Hello,
Thanks for your sharing code. I successfully run "bash batch/job_template.sh" as instructed and got the "breastcancer_fold_0_results.p" file, but failed to run "results = pickle.load(infile)" and it gave me the following error.
type_error

I wonder how to deal with this error, looking forward for your reply! Thanks!

Trying to understand output

I'm trying to understand the output I am getting from risk_slim.
In the results dictionary there is a key called solution which has an array that is the same length as my input data set.

What's confusing me is that shouldn't my solution have 1 fewer column than my inputs, given that the first column of the inputs is label I am trying to train on?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.