ustunb / risk-slim Goto Github PK

View Code? Open in Web Editor NEW

130.0 130.0 33.0 15.26 MB

simple customizable risk scores in python

License: BSD 3-Clause "New" or "Revised" License

Python 93.32% Shell 1.55% Cython 5.13%

risk-slim's People

Contributors

Stargazers

Watchers

risk-slim's Issues

Could you please be more specific in how to use this framework in Python.

Maybe one more usage demo can make better sense.

Validation metrics when doing CV

Hi Berk, I am wondering if there is any way to get the performance metrics of the model when using the cv_indices flag (I am doing a 10 fold-cv). I am mainly interested in accuracy, balanced_acc, precision, recall, AUROC and AUPR of the folds.

I have written an independent script myself and get the overall performance metrics, but since I am using the same data for getting the coefficients vector and for the validation, the model is overfitting. I have noticed that the model outputs one coefficient vector for all of the folds, instead of one for each iteration.

Since you mention in your paper that you are doing nested CV, was wondering if you were saving the metrics for each fold somewhere.

Thanks for the help and congrats for the great library!

question with the preprocessed dataset

Hi,

I have some question about the adult dataset.
https://raw.githubusercontent.com/ustunb/risk-slim/master/examples/data/adult_data.csv

For the first row of data (except the header row), we see that the second and fourth features are 1.
The second and fourth features corresponds to Age_leq_21 and Age_30_to_44.
How is it possible that a person is both below 21 and between 30 and 44?
Is there a mistake in the header row?

Thanks.

Tests on the breastcancer_data.csv

Hey,

I have a small doubt regarding the output of the second code (ex_02_advanced_options.py) . After running it, we get a table containing risk scores for the classes selected by the optimizer. In case of the breast cancer dataset, the values under each feature (columns of the CSV) are not binary in nature. In this case, how do we proceed ? cause just multiplying the class weights with values might lead to very high final scores which might result in false probability values for P(Y = 1 | x) (almost 99% for all samples)

Here is a truncated output generated by the code

Pr(Y = +1) = 1/(1 + exp(-17 - score))
+-----------------------------------------+------------------+-----------+
| ClumpThickness                          |         1 points |   + ..... |
| MarginalAdhesion                        |         1 points |   + ..... |
| BareNuclei                              |         1 points |   + ..... |
| BlandChromatin                          |         1 points |   + ..... |
| Mitoses                                 |         1 points |   + ..... |
+-----------------------------------------+------------------+-----------+
| ADD POINTS FROM ROWS 1 to 5             |            SCORE |   = ..... |
+-----------------------------------------+------------------+-----------+

How do I interpret this in case of non binary data ?

Also, for the same code, the ex_02...py file, you haven't defined what P is. Referring to the ex_03_constraints.py file, I guess this line should be included in the code just after the data has been imported

N, P = data['X'].shape

Nested 5-fold Cross-validation and performance metric?

Hi Berk,

I am trying to run the RiskSLIM model for my own dataset and am trying to do a nested k-fold cross-validation (same as what you've done in the paper). However, I wasn't able to find the code where nested cross-validation is being handled. The only thing I found related is the fold_csv_file in the utils.py, but to me it seems like it is just regular k-fold cross-validation instead of nested validation. I am also thinking of doing the cross-validation based on different performance metrics (accuracy, AUC, etc.) and am not sure where could I change those things.

Thanks!

How to set Accuracy constraints

I want to set FPR <=20% to get highest TPR value, how to set this constraint.

how to use something similar without CPLEX on windows ?

open-source solver

Any updates on the open source solver development? Been two years... Perhaps this is abandonware now?

KeyError: 'y' in train_risk_slim.py

There is an issue with the y = data['y'] definition

error running example files

Hi,
Thanks for sharing your code. I tried to run the ex_01_quickstart.py file but it gave me the following error.

Do you know of any fix for this? I have tried to dig into the code but found nothing that could work.
Thanks.

Output table from riskSLIM

Hi there,

I am currently implementing riskSLIM with some additive stumps (e.g., age >=0, age>=10, age>=20) as input features and got something like this for the output table

+----------------------------------------------+------------------+
| Pr(Y = +1) = 1.0/(1.0 + exp(-(-5 + score)) | | |
| ================================================|
| Gender Male | 5 points | + ….. |
| Age >= 0 | 4 points | + ..... |
| Days >= 10 | 1 points | + ..... |
| ================================================|
| ADD POINTS FROM ROWS 1 to 3 | SCORE | = ..... |
+----------------------------------------------+------------------+

where conditions like Age >= 0 are always true for any observation (and hence redundant). I wonder if you have any idea why this is happening as the regularization term in the objective function should take care of it? For example, the table below would reach the same accuracy/AUC but would have a lower objective value (because of the regularization term). I tried different values of c (i.e. weight for regularization in the objective function) but the issue still remains.

+----------------------------------------------+------------------+
| Pr(Y = +1) = 1.0/(1.0 + exp(-(-1 + score)) | | |
| ================================================|
| Gender Male | 5 points | + ….. |
| Days >= 10 | 1 points | + ..... |
| ================================================|
| ADD POINTS FROM ROWS 1 to 3 | SCORE | = ..... |
+----------------------------------------------+------------------+

Thank you very much for your time. I really appreciate your help!

To-Do

Operational constraints not working

Hi there,

I'm currently applying riskSLIM to a problem of mine and am trying to implement operational constraints in the form of 'At least one of A or B must be selected'. However, I've noticed that riskSLIM seems to only accommodate constraints of the type 'At most one of A or B can be selected'. To address this, I attempted to add the 'at least one' constraint with the following three versions of code:

`cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [1.0]*len(constraint))],
senses = "E",
rhs = [1.0]
)

cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [1.0]*len(constraint))],
senses = "G",
rhs = [1.0]
)

cons.add(
lin_expr = [SparsePair(ind = get_alpha_ind(constraint), val = [-1.0]*len(constraint))],
senses = "L",
rhs = [-1.0]
)`

but none worked. Although these constraints are successfully added to the CPLEX model (I printed out the entire model to confirm), the resulting scoring tables consistently violate these specific constraints. I've spent a quite a few time examining the code, yet I'm unable to determine why this isn't working. Is this a limitation of the model itself, or is there something I'm overlooking?

Any helps would greatly appreciated. Thanks!

'm.dll' does not exist on Windows 10 platforms

When trying to install risk-slim from source, I discovered that it will throw an ImportError because of m.dll. Changing this to msvcrt.dll in setup.py will cause it to work on Windows; I do not know if this error still occurs on other platforms.

ustunb / risk-slim Goto Github PK

risk-slim's People

Contributors

Stargazers

Watchers

Forkers

risk-slim's Issues

Recommend Projects

Recommend Topics

Recommend Org