Giter VIP home page Giter VIP logo

firthlogist's Introduction

firthlogist

PyPI PyPI - Downloads GitHub

A Python implementation of Logistic Regression with Firth's bias reduction.

Installation

pip install firthlogist

Usage

firthlogist is sklearn compatible and follows the sklearn API.

>>> from firthlogist import FirthLogisticRegression, load_sex2
>>> fl = FirthLogisticRegression()
>>> X, y, feature_names = load_sex2()
>>> fl.fit(X, y)
FirthLogisticRegression()
>>> fl.summary(xname=feature_names)
                 coef    std err     [0.025      0.975]      p-value
---------  ----------  ---------  ---------  ----------  -----------
age        -1.10598     0.42366   -1.97379   -0.307427   0.00611139
oc         -0.0688167   0.443793  -0.941436   0.789202   0.826365
vic         2.26887     0.548416   1.27304    3.43543    1.67219e-06
vicl       -2.11141     0.543082  -3.26086   -1.11774    1.23618e-05
vis        -0.788317    0.417368  -1.60809    0.0151846  0.0534899
dia         3.09601     1.67501    0.774568   8.03028    0.00484687
Intercept   0.120254    0.485542  -0.818559   1.07315    0.766584

Log-Likelihood: -132.5394
Newton-Raphson iterations: 8

Parameters

max_iter: int, default=25

 The maximum number of Newton-Raphson iterations.

max_halfstep: int, default=25

 The maximum number of step-halvings in one Newton-Raphson iteration.

max_stepsize: int, default=5

 The maximum step size - for each coefficient, the step size is forced to be less than max_stepsize.

pl_max_iter: int, default=100

 The maximum number of Newton-Raphson iterations for finding profile likelihood confidence intervals.

pl_max_halfstep: int, default=25

 The maximum number of step-halvings in one iteration for finding profile likelihood confidence intervals.

pl_max_stepsize: int, default=5

 The maximum step size while finding PL confidence intervals - for each coefficient, the step size is forced to be less than max_stepsize.

tol: float, default=0.0001

 Convergence tolerance for stopping.

fit_intercept: bool, default=True

 Specifies if intercept should be added.

skip_pvals: bool, default=False

 If True, p-values will not be calculated. Calculating the p-values can be expensive if wald=False since the fitting procedure is repeated for each coefficient.

skip_ci: bool, default=False

 If True, confidence intervals will not be calculated. Calculating the confidence intervals via profile likelihoood is time-consuming.

alpha: float, default=0.05

 Significance level (confidence interval = 1-alpha). 0.05 as default for 95% CI.

wald: bool, default=False

 If True, uses Wald method to calculate p-values and confidence intervals.

test_vars: Union[int, List[int]], default=None

 Index or list of indices of the variables for which to calculate confidence intervals and p-values. If None, calculate for all variables. This option has no effect if wald=True.

Attributes

bse_

 Standard errors of the coefficients.

classes_

 A list of the class labels.

ci_

  The fitted profile likelihood confidence intervals.

coef_

 The coefficients of the features.

intercept_

 Fitted intercept. If fit_intercept = False, the intercept is set to zero.

loglik_

 Fitted penalized log-likelihood.

n_iter_

 Number of Newton-Raphson iterations performed.

pvals_

 p-values calculated by penalized likelihood ratio tests.

References

Firth, D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38.

Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21: 2409-2419.

firthlogist's People

Contributors

jzluo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

firthlogist's Issues

add a result summary method

A summary method to present in a prettier format would be nice. I don't think it needs to be as detailed as the summary in statsmodels. This is what summary in logistf looks like:

                   coef  se(coef) lower 0.95  upper 0.95       Chisq            p method
(Intercept)  0.12025405 0.4763429 -0.8185591  1.07315122  0.06286298 8.020268e-01      2
age         -1.10598131 0.4149021 -1.9737884 -0.30742514  7.50773092 6.143472e-03      2
oc          -0.06881673 0.4344026 -0.9414363  0.78920202  0.02467044 8.751911e-01      2
vic          2.26887464 0.5384872  1.2730216  3.43543273 22.93139022 1.678877e-06      2
vicl        -2.11140817 0.5320395 -3.2608611 -1.11773495 19.10407252 1.237805e-05      2
vis         -0.78831694 0.4089620 -1.6080879  0.01518468  3.69740975 5.449701e-02      2
dia          3.09601166 1.5052197  0.7745682  8.03029352  7.89693139 4.951873e-03      2

Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None

Likelihood ratio test=49.09064 on 6 df, p=7.15089e-09, n=239
Wald test = 31.96835 on 6 df, p = 1.654713e-05

Should be called like:

firth = FirthLogisticRegression()
firth.fit(X, y)
firth.summary()

Maybe an optional var_names parameter in .summary() so that the feature names can be printed in the summary as well, otherwise x0, x1, etc.

Probably the most straightforward way is to just stick everything in a Pandas dataframe and print (and perhaps return?) that. Pretty printing with tabulate can be considered, maybe with tabulate as just an optional dependency?

Python 3.11 support

Hi @jzluo ! I tried to install it through pip but I am already on 3.11, since fedora and a few other distros are already on 3.11, what is missing to get 3.11 support? A fast scan over your code seems everything is ok at least syntax wise.

Implement FLIC and FLAC modifications

Paper for FLIC and FLAC: https://onlinelibrary.wiley.com/doi/10.1002/sim.7273

Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced
in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the
predicted probabilities. We propose two simple modifications of Firth’s logistic regression resulting in unbiased
predicted probabilities.

Interaction terms

Hi, I would like to test the significance of a variable with an interaction term as covariate:
trait ~ sex:age + variable_to_test
Can I do this with your library @jzluo?

Cannot fit data: Newton-Raphson does not converge

Hi @jzluo, I found a sample dataset which fails fitting.
Consider the example data I attached here:
sample.zip

#! unzip sample.zip
with open(r"X.pickle", "rb") as fd:
    X = pickle.load(fd)
with open(r"y.pickle", "rb") as fd:
    y = pickle.load(fd)

from firthlogist import FirthLogisticRegression
FirthLogisticRegression().fit(X, y)

This fails after the second round of optimization because all coefficients get NaN.

Detailed logs + stack trace:

/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:153: ConvergenceWarning: Firth logistic regression failed to converge. Try increasing max_iter.
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:322: RuntimeWarning: invalid value encountered in log
/opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/numpy/linalg/linalg.py:2154: RuntimeWarning: invalid value encountered in det

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Cell In[164], line 2
      1 from firthlogist import FirthLogisticRegression
----> 2 FirthLogisticRegression().fit(X, y)

File /opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:161, in FirthLogisticRegression.fit(self, X, y)
    159 if not self.skip_ci:
    160     if not self.wald:
--> 161         self.ci_ = _profile_likelihood_ci(
    162             X=X,
    163             y=y,
    164             fitted_coef=self.coef_,
    165             full_loglik=self.loglik_,
    166             max_iter=self.pl_max_iter,
    167             max_stepsize=self.pl_max_stepsize,
    168             max_halfstep=self.pl_max_halfstep,
    169             tol=self.tol,
    170             alpha=self.alpha,
    171             test_vars=self.test_vars,
    172         )
    173     else:
    174         self.ci_ = _wald_ci(self.coef_, self.bse_, self.alpha)

File /opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/firthlogist/firthlogist.py:440, in _profile_likelihood_ci(X, y, fitted_coef, full_loglik, max_iter, max_stepsize, max_halfstep, tol, alpha, test_vars)
    438 U_star = np.matmul(X.T, y - preds + np.multiply(hat, 0.5 - preds))
    439 # https://github.com/georgheinze/logistf/blob/master/src/logistf.c#L780-L781
--> 440 inv_fisher = np.linalg.pinv(fisher_info_mtx)
    441 tmp1x1 = U_star @ np.negative(inv_fisher) @ U_star
    442 underRoot = (
    443     -2
    444     * ((LL0 - loglike) + 0.5 * tmp1x1)
    445     / (inv_fisher[coef_idx, coef_idx])
    446 )

File <__array_function__ internals>:180, in pinv(*args, **kwargs)

File /opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/numpy/linalg/linalg.py:1998, in pinv(a, rcond, hermitian)
   1996     return wrap(res)
   1997 a = a.conjugate()
-> 1998 u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
   2000 # discard small singular values
   2001 cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)

File <__array_function__ internals>:180, in svd(*args, **kwargs)

File /opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/numpy/linalg/linalg.py:1657, in svd(a, full_matrices, compute_uv, hermitian)
   1654         gufunc = _umath_linalg.svd_n_s
   1656 signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1657 u, s, vh = gufunc(a, signature=signature, extobj=extobj)
   1658 u = u.astype(result_t, copy=False)
   1659 s = s.astype(_realType(result_t), copy=False)

File /opt/anaconda/envs/firthlogist/lib/python3.9/site-packages/numpy/linalg/linalg.py:98, in _raise_linalgerror_svd_nonconvergence(err, flag)
     97 def _raise_linalgerror_svd_nonconvergence(err, flag):
---> 98     raise LinAlgError("SVD did not converge")

LinAlgError: SVD did not converge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.