Giter VIP home page Giter VIP logo

py-glm's Introduction

py-glm: Generalized Linear Models in Python

py-glm is a library for fitting, inspecting, and evaluating Generalized Linear Models in python.

Installation

The py-glm library can be installed directly from github.

pip install git+https://github.com/madrury/py-glm.git

Features

Model Fitting

py-glm supports models from various exponential families:

from glm.glm import GLM
from glm.families import Gaussian, Bernoulli, Poisson, Exponential

linear_model = GLM(family=Gaussian())
logistic_model = GLM(family=Bernoulli())
poisson_model = GLM(family=Poisson())
exponential_model = GLM(family=Exponential())

Models with dispersion parameters are also supported. The dispersion parameters in these models are estimated using the deviance.

from glm.families import QuasiPoisson, Gamma

quasi_poisson_model = GLM(family=QuasiPoisson())
gamma_model = GLM(family=Gamma())

Fitting a model proceeds in sklearn style, and uses the Fisher scoring algorithm:

logistic_model.fit(X, y_logistic)

If your data resides in a pandas.DataFrame, you can pass this to fit along with a model formula.

logistic_model.fit(X, formula="y ~ Moshi + SwimSwim")

Offsets and sample weights are supported when fitting:

linear_model.fit(X, y_linear, sample_weights=sample_weights)
poisson_nmodel.fit(X, y_poisson, offset=np.log(expos))

Predictions are also made in sklearn style:

logistic_model.predict(X)

Note: There is one major place we deviate from the sklearn interface. The predict method on a GLM object always returns an estimate of the conditional expectation E[y | X]. This is in contrast to sklearn behavior for classification models, where it returns a class assignment. We make this choice so that the py-glm library is consistent with its use of predict. If the user would like class assignments from a model, they will need to threshold the probability returned by predict manually.

Inference

Once the model is fit, parameter estimates, parameter covariance estimates, and p-values from a standard z-test are available:

logistic_model.coef_
logistic_model.coef_covariance_matrix_
logistic_model.coef_standard_error_
logistic_model.p_values_

To get a quick summary, use the summary method:

logistic_model.summary()

Binomial GLM Model Summary.
===============================================
Name         Parameter Estimate  Standard Error
-----------------------------------------------
Intercept                  1.02            0.01
Moshi                     -2.00            0.02
SwimSwim                   1.00            0.02

Re-sampling methods are also supported in the simulation subpackage: the parametric and non-parametric bootstraps:

from glm.simulation import Simulation

sim = Simulation(logistic_model)
sim.parametric_bootstrap(X, n_sim=1000)
sim.non_parametric_bootstrap(X, n_sim=1000)

Regularization

Ridge regression is supported for each model (note, the regularization parameter is called alpha instead of lambda due to lambda being a reserved word in python):

logistic_model.fit(X, y_logistic, alpha=1.0)

References

Warning

The glmnet code included in glm.glmnet is experimental. Please use at your own risk.

py-glm's People

Contributors

madcowd avatar madrury avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-glm's Issues

Add negative binomial family

I wouldn't normally feign the authority to make requests of you, but you asked me to do this, so here we go. Please consider adding the negative binomial family.

Thank you! This repo is badass.

Deviance Bernoulli distribution

Isn't there a mistake in the deviance of the Bernoulli distribution?

The deviance should be ... log(y_i/mu_i) ... log((1-y_i) / (1 - mu_i))

Import Error - from glm.glm import GLM

Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2878, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
from glm.glm import GLM
File "C:\Users\Ben.p2\pool\plugins\org.python.pydev.core_7.0.3.201811082356\pysrc_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Python27\lib\site-packages\py_glm-0.0.1-py2.7.egg\glm\glm.py", line 104
def fit(self, X, y=None, formula=None, *,

NaN coefficients

Hi,

I run many models using your library and in one run I got back "nan" coefficients.
I solved that by checking in the glm.py file if the returned values are nan

# line 219 py-glm/glm/glm.py
diff = coef - np.linalg.solve(ddbeta, dbeta)
if np.isnan(diff).any():
    break

I was wondering if it is a valid solution or we could use some different check based on tolerance?

I also checked the julia package which performs a finite check

What are your thoughts on that?
Could we integrate that change in the package?

failing comparison of elastic net with scikit-learn

I tried to compare pyglmnet.GLM with ElasticNet from scikit-learn and could not get it work. Code to reproduce:

import numpy as np
from sklearn.datasets.samples_generator import make_regression
from sklearn.linear_model import ElasticNet, GeneralizedLinearRegressor
from pyglmnet import GLM

def rmse(a, b):
    return np.sqrt(np.mean((a - b) ** 2))

X, Y, coef_ = make_regression(
    n_samples=1000, n_features=1000,
    noise=0.1, n_informative=10, coef=True,
    random_state=42)

alpha = 0.1
l1_ratio=0.5

sk = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, tol = 1e-5).fit(X, Y)
pg = GLM(distr='gaussian', alpha=l1_ratio, reg_lambda=alpha, solver='cdfast', tol = 1e-5).fit(X, Y)
print('in-sample rmse sklearn = {}, pyglmnet = {}'.format(rmse(Y, sk.predict(X)), rmse(Y, pg.predict(X))))

Result:
in-sample rmse sklearn = 12.756054997216442, pyglmnet = 161.03496460055504

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.