py-glm: Generalized Linear Models in Python
py-glm
is a library for fitting, inspecting, and evaluating Generalized Linear Models in python.
Installation
The py-glm
library can be installed directly from github.
pip install git+https://github.com/madrury/py-glm.git
Features
Model Fitting
py-glm
supports models from various exponential families:
from glm.glm import GLM
from glm.families import Gaussian, Bernoulli, Poisson, Exponential
linear_model = GLM(family=Gaussian())
logistic_model = GLM(family=Bernoulli())
poisson_model = GLM(family=Poisson())
exponential_model = GLM(family=Exponential())
Models with dispersion parameters are also supported. The dispersion parameters in these models are estimated using the deviance.
from glm.families import QuasiPoisson, Gamma
quasi_poisson_model = GLM(family=QuasiPoisson())
gamma_model = GLM(family=Gamma())
Fitting a model proceeds in sklearn
style, and uses the Fisher scoring algorithm:
logistic_model.fit(X, y_logistic)
If your data resides in a pandas.DataFrame
, you can pass this to fit
along with a model formula.
logistic_model.fit(X, formula="y ~ Moshi + SwimSwim")
Offsets and sample weights are supported when fitting:
linear_model.fit(X, y_linear, sample_weights=sample_weights)
poisson_nmodel.fit(X, y_poisson, offset=np.log(expos))
Predictions are also made in sklearn style:
logistic_model.predict(X)
Note: There is one major place we deviate from the sklearn interface. The predict
method on a GLM
object always returns an estimate of the conditional expectation E[y | X]
. This is in contrast to sklearn behavior for classification models, where it returns a class assignment. We make this choice so that the py-glm
library is consistent with its use of predict
. If the user would like class assignments from a model, they will need to threshold the probability returned by predict
manually.
Inference
Once the model is fit, parameter estimates, parameter covariance estimates, and p-values from a standard z-test are available:
logistic_model.coef_
logistic_model.coef_covariance_matrix_
logistic_model.coef_standard_error_
logistic_model.p_values_
To get a quick summary, use the summary
method:
logistic_model.summary()
Binomial GLM Model Summary.
===============================================
Name Parameter Estimate Standard Error
-----------------------------------------------
Intercept 1.02 0.01
Moshi -2.00 0.02
SwimSwim 1.00 0.02
Re-sampling methods are also supported in the simulation
subpackage: the
parametric and non-parametric bootstraps:
from glm.simulation import Simulation
sim = Simulation(logistic_model)
sim.parametric_bootstrap(X, n_sim=1000)
sim.non_parametric_bootstrap(X, n_sim=1000)
Regularization
Ridge regression is supported for each model (note, the regularization parameter is called alpha
instead of lambda
due to lambda
being a reserved word in python):
logistic_model.fit(X, y_logistic, alpha=1.0)
References
- Marlene Müller (2004). Generalized Linear Models.
Warning
The glmnet
code included in glm.glmnet
is experimental. Please use at your own risk.