• Stars
    star
    124
  • Rank 287,130 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created about 7 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generalized Linear Models in Sklearn Style

py-glm: Generalized Linear Models in Python

py-glm is a library for fitting, inspecting, and evaluating Generalized Linear Models in python.

Installation

The py-glm library can be installed directly from github.

pip install git+https://github.com/madrury/py-glm.git

Features

Model Fitting

py-glm supports models from various exponential families:

from glm.glm import GLM
from glm.families import Gaussian, Bernoulli, Poisson, Exponential

linear_model = GLM(family=Gaussian())
logistic_model = GLM(family=Bernoulli())
poisson_model = GLM(family=Poisson())
exponential_model = GLM(family=Exponential())

Models with dispersion parameters are also supported. The dispersion parameters in these models are estimated using the deviance.

from glm.families import QuasiPoisson, Gamma

quasi_poisson_model = GLM(family=QuasiPoisson())
gamma_model = GLM(family=Gamma())

Fitting a model proceeds in sklearn style, and uses the Fisher scoring algorithm:

logistic_model.fit(X, y_logistic)

If your data resides in a pandas.DataFrame, you can pass this to fit along with a model formula.

logistic_model.fit(X, formula="y ~ Moshi + SwimSwim")

Offsets and sample weights are supported when fitting:

linear_model.fit(X, y_linear, sample_weights=sample_weights)
poisson_nmodel.fit(X, y_poisson, offset=np.log(expos))

Predictions are also made in sklearn style:

logistic_model.predict(X)

Note: There is one major place we deviate from the sklearn interface. The predict method on a GLM object always returns an estimate of the conditional expectation E[y | X]. This is in contrast to sklearn behavior for classification models, where it returns a class assignment. We make this choice so that the py-glm library is consistent with its use of predict. If the user would like class assignments from a model, they will need to threshold the probability returned by predict manually.

Inference

Once the model is fit, parameter estimates, parameter covariance estimates, and p-values from a standard z-test are available:

logistic_model.coef_
logistic_model.coef_covariance_matrix_
logistic_model.coef_standard_error_
logistic_model.p_values_

To get a quick summary, use the summary method:

logistic_model.summary()

Binomial GLM Model Summary.
===============================================
Name         Parameter Estimate  Standard Error
-----------------------------------------------
Intercept                  1.02            0.01
Moshi                     -2.00            0.02
SwimSwim                   1.00            0.02

Re-sampling methods are also supported in the simulation subpackage: the parametric and non-parametric bootstraps:

from glm.simulation import Simulation

sim = Simulation(logistic_model)
sim.parametric_bootstrap(X, n_sim=1000)
sim.non_parametric_bootstrap(X, n_sim=1000)

Regularization

Ridge regression is supported for each model (note, the regularization parameter is called alpha instead of lambda due to lambda being a reserved word in python):

logistic_model.fit(X, y_logistic, alpha=1.0)

References

Warning

The glmnet code included in glm.glmnet is experimental. Please use at your own risk.

More Repositories

1

basis-expansions

Basis expansion transformers in sklearn style.
Python
81
star
2

linalg

A linear algebra library in C. For fun.
C
41
star
3

mtg-draftbot

Algorithmic Drafting for Magic the Gathering
Jupyter Notebook
26
star
4

smoothers

Visualizations of various one-dimensional smoothers using the d3 javascript library.
JavaScript
17
star
5

boosting-presentation

Presentation on Gradient Boosting
Jupyter Notebook
13
star
6

poisson-boosting

Demo of Gradient Boosted Poisson Regression
Jupyter Notebook
9
star
7

roguelike

A Basic Roguelike Game
Python
9
star
8

csvtools

Fast tools for working with delimited files.
Shell
6
star
9

xgboost-monotonic-testing

Some testing of xgboost's monotonic constraints.
Jupyter Notebook
6
star
10

regression-tools

Miscellaneous tools for specifying and introspecting regression models in python.
Python
5
star
11

np2latex

Write latex markup given a numpy array.
Python
5
star
12

Thesis

The lonely PhD thesis I never defended.
TeX
4
star
13

diff-priv-presentation

Presentation of Differential Privacy
3
star
14

pycmath

Math functions written as a python C extension module.
C
3
star
15

madrury.github.io

JavaScript
3
star
16

ridge-regression-and-noisy-predictors

A proof that adding noise to regression predictors results in ridge regression.
Jupyter Notebook
2
star
17

fp-in-scala

Exercises from Functional Programming in Scala
Scala
2
star
18

generalized-eigenvalues

A short example of solving the generalized eigenvalue problem.
Jupyter Notebook
1
star
19

reinforcement-learning

Code from reinforcement learning reading group.
Jupyter Notebook
1
star
20

into-to-multilevel-models

A Gentle Introduction to Multilevel Models
HTML
1
star
21

multi-armed-bandits

Simulations of Multi Armed Bandits in Cython
Python
1
star
22

rusty-rogue

Learning Rust with the Roguelike Tutorial
Rust
1
star
23

regression-helpers

Helper Functions for Regression Using Sklearn
Python
1
star
24

table-tools

Pure python api for manipulating table data structures.
Python
1
star
25

hacker-rank

Python hacker rank problems.
Python
1
star
26

Cookies-Autoencoder

Autoencoder fit to cookie recipe data set from reddit.
Jupyter Notebook
1
star
27

advent-of-code-2019

It's cold outside, let's do some programming.
Python
1
star
28

qr-algorithm

Code for a demo of eigen-solving algorithms.
Jupyter Notebook
1
star