• Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast and modular sklearn replacement for generalized linear models
skglm logo

A fast ⚑ and modular βš’οΈ scikit-learn replacement for sparse GLMs

build License Downloads Downloads PyPI version

skglm is a Python package that offers fast estimators for sparse Generalized Linear Models (GLMs) that are 100% compatible with scikit-learn. It is highly flexible and supports a wide range of GLMs. You get to choose from skglm's already-made estimators or customize your own by combining the available datafits and penalties.

Excited to have a tour on skglm documentation?

Why skglm?

skglm is specifically conceived to solve sparse GLMs. It supports many missing models in scikit-learn and ensures high performance. There are several reasons to opt for skglm among which:

Speed Fast solvers able to tackle large datasets, either dense or sparse, with millions of features up to 100 times faster than scikit-learn
Modularity User-friendly API that enables composing custom estimators with any combination of its existing datafits and penalties
Extensibility Flexible design that makes it simple and easy to implement new datafits and penalties, a matter of few lines of code
Compatibility Estimators fully compatible with the scikit-learn API and drop-in replacements of its GLM estimators

Get started with skglm

Installing skglm

skglm is available on PyPi. Run the following command to get the latest version of the package

pip install -U skglm

It is also available on conda-forge and can be installed using, for instance:

conda install -c conda-forge skglm

First steps with skglm

Once you installed skglm, you can run the following code snippet to fit a MCP Regression model on a toy dataset

# import model to fit
from skglm.estimators import MCPRegression
# import util to create a toy dataset
from skglm.utils.data import make_correlated_data

# generate a toy dataset
X, y, _ = make_correlated_data(n_samples=10, n_features=100)

# init and fit estimator
estimator = MCPRegression()
estimator.fit(X, y)

# print RΒ²
print(estimator.score(X, y))

You can refer to the documentation to explore the list of skglm's already-made estimators.

Didn't find one that suits you? you can still compose your own. Here is a code snippet that fits a MCP-regularized problem with Huber loss.

# import datafit, penalty and GLM estimator
from skglm.datafits import Huber
from skglm.penalties import MCPenalty
from skglm.estimators import GeneralizedLinearEstimator

from skglm.utils.data import make_correlated_data
from skglm.solvers import AndersonCD

X, y, _ = make_correlated_data(n_samples=10, n_features=100)
# create and fit GLM estimator with Huber loss and MCP penalty
estimator = GeneralizedLinearEstimator(
    datafit=Huber(delta=1.),
    penalty=MCPenalty(alpha=1e-2, gamma=3),
    solver=AndersonCD()
)
estimator.fit(X, y)

You will find detailed description on the supported datafits and penalties and how to combine them in the API section of the documentation. You can also take our tutorial to learn how to create your own datafit and penalty.

Contribute to skglm

skglm is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

  • bug report: you may encounter a bug while using skglm. Don't hesitate to report it on the issue section.
  • feature request: you may want to extend/add new features to skglm. You can use the issue section to make suggestions.
  • pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and we will reach out to you asap.

Cite

skglm is the result of perseverant research. It is licensed under BSD 3-Clause. You are free to use it and if you do so, please cite

@inproceedings{skglm,
    title     = {Beyond L1: Faster and better sparse models with skglm},
    author    = {Q. Bertrand and Q. Klopfenstein and P.-A. Bannier and G. Gidel and M. Massias},
    booktitle = {NeurIPS},
    year      = {2022},
}

Useful links

More Repositories

1

imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Python
6,549
star
2

sklearn-pandas

Pandas integration with sklearn
Python
2,803
star
3

hdbscan

A high performance implementation of HDBSCAN clustering.
Jupyter Notebook
2,795
star
4

category_encoders

A library of sklearn compatible categorical variable encoders
Python
2,405
star
5

lightning

Large-scale linear classification, regression and ranking in Python
Python
1,716
star
6

boruta_py

Python implementations of the Boruta all-relevant feature selection method.
Python
1,474
star
7

metric-learn

Metric learning algorithms in Python
Python
1,346
star
8

MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
Jupyter Notebook
1,285
star
9

skope-rules

machine learning with logical rules in Python
Jupyter Notebook
541
star
10

DESlib

A Python library for dynamic classifier and ensemble selection
Python
479
star
11

py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
Python
444
star
12

scikit-learn-contrib

scikit-learn compatible projects
400
star
13

project-template

A template for scikit-learn extensions
Python
316
star
14

forest-confidence-interval

Confidence intervals for scikit-learn forest algorithms
HTML
282
star
15

polylearn

A library for factorization machines and polynomial networks for classification and regression in Python.
Python
245
star
16

stability-selection

scikit-learn compatible implementation of stability selection.
Python
195
star
17

scikit-learn-extra

scikit-learn contrib estimators
Python
155
star
18

qolmat

A scikit-learn-compatible module for comparing imputation methods.
Python
134
star
19

hiclass

A python library for hierarchical classification compatible with scikit-learn
Python
113
star
20

scikit-dimension

A Python package for intrinsic dimension estimation
Python
78
star
21

scikit-matter

A collection of scikit-learn compatible utilities that implement methods born out of the materials science and chemistry communities
Python
76
star
22

skdag

A more flexible alternative to scikit-learn Pipelines
Python
29
star
23

denmune-clustering-algorithm

DenMune a clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K (the number of nearest neighbors). The results show the superiority of DenMune. Enjoy the simplicty but the power of DenMune.
Jupyter Notebook
29
star
24

mimic

mimic calibration
Python
21
star
25

sklearn-ann

Integration with (approximate) nearest neighbors libraries for scikit-learn + clustering based on with kNN-graphs.
Python
14
star
26

scikit-learn-contrib.github.io

Project webpage
HTML
4
star