• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Group Lasso implementation following the scikit-learn API

Group Lasso

PyPI Downloads

image

image

image

image

CodeFactor

The group lasso1 regulariser is a well known method to achieve structured sparsity in machine learning and statistics. The idea is to create non-overlapping groups of covariates, and recover regression weights in which only a sparse set of these covariate groups have non-zero components.

There are several reasons for why this might be a good idea. Say for example that we have a set of sensors and each of these sensors generate five measurements. We don't want to maintain an unneccesary number of sensors. If we try normal LASSO regression, then we will get sparse components. However, these sparse components might not correspond to a sparse set of sensors, since they each generate five measurements. If we instead use group LASSO with measurements grouped by which sensor they were measured by, then we will get a sparse set of sensors.

An extension of the group lasso regulariser is the sparse group lasso regulariser2, which imposes both group-wise sparsity and coefficient-wise sparsity. This is done by combining the group lasso penalty with the traditional lasso penalty. In this library, I have implemented an efficient sparse group lasso solver being fully scikit-learn API compliant.

About this project

This project is developed by Yngve Mardal Moe and released under an MIT lisence.

Installation guide

Group-lasso requires Python 3.5+, numpy and scikit-learn. To install group-lasso via pip, simply run the command:

pip install group-lasso

Alternatively, you can manually pull this repository and run the setup.py file:

git clone https://github.com/yngvem/group-lasso.git
cd group-lasso
python setup.py

Documentation

You can read the full documentation on readthedocs.

Examples

There are several examples that show usage of the library here.

Further work

  1. Fully test with sparse arrays and make examples
  2. Make it easier to work with categorical data
  3. Poisson regression

Implementation details

The problem is solved using the FISTA optimiser3 with a gradient-based adaptive restarting scheme4. No line search is currently implemented, but I hope to look at that later.

Although fast, the FISTA optimiser does not achieve as low loss values as the significantly slower second order interior point methods. This might, at first glance, seem like a problem. However, it does recover the sparsity patterns of the data, which can be used to train a new model with the given subset of the features.

Also, even though the FISTA optimiser is not meant for stochastic optimisation, it has to my experience not suffered a large fall in performance when the mini batch was large enough. I have therefore implemented mini-batch optimisation using FISTA, and thus been able to fit models based on data with ~500 columns and 10 000 000 rows on my moderately priced laptop.

Finally, we note that since FISTA uses Nesterov acceleration, is not a descent algorithm. We can therefore not expect the loss to decrease monotonically.

References


  1. Yuan, M. and Lin, Y. (2006), Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68: 49-67. doi:10.1111/j.1467-9868.2005.00532.x↩

  2. Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231-245.↩

  3. Beck, A. and Teboulle, M. (2009), A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences 2009 2:1, 183-202. doi:10.1137/080716542↩

  4. O’Donoghue, B. & Candès, E. (2015), Adaptive Restart for Accelerated Gradient Schemes. Found Comput Math 15: 715. doi:10.1007/s10208-013-9150-↩

More Repositories

1

python-project-structure

A tutorial on how to manage a Python project
486
star
2

INF200-2019

Jupyter Notebook
29
star
3

INF200-2019-Exercises

Python
20
star
4

parallelising-python

A notebook that shows some ways to parallelise Python
Jupyter Notebook
19
star
5

PyDense2

A pythonic wrapper to the PyDenseCRF package.
Python
8
star
6

mask_stats

Summary statistics for comparison of binary masks
Python
5
star
7

ntnu-analysis

Python
4
star
8

scinets

Code used for my masters project about tumour segmentation using deep learning.
Python
4
star
9

CT_reconstruction

A toolbox for 2D CT reconstruction written as part of my undergraduate project at the University of Manchester
MATLAB
4
star
10

MutableTrap

Code I wrote to better understand Python functions and decorators.
Python
3
star
11

tex_tools

Python
3
star
12

bibdec

A decorator used to automatically generate citations from Python code
Python
2
star
13

MEK9250-Elastic-Locking

TeX
2
star
14

song-vec

Python
2
star
15

biosim_template

Python
2
star
16

brainsim

Python
2
star
17

EJNMMI-20

Python
2
star
18

PDImaging

A Python toolkit for variational and PDE based image processing
Python
2
star
19

Data-analysis-with-Python

Course material for a course I taught summer 2019
Jupyter Notebook
2
star
20

Intro_til_Python_soekemotor

Mandatory assignment for INF120 at NMBU
Python
1
star
21

medvis

Tools for visualising medical images and segmentation masks
Python
1
star
22

singularity-slurm-demo

Python
1
star
23

Coursework_MEK9250

Python
1
star
24

HDF5-with-python

Jupyter Notebook
1
star
25

cookiecutter-nmbu

Python
1
star
26

INF200-2019-ex06

Python
1
star
27

biot_solver

Python
1
star
28

Learning-Haskell

Haskell
1
star
29

subclass-register

Add all subclasses of given base class to a dictionary.
Python
1
star
30

fenicsdocs

HTML
1
star
31

exjour

Python
1
star
32

zebrafish_manual_track_gui

Python
1
star
33

git-learning-by-doing

Python
1
star
34

visutils

Python
1
star
35

Some-slides

1
star
36

turtle_tools

Utilities to simplify working with turtle.py
Python
1
star
37

cd-dynamic-versioning

Python
1
star
38

zebrafish-bloodflow

Jupyter Notebook
1
star
39

fenics-cookiecutter

A cookiecutter template for a FEniCS based Python project
Jupyter Notebook
1
star