• Stars
    star
    444
  • Rank 98,300 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created almost 12 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines

py-earth Build Status

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines algorithm, in the style of scikit-learn. The py-earth package implements Multivariate Adaptive Regression Splines using Cython and provides an interface that is compatible with scikit-learn's Estimator, Predictor, Transformer, and Model interfaces. For more information about Multivariate Adaptive Regression Splines, see the references below.

Now With Missing Data Support!

The py-earth package now supports missingness in its predictors. Just set allow_missing=True when constructing an Earth object.

Requesting Feedback

If there are other features or improvements you'd like to see in py-earth, please send me an email or open or comment on an issue. In particular, please let me know if any of the following are important to you:

  1. Improved speed
  2. Exporting models to additional formats
  3. Support for shared memory multiprocessing during fitting
  4. Support for cyclic predictors (such as time of day)
  5. Better support for categorical predictors
  6. Better support for large data sets
  7. Iterative reweighting during fitting

Installation

Make sure you have numpy and scikit-learn installed. Then do the following:

git clone git://github.com/scikit-learn-contrib/py-earth.git
cd py-earth
sudo python setup.py install

Usage

import numpy
from pyearth import Earth
from matplotlib import pyplot
    
#Create some fake data
numpy.random.seed(0)
m = 1000
n = 10
X = 80*numpy.random.uniform(size=(m,n)) - 40
y = numpy.abs(X[:,6] - 4.0) + 1*numpy.random.normal(size=m)
    
#Fit an Earth model
model = Earth()
model.fit(X,y)
    
#Print the model
print(model.trace())
print(model.summary())
    
#Plot the model
y_hat = model.predict(X)
pyplot.figure()
pyplot.plot(X[:,6],y,'r.')
pyplot.plot(X[:,6],y_hat,'b.')
pyplot.xlabel('x_6')
pyplot.ylabel('y')
pyplot.title('Simple Earth Example')
pyplot.show()

Other Implementations

I am aware of the following implementations of Multivariate Adaptive Regression Splines:

  1. The R package earth (coded in C by Stephen Millborrow): http://cran.r-project.org/web/packages/earth/index.html
  2. The R package mda (coded in Fortran by Trevor Hastie and Robert Tibshirani): http://cran.r-project.org/web/packages/mda/index.html
  3. The Orange data mining library for Python (uses the C code from 1): http://orange.biolab.si/
  4. The xtal package (uses Fortran code written in 1991 by Jerome Friedman): http://www.ece.umn.edu/users/cherkass/ee4389/xtalpackage.html
  5. MARSplines by StatSoft: http://www.statsoft.com/textbook/multivariate-adaptive-regression-splines/
  6. MARS by Salford Systems (also uses Friedman's code): http://www.salford-systems.com/products/mars
  7. ARESLab (written in Matlab by Gints Jekabsons): http://www.cs.rtu.lv/jekabsons/regression.html

The R package earth was most useful to me in understanding the algorithm, particularly because of Stephen Milborrow's thorough and easy to read vignette (http://www.milbo.org/doc/earth-notes.pdf).

References

  1. Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics, 19(1), 1–67. http://www.jstor.org/stable/10.2307/2241837
  2. Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. (2012). earth: Multivariate Adaptive Regression Spline Models. R package version 3.2-3. http://CRAN.R-project.org/package=earth
  3. Friedman, J. (1993). Fast MARS. Stanford University Department of Statistics, Technical Report No 110. https://statistics.stanford.edu/sites/default/files/LCS%20110.pdf
  4. Friedman, J. (1991). Estimating functions of mixed ordinal and categorical variables using adaptive splines. Stanford University Department of Statistics, Technical Report No 108. http://media.salford-systems.com/library/MARS_V2_JHF_LCS-108.pdf
  5. Stewart, G.W. Matrix Algorithms, Volume 1: Basic Decompositions. (1998). Society for Industrial and Applied Mathematics.
  6. Bjorck, A. Numerical Methods for Least Squares Problems. (1996). Society for Industrial and Applied Mathematics.
  7. Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning (2nd Edition). (2009).
    Springer Series in Statistics
  8. Golub, G., & Van Loan, C. Matrix Computations (3rd Edition). (1996). Johns Hopkins University Press.

References 7, 2, 1, 3, and 4 contain discussions likely to be useful to users of py-earth. References 1, 2, 6, 5, 8, 3, and 4 were useful during the implementation process.

More Repositories

1

imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Python
6,549
star
2

sklearn-pandas

Pandas integration with sklearn
Python
2,803
star
3

hdbscan

A high performance implementation of HDBSCAN clustering.
Jupyter Notebook
2,795
star
4

category_encoders

A library of sklearn compatible categorical variable encoders
Python
2,405
star
5

lightning

Large-scale linear classification, regression and ranking in Python
Python
1,716
star
6

boruta_py

Python implementations of the Boruta all-relevant feature selection method.
Python
1,474
star
7

metric-learn

Metric learning algorithms in Python
Python
1,346
star
8

MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
Jupyter Notebook
1,285
star
9

skope-rules

machine learning with logical rules in Python
Jupyter Notebook
541
star
10

DESlib

A Python library for dynamic classifier and ensemble selection
Python
479
star
11

scikit-learn-contrib

scikit-learn compatible projects
400
star
12

project-template

A template for scikit-learn extensions
Python
316
star
13

forest-confidence-interval

Confidence intervals for scikit-learn forest algorithms
HTML
282
star
14

polylearn

A library for factorization machines and polynomial networks for classification and regression in Python.
Python
245
star
15

stability-selection

scikit-learn compatible implementation of stability selection.
Python
195
star
16

skglm

Fast and modular sklearn replacement for generalized linear models
Python
157
star
17

scikit-learn-extra

scikit-learn contrib estimators
Python
155
star
18

qolmat

A scikit-learn-compatible module for comparing imputation methods.
Python
134
star
19

hiclass

A python library for hierarchical classification compatible with scikit-learn
Python
113
star
20

scikit-dimension

A Python package for intrinsic dimension estimation
Python
78
star
21

scikit-matter

A collection of scikit-learn compatible utilities that implement methods born out of the materials science and chemistry communities
Python
76
star
22

skdag

A more flexible alternative to scikit-learn Pipelines
Python
29
star
23

denmune-clustering-algorithm

DenMune a clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K (the number of nearest neighbors). The results show the superiority of DenMune. Enjoy the simplicty but the power of DenMune.
Jupyter Notebook
29
star
24

mimic

mimic calibration
Python
21
star
25

sklearn-ann

Integration with (approximate) nearest neighbors libraries for scikit-learn + clustering based on with kNN-graphs.
Python
14
star
26

scikit-learn-contrib.github.io

Project webpage
HTML
4
star