• Stars
    star
    394
  • Rank 106,891 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python implementation of the rulefit algorithm

! This package is no longer actively maintained. If you are interested in maintaining this package, please feel free to reach out to me via Github issue !

RuleFit

Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF)

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

You can use rulefit for predicting a numeric response (categorial not yet implemented). The input has to be a numpy matrix with only numeric values.

Installation

The latest version can be installed from the master branch using pip:

pip install git+https://github.com/christophM/rulefit.git

Another option is to clone the repository and install using python setup.py install or python setup.py develop.

Usage

Train your model:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

If you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

Predict

rf.predict(X)

Inspect rules:

rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

Notes

  • In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time
  • This implementation is in progress. If you find a bug, don't hesitate to contact me.

Changelog

All notable changes to this project will be documented here.

[v0.3] - IN PROGRESS

  • set default of exclude_zero_coef to False in get_rules():
  • syntax fix (Issue 21)

[v0.2] - 2017-11-24

  • Introduces classification for RuleFit
  • Adds scaling of variables (Friedscale)
  • Allows random size trees for creating rules

[v0.1] - 2016-06-18

  • Start changelog and versions

More Repositories

1

interpretable-ml-book

Book about interpretable machine learning
Jupyter Notebook
4,673
star
2

explain-ml

Python
33
star
3

LolCrawler

Crawls League of Legends matches
Python
23
star
4

christophm.github.io

My personal website
SCSS
17
star
5

modeling-mindsets

Book for modeling mindsets
HTML
13
star
6

overview-ctrees

Conditional trees
12
star
7

shap-book

Jupyter Notebook
10
star
8

iml-talk

Talk about interpretable machine learning
HTML
8
star
9

infinite-data-hallucinator

Script to generate datasets with GPT-3
Jupyter Notebook
8
star
10

iml-course

Course about interpretable machine learning
7
star
11

sandbox

Trying out new ideas
R
5
star
12

presentation-two-cultures

Presentation on "Statistical Modeling: The Two Cultures" from Leo Breiman
TeX
4
star
13

paper_conditional_subgroups

Code for the paper: Model-agnostic Feature Importance and Effects with Dependent Feature -- A Conditional Subgroup Approach
R
3
star
14

kaggle-yelp

My approach for the kaggle yelp competition http://www.kaggle.com/c/yelp-recruiting
Python
2
star
15

ipython-course-zurich

CSS
2
star
16

r-y-u-do-dis

Presentation: R, y u do dis
HTML
2
star
17

weihnachtsfeier-spiele

Spiele fΓΌr Statistiker Weihnachtsfeier
HTML
2
star
18

lime

Lime: Explaining the predictions of any machine learning classifier
JavaScript
2
star
19

GSoC-test

Test for GSoC 2012
R
1
star
20

shiny-drink-tracker

A drink tracker app with the R shiny package
R
1
star
21

bentoML

bentoML website and blog
CSS
1
star
22

lol

league of legends analysis stuff
Python
1
star
23

lol-champ-select

R
1
star
24

solarBot

Python
1
star
25

util

R utility function
R
1
star
26

emacs-config

my messy emacs config
Emacs Lisp
1
star