• Stars
    star
    587
  • Rank 76,145 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 7 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tuning hyperparams fast with Hyperband

hyperband

Code for tuning hyperparams with Hyperband, adapted from Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.

defs/ - functions and search space definitions for various classifiers
defs_regression/ - the same for regression models
common_defs.py - imports and definitions shared by defs files
hyperband.py - from hyperband import Hyperband

load_data.py - classification defs import data from this file
load_data_regression.py - regression defs import data from this file

main.py - a complete example for classification
main_regression.py - the same, for regression
main_simple.py - a simple, bare-bones, example	

The goal is to provide a fully functional implementation of Hyperband, as well as a number of ready to use functions for a number of models (classifiers and regressors). Currently these include four from scikit-learn and four others:

  • gradient boosting (GB)
  • random forest (RF)
  • extremely randomized trees (XT)
  • linear SGD
  • factorization machines from polylearn
  • polynomial networks from polylearn
  • a multilayer perceptron from Keras
  • gradient boosting from XGBoost (classification only)

Meta-classifier/regressor

Use defs.meta/defs_regression.meta to try many models in one Hyperband run. This is an automatic alternative to constructing search spaces with multiple models (like defs.rf_xt, or defs.polylearn_fm_pn) by hand.

Loading data

Definitions files in defs/defs_regression import data from load_data.py and load_data_regression.py, respectively.

Edit these files, or a definitions file directly, to make your data available for tuning.

Regression defs use the kin8nm dataset in data/kin8nm. There is no attached data for classification.

For the provided models data format follows scikit-learn conventions, that is, there are x_train, y_train, x_test and y_test Numpy arrays.

Usage

Run main.py (with your own data), or main_regression.py. The essence of it is

from hyperband import Hyperband
from defs.gb import get_params, try_params

hb = Hyperband( get_params, try_params )
results = hb.run()

Here's a sample output from a run (three configurations tested) using defs.xt:

3 | Tue Feb 28 15:39:54 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': 'balanced',
'criterion': 'entropy',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 5,
'min_samples_split': 6}

# training | log loss: 62.21%, AUC: 75.25%, accuracy: 67.20%
# testing  | log loss: 62.64%, AUC: 74.81%, accuracy: 66.78%

7 seconds.

4 | Tue Feb 28 15:40:01 2017 | best so far: 0.5777 (run 2)

n_estimators: 5
{'bootstrap': False,
'class_weight': None,
'criterion': 'gini',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 1,
'min_samples_split': 2}

# training | log loss: 53.39%, AUC: 75.69%, accuracy: 72.37%
# testing  | log loss: 53.96%, AUC: 75.29%, accuracy: 71.89%

7 seconds.

5 | Tue Feb 28 15:40:07 2017 | best so far: 0.5396 (run 4)

n_estimators: 5
{'bootstrap': True,
'class_weight': None,
'criterion': 'gini',
'max_depth': 3,
'max_features': None,
'min_samples_leaf': 7,
'min_samples_split': 8}

# training | log loss: 50.20%, AUC: 77.04%, accuracy: 75.39%
# testing  | log loss: 50.67%, AUC: 76.77%, accuracy: 75.12%

8 seconds.

Early stopping

Some models may use early stopping (as the Keras MLP example does). If a configuration stopped early, it doesn't make sense to run it with more iterations (duh). To indicate this, make try_params()

return { 'loss': loss, 'early_stop': True }

This way, Hyperband will know not to select that configuration for any further runs.

Moar

See http://fastml.com/tuning-hyperparams-fast-with-hyperband/ for a detailed description.

More Repositories

1

goodbooks-10k

Ten thousand books, six million ratings
Jupyter Notebook
788
star
2

phraug

A set of simple Python scripts for pre-processing large files
Python
271
star
3

phraug2

A new version of phraug, which is a set of simple Python scripts for pre-processing large files
Python
206
star
4

numer.ai

Validation and prediction code for numer.ai
Python
150
star
5

kaggle-blackbox

Deep learning made easy
MATLAB
115
star
6

classifying-text

Classifying text with bag-of-words
Python
114
star
7

adversarial-validation

Creating a better validation set when test examples differ from training examples
Python
100
star
8

evaluating-recommenders

Compute and plot NDCG for a recommender system
Python
95
star
9

time-series-classification

Classifying time series using feature extraction
Python
86
star
10

classifier-calibration

Reliability diagrams, Platt's scaling, isotonic regression
Python
71
star
11

kaggle-advertised-salaries

Predicting job salaries from ads - a Kaggle competition
Python
55
star
12

the-secret-of-the-big-guys

k-means + a linear model = good results
Python
55
star
13

pointer-networks-experiments

Sorting numbers with pointer networks
Python
55
star
14

kaggle-cats-and-dogs

Classifying images with OverFeat
Python
46
star
15

kaggle-stackoverflow

Predicting closed questions on Stack Overflow
Python
46
star
16

gaussrank

Preparing continuous features for neural networks with GaussRank
Python
45
star
17

kaggle-happiness

Predicting happiness from demographics and poll answers
Python
45
star
18

kaggle-cifar

Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet
Python
44
star
19

sofia-ml-mod

sofia-kmeans with sparse RBF cluster mapping
C++
42
star
20

pylearn2-practice

Pylearn2 in practice
Python
41
star
21

kaggle-burn-cpu

Code for the "Burn CPU, burn" competition at Kaggle. Uses Extreme Learning Machines and hyperopt.
Python
33
star
22

kaggle-amazon

Amazon access control challenge
Python
25
star
23

pybrain-practice

A regression example for PyBrain
Python
25
star
24

wine-quality

Predicting wine quality
R
25
star
25

dimensionality-reduction-for-sparse-binary-data

convert a lot of zeros and ones to fewer real numbers
Python
23
star
26

cubert

How to make those 3D data visualizations
JavaScript
22
star
27

kaggle-gender

A Kaggle competition: discriminate gender based on handwriting
Python
21
star
28

msda-denoising

Using a very fast denoising autoencoder
MATLAB
17
star
29

kaggle-solar

Code for Solar Energy Prediction Contest at Kaggle
Python
17
star
30

nonlinear-vowpal-wabbit

How to use automatic polynomial features and neural network mode in VW
Python
17
star
31

metric-learning-for-regression

Applying metric learning to kin8nm
MATLAB
16
star
32

kaggle-avito

Code for the Avito competition
Python
16
star
33

kaggle-rossmann

Predicting sales with Pandas
Python
15
star
34

spearmint

tuning hyperparams automatically with spearmint
R
15
star
35

kaggle-accelerometer

Code for Accelerometer Biometric Competition at Kaggle
Python
15
star
36

large-scale-linear-learners

VW, Liblinear and StreamSVM compared on webspam
Python
14
star
37

r-libsvm-format-read-write

R code for reading and writing files in libsvm format
R
14
star
38

stardose

A recommender system for GitHub repositories
Python
13
star
39

running-external-programs-from-python

Python
11
star
40

feature-selection

Selecting features for classification with MRMR
R
11
star
41

kaggle-merck

Merck challenge at Kaggle
Python
10
star
42

kaggle-stumbleupon

bag of words + sparsenn
Python
10
star
43

project-rhubarb

predicting mortality in England using air quality data
Python
9
star
44

kaggle-bestbuy_big

Code for the Best Buy competition at Kaggle
Python
8
star
45

kaggle-digits

Some code for the Digits competition at Kaggle, incl. pylearn2's maxout
MATLAB
8
star
46

misc

misc
Jupyter Notebook
7
star
47

kaggle-poker-hands

Code for the Poker Rule Induction competition
Python
7
star
48

kaggle-bestbuy_small

Python
6
star
49

AlpacaGPT

How to train your own ChatGPT, Alpaca style
Python
3
star
50

kaggle-jobs

Some auxiliary code for Kaggle job recommendation challenge
Python
2
star