• Stars
    star
    2,271
  • Rank 19,602 (Top 0.4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automatic architecture search and hyperparameter optimization for PyTorch

Auto-PyTorch

Copyright (C) 2021 AutoML Groups Freiburg and Hannover

While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed Auto-PyTorch, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).

Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting). The newest features in Auto-PyTorch for tabular data are described in the paper "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL" (see below for bibtex ref). Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper "Efficient Automated Deep Learning for Time Series Forecasting" (also see below for bibtex ref).

Also, find the documentation here.

From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility. In case you would like to use the old API, you can find it at master_old.

Workflow

The rough description of the workflow of Auto-Pytorch is drawn in the following figure.

AutoPyTorch Workflow

In the figure, Data is provided by user and Portfolio is a set of configurations of neural networks that work well on diverse datasets. The current version only supports the greedy portfolio as described in the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL This portfolio is used to warm-start the optimization of SMAC. In other words, we evaluate the portfolio on a provided data as initial configurations. Then API starts the following procedures:

  1. Validate input data: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
  2. Create dataset: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
  3. Evaluate baselines
    • Tabular dataset *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from sklearn.dummy that represents the worst possible performance.
    • Time Series Forecasting dataset : Train a dummy predictor that repeats the last observed value in each series
  4. Search by SMAC:
    a. Determine budget and cut-off rules by Hyperband
    b. Sample a pipeline hyperparameter configuration *2 by SMAC
    c. Update the observations by obtained results
    d. Repeat a. -- c. until the budget runs out
  5. Build the best ensemble for the provided dataset from the observations and model selection of the ensemble.

*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset

*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and (which specifies the choice of components in each step and their corresponding hyperparameters.

Installation

PyPI Installation

pip install autoPyTorch

Auto-PyTorch for Time Series Forecasting requires additional dependencies

pip install autoPyTorch[forecasting]

Manual Installation

We recommend using Anaconda for developing as follows:

# Following commands assume the user is in a cloned directory of Auto-Pytorch

# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive

# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install

Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

Examples

In a nutshell:

from autoPyTorch.api.tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

# initialise Auto-PyTorch api
api = TabularClassificationTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    optimize_metric='accuracy',
    total_walltime_limit=300,
    func_eval_time_limit_secs=50
)

# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)

For Time Series Forecasting Tasks

from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]

start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test, 
    optimize_metric='mean_MAPE_forecasting',
    n_prediction_steps=forecasting_horizon,
    memory_limit=16 * 1024,  # Currently, forecasting models use much more memories
    freq=freq,
    start_times=start_times,
    func_eval_time_limit_secs=50,
    total_walltime_limit=60,
    min_num_test_instances=1000,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features=known_future_features,
)

# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()

# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)

For more examples including customising the search space, parellising the code, etc, checkout the examples folder

$ cd examples/

Code for the paper is available under examples/ensemble in the TPAMI.2021.3067763 branch.

Contributing

If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch

$ git checkout development

License

This program is free software: you can redistribute it and/or modify it under the terms of the Apache license 2.0 (please see the LICENSE file).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the Apache license 2.0 along with this program (see LICENSE file).

Reference

Please refer to the branch TPAMI.2021.3067763 to reproduce the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL.

  @article{zimmer-tpami21a,
  author = {Lucas Zimmer and Marius Lindauer and Frank Hutter},
  title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2021},
  note = {also available under https://arxiv.org/abs/2006.13799},
  pages = {3079 - 3090}
}
@incollection{mendoza-automlbook18a,
  author    = {Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter},
  title     = {Towards Automatically-Tuned Deep Neural Networks},
  year      = {2018},
  month     = dec,
  editor    = {Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin},
  booktitle = {AutoML: Methods, Sytems, Challenges},
  publisher = {Springer},
  chapter   = {7},
  pages     = {141--156}
}
@article{deng-ecml22,
  author    = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer},
  title     = {Efficient Automated Deep Learning for Time Series Forecasting},
  year      = {2022},
  booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track
               - European Conference, {ECML} {PKDD} 2022},
  url       = {https://doi.org/10.48550/arXiv.2205.05511},
}

Contact

Auto-PyTorch is developed by the AutoML Groups of the University of Freiburg and Hannover.

More Repositories

1

auto-sklearn

Automated Machine Learning with scikit-learn
Python
7,389
star
2

TabPFN

Official implementation of the TabPFN paper (https://arxiv.org/abs/2207.01848) and the tabpfn package.
Python
1,079
star
3

SMAC3

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Python
1,003
star
4

HpBandSter

a distributed Hyperband implementation on Steroids
Python
603
star
5

NASLib

NASLib is a Neural Architecture Search (NAS) library for facilitating NAS research for the community by providing interfaces to several state-of-the-art NAS search spaces and optimizers.
Python
497
star
6

RoBO

RoBO: a Robust Bayesian Optimization framework
Python
479
star
7

autoweka

Auto-WEKA
Java
326
star
8

awesome-transformer-search

A curated list of awesome resources combining Transformers with Neural Architecture Search
252
star
9

ConfigSpace

Domain specific language for configuration spaces in Python/Cython. Useful for hyperparameter optimization and algorithm configuration.
Python
186
star
10

HPOlib

HPOlib is a hyperparameter optimization library. It provides a common interface to three state of the art hyperparameter optimization packages: SMAC, spearmint and hyperopt. This package is discontinued, please read the longer note in the info box below.
Python
167
star
11

TransformersCanDoBayesianInference

Official Implementation of "Transformers Can Do Bayesian Inference", the PFN paper
Python
162
star
12

RobustDARTS

Understanding and Robustifying DARTS
Python
153
star
13

pybnn

Bayesian neural network package
Jupyter Notebook
131
star
14

HPOBench

Collection of hyperparameter optimization benchmark problems
Python
125
star
15

CARL

Benchmarking RL generalization in an interpretable way.
Python
120
star
16

CAAFE

Semi-automatic feature engineering process using Language Models and your dataset descriptions. Based on the paper "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" by Hollmann, MΓΌller, and Hutter (2023).
Python
100
star
17

nas_benchmarks

Python
91
star
18

ParameterImportance

Parameter Importance Analysis Tool
Python
75
star
19

trivialaugment

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.
Python
73
star
20

nasbench301

Python
71
star
21

HPOlib1.5

Python
70
star
22

nasbench-1shot1

Python
68
star
23

BOAH

BOAH: Bayesian Optimization & Analysis of Hyperparameters
Python
68
star
24

DEHB

Python
65
star
25

DeepCAVE

An interactive framework to visualize and analyze your AutoML process in real-time.
Python
63
star
26

labwatch

An extension to Sacred for automated hyperparameter optimization.
Python
60
star
27

learna

End-to-end RNA Design using deep reinforcement learning
Python
55
star
28

amltk

A build-it-yourself AutoML Framework
Python
53
star
29

CAVE

[deprecated] Configuration Assessment, Visualization and Evaluation
Python
45
star
30

zero-shot-automl-with-pretrained-models

Official repository for the paper "Zero-Shot AutoML with Pretrained Models"
Python
41
star
31

random_forest_run

C++
36
star
32

neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
Python
35
star
33

SEARL

Sample-Efficient Automated Deep Reinforcement Learning
Python
34
star
34

AutoFolio

Automated Algorithm Selection with Hyperparameter Optimization
Python
34
star
35

LCBench

A learning curve benchmark on OpenML data
Jupyter Notebook
29
star
36

nes

Neural Ensemble Search for Uncertainty Estimation and Dataset Shift
Python
29
star
37

DACBench

A benchmark library for Dynamic Algorithm Configuration.
PDDL
26
star
38

mdp-playground

A python package to design and debug RL agents.
Python
24
star
39

PFNs

Our maintained PFN repository. Come here to train SOTA PFNs.
Python
23
star
40

multi-obj-baselines

Python
22
star
41

auto-sklearn-talks

Presentations on Auto-sklearn
Jupyter Notebook
22
star
42

RNAformer

Scalable Deep Learning for RNA Secondary Structure Prediction
Python
20
star
43

learning_environments

Python
20
star
44

DAC

Dynamic Algorithm Configuration
Jupyter Notebook
20
star
45

DE-NAS

Jupyter Notebook
19
star
46

pynisher

Python
18
star
47

nas-bench-x11

Python
18
star
48

PFNs4BO

The official implementation of PFNs4BO: In-Context Learning for Bayesian Optimization
Jupyter Notebook
16
star
49

ProbTransformer

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
Python
16
star
50

TempoRL

Python
15
star
51

jahs_bench_201

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search.
Python
15
star
52

tabpfn-client

Python
14
star
53

HPO_for_RL

This is the code of reproducing the results of our paper: On the importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Python
14
star
54

Squirrel-Optimizer-BBO-NeurIPS20-automlorg

Python
13
star
55

hierarchical_nas_construction

Official repository for "Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars" (NeurIPS 2023)
Python
12
star
56

GenericWrapper4AC

C++
12
star
57

SVGe

Smooth Variational Graph Embeddings for Efficient Neural Architecture Search
Python
12
star
58

ChaLearn_Automatic_Machine_Learning_Challenge_2015

Python
11
star
59

transfer-hpo-framework

Code accompanying https://arxiv.org/abs/1802.02219
Python
11
star
60

ASKL2.0_experiments

Jupyter Notebook
9
star
61

EfficientNAS

Python
9
star
62

TabularTempoRL

Code for the paper "Towards TempoRL: Learning When to Act"
Python
8
star
63

LTO-CMA

Code for the paper "Learning Step-Size Adaptation in CMA-ES"
Python
8
star
64

HPOlibConfigSpace

Python
8
star
65

paramsklearn

Python
8
star
66

multibeep

A Multi Armed Bandit library written in C++ with Python bindings
C
8
star
67

HPOBenchExperimentUtils

Experiment code to run large-scale experimente with HPOBench
Python
7
star
68

lcpfn

Python
7
star
69

mf-prior-bench

A collection of multi-fidelity benchmarks with first class support for user priors
Python
6
star
70

HPOlib-hpconvnet

A wrapper for James Bergstras hyperopt convnet
Python
5
star
71

dac4automlcomp

DAC4AutoML Competition
HTML
5
star
72

automl_common

This repository holds shared utilities that AutoML frameworks may benefit from.
Python
5
star
73

IMFAS

Implicit Multi-Fidelity Algorithm Selection
Python
5
star
74

ParameterConfigSpace

parameter configuration space parser for SMAC format
Python
5
star
75

DAC4SGD

Python
5
star
76

automl_template

A template that provides all the tools to ensure the same project setup across all AutoML packages.
Python
5
star
77

AutoRL-Landscape

Python
4
star
78

SAWEI

Jupyter Notebook
4
star
79

SPaCE

Jupyter Notebook
4
star
80

masif

MASIF: Meta-learned Algorithm Selection using Implicit Fidelity Information
Python
4
star
81

hydra-smac-sweeper

Sweeper plugin based on SMAC for Hydra.
Python
4
star
82

DAC4RL

DAC4RL track of DAC4AutoML competition at AutoML Conf
Python
4
star
83

BO-AFS

For BO: Select Acquisition Function (Schedule) with Meta-Learned Model Per-Run
Jupyter Notebook
4
star
84

HPOlib-AutoWEKA

Python
3
star
85

AutoDLComp19

AutoDL Competition Scripts 2019
Python
3
star
86

mf-prior-exp

Python
3
star
87

ICGen

Image Classification Dataset Generator
Python
3
star
88

2022_JAIR_DAC_experiments

Python
2
star
89

HPOlib-hpnnet

Python
2
star
90

SPaCE_BIG

Code for the experiments in "Towards Self-Paced Context Evaluation for Contextual Reinforcement Learning"
Python
2
star
91

plotting_scripts

Python
2
star
92

DontWasteYourTime-early-stopping

Experiments for pipelines
Python
2
star
93

pi_is_back

Repo for "PI is back! Switching Acquisition Functions in Bayesian Optimization" (NeurIPS: Gaussian Process Workshop '22)
Python
2
star
94

automl_sphinx_theme

Write easy documentations with the AutoML sphinx theme. No sphinx knowledge necessary. See the documentation to get a preview:
Python
2
star
95

SAFS

Respository for Sparse Activation Function Search
Python
2
star
96

bibtex-cleaner

Python
2
star
97

AutomlCup2023

Code for the AutoMLCup 2023
Python
2
star
98

naslib-fall-school

Repository for the NASLib Hands-on Session at the AutoML Fall School 2022
2
star
99

hydra_tutorial

AutoML Fall School 23
Jupyter Notebook
2
star
100

autorl-org

The AutoRL.org site
Ruby
2
star