• Stars
    star
    2,358
  • Rank 19,515 (Top 0.4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automatic architecture search and hyperparameter optimization for PyTorch

Auto-PyTorch

Copyright (C) 2021 AutoML Groups Freiburg and Hannover

While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed Auto-PyTorch, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).

Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting). The newest features in Auto-PyTorch for tabular data are described in the paper "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL" (see below for bibtex ref). Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper "Efficient Automated Deep Learning for Time Series Forecasting" (also see below for bibtex ref).

Also, find the documentation here.

From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility. In case you would like to use the old API, you can find it at master_old.

Workflow

The rough description of the workflow of Auto-Pytorch is drawn in the following figure.

AutoPyTorch Workflow

In the figure, Data is provided by user and Portfolio is a set of configurations of neural networks that work well on diverse datasets. The current version only supports the greedy portfolio as described in the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL This portfolio is used to warm-start the optimization of SMAC. In other words, we evaluate the portfolio on a provided data as initial configurations. Then API starts the following procedures:

  1. Validate input data: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
  2. Create dataset: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
  3. Evaluate baselines
    • Tabular dataset *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from sklearn.dummy that represents the worst possible performance.
    • Time Series Forecasting dataset : Train a dummy predictor that repeats the last observed value in each series
  4. Search by SMAC:
    a. Determine budget and cut-off rules by Hyperband
    b. Sample a pipeline hyperparameter configuration *2 by SMAC
    c. Update the observations by obtained results
    d. Repeat a. -- c. until the budget runs out
  5. Build the best ensemble for the provided dataset from the observations and model selection of the ensemble.

*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset

*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and (which specifies the choice of components in each step and their corresponding hyperparameters.

Installation

PyPI Installation

pip install autoPyTorch

Auto-PyTorch for Time Series Forecasting requires additional dependencies

pip install autoPyTorch[forecasting]

Manual Installation

We recommend using Anaconda for developing as follows:

# Following commands assume the user is in a cloned directory of Auto-Pytorch

# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive

# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install

Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

Examples

In a nutshell:

from autoPyTorch.api.tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

# initialise Auto-PyTorch api
api = TabularClassificationTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    optimize_metric='accuracy',
    total_walltime_limit=300,
    func_eval_time_limit_secs=50
)

# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)

For Time Series Forecasting Tasks

from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]

start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test, 
    optimize_metric='mean_MAPE_forecasting',
    n_prediction_steps=forecasting_horizon,
    memory_limit=16 * 1024,  # Currently, forecasting models use much more memories
    freq=freq,
    start_times=start_times,
    func_eval_time_limit_secs=50,
    total_walltime_limit=60,
    min_num_test_instances=1000,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features=known_future_features,
)

# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()

# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)

For more examples including customising the search space, parellising the code, etc, checkout the examples folder

$ cd examples/

Code for the paper is available under examples/ensemble in the TPAMI.2021.3067763 branch.

Contributing

If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch

$ git checkout development

License

This program is free software: you can redistribute it and/or modify it under the terms of the Apache license 2.0 (please see the LICENSE file).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the Apache license 2.0 along with this program (see LICENSE file).

Reference

Please refer to the branch TPAMI.2021.3067763 to reproduce the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL.

  @article{zimmer-tpami21a,
  author = {Lucas Zimmer and Marius Lindauer and Frank Hutter},
  title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2021},
  note = {also available under https://arxiv.org/abs/2006.13799},
  pages = {3079 - 3090}
}
@incollection{mendoza-automlbook18a,
  author    = {Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter},
  title     = {Towards Automatically-Tuned Deep Neural Networks},
  year      = {2018},
  month     = dec,
  editor    = {Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin},
  booktitle = {AutoML: Methods, Sytems, Challenges},
  publisher = {Springer},
  chapter   = {7},
  pages     = {141--156}
}
@article{deng-ecml22,
  author    = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer},
  title     = {Efficient Automated Deep Learning for Time Series Forecasting},
  year      = {2022},
  booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track
               - European Conference, {ECML} {PKDD} 2022},
  url       = {https://doi.org/10.48550/arXiv.2205.05511},
}

Contact

Auto-PyTorch is developed by the AutoML Groups of the University of Freiburg and Hannover.

More Repositories

1

auto-sklearn

Automated Machine Learning with scikit-learn
Python
7,574
star
2

TabPFN

Official implementation of the TabPFN paper (https://arxiv.org/abs/2207.01848) and the tabpfn package.
Python
1,187
star
3

SMAC3

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Python
1,067
star
4

HpBandSter

a distributed Hyperband implementation on Steroids
Python
609
star
5

NASLib

NASLib is a Neural Architecture Search (NAS) library for facilitating NAS research for the community by providing interfaces to several state-of-the-art NAS search spaces and optimizers.
Python
521
star
6

RoBO

RoBO: a Robust Bayesian Optimization framework
Python
480
star
7

autoweka

Auto-WEKA
Java
330
star
8

awesome-transformer-search

A curated list of awesome resources combining Transformers with Neural Architecture Search
254
star
9

ConfigSpace

Domain specific language for configuration spaces in Python. Useful for hyperparameter optimization and algorithm configuration.
Python
202
star
10

TransformersCanDoBayesianInference

Official Implementation of "Transformers Can Do Bayesian Inference", the PFN paper
Python
183
star
11

HPOlib

HPOlib is a hyperparameter optimization library. It provides a common interface to three state of the art hyperparameter optimization packages: SMAC, spearmint and hyperopt. This package is discontinued, please read the longer note in the info box below.
Python
167
star
12

RobustDARTS

Understanding and Robustifying DARTS
Python
153
star
13

trivialaugment

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.
Python
147
star
14

HPOBench

Collection of hyperparameter optimization benchmark problems
Python
137
star
15

pybnn

Bayesian neural network package
Jupyter Notebook
135
star
16

CARL

Benchmarking RL generalization in an interpretable way.
Python
129
star
17

CAAFE

Semi-automatic feature engineering process using Language Models and your dataset descriptions. Based on the paper "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" by Hollmann, MΓΌller, and Hutter (2023).
Python
124
star
18

nas_benchmarks

Python
91
star
19

ParameterImportance

Parameter Importance Analysis Tool
Python
75
star
20

nasbench301

Python
74
star
21

DEHB

Python
71
star
22

DeepCAVE

An interactive framework to visualize and analyze your AutoML process in real-time.
Python
70
star
23

HPOlib1.5

Python
69
star
24

nasbench-1shot1

Python
67
star
25

BOAH

BOAH: Bayesian Optimization & Analysis of Hyperparameters
Python
67
star
26

amltk

A build-it-yourself AutoML Framework
Python
62
star
27

labwatch

An extension to Sacred for automated hyperparameter optimization.
Python
59
star
28

learna

End-to-end RNA Design using deep reinforcement learning
Python
57
star
29

neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
Python
54
star
30

CAVE

[deprecated] Configuration Assessment, Visualization and Evaluation
Python
46
star
31

PFNs

Our maintained PFN repository. Come here to train SOTA PFNs.
Python
44
star
32

zero-shot-automl-with-pretrained-models

Official repository for the paper "Zero-Shot AutoML with Pretrained Models"
Python
41
star
33

random_forest_run

C++
35
star
34

AutoFolio

Automated Algorithm Selection with Hyperparameter Optimization
Python
35
star
35

SEARL

Sample-Efficient Automated Deep Reinforcement Learning
Python
34
star
36

nes

Neural Ensemble Search for Uncertainty Estimation and Dataset Shift
Python
31
star
37

LCBench

A learning curve benchmark on OpenML data
Jupyter Notebook
29
star
38

DACBench

A benchmark library for Dynamic Algorithm Configuration.
PDDL
28
star
39

mdp-playground

A python package to design and debug RL agents.
Python
28
star
40

RNAformer

Scalable Deep Learning for RNA Secondary Structure Prediction
Python
27
star
41

auto-sklearn-talks

Presentations on Auto-sklearn
Jupyter Notebook
24
star
42

multi-obj-baselines

Python
22
star
43

PFNs4BO

The official implementation of PFNs4BO: In-Context Learning for Bayesian Optimization
Jupyter Notebook
22
star
44

CVPR24-MedSAM-on-Laptop

Data-aware Fine-Tuning (DAFT) Code related to the CPVR24 Competition for Image Segmentation on a Laptop.
Python
22
star
45

tabpfn-client

Python
21
star
46

learning_environments

Python
20
star
47

pynisher

Python
20
star
48

DE-NAS

Jupyter Notebook
19
star
49

DAC

Dynamic Algorithm Configuration
Jupyter Notebook
19
star
50

nas-bench-x11

Python
18
star
51

ProbTransformer

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
Python
17
star
52

Mighty

The Mighty cRL library you've been looking for! πŸ’ͺ
Python
17
star
53

TempoRL

Python
15
star
54

HPO_for_RL

This is the code of reproducing the results of our paper: On the importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Python
15
star
55

hierarchical_nas_construction

Official repository for "Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars" (NeurIPS 2023)
Python
14
star
56

transfer-hpo-framework

Code accompanying https://arxiv.org/abs/1802.02219
Python
14
star
57

jahs_bench_201

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search.
Python
14
star
58

Squirrel-Optimizer-BBO-NeurIPS20-automlorg

Python
13
star
59

SVGe

Smooth Variational Graph Embeddings for Efficient Neural Architecture Search
Python
13
star
60

is_mamba_capable_of_icl

Jupyter Notebook
12
star
61

ChaLearn_Automatic_Machine_Learning_Challenge_2015

Python
11
star
62

GenericWrapper4AC

C++
11
star
63

HW-GPT-Bench

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
Python
11
star
64

QTT

A framework for QuickTune
Jupyter Notebook
11
star
65

ifBO

In-context Bayesian Optimization
Python
11
star
66

arlbench

HPO and Architecture Benchmarking for RL: Dynamically, Reactive and Efficient
Python
10
star
67

lcpfn

Jupyter Notebook
10
star
68

ASKL2.0_experiments

Jupyter Notebook
9
star
69

mf-prior-bench

A collection of multi-fidelity benchmarks with first class support for user priors
Python
8
star
70

LTO-CMA

Code for the paper "Learning Step-Size Adaptation in CMA-ES"
Python
8
star
71

EfficientNAS

Python
8
star
72

HPOlibConfigSpace

Python
8
star
73

paramsklearn

Python
8
star
74

multibeep

A Multi Armed Bandit library written in C++ with Python bindings
C
8
star
75

MODNAS

Official Repo for "Multi-objective Differentiable Neural Architecture Search"
Python
8
star
76

TabularTempoRL

Code for the paper "Towards TempoRL: Learning When to Act"
Python
8
star
77

CARP-S

A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks.
Jupyter Notebook
8
star
78

HPOBenchExperimentUtils

Experiment code to run large-scale experimente with HPOBench
Python
7
star
79

hypersweeper

Hydra sweeper integration of our favorite optimization packages, utilizing ask-and-tell interfaces.
Python
6
star
80

HPOlib-hpconvnet

A wrapper for James Bergstras hyperopt convnet
Python
5
star
81

SPaCE

Jupyter Notebook
5
star
82

dac4automlcomp

DAC4AutoML Competition
HTML
5
star
83

IMFAS

Implicit Multi-Fidelity Algorithm Selection
Python
5
star
84

automl_common

This repository holds shared utilities that AutoML frameworks may benefit from.
Python
5
star
85

ParameterConfigSpace

parameter configuration space parser for SMAC format
Python
5
star
86

masif

MASIF: Meta-learned Algorithm Selection using Implicit Fidelity Information
Python
5
star
87

DAC4SGD

Python
5
star
88

automl_template

A template that provides all the tools to ensure the same project setup across all AutoML packages.
Python
5
star
89

AutoRL-Landscape

Python
4
star
90

SAWEI

Jupyter Notebook
4
star
91

hydra-smac-sweeper

Sweeper plugin based on SMAC for Hydra.
Python
4
star
92

DAC4RL

DAC4RL track of DAC4AutoML competition at AutoML Conf
Python
4
star
93

pi_is_back

Repo for "PI is back! Switching Acquisition Functions in Bayesian Optimization" (NeurIPS: Gaussian Process Workshop '22)
Python
4
star
94

BO-AFS

For BO: Select Acquisition Function (Schedule) with Meta-Learned Model Per-Run
Jupyter Notebook
4
star
95

HPOlib-AutoWEKA

Python
3
star
96

AutoDLComp19

AutoDL Competition Scripts 2019
Python
3
star
97

mf-prior-exp

Python
3
star
98

ICGen

Image Classification Dataset Generator
Python
3
star
99

AutomlCup2023

Code for the AutoMLCup 2023
Python
3
star
100

FOB

Python
3
star