Survey on End-To-End Machine Learning Automation In this repository, we present the references mentioned in a comprehensive survey for the state-of-the-art efforts in tackling the automation of Machine Learning AutoML, wether through fully automation to the role of data scientist or using some aiding tools that minimize the role of human in the loop. First, we focus on the Combined Algorithm Selection, and Hyperparameter Tuning (CASH) problem. In addition, we highlight the research work of automating the other steps of the full complex machine learning pipeline from data understanding till model deployment. Furthermore, we provide a comprehensive coverage for the various tools and frameworks that have been introduced in this domain.
Table of Contents & Organization:
This repository will be organized into 6 separate sections:
- Meta-Learning Techniques for AutoML search problem
- Neural Architecture Search Problem
- Hyper-Parameter Optimization
- AutoML Tools and Frameworks
- Pre-Modeling and Post-Modeling Aiding Tools
- AutoML Competitions
Meta-Learning Techniques for AutoML search problem:
Meta-learning can be described as the process of leaning from previous experience gained during applying various learning algorithms on different kinds of data, and hence reducing the needed time to learn new tasks.
- 2018 | Meta-Learning: A Survey. | Vanschoren | CoRR |
PDF
- 2008 | Metalearning: Applications to data mining | Brazdil et al. | Springer Science & Business Media |
PDF
Learning From Model Evaluation
-
Surrogate Models
- 2018 | Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. | Wistuba et al. | Journal of ML |
PDF
- 2018 | Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. | Wistuba et al. | Journal of ML |
-
Warm-Started Multi-task Learning
- 2017 | Multiple adaptive Bayesian linear regression for scalable Bayesian optimization with warm start. | Perrone et al. |
PDF
- 2017 | Multiple adaptive Bayesian linear regression for scalable Bayesian optimization with warm start. | Perrone et al. |
-
Relative Landmarks
- 2001 | An evaluation of landmarking variants. | Furnkranz and Petrak | ECML/PKDD |
PDF
- 2001 | An evaluation of landmarking variants. | Furnkranz and Petrak | ECML/PKDD |
Learning From Task Properties
-
Using Meta-Features
- 2019 | SmartML: A Meta Learning-Based Framework for Automated
Selection and Hyperparameter Tuning for Machine Learning Algorithms. | Maher and Sakr | EDBT |
PDF
- 2017 | On the predictive power of meta-features in OpenML. | Bilalli et al. | IJAMC |
PDF
- 2013 | Collaborative hyperparameter tuning. | Bardenet et al. | ICML |
PDF
- 2019 | SmartML: A Meta Learning-Based Framework for Automated
Selection and Hyperparameter Tuning for Machine Learning Algorithms. | Maher and Sakr | EDBT |
-
Using Meta-Models
- 2018 | Predicting hyperparameters from meta-features in binary classification problems. | Nisioti et al. | ICML |
PDF
- 2014 | Automatic classifier selection for non-experts. Pattern Analysis and Applications. | Reif et al. |
PDF
- 2012 | Imagenet classification with deep convolutional neural networks. | Krizhevsky et al. | NIPS |
PDF
- 2008 | Predicting the performance of learning algorithms using support vector machines as meta-regressors. | Guerra et al. | ICANN |
PDF
- 2008 | Metalearning-a tutorial. | Giraud-Carrier | ICMLA |
PDF
- 2004 | Metalearning: Applications to data mining. | Soares et al. | Springer Science & Business Media |
PDF
- 2004 | Selection of time series forecasting models based on performance information. | dos Santos et al. | HIS |
PDF
- 2003 | Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. | Brazdil et al. | Journal of ML |
PDF
- 2002 | Combination of task description strategies and case base properties for meta-learning. | Kopf and Iglezakis |
PDF
- 2018 | Predicting hyperparameters from meta-features in binary classification problems. | Nisioti et al. | ICML |
Learning From Prior Models
-
Transfer Learning
- 2014 | How transferable are features in deep neural networks? | Yosinski et al. | NIPS |
PDF
- 2014 | CNN features offthe-shelf: an astounding baseline for recognition. | Sharif Razavian et al. | IEEE CVPR |
PDF
- 2014 | Decaf: A deep convolutional activation feature for generic visual recognition. | Donahue et al. | ICML |
PDF
- 2012 | Imagenet classification with deep convolutional neural networks. | Krizhevsky et al. | NIPS |
PDF
- 2012 | Deep learning of representations for unsupervised and transfer learning. | Bengio | ICML |
PDF
- 2010 | A survey on transfer learning. | Pan and Yang | IEEE TKDE |
PDF
- 1995 | Learning many related tasks at the same time with backpropagation. | Caruana | NIPS |
PDF
- 1995 | Learning internal representations. | Baxter |
PDF
- 2014 | How transferable are features in deep neural networks? | Yosinski et al. | NIPS |
-
Few-Shot Learning
Neural Architecture Search Problem
Neural Architecture Search (NAS) is a fundamental step in automating the machine learning process and has been successfully used to design the model architecture for image and language tasks.
- 2018 | Progressive neural architecture search. | Liu et al. | ECCV |
PDF
- 2018 | Efficient architecture search by network transformation. | Cai et al. | AAAI |
PDF
- 2018 | Learning transferable architectures for scalable image recognition. | Zoph et al. | IEEE CVPR |
PDF
- 2017 | Hierarchical representations for efficient architecture search. | Liu et al. |
PDF
- 2016 | Neural architecture search with reinforcement learning. | Zoph and Le |
PDF
- 2009 | Learning deep architectures for AI. | Bengio et al. |
PDF
-
Random Search
-
Reinforcement Learning
-
Evolutionary Methods
- 2019 | Evolutionary Neural AutoML for Deep Learning. | Liang et al. |
PDF
- 2019 | Evolving deep neural networks. | Miikkulainen et al. |
PDF
- 2018 | a multi-objective genetic algorithm for neural architecture search. | Lu et al. |
PDF
- 2018 | Efficient multi-objective neural architecture search via lamarckian evolution. | Elsken et al. |
PDF
- 2018 | Regularized evolution for image classifier architecture search. | Real et al. |
PDF
- 2017 | Large-scale evolution of image classifiers | Real et al. | ICML |
PDF
- 2017 | Hierarchical representations for efficient architecture search. | Liu et al. |
PDF
- 2009 | A hypercube-based encoding for evolving large-scale neural networks. | Stanley et al. | Artificial Life |
PDF
- 2002 | Evolving neural networks through augmenting topologies. | Stanley and Miikkulainen | Evolutionary Computation |
PDF
- 2019 | Evolutionary Neural AutoML for Deep Learning. | Liang et al. |
-
Gradient Based Methods
-
Bayesian Optimization
- 2018 | Towards reproducible neural architecture and hyperparameter search. | Klein et al. |
PDF
- 2018 | Neural Architecture Search with Bayesian Optimisation and Optimal Transport | Kandasamy et al. | NIPS |
PDF
- 2016 | Towards automatically-tuned neural networks. | Mendoza et al. | PMLR |
PDF
- 2015 | Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. | Domhan et al. | IJCAI |
PDF
- 2014 | Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces. | Swersky et al. |
PDF
- 2013 | Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. | Bergstra et al. |
PDF
Github (Hyperopt)
- 2011 | Algorithms for hyper-parameter optimization. | Bergstra et al. | NIPS |
PDF
- 2018 | Towards reproducible neural architecture and hyperparameter search. | Klein et al. |
Hyper-Parameter Optimization
After choosing the model pipeline algorithm(s) with the highest potential for achieving the top performance on the input dataset, the next step is tuning the hyper-parameters of such model in order to further optimize the model performance. It is worth mentioning that some tools have democratized the space of different learning algorithms in discrete number of model pipelines. So, the model selection itself can be considered as a categorical parameter that needs to be tuned in the first place before modifying its hyper-parameters.
Black Box Optimization
-
Grid and Random Search
-
Bayesian Optimization
- 2018 | Bohb: Robust and efficient hyperparameter optimization at scale. | Falkner et al. | JMLR |
PDF
- 2017 | On the state of the art of evaluation in neural language models. | Melis et al. |
PDF
- 2015 | Automating model search for large scale machine learning. | Sparks et al. | ACM-SCC |
PDF
- 2015 | Scalable bayesian optimization using deep neural networks. | Snoek et al. | ICML |
PDF
- 2014 | Bayesopt: A bayesian optimization library for nonlinear optimization, experimental design and bandits. | Martinez-Cantin | JMLR |
PDF
- 2013 | Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. | Bergstra et al. |
PDF
- 2013 | Towards an empirical foundation for assessing bayesian optimization of hyperparameters. | Eggensperger et al. | NIPS |
PDF
- 2013 | Improving deep neural networks for LVCSR using rectified linear units and dropout. | Dahl et al. | IEEE-ICASSP |
PDF
- 2012 | Practical bayesian optimization of machine learning algorithms. | Snoek et al. | NIPS |
PDF
Github (Spearmint)
- 2011 | Sequential model-based optimization for general algorithm configuration. | Hutter et al. | LION |
PDF
Github
- 2011 | Algorithms for hyper-parameter optimization. | Bergstra et al. | NIPS |
PDF
- 1998 | Efficient global optimization of expensive black-box functions. | Jones et al. |
PDF
- 1978 | Adaptive control processes: a guided tour. | Mockus et al. |
PDF
- 1975 | Single-step Bayesian search method for an extremum of functions of a single variable. | Zhilinskas |
PDF
- 1964 | A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. | Kushner |
PDF
- 2018 | Bohb: Robust and efficient hyperparameter optimization at scale. | Falkner et al. | JMLR |
-
Simulated Annealing
- 1983 | Optimization by simulated annealing. | Kirkpatrick et al. | Science |
PDF
- 1983 | Optimization by simulated annealing. | Kirkpatrick et al. | Science |
-
Genetic Algorithms
- 1992 | Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. | Holland et al. |
PDF
- 1992 | Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. | Holland et al. |
Multi-Fidelity Optimization
- 2019 | Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning. | Wu et al. |
PDF
- 2019 | Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion. | Hu et al. |
PDF
- 2016 | Review of multi-fidelity models. | Fernandez-Godino |
PDF
- 2012 | Provably convergent multifidelity optimization algorithm not requiring high-fidelity derivatives. | March and Willcox | AIAA |
PDF
-
Modeling Learning Curve
- 2017 | Learning curve prediction with Bayesian neural networks. | Klein et al. | ICLR |
PDF
- 2015 | Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. | Domhan et al. | IJCAI |
PDF
- 1998 | Efficient global optimization of expensive black-box functions. | Jones et al. | JGO |
PDF
- 2017 | Learning curve prediction with Bayesian neural networks. | Klein et al. | ICLR |
-
Bandit Based
- 2018 | Massively parallel hyperparameter tuning. | Li et al. | AISTATS |
PDF
- 2016 | Non-stochastic Best Arm Identification and Hyperparameter Optimization. | Jamieson and Talwalkar | AISTATS |
PDF
- 2016 | Hyperband: A novel bandit-based approach to hyperparameter optimization. | Kirkpatrick et al. | JMLR |
PDF
Github
Github (Distributed Hyperband - BOHB)
- 2018 | Massively parallel hyperparameter tuning. | Li et al. | AISTATS |
AutoML Tools and Frameworks
-
Centralized Frameworks
Date | Language | Training Framework | Optimization Method | ML Tasks | Meta-Learning | UI | Open Source | |
---|---|---|---|---|---|---|---|---|
AutoWeka |
2013 | Java | Weka | Bayesian Optimization | Single-label classification regression | ร | โ | Github 'Tool' |
HyperOpt-Sklearn |
2014 | Python | Scikit-Learn | Bayesian Optimization, Simulated Annealing, and Random Search | Single-label classification regression | ร | ร | Github |
AutoSklearn |
2015 | Python | Scikit-Learn | Bayesian Optimization | Single-label classification regression | โ | ร | Github 'Tool' |
TPOT |
2016 | Python | Scikit-Learn | Genetic Algorithm | Single-label classification regression | ร | ร | Github |
Recipe |
2017 | Python | Scikit-Learn | Grammer-Based Genetic Algorithm | Single-label classification | โ | ร | Github |
Auto-Meka |
2018 | Java | Meka | Grammer-Based Genetic Algorithm | Multi-label classification | โ | ร | Github |
ML-Plan |
2018 | Java | Weka / Scikit-Learn | Hierarchical Task Planning | Single-label classification | ร | ร | Github |
AutoStacker |
2018 | - | - | Genetic Algorithm | Single-label classification | ร | ร | ร |
PMF |
2018 | Python | Scikit-Learn | Collaborative Filtering and Bayesian Optimization | Single-label classification | โ | ร | Github |
AlphaD3M |
2018 | - | - | Reinforcement Learning | Single-label classification regression | โ | ร | ร |
SmartML |
2019 | R | Different R Packages | Bayesian Optimization | Single-label classification | โ | โ | Github |
VDS |
2019 | - | - | Cost-Based Multi-Armed Bandits and Bayesian Optimization | Single-label classification, regression, image classification, audio classification, graph matching | โ | โ | ร |
OBOE |
2019 | Python | Scikit-Learn | Collaborative Filtering | Single-label classification | โ | ร | Github |
Auptimizer |
2019 | Random, Grid, Hyperband, Hyperopt, Spearmint | Single-label classification | x | ร | Github |
||
iSmartML |
2019 | Python | Scikit-Learn | Bayesian Optimization | Single-label classification regression | โ | โ | Github 'Tool' |
-
Distributed Frameworks
Date | Language | Training Framework | Optimization Method | Meta-Learning | UI | Open Source | ||
---|---|---|---|---|---|---|---|---|
MLBase | 2013 | Scala | SparkMlib | Cost-based Multi-Armed Bandits | ร | ร | ร Website |
PDF |
ATM | 2017 | Python | Scikit-Learn | Hybrid Bayesian, and Multi-armed bandits Optimization | โ | ร | Github |
PDF |
MLBox | 2017 | Python | Scikit-Learn Keras | Distributed Random search, and Tree-Parzen estimators | ร | ร | Github |
ร |
Rafiki | 2018 | Python | Scikit-Learn TensorFlow | Distributed random search, Bayesian Optimization | ร | โ | Github |
PDF |
TransmogrifAI | 2018 | Scala | SparkML | Bayesian Optimization, and Random Search | ร | ร | Github Website |
ร |
ATMSeer | 2019 | Python | Scikit-Learn On Top Of ATM | Hybrid Bayesian, and Multi-armed bandits Optimization | โ | โ | Github |
PDF |
D-SmartML | 2019 | Scala | SparkMlib | Grid Search, Random Search, Hyperband | โ | x | Github |
x |
Databricks | 2019 | Python | SparkMlib | Hyperopt | x | โ | ร Website |
x |
Date | Supported Architectures | Optimization Method | Supported Frameworks | UI | Open Source | ||
---|---|---|---|---|---|---|---|
AutoNet | 2016 | FCN | SMAC | PyTorch | ร | Github |
PDF |
Auto-Keras | 2018 | No Restrictions | Network Morphism | Keras | โ | Github |
PDF |
enas | 2018 | CNN, RNN | Reinforcement Learning | TensorFlow | ร | Github |
PDF |
NAO | 2018 | CNN, RNN | Gradient based optimization | TensorFlow PyTorch | ร | Github |
PDF |
DARTS | 2019 | No Restrictions | Gradient based optimization | PyTorch | ร | Github |
PDF |
NNI | 2019 | No Restrictions | Random and GridSearch, Different Bayesian Optimizations, Annealing, Network Morphism, Hyper-Band, Naive Evolution | PyTorch, TensorFlow, Keras, Caffe2, CNTK, Chainer, Theano | โ | Github |
ร |
Pre-Modeling and Post-Modeling Aiding Tools
While current different AutoML tools and frameworks have minimized the role of data scientist in the modeling part and saved much effort, there is still several aspects that need human intervention and interpretability in order to make the correct decisions that can enhance and affect the modeling steps. These aspects belongs to two main building blocks of the machine learning production pipeline: Pre-Modeling and PostModeling.
The aspects of these two building blocks can help on covering what is missed in current AutoML tools, and help data scientists in doing their job in a much easier, organized, and informative way.
Pre-Modeling
-
Data Understanding
-
Sanity Checking
- 2017 | Controlling False Discoveries During Interactive Data Exploration. | Zhao et al. | SIGMOD |
PDF
- 2016 | Data Exploration with Zenvisage: An Expressive and Interactive Visual Analytics System. | Siddiqui et al. | VLDB |
PDF
|TOOL
- 2015 | SEEDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. | Vartak et al. | PVLDB |
PDF
|TOOL
- 2017 | Controlling False Discoveries During Interactive Data Exploration. | Zhao et al. | SIGMOD |
-
Feature Based Analysis
-
Data Life-Cycle Analysis
-
-
Data Validation
-
Automatic Correction
-
Automatic Alerting
-
-
Data Preparation
-
Feature Addition
- 2018 | Google Search Engine for Datasets |
URL
- 2014 | DataHub: Collaborative Data Science & Dataset Version Management at Scale. | Bhardwaj et al. | CoRR |
PDF
|URL
- 2013 | OpenML: Networked Science in Machine Learning. | Vanschoren et al. | SIGKDD |
PDF
|URL
- 2007 | UCI: Machine Learning Repository. | Dua, D. and Graff, C. |
URL
- 2018 | Google Search Engine for Datasets |
-
Feature Synthesis
-
Post-Modeling
AutoML Challenges
- 2019 | Third AutoML Challenge |
URL
- 2018 | Second AutoML Challenge |
URL
- 2017 | First AutoML Challenge |
URL
Contribute:
To contribute a change to add more references to our repository, you can follow these steps:
- Create a branch in git and make your changes.
- Push branch to github and issue pull request (PR).
- Discuss the pull request.
- We are going to review the request, and merge it to the repository.
Citation:
For more details, please refer to our Survey Paper PDF
Radwa El-Shawi, Mohamed Maher, Sherif Sakr., Automated Machine Learning: State-of-The-Art and Open Challenges (2019).