• Stars
    star
    609
  • Rank 71,131 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 1 year ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning

AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

This code provides an Automated Machine Learning (AutoML) implementation for static and dynamic data analytics problems. It provides a case study of IoT anomaly detection using many ML algorithms and optimization/AutoML methods (for automating and optimizing ML algorithms). It can also be used as a tutorial to help machine learning researchers to automatically obtain optimized machine learning models with the optimal learning performance on any specific task.

  • Batch/Static Learning: Batch learning is the traditional machine learning and data analytics process. Batch learning methods analyze static IoT data in batches and often need access to the entire dataset prior to model training.
  • Online/Continual learning: Online learning or continual learning techniques are able to train models using continuously incoming online data streams in dynamic IoT environments and address concept drift issues (data distribution changes).

This code is also the implementation of a review paper published in Engineering Applications of Artificial Intelligence (IF: 7.8):
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.

This paper and code will help industrial users, data analysts, and researchers to better develop machine learning models using automation technology.

  • A comprehensive hyperparameter optimization (automatically tuning the hyperparameters of machine learning algorithms to achieve optimal performance) tutorial code can be found in: Hyperparameter-Optimization-of-Machine-Learning-Algorithms
    • 1,000+ GitHub stars
    • 700+ citations by journal & conference papers

Paper Link

IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
One-column version: arXiv
Two-column version: Elsevier

AutoML Pipeline and Procedures

  1. Automated Data Pre-Processing
  2. Automated Feature Engineering
  3. Automated Model Selection
  4. Hyper-Parameter Optimization
  5. Automated Model Updating (for addressing concept drift, and only for online learning and data stream analytics)

Quick Navigation of The Paper

Section 3: IoT data analytics overview
Section 3: Model learning (introduce all common machine learning algorithms)
Section 4: AutoML overview & optimization techniques (introduce what is AutoML and its techniques)
Section 5: Automated data pre-processing
Section 6: Automated feature engineering
Section 7: Automated model updating by handling concept drift
Section 8: Selection of evaluation metrics and validation methods
Section 9: AutoML Tools and libraries
Section 10: Case study (Experimental results, sample code in "AutoML_Batch_Learning_CIC.ipynb")
Section 11: Open challenges and future research directions
Summary table for Sections 3: Table 1 & 2: A comprehensive overview of common ML models, their hyperparameters, their advantages and limitations, and suitable IoT tasks
Summary table for Sections 4: Table 3: The comparison of common optimization methods for CASH and HPO problems
Summary table for Sections 7: Table 5: The comparison of concept drift methods for automated model updating
Summary table for Sections 10: Table 6: The specifications of the proposed AutoML pipeline
Summary table for Sections 11: Table 12: The challenges and research directions of applying AutoML to IoT data analytics

Implementation

Static Machine Learning & Deep Learning Algorithms

  • Random forest (RF)
  • LightGBM
  • K-nearest neighbor (KNN)
  • Naive Bayes (NB)
  • Artificial Neural Networks (ANN)

Dynamic/Online Learning Algorithms

  • Hoeffding Tree (HT)
  • Leveraging Bagging (LB)
  • Adaptive Random Forest (ARF)
  • Streaming Random Patches (SRP)

Optimization/AutoML Algorithms

  • Grid search
  • Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)
  • Particle Swarm Optimization (PSO)

Datasets

  1. CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems

  2. IoTID20 dataset, a novel IoT botnet dataset

Requirements

Contact-Info

Please feel free to contact me for any questions or cooperation opportunities. I'd be happy to help.

Citation

If you find this repository useful in your research, please cite this article as:

L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.

@article{YANG2022105366,
title = "IoT data analytics in dynamic environments: From an automated machine learning perspective",
author = "Li Yang and Abdallah Shami",
journal = "Engineering Applications of Artificial Intelligence",
volume = {116},
pages = {1-33},
year = "2022",
doi = "https://doi.org/10.1016/j.engappai.2022.105366",
url = "https://www.sciencedirect.com/science/article/pii/S0952197622003803"
}

More Repositories

1

Intrusion-Detection-System-Using-Machine-Learning

Code for IDS-ML: intrusion detection system development using machine learning algorithms (Decision tree, random forest, extra trees, XGBoost, stacking, k-means, Bayesian optimization..)
Jupyter Notebook
306
star
2

PWPAE-Concept-Drift-Detection-and-Adaptation

Data stream analytics: Implement online learning methods to address concept drift and model drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" published in IEEE GlobeCom 2021.
Jupyter Notebook
192
star
3

Intrusion-Detection-System-Using-CNN-and-Transfer-Learning

Code for intrusion detection system (IDS) development using CNN models and transfer learning
Jupyter Notebook
95
star
4

OASW-Concept-Drift-Detection-and-Adaptation

An online learning method used to address concept drift and model drift. Code for the paper entitled "A Lightweight Concept Drift Detection and Adaptation Framework for IoT Data Streams" published in IEEE Internet of Things Magazine.
Jupyter Notebook
44
star
5

Vibration-Based-Fault-Diagnosis-with-Low-Delay

Python codes “Jupyter notebooks” for the paper entitled "A Hybrid Method for Condition Monitoring and Fault Diagnosis of Rolling Bearings With Low System Delay, IEEE Trans. on Instrumentation and Measurement, Aug. 2022. Techniques used: Wavelet Packet Transform (WPT) & Fast Fourier Transform (FFT). Application: vibration-based fault diagnosis.
Jupyter Notebook
44
star
6

MSANA-Online-Data-Stream-Analytics-And-Concept-Drift-Adaptation

Data stream analytics: Implement online learning methods to address concept drift and model drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.
Jupyter Notebook
28
star
7

FL-IOV-ITS

Code for the case study presented in "Making a Case for Federated Learning in the Internet of Vehicles and Intelligent Transportation Systems" accepted for publication in the IEEE Network Magazine May 2021 Special Issue on AI-empowered Mobile Edge Computing in the Internet of Vehicles.
Jupyter Notebook
21
star
8

5G-Core-Networks-Datasets

12
star
9

Wireless-Resource-Virtualization-with-Device-to-Device-Communication-Underlaying-LTE-Networks

Implementation of Wireless Resource Virtualization with Device-to-Device Communication Underlaying LTE Networks
MATLAB
7
star
10

Student-Performance-and-Engagement-Prediction-eLearning-datasets

This repository contains the datasets used as part of the OC2 lab's work on Student Performance prediction and student engagement prediction in eLearning environments using machine learning methods.
7
star
11

Similarity-Based-Predictive-Maintenance-Framework-for-Rotating-Machinery

Python code “Jupyter notebooks” for the paper entitled " Similarity-Based Predictive Maintenance Framework for Rotating Machinery" has been presented in the Fifth International Conference on Communications, Signal Processing, and their Applications (ICCSPA’22), Cairo, Egypt, 27-29 December 2022. Techniques used: statistical analysis, FFT, and STFT.
Jupyter Notebook
5
star
12

Data-driven-Methods-for-the-Reduction-of-Energy-Consumption-in-Warehouses-Use-Case

This is the repository that includes the code of the use case in the paper titled "Data-driven Methods for the Reduction of Energy Consumption in Warehouses: Use-Case Driven Analysis"
Jupyter Notebook
3
star
13

SB-PdM-a-tool-for-predictive-maintenance-of-rolling-bearings-based-on-limited-labeled-data

SB-PdM is a non-machine learning code to perform Predictive Maintenance (PdM) of rolling bearings without the need to train a classifier. In SM-PdM, the classification task is performed by applying a similarity measure between test sample and class-reference labeled samples in the feature space.
Jupyter Notebook
3
star
14

AutoML-and-Adversarial-Attack-Defense-for-Zero-Touch-Network-Security

This repository includes code for the AutoML-based IDS and adversarial attack defense case studies presented in the paper "Diving Into Zero-Touch Network Security: Use-Case Driven Analysis" submitted to IEEE Transactions on Network and Service Management.
2
star
15

DNS_Typosquatting_Detection_Datasets

This repository contains the datasets used as part of the OC2 lab's work on DNS Typosquatting Detection using machine learning methods
MATLAB
2
star
16

hierarchical-CO2

This is a repository that includes the code used in the paper titled "Hierarchical Modelling for CO2 Variation Prediction for HVAC System Operation"
Python
1
star
17

CorrFL

This repository includes the code used in the paper titled "CorrFL: Correlation-based Neural Network Architecture for Unavailability Concerns in a Heterogeneous IoT Environment"
Python
1
star