• Stars
    star
    385
  • Rank 111,464 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for IDS-ML: intrusion detection system development using machine learning algorithms (Decision tree, random forest, extra trees, XGBoost, stacking, k-means, Bayesian optimization..)

Intrusion-Detection-System-Using-Machine-Learning

This repository contains the code for the project "IDS-ML: Intrusion Detection System Development Using Machine Learning". The code and proposed Intrusion Detection System (IDSs) are general models that can be used in any IDS and anomaly detection applications. In this project, three papers have been published:

The code introduction of this repository is publicly available at:

This repository proposed three intrusion detection systems by implementing many machine learning algorithms, including tree-based algorithms (decision tree, random forest, XGBoost, LightGBM, CatBoost etc.), unsupervised learning algorithms (k-means), ensemble learning algorithms (stacking, proposed LCCDE), and hyperparameter optimization techniques (Bayesian optimization)**.

Paper Abstract

Paper 1: Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles

  The use of autonomous vehicles (AVs) is a promising technology in Intelligent Transportation Systems (ITSs) to improve safety and driving efficiency. Vehicle-to-everything (V2X) technology enables communication among vehicles and other infrastructures. However, AVs and Internet of Vehicles (IoV) are vulnerable to different types of cyber-attacks such as denial of service, spoofing, and sniffing attacks. An intelligent IDS is proposed in this paper for network attack detection that can be applied to not only Controller Area Network (CAN) bus of AVs but also on general IoVs. The proposed IDS utilizes tree-based ML algorithms including decision tree (DT), random forest (RF), extra trees (ET), and Extreme Gradient Boosting (XGBoost). The results from the implementation of the proposed intrusion detection system on standard data sets indicate that the system has the ability to identify various cyber-attacks in the AV networks. Furthermore, the proposed ensemble learning and feature selection approaches enable the proposed system to achieve high detection rate and low computational cost simultaneously.

Figure 1: The overview of the tree-based IDS model.

Paper 2: MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles

  Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devices. However, the improving functionality and connectivity of modern vehicles also increase their vulnerabilities to cyber-attacks targeting both intra-vehicle and external networks due to the large attack surfaces. To secure vehicular networks, many researchers have focused on developing intrusion detection systems (IDSs) that capitalize on machine learning methods to detect malicious cyber-attacks. In this paper, the vulnerabilities of intra-vehicle and external networks are discussed, and a multi-tiered hybrid IDS that incorporates a signature-based IDS and an anomaly-based IDS is proposed to detect both known and unknown attacks on vehicular networks. Experimental results illustrate that the proposed system can accurately detect various types of known attacks on the CAN-intrusion-dataset representing the intra-vehicle network data and the CICIDS2017 dataset illustrating the external vehicular network data.
  The proposed MTH-IDS framework consists of two traditional ML stages (data pre-processing and feature engineering) and four tiers of learning models:

  1. Four tree-based supervised learners — decision tree (DT), random forest (RF), extra trees (ET), and extreme gradient boosting (XGBoost) — used as multi-class classifiers for known attack detection;
  2. A stacking ensemble model and a Bayesian optimization with tree Parzen estimator (BO-TPE) method for supervised learner optimization;
  3. A cluster labeling (CL) k-means used as an unsupervised learner for zero-day attack detection;
  4. Two biased classifiers and a Bayesian optimization with Gaussian process (BO-GP) method for unsupervised learner optimization.

Figure 2: The overview of the MTH-IDS model.

Paper 3: LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles

  Modern vehicles, including autonomous vehicles and connected vehicles, have adopted an increasing variety of functionalities through connections and communications with other vehicles, smart devices, and infrastructures. However, the growing connectivity of the Internet of Vehicles (IoV) also increases the vulnerabilities to network attacks. To protect IoV systems against cyber threats, Intrusion Detection Systems (IDSs) that can identify malicious cyber-attacks have been developed using Machine Learning (ML) approaches. To accurately detect various types of attacks in IoV networks, we propose a novel ensemble IDS framework named Leader Class and Confidence Decision Ensemble (LCCDE). It is constructed by determining the best-performing ML model among three advanced ML algorithms (XGBoost, LightGBM, and CatBoost) for every class or type of attack. The class leader models with their prediction confidence values are then utilized to make accurate decisions regarding the detection of various types of cyber-attacks. Experiments on two public IoV security datasets (Car-Hacking and CICIDS2017 datasets) demonstrate the effectiveness of the proposed LCCDE for intrusion detection on both intra-vehicle and external networks.

Figure 3: The overview of the LCCCDE IDS model.

Implementation

Dataset

CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems

  • Publicly available at: https://www.unb.ca/cic/datasets/ids-2017.html
  • For the purpose of displaying the experimental results in Jupyter Notebook, the sampled subsets of CICIDS2017 is used in the sample code. The subsets are in the "data" folder.

CAN-intrusion dataset, a benchmark network security dataset for intra-vehicle intrusion detection

Code

  • Tree-based_IDS_GlobeCom19.ipynb: code for the paper "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles"
  • MTH_IDS_IoTJ.ipynb: code for the paper "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles"
  • LCCDE_IDS_GlobeCom22.ipynb: code for the paper "LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles"

Machine Learning Algorithms

  • Decision tree (DT)
  • Random forest (RF)
  • Extra trees (ET)
  • XGBoost
  • LightGBM
  • CatBoost
  • Stacking
  • K-means

Hyperparameter Optimization Methods

  • Bayesian Optimization with Gaussian Processes (BO-GP)
  • Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)

If you are interested in hyperparameter tuning of machine learning algorithms, please see the code in the following link:
https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms

Requirements & Libraries

Contact-Info

Please feel free to contact us for any questions or cooperation opportunities. We will be happy to help.

Citation

If you find this repository useful in your research, please cite one of the following two articles as:

L. Yang, A. Moubayed, I. Hamieh and A. Shami, "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles," 2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1-6, doi: 10.1109/GLOBECOM38437.2019.9013892.

@INPROCEEDINGS{9013892,
  author={Yang, Li and Moubayed, Abdallah and Hamieh, Ismail and Shami, Abdallah},
  booktitle={2019 IEEE Global Communications Conference (GLOBECOM)}, 
  title={Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles}, 
  year={2019},
  pages={1-6},
  doi={10.1109/GLOBECOM38437.2019.9013892}
  }

L. Yang, A. Moubayed, and A. Shami, “MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 616-632, Jan.1, 2022, doi: 10.1109/JIOT.2021.3084796.

@ARTICLE{9443234,
  author={Yang, Li and Moubayed, Abdallah and Shami, Abdallah},
  journal={IEEE Internet of Things Journal}, 
  title={MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles}, 
  year={2022},
  volume={9},
  number={1},
  pages={616-632},
  doi={10.1109/JIOT.2021.3084796}}

L. Yang, A. Shami, G. Stevens, and S. DeRusett, “LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles," in 2022 IEEE Global Communications Conference (GLOBECOM), 2022, pp. 1-6, doi: 10.1109/GLOBECOM48099.2022.10001280.

@INPROCEEDINGS{10001280,
  author={Yang, Li and Shami, Abdallah and Stevens, Gary and de Rusett, Stephen},
  booktitle={GLOBECOM 2022 - 2022 IEEE Global Communications Conference}, 
  title={LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles}, 
  year={2022},
  pages={3545-3550},
  doi={10.1109/GLOBECOM48099.2022.10001280}}

More Repositories

1

AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning
Jupyter Notebook
614
star
2

PWPAE-Concept-Drift-Detection-and-Adaptation

Data stream analytics: Implement online learning methods to address concept drift and model drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" published in IEEE GlobeCom 2021.
Jupyter Notebook
198
star
3

Intrusion-Detection-System-Using-CNN-and-Transfer-Learning

Code for intrusion detection system (IDS) development using CNN models and transfer learning
Jupyter Notebook
126
star
4

Vibration-Based-Fault-Diagnosis-with-Low-Delay

Python codes “Jupyter notebooks” for the paper entitled "A Hybrid Method for Condition Monitoring and Fault Diagnosis of Rolling Bearings With Low System Delay, IEEE Trans. on Instrumentation and Measurement, Aug. 2022. Techniques used: Wavelet Packet Transform (WPT) & Fast Fourier Transform (FFT). Application: vibration-based fault diagnosis.
Jupyter Notebook
53
star
5

OASW-Concept-Drift-Detection-and-Adaptation

An online learning method used to address concept drift and model drift. Code for the paper entitled "A Lightweight Concept Drift Detection and Adaptation Framework for IoT Data Streams" published in IEEE Internet of Things Magazine.
Jupyter Notebook
47
star
6

MSANA-Online-Data-Stream-Analytics-And-Concept-Drift-Adaptation

Data stream analytics: Implement online learning methods to address concept drift and model drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.
Jupyter Notebook
30
star
7

FL-IOV-ITS

Code for the case study presented in "Making a Case for Federated Learning in the Internet of Vehicles and Intelligent Transportation Systems" accepted for publication in the IEEE Network Magazine May 2021 Special Issue on AI-empowered Mobile Edge Computing in the Internet of Vehicles.
Jupyter Notebook
22
star
8

AutoML-and-Adversarial-Attack-Defense-for-Zero-Touch-Network-Security

This repository includes code for the AutoML-based IDS and adversarial attack defense case studies presented in the paper "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis" published in IEEE Transactions on Network and Service Management.
Jupyter Notebook
21
star
9

5G-Core-Networks-Datasets

13
star
10

Signal-Processing-for-Machine-Learning

This repository serves as a platform for posting a diverse collection of Python codes for signal processing, facilitating various operations within a typical signal processing pipeline (pre-processing, processing, and application).
Jupyter Notebook
11
star
11

Student-Performance-and-Engagement-Prediction-eLearning-datasets

This repository contains the datasets used as part of the OC2 lab's work on Student Performance prediction and student engagement prediction in eLearning environments using machine learning methods.
10
star
12

Similarity-Based-Predictive-Maintenance-Framework-for-Rotating-Machinery

Python code “Jupyter notebooks” for the paper entitled " Similarity-Based Predictive Maintenance Framework for Rotating Machinery" has been presented in the Fifth International Conference on Communications, Signal Processing, and their Applications (ICCSPA’22), Cairo, Egypt, 27-29 December 2022. Techniques used: statistical analysis, FFT, and STFT.
Jupyter Notebook
9
star
13

Wireless-Resource-Virtualization-with-Device-to-Device-Communication-Underlaying-LTE-Networks

Implementation of Wireless Resource Virtualization with Device-to-Device Communication Underlaying LTE Networks
MATLAB
7
star
14

Data-driven-Methods-for-the-Reduction-of-Energy-Consumption-in-Warehouses-Use-Case

This is the repository that includes the code of the use case in the paper titled "Data-driven Methods for the Reduction of Energy Consumption in Warehouses: Use-Case Driven Analysis"
Jupyter Notebook
4
star
15

CorrFL

This repository includes the code used in the paper titled "CorrFL: Correlation-based Neural Network Architecture for Unavailability Concerns in a Heterogeneous IoT Environment"
Python
3
star
16

SB-PdM-a-tool-for-predictive-maintenance-of-rolling-bearings-based-on-limited-labeled-data

SB-PdM is a non-machine learning code to perform Predictive Maintenance (PdM) of rolling bearings without the need to train a classifier. In SM-PdM, the classification task is performed by applying a similarity measure between test sample and class-reference labeled samples in the feature space.
Jupyter Notebook
3
star
17

DNS_Typosquatting_Detection_Datasets

This repository contains the datasets used as part of the OC2 lab's work on DNS Typosquatting Detection using machine learning methods
MATLAB
2
star
18

FDE

Jupyter Notebook
1
star
19

TRL-HPO

Python
1
star
20

Joint-Instantaneous-Amplitude-Frequency-Analysis-for-Vibration-Based-Condition-Monitoring

Jupyter Notebook
1
star
21

hierarchical-CO2

This is a repository that includes the code used in the paper titled "Hierarchical Modelling for CO2 Variation Prediction for HVAC System Operation"
Python
1
star
22

TinyML_EVCI

This repository contains code for comparing traditional Machine Learning (ML) and Tiny Machine Learning (TinyML) in terms of time, memory usage, and performance, specifically in the context of electric vehicle charging infrastructure. It also offers practical insights by implementing TinyML on the ESP32 microcontroller.
Python
1
star