• Stars
    star
    120
  • Rank 295,983 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine learning plugins for network traffic

Device Functional Role ID via Machine Learning and Network Traffic Analysis

License Build Status PyPI version codecov Docker Hub Downloads

Overview

NetworkML is the machine learning portion of our Poseidon project. The model in networkML classifies each device into a functional role via machine learning models trained on features derived from network traffic. "Functional role" refers to the authorized administrative purpose of the device on the network and includes roles such as printer, mail server, and others typically found in an IT environment. Our internal analysis suggests networkML can achieve accuracy, precision, recall, and F1 scores in the high 90's when trained on devices from your own network. Whether this performance can transfer from IT environment to IT environment is an active area of our research.

NetworkML can be used in a "standalone" mode from the command line interface. For more background and context on the macro project, please check out the Poseidon project page on our website. This repository specifically covers the output, inputs, data processing, and machine learning models we deploy in networkML.

While this repository and resulting docker container can be used completely independently, the code was written to support the IQT Labs Poseidon project. See:

This repository contains the components necessary to build a docker container that can be used for training a number of ML models using network packet captures (PCAPs). The repository includes scripts necessary to do training, testing, and evaluation. These can be run from a shell once networkml is installed as a package or run in a Docker container using the networkml script.

Feel free to use, discuss, and contribute!

Model Output

NetworkML predicts the functional role of network-connected device via network traffic analysis and machine learning.

Admittedly subjective, the term "role" refers to the authorized administrative purpose of the device on the network. NetworkML in its default configuration has twelve roles: active directory controller, administrator server, administrator workstation, confluence server, developer workstation, distributed file share, exchange server, graphics processing unit (GPU) laptop, github server, public key infrastructure (PKI) server, and printer. This typology reflects the network-connected devices in the data we used to train the model. Other networks will lack some of these roles and might include others. Consequently, organizations that wish to use networkML might have to adapt the model outputs for their specific organization.

Model Inputs

NetworkML's key input is the network traffic for a single device. By network traffic for a single device, we mean all packets sent and received by that device over a given time period. For reliable results, we recommend at least fifteen minutes of network traffic. Poseidon, the larger project of which networkML is only a part, performs the necessary packet pre-processing to produce pcap's containing all network traffic to and from a single device. If you are using networkML in a standalone manner, the pcap files must all follow a strict naming convention: DeviceName-deviceID-time-duration-flags.pcap. For example, ActiveDirectoryController-labs-Fri0036-n00.pcap refers to a pcap from an active directory controller taken from a user named labs on a Friday at 00:36. The flag field does not currently have any significance.

It is worth noting that networkML uses only packet header data in its models. NetworkML does not use data from the packet payload. Relying only on packet header data enables networkML to avoid some privacy-related issues associated with using payload data and to create (hopefully) more generalizable and more performant models.

Data Processing

Algorithms

NetworkML uses a feedforward neural network from the scikit-learn package. The model is trained using 5-fold cross validation in combination with a simple grid-search of the hyper-parameter space.

Installation/Run

Our models can be executed via Docker and in a standalone manner on a Linux host. We recommend deployment via Poseidon if you are running an SDN (software-defined network). Otherwise, we recommend using Docker.

See the README file included in the networkml/trained_models folder for specific instructions on deployment.

Develop/Standalone Installation

Note: This project uses absolute paths for imports, meaning you'll either need to modify your PYTHONPATH to something like this from the project directory:

export PYTHONPATH=$PWD/networkml:$PYTHONPATH

Alternatively, simply running pip3 install . from the project directory after making changes will update the package to test or debug against.

This package is set up for anaconda/miniconda to be used for package and environment management if desired. Assuming you have the latest install (as of this writing, we have been using conda 4.5.12), set up the environment by performing the following:

  1. Ensure that the CONDA_EXE environment variable has been set. If echo $CONDA_EXE returns empty, resolve this by export CONDA_EXE=$_CONDA_EXE in your bash shell.
  2. Run make dev to set up the environment
  3. Run conda activate posml-dev to begin.

You can remove the dev environment via standard conda commands:

  1. Run conda deactivate
  2. Run conda env remove -y -n posml-dev

For more information about using conda, please refer to their user documentation.

More Repositories

1

ryu

Ryu component-based software defined networking framework
Python
1,498
star
2

faucet

FAUCET is an OpenFlow controller for multi table OpenFlow 1.3 switches, that implements layer 2 switching, VLANs, ACLs, and layer 3 IPv4 and IPv6 routing.
Python
558
star
3

poseidon

Poseidon is a python-based application that leverages software defined networks (SDN) to acquire and then feed network traffic to a number of machine learning techniques. The machine learning algorithms classify and predict the type of device.
Python
417
star
4

udmi

Universal Device Management Interface (UDMI) provides a high-level specification for the management and operation of physical IoT systems.
Java
49
star
5

daq

DEPRECATED -- DAQ (Device Automated Qualification) framework in no longer in use, supported, or maintained. It is here for archival purposes only.
Python
41
star
6

network-tools

Network Tools
C
32
star
7

faucet-gui

Faucet GUI (a.k.a virtnet-creator) is an easy to use interface for building faucet configuration files.
Python
22
star
8

chewie

A python 802.1x daemon
Python
16
star
9

gnmi

Faucet gNMI docker image
Shell
10
star
10

beka

A Python BGP Speaker
Python
10
star
11

udmi_site_model

8
star
12

faucetagent

gNMI agent for faucet configuration
Python
5
star
13

forch

Faucet Orchestrator
Python
4
star
14

python3-ryu

Python
3
star
15

faucetapps

Applications for Faucet SDN Controller
JavaScript
2
star
16

action-packagecloud-upload-debian-packages

Github action for uploading a debian package to packagecloud
Shell
1
star
17

faucet.nz

Faucet SDN Website
CSS
1
star
18

book.ryu-sdn.org

Python
1
star
19

faucet.org.nz

Faucet Foundation Website
CSS
1
star
20

docker-test-base

Faucet testsuite base docker image
Shell
1
star