LogBERT: Log Anomaly Detection via BERT
ARXIV
This repository provides the implementation of Logbert for log anomaly detection. The process includes downloading raw data online, parsing logs into structured data, creating log sequences and finally modeling.
Configuration
- Ubuntu 20.04
- NVIDIA driver 460.73.01
- CUDA 11.2
- Python 3.8
- PyTorch 1.9.0
Installation
This code requires the packages listed in requirements.txt. An virtual environment is recommended to run this code
On macOS and Linux:
python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
pip install -r ./environment/requirements.txt
deactivate
Reference: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
An alternative is to create a conda environment:
conda create -f ./environment/environment.yml
conda activate logbert
Reference: https://docs.conda.io/en/latest/miniconda.html
Experiment
Logbert and other baseline models are implemented on HDFS, BGL, and thunderbird datasets
HDFS example
cd HDFS
sh init.sh
# process data
python data_process.py
#run logbert
python logbert.py vocab
python logbert.py train
python logbert.py predict
#run deeplog
python deeplog.py vocab
# set options["vocab_size"] = <vocab output> above
python deeplog.py train
python deeplog.py predict
#run loganomaly
python loganomaly.py vocab
# set options["vocab_size"] = <vocab output> above
python loganomaly.py train
python loganomaly.py predict
#run baselines
baselines.ipynb
Folders created during execution
~/.dataset //Stores original datasets after downloading
project/output //Stores intermediate files and final results during execution