• Stars
    star
    161
  • Rank 233,470 (Top 5 %)
  • Language Roff
  • Created over 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distributed representation method for online logs.

Paper

Our paper is published on The 29th International Conference on Computer Communications and Networks (ICCCN 2020,). The information can be found here:

  • Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang, Federico Zaiter, Bingjin Chen, Dan Pei. A Semantic-aware Representation Framework for Online Log Analysis. ICCCN 2020. August 3 - August 6, 2020, Honolulu, Hawaii, USA.

Dependency

1. nltk, nltk.download("wordnet")
2. spacy, spacy.load("en_core_web_md")
3. progressbar
4. dynet (python3)

Quick Start

cd code/LRWE/src/ 
make clean
make 

# prepare the middle results 
python pipeline.py -i data/HDFS.log -t HDFS -o results/
  -i input rawlog
  -t name of logs
  -o output path

# do experiments for log2vec
python log2vec.py -i results -t HDFS
  -i input path
  -t name of logs

Directory Structure

.
|-- code
|   |-- get_syn_ant.py
|   |-- get_triplet.py
|   |-- getTempLogs.py
|   |-- Log2Vec.py
|   |-- utils.py
|   |-- kmeans.py
|   |-- LRWE/
|   |-- mimick/
|   |-- preprocessing.py
|
|-- log2vec.py
|-- pipeline.py
|-- statistics.py
|-- sample.py
|-- data
|   |-- HDFS.log #sample data

File Descriptions

preprocessing.py

#Filter variables in the logs
python code/preprocessing.py -rawlog ./data/BGL.log

  -rawlog:raw logs

Antonyms&Synonyms Extraction

#Extract antonyms and synonyms 
python code/get_syn_ant.py -logs ./data/BGL_without_variables.log -ant_file ./middle/ants.txt -syn_file ./middle/syns.txt

  -logs: logs
  -ant_file: antonyms
  -syn_file: synonyms

Relation Triple Extraction

python code/get_triplet.py data/BGL_without_variables.log middle/bgl_triplet.txt

  data/BGL_without_variables.log: logs
  middle/bgl_triples.txt: triples
#If -s is added, temporary saving will be enabled. By default, every 10000 pieces will be saved, named "temp\_" + output\_file
python code/get_triplet.py input_file output_file -s
#If another parameter is added after -s, the number of bars saved per time is modified
python code/get_triplet.py input_file output_file -s 50000 

Semantic Word Embedding

#Convert log file to single line for training
python code/getTempLogs.py -input data/BGL_without_variables.log -output middle/BGL_without_variables_for_training.log
cd code/LRWE/src/ 
make clean
make #make before you run

#The input file for training is the file obtained in the previous step
./lrcwe -train ../../middle/BGL_without_variables_for_training.log  -synonym ../../middle/syns.txt  -antonym ../../middle/ants.txt -output ../../middle/bgl_words.model -save-vocab ../../middle/bgl.vocab -belta-rel 0.8 - alpha-rel 0.01  -alpha-ant 0.3 -size 32 -min-count 1 -triplet ../../middle/bgl_triplet.txt

Handle OOV Words

#Read the original vector file
python code/mimick/make_dataset.py --vectors middle/bgl_words.model --w2v-format --output middle/bgl_words.pkl

  --vectors:Results of w2v, the first row is the number of rows and dimensions (can be omitted), the format of each subsequent row is word + word vector: word d1 d2... d32
#Train the new embedding according to oov
python code/mimick/model.py --dataset middle/bgl_words.pkl  --vocab middle/testvocab.txt --output middle/oov.vector

  --dataset:Output of the first step
  --vocab:New words, you can write multiple words in batches, one word per line
  --output:Embedding file for new words

Generate vector for logs

python code/Log2Vec.py -logs ./data/BGL_without_variables.log -word_model ./middle/bgl_words.model -log_vector_file ./middle/bgl_log.vector -dimension 32

This code was completed by @Weibin Meng, Yuheng Huang and Bingjin Chen in cooperation.

More Repositories

1

OmniAnomaly

KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Python
729
star
2

donut

WWW 2018: Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
Python
463
star
3

TraceAnomaly

ISSRE'20: Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks
Python
313
star
4

LogParse

An adaptive log template extraction toolkit.
Python
218
star
5

LogClass

IEEE-TNSM 2021: Anomalous Log Identification and Classification with Partial Labels
Python
167
star
6

Squeeze

ISSRE 2019: Generic and Robust Localization of Multi-Dimensional Root Cause
Python
96
star
7

KPI-Anomaly-Detection

2018AIOps: The 1st match for AIOps
80
star
8

TraceRCA

Practical Root Cause Localization for Microservice Systems via Trace Analysis. IWQoS 2021
Python
74
star
9

CIRCA

Causal Inference-based Root Cause Analysis
Python
73
star
10

DejaVu

Code and datasets for FSE'22 paper "Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
73
star
11

AIOps-Challenge-2020-Data

The published dataset of AIOps Challenge 2020
62
star
12

Bagel

IPCCC 2018: Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder
Python
50
star
13

JumpStarter

Python
44
star
14

PSqueeze

Python
30
star
15

MultiDimension-Localization

2019AIOps: The 2nd match for AIOps
23
star
16

TraceVAE

The source code for "Unsupervised Anomaly Detection on Microservice Traces through Graph VAE" in WWW2023.
Python
18
star
17

OpsEval-Datasets

Datasets for OpsEval
Python
15
star
18

CTF_data

Data of paper "CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer"
13
star
19

DOMI_code

code for DOMI
Python
11
star
20

kontrast

Python
9
star
21

aiops2020-judge

AIOps2020评测脚本
Python
7
star
22

CMDiagnostor

Python
7
star
23

DOMI_dataset

DOMI dataset
7
star
24

AutoKAD

Python
6
star
25

RC-LIR

Python
5
star
26

GTrace

Source code for GTrace (ESEC/FSE'23 industry track).
Python
4
star
27

KAD-Disformer

Python
3
star
28

AnoTuner

Python
3
star
29

PreFix

SIGMETRICS 2018: PreFix: Switch Failure Prediction in Datacenter Networks
2
star
30

course.aiops.org

HTML
2
star
31

aiops-2022-judge

2022挑战赛评测脚本
Python
2
star
32

Chain-of-Event

Python
2
star
33

DejaVu-Omni

Code and datasets for TOSEM paper "DejaVu-Omni: Actionable, Robust and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
1
star
34

AlertRCA

Python
1
star
35

OpenCompass-OpsQA

Python
1
star