• Stars
    star
    160
  • Rank 233,336 (Top 5 %)
  • Language Roff
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distributed representation method for online logs.

Paper

Our paper is published on The 29th International Conference on Computer Communications and Networks (ICCCN 2020,). The information can be found here:

  • Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang, Federico Zaiter, Bingjin Chen, Dan Pei. A Semantic-aware Representation Framework for Online Log Analysis. ICCCN 2020. August 3 - August 6, 2020, Honolulu, Hawaii, USA.

Dependency

1. nltk, nltk.download("wordnet")
2. spacy, spacy.load("en_core_web_md")
3. progressbar
4. dynet (python3)

Quick Start

cd code/LRWE/src/ 
make clean
make 

# prepare the middle results 
python pipeline.py -i data/HDFS.log -t HDFS -o results/
  -i input rawlog
  -t name of logs
  -o output path

# do experiments for log2vec
python log2vec.py -i results -t HDFS
  -i input path
  -t name of logs

Directory Structure

.
|-- code
|   |-- get_syn_ant.py
|   |-- get_triplet.py
|   |-- getTempLogs.py
|   |-- Log2Vec.py
|   |-- utils.py
|   |-- kmeans.py
|   |-- LRWE/
|   |-- mimick/
|   |-- preprocessing.py
|
|-- log2vec.py
|-- pipeline.py
|-- statistics.py
|-- sample.py
|-- data
|   |-- HDFS.log #sample data

File Descriptions

preprocessing.py

#Filter variables in the logs
python code/preprocessing.py -rawlog ./data/BGL.log

  -rawlog:raw logs

Antonyms&Synonyms Extraction

#Extract antonyms and synonyms 
python code/get_syn_ant.py -logs ./data/BGL_without_variables.log -ant_file ./middle/ants.txt -syn_file ./middle/syns.txt

  -logs: logs
  -ant_file: antonyms
  -syn_file: synonyms

Relation Triple Extraction

python code/get_triplet.py data/BGL_without_variables.log middle/bgl_triplet.txt

  data/BGL_without_variables.log: logs
  middle/bgl_triples.txt: triples
#If -s is added, temporary saving will be enabled. By default, every 10000 pieces will be saved, named "temp\_" + output\_file
python code/get_triplet.py input_file output_file -s
#If another parameter is added after -s, the number of bars saved per time is modified
python code/get_triplet.py input_file output_file -s 50000 

Semantic Word Embedding

#Convert log file to single line for training
python code/getTempLogs.py -input data/BGL_without_variables.log -output middle/BGL_without_variables_for_training.log
cd code/LRWE/src/ 
make clean
make #make before you run

#The input file for training is the file obtained in the previous step
./lrcwe -train ../../middle/BGL_without_variables_for_training.log  -synonym ../../middle/syns.txt  -antonym ../../middle/ants.txt -output ../../middle/bgl_words.model -save-vocab ../../middle/bgl.vocab -belta-rel 0.8 - alpha-rel 0.01  -alpha-ant 0.3 -size 32 -min-count 1 -triplet ../../middle/bgl_triplet.txt

Handle OOV Words

#Read the original vector file
python code/mimick/make_dataset.py --vectors middle/bgl_words.model --w2v-format --output middle/bgl_words.pkl

  --vectors:Results of w2v, the first row is the number of rows and dimensions (can be omitted), the format of each subsequent row is word + word vector: word d1 d2... d32
#Train the new embedding according to oov
python code/mimick/model.py --dataset middle/bgl_words.pkl  --vocab middle/testvocab.txt --output middle/oov.vector

  --dataset:Output of the first step
  --vocab:New words, you can write multiple words in batches, one word per line
  --output:Embedding file for new words

Generate vector for logs

python code/Log2Vec.py -logs ./data/BGL_without_variables.log -word_model ./middle/bgl_words.model -log_vector_file ./middle/bgl_log.vector -dimension 32

This code was completed by @Weibin Meng, Yuheng Huang and Bingjin Chen in cooperation.

More Repositories

1

OmniAnomaly

KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Python
716
star
2

donut

WWW 2018: Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
Python
461
star
3

TraceAnomaly

ISSRE'20: Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks
Python
311
star
4

LogParse

An adaptive log template extraction toolkit.
Python
217
star
5

LogClass

IEEE-TNSM 2021: Anomalous Log Identification and Classification with Partial Labels
Python
166
star
6

Squeeze

ISSRE 2019: Generic and Robust Localization of Multi-Dimensional Root Cause
Python
91
star
7

KPI-Anomaly-Detection

2018AIOps: The 1st match for AIOps
77
star
8

DejaVu

Code and datasets for FSE'22 paper "Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
72
star
9

TraceRCA

Practical Root Cause Localization for Microservice Systems via Trace Analysis. IWQoS 2021
Python
69
star
10

CIRCA

Causal Inference-based Root Cause Analysis
Python
67
star
11

AIOps-Challenge-2020-Data

The published dataset of AIOps Challenge 2020
60
star
12

Bagel

IPCCC 2018: Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder
Python
50
star
13

JumpStarter

Python
42
star
14

PSqueeze

Python
28
star
15

MultiDimension-Localization

2019AIOps: The 2nd match for AIOps
23
star
16

TraceVAE

The source code for "Unsupervised Anomaly Detection on Microservice Traces through Graph VAE" in WWW2023.
Python
17
star
17

CTF_data

Data of paper "CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer"
13
star
18

OpsEval-Datasets

Datasets for OpsEval
Python
12
star
19

DOMI_code

code for DOMI
Python
11
star
20

kontrast

Python
9
star
21

aiops2020-judge

AIOps2020评测脚本
Python
7
star
22

CMDiagnostor

Python
7
star
23

DOMI_dataset

DOMI dataset
7
star
24

RC-LIR

Python
5
star
25

AutoKAD

Python
5
star
26

GTrace

Source code for GTrace (ESEC/FSE'23 industry track).
Python
4
star
27

KAD-Disformer

Python
3
star
28

AnoTuner

Python
3
star
29

PreFix

SIGMETRICS 2018: PreFix: Switch Failure Prediction in Datacenter Networks
2
star
30

course.aiops.org

HTML
2
star
31

aiops-2022-judge

2022挑战赛评测脚本
Python
2
star
32

DejaVu-Omni

Code and datasets for TOSEM paper "DejaVu-Omni: Actionable, Robust and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
1
star
33

AlertRCA

Python
1
star
34

OpenCompass-OpsQA

Python
1
star