• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A log pattern analyzer CLI

logmine - a log pattern analyzer CLI

PyPI version

A command-line tool to help you quickly inspect your log files and identify patterns.

Install

pip install logmine

Usage

cat sample/Apache_2k.log | logmine

logmine helps to cluster the logs into multiple clusters with common patterns along with the number of messages in each cluster.

image

You can have more granular clusters by adjusting -m value, the lower the value, the more details you will get.

cat sample/Apache_2k.log | logmine -m0.2

image

The texts in red are the placeholder for multiple values that fit in the pattern, you can replace those with your own placeholder.

cat sample/Apache_2k.log | logmine -m0.2 -p'---'

image

You can define variables to reduce the number unnecessary patterns and have less clusters. For example, the command bellow replaces all time texts with <time> variable.

cat sample/Apache_2k.log | logmine -m0.2 -p'---' -v "<time>:/\\d{2}:\\d{2}:\\d{2}/"

image

See all available options

How it works

LogMine is an implementation of the same name paper LogMine: Fast Pattern Recognition for Log Analytics. The idea is to use a distance function to calculate a distance between to log line and group them into clusters.

image

The distance function is designed to work well on log dataset, where all log messages from the same application are generated by a finite set of formats.

The Max Distance variable (max_dist or the -m option) represents the maximum distance between any log message in a cluster. The smaller max_dist, the more clusters will be generated. This can be useful to analyze a set of log messages at multiple levels.

image

More details on the clustering algorithm and pattern generation are available in the paper.

Features

  • Customizable max_dist and many other variables
  • Parallel processing on multiple cores
  • Colorful output
  • Support pipe/redirect
  • No dependencies
  • Tail mode: watch the clusters on a continuous input stream (TODO)
  • Sampling to reduce processing time on a large dataset (TODO)

Contribute / Development

  • Welcome all contributions

  • Install virtualenv (and optionally twine if you intend to publish):

      python3 -m pip install virtualenv twine
    
  • Create (if not yet exists) & activate virtual env:

      python3 -m virtualenv -p $(which python3) .v
    
  • Activate the virtualenv

      source ./.v/bin/activate
    
  • Run tests:

      ./test.sh
    
  • Run the dev version:

      ./logmine sample/Apache_2k.log
    
  • Publish:

    • Update the version value in setup.py following semver.
    • run ./publish.sh

CLI options

usage: logmine [-h] [-m MAX_DIST] [-v [VARIABLES [VARIABLES ...]]]
               [-d DELIMETERS] [-i MIN_MEMBERS] [-k1 K1] [-k2 K2]
               [-s {desc,asc}] [-da] [-p PATTERN_PLACEHOLDER] [-dhp] [-dm]
               [-dhv] [-c]
               [file [file ...]]

LogMine: a log pattern analyzer

positional arguments:
  file                  Filenames or glob pattern to analyze. Default: stdin

optional arguments:
  -h, --help            show this help message and exit
  -m MAX_DIST, --max-dist MAX_DIST
                        This parameter control how the granularity of the
                        clustering algorithm. Lower the value will provide
                        more granular clusters (more clusters generated).
                        Default: 0.6
  -v [VARIABLES [VARIABLES ...]], --variables [VARIABLES [VARIABLES ...]]
                        List of variables to replace before process the log
                        file. A variable is a pair of name and a regex
                        pattern. Format: "name:/regex/". During processing
                        time, LogMine will consider all texts that match
                        varible regexes to be the same value. This is useful
                        to reduce the number of unnecessary cluster generated,
                        with trade off of processing time. Default: None
  -d DELIMETERS, --delimeters DELIMETERS
                        A regex pattern used to split a line into multiple
                        fields. Default: "\s+"
  -i MIN_MEMBERS, --min-members MIN_MEMBERS
                        Minimum number of members in a cluster to show in the
                        result. Default: 2
  -k1 K1, --fixed-value-weight K1
                        Internal weighting variable. This value will be used
                        as the weight value when two fields have the same
                        value. This is used in the score function to calculate
                        the distance between two lines. Default: 1
  -k2 K2, --variable-weight K2
                        Similar to k1 but for comparing variables. Two
                        variable is considering the same if they have same
                        name. Default: 1
  -s {desc,asc}, --sorted {desc,asc}
                        Sort the clusters by number of members. Default: desc
  -da, --disable-number-align
                        Disable number align in output. Default: True
  -p PATTERN_PLACEHOLDER, --pattern-placeholder PATTERN_PLACEHOLDER
                        Use a string as placeholder for patterns in output.
                        Default: None
  -dhp, --disable-highlight-patterns
                        Disable highlighting for patterns in output. Default:
                        True
  -dm, --disable-mask-variables
                        Disable masks for variables in output. When disabled
                        variables will be shown as the actual value. Default:
                        True
  -dhv, --disable-highlight-variables
                        Disable highlighting for variables in output. Default:
                        True
  -c, --single-core     Force LogMine to only run on 1 core. This will
                        increase the processing time. Note: the result output
                        can be different compare to when run with multicores,
                        this is expected. Default: False

Capturing the analysis result in a buffer

By default, logmine writes the analysis results to stdout. In order to capture this output, a file-like object can be passed using the set_output_file() method to capture the result string, like in the below example :

buffer = io.StringIO()
lm = LogMine() # pass the usual parameters
lm.output.set_output_file(file=buffer)
lm.run()
# The captured output can be accessed in the buffer.
print(buffer.getvalue())

More Repositories

1

Awesome-Black-Friday-Cyber-Monday

Awesome deals on Black Friday: Apps, SaaS, Books, Courses, etc.
1,721
star
2

github-explorer

Progressive Web Apps experiment
JavaScript
722
star
3

react-router-page-transition

Highly customizable page transition component for your React Router
JavaScript
542
star
4

real-time-twitter-banner

Fun little experiment with Twitter API
JavaScript
108
star
5

chatgpt-prompt-templates

Templates for community prompts on TypingMind.com
47
star
6

tonydinh-com

My personal website
JavaScript
37
star
7

summer

Little fun experiment with cellular automaton
JavaScript
34
star
8

ChordDroid

Android library to render Guitar Chord.
Java
25
star
9

dinhquangtrung.net

My website
JavaScript
23
star
10

hn-big-threads

Use flame graphs to read very big HN threads
HTML
16
star
11

screensaver

Simple screensaver app
Shell
14
star
12

16-bit-computer

16-bit computer in Logism
JavaScript
13
star
13

smart-doge

Doge can find the next number in a sequence. Yay!
JavaScript
10
star
14

movie-showtimes

Web Service & Android Application to look up Vietnam movie showtimes
Java
7
star
15

nextjs-phaser

TypeScript
7
star
16

firebase-example

A web application to demonstrate Firebase features
JavaScript
6
star
17

8-bit-computer

8-bit-computer in Logism
JavaScript
5
star
18

do-an-xe

do-an-xe
JavaScript
4
star
19

960-grid-generator

Old school CSS grid generator (warning: float)
JavaScript
4
star
20

snake-redux

Snake game with pure Redux and p5.js, no boilerplate
JavaScript
3
star
21

-VB6-VirusRemoveAll

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
3
star
22

FreakingMathAndroid

Android game with numbers
Java
3
star
23

plain-react

My favorite boilerplate to create static web app with React.
JavaScript
3
star
24

thesimpleapi

API as a service
JavaScript
3
star
25

firebase-checkin

Check in application built with Firebase
HTML
2
star
26

web-bluetooth

web-bluetooth
HTML
2
star
27

vnk-swift

Vietnamese input method for OSX
Swift
2
star
28

facebook-float-video

Floating facebook video
JavaScript
2
star
29

-VB6-PerfectAntivirus

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
2
star
30

-VB6-SystemReporter

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
2
star
31

-VB6-ProcessManager

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
1
star
32

web-bluetooth-codelabs

web-bluetooth-codelabs
HTML
1
star
33

test-twitter-physic

JavaScript
1
star
34

school-revision-tool

Self-review tool for school exam.
JavaScript
1
star
35

progressive-web-app-demo

Small demo building Progressive Web Apps
JavaScript
1
star
36

chrome-dinosaur

I copied the source code from Chromium and edited it.
HTML
1
star
37

css-stuffs

My experiment works with css
CSS
1
star
38

-VB6-NeverAutorun

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
1
star
39

-VB6-AllFileLocker

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
1
star
40

bricks

Coding challenge
JavaScript
1
star
41

FormulasTextBox

A control for .Net Winform Application to process Input with formulas like an Excel cell.
C#
1
star
42

-VB6-1Click

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
1
star
43

-VB6-ProStatus

See my article about this project: http://dinhquangtrung.net/blog/some-softwares-i-made
Visual Basic
1
star