• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    Ruby
  • License
    Other
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fluentd's Grok parser

Grok Parser for Fluentd

Testing on Ubuntu Testing on macOS

This is a Fluentd plugin to enable Logstash's Grok-like parsing logic.

Requirements

fluent-plugin-grok-parser fluentd ruby
>= 2.0.0 >= v0.14.0 >= 2.1
< 2.0.0 >= v0.12.0 >= 1.9

What's Grok?

Grok is a macro to simplify and reuse regexes, originally developed by Jordan Sissel.

This is a partial implementation of Grok's grammer that should meet most of the needs.

How It Works

You can use it wherever you used the format parameter to parse texts. In the following example, it extracts the first IP address that matches in the log.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_pattern %{IP:ip_address}
  </parse>
</source>

If you want to try multiple grok patterns and use the first matched one, you can use the following syntax:

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      pattern %{IP:ip_address}
    </grok>
    <grok>
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

Multiline support

You can parse multiple line text.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    grok_pattern %{IP:ip_address}%{GREEDYDATA:message}
    multiline_start_regexp /^[^\s]/
  </parse>
</source>

You can use multiple grok patterns to parse your data.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    <grok>
      pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
    </grok>
  </parse>
</source>

Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches.

You can use this parser without multiline_start_regexp when you know your data structure perfectly.

Configurations

  • See also: Config: Parse Section - Fluentd

  • time_format (string) (optional): The format of the time field.

  • grok_pattern (string) (optional): The pattern of grok. You cannot specify multiple grok pattern with this.

  • custom_pattern_path (string) (optional): Path to the file that includes custom grok patterns

  • grok_failure_key (string) (optional): The key has grok failure reason.

  • grok_name_key (string) (optional): The key name to store grok section's name

  • multi_line_start_regexp (string) (optional): The regexp to match beginning of multiline. This is only for "multiline_grok".

  • grok_pattern_series (enum) (optional): Specify grok pattern series set.

    • Default value: legacy.

<grok> section (optional) (multiple)

  • name (string) (optional): The name of this grok section
  • pattern (string) (required): The pattern of grok
  • keep_time_key (bool) (optional): If true, keep time field in the record.
  • time_key (string) (optional): Specify time field for event time. If the event doesn't have this field, current time is used.
    • Default value: time.
  • time_format (string) (optional): Process value using specified format. This is available only when time_type is string
  • timezone (string) (optional): Use specified timezone. one can parse/format the time value in the specified timezone.

Examples

Using grok_failure_key

<source>
  @type dummy
  @label @dummy
  dummy [
    { "message1": "no grok pattern matched!", "prog": "foo" },
    { "message1": "/", "prog": "bar" }
  ]
  tag dummy.log
</source>

<label @dummy>
  <filter>
    @type parser
    key_name message1
    reserve_data true
    reserve_time true
    <parse>
      @type grok
      grok_failure_key grokfailure
      <grok>
        pattern %{PATH:path}
      </grok>
    </parse>
  </filter>
  <match dummy.log>
    @type stdout
  </match>
</label>

This generates following events:

2016-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"}
2016-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"}

Using grok_name_key

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_name_key grok_name
    grok_failure_key grokfailure
    <grok>
      name apache_log
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      name ip_address
      pattern %{IP:ip_address}
    </grok>
    <grok>
      name rest_message
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

This will add keys like following:

  • Add grok_name: "apache_log" if the record matches HTTPD_COMBINEDLOG
  • Add grok_name: "ip_address" if the record matches IP
  • Add grok_name: "rest_message" if the record matches GREEDYDATA

Add grokfailure key to the record if the record does not match any grok pattern. See also test code for more details.

How to parse time value using specific timezone

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      name mylog-without-timezone
      pattern %{DATESTAMP:time} %{GREEDYDATE:message}
      timezone Asia/Tokyo
    </grok>
  </parse>
</source>

This will parse the time value as "Asia/Tokyo" timezone.

See Config: Parse Section - Fluentd for more details about timezone.

How to write Grok patterns

Grok patterns look like %{PATTERN_NAME:name} where ":name" is optional. If "name" is provided, then it becomes a named capture. So, for example, if you have the grok pattern

%{IP} %{HOST:host}

it matches

127.0.0.1 foo.example

but only extracts "foo.example" as {"host": "foo.example"}

Please see patterns/* for the patterns that are supported out of the box.

How to add your own Grok pattern

You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it. This is what the custom_pattern_path parameter is for.

<source>
  @type tail
  path /path/to/log
  <parse>
    @type grok
    grok_pattern %{MY_SUPER_PATTERN}
    custom_pattern_path /path/to/my_pattern
  </parse>
</source>

custom_pattern_path can be either a directory or file. If it's a directory, it reads all the files in it.

FAQs

1. How can I convert types of the matched patterns like Logstash's Grok?

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

grok_pattern %{GROK_PATTERN:NAME:TYPE}...

e.g.,

grok_pattern %{INT:foo:integer}

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

  • string
  • bool
  • integer ("int" would NOT work!)
  • float
  • time
  • array

For the time and array types, there is an optional 4th field after the type name. For the "time" type, you can specify a time format like you would in time_format.

For the "array" type, the third field specifies the delimiter (the default is ","). For example, if a field called "item_ids" contains the value "3,4,5", types item_ids:array parses it as ["3", "4", "5"]. Alternatively, if the value is "Adam|Alice|Bob", types item_ids:array:| parses it as ["Adam", "Alice", "Bob"].

Here is a sample config using the Grok parser with in_tail and the types parameter:

<source>
  @type tail
  path /path/to/log
  format grok
  grok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float}
  tag payment
</source>

Notice

If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.x.

See also: Plugin Management | Fluentd

License

Apache 2.0 License

More Repositories

1

fluentd

Fluentd: Unified Logging Layer (project under CNCF)
Ruby
12,329
star
2

fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
C
5,812
star
3

fluentd-kubernetes-daemonset

Fluentd daemonset for Kubernetes and it Docker image
Ruby
1,210
star
4

fluentd-ui

Web UI for Fluentd
Ruby
596
star
5

fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Go
580
star
6

fluent-bit-kubernetes-logging

Fluent Bit Kubernetes Daemonset
466
star
7

fluentd-docker-image

Docker image for Fluentd
Dockerfile
452
star
8

fluent-logger-python

A structured logger for Fluentd (Python)
Python
424
star
9

fluent-logger-golang

A structured logger for Fluentd (Golang)
Go
380
star
10

helm-charts

Helm Charts for Fluentd and Fluent Bit
Mustache
376
star
11

fluent-plugin-s3

Amazon S3 input and output plugin for Fluentd
Ruby
308
star
12

fluent-plugin-kafka

Kafka input and output plugin for Fluentd
Ruby
298
star
13

fluentd-forwarder

Fluentd Forwarder: Lightweight Data Collector in Golang
Go
283
star
14

fluent-logger-node

A structured logger for Fluentd (Node.js)
JavaScript
257
star
15

fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Ruby
253
star
16

fluent-logger-ruby

A structured logger for Fluentd (Ruby)
Ruby
251
star
17

fluent-logger-php

A structured logger for Fluentd (PHP)
PHP
216
star
18

fluent-logger-java

A structured logger for Fluentd (Java)
Java
205
star
19

sigdump

Use signal to show stacktrace of a Ruby process without restarting it
Ruby
183
star
20

fluent-bit-go

Fluent Bit Golang package to build plugins
Go
173
star
21

fluent-plugin-mongo

MongoDB input and output plugin for Fluentd
Ruby
171
star
22

fluent-plugin-rewrite-tag-filter

Fluentd Output filter plugin to rewrite tags that matches specified attribute.
Ruby
168
star
23

fluent-bit-docs

Fluent Bit - Official Documentation
Shell
119
star
24

fluent-plugin-sql

SQL input/output plugin for Fluentd
Ruby
102
star
25

nginx-fluentd-module

Nginx module for Fluentd data collector
C
85
star
26

fluent-bit-docker-image

Docker image for Fluent Bit
Shell
67
star
27

fluent-plugin-webhdfs

Hadoop WebHDFS output plugin for Fluentd
Ruby
59
star
28

fluent-plugin-opensearch

OpenSearch Plugin for Fluentd
Ruby
49
star
29

fluentd-docs

This repository is deprecated. Go to fluentd-docs-gitbook repository.
Ruby
49
star
30

fluentd-benchmark

Benchmark collection of fluentd use cases
Shell
47
star
31

fluent-logger-scala

A structured logger implementation in Scala.
Shell
45
star
32

NLog.Targets.Fluentd

C#
44
star
33

fluent-logger-perl

A structured logger for Fluentd (Perl)
Perl
43
star
34

fluent-plugin-multiprocess

Multiprocess agent plugin for Fluentd
Ruby
42
star
35

fluentd-docs-gitbook

Fluentd documentation project in Gitbook format
JavaScript
41
star
36

fluent-plugin-splunk

Fluentd Plugin for Splunk
Ruby
38
star
37

fluent-plugin-windows-eventlog

Fluentd plugin to collect windows event logs
Ruby
33
star
38

fluent-plugin-parser-cri

CRI log parser for Fluentd
Ruby
32
star
39

fluent-bit-perf

Fluent Bit Performance Tools
C
31
star
40

fluent-plugin-flume

Flume input and output plugin for Fluentd
Ruby
23
star
41

kafka-connect-fluentd

Kafka Connect for Fluentd
Java
23
star
42

chunkio

Simple library to manage chunks of data in memory and file system
C
21
star
43

fluent-package-builder

td-agent (Fluentd) Building and Packaging System
Shell
21
star
44

fluent-plugin-scribe

Scribe input/output plugin for Fluentd data collector
Ruby
20
star
45

fluent-plugins

18
star
46

cmetrics

A standalone library to create and manipulate metrics in C
C
15
star
47

website

http://fluentd.org/
CSS
14
star
48

fluent-plugin-sanitizer

Ruby
14
star
49

fluent-bit-plugin

Fluent Bit Dynamic Plugin Development
C
13
star
50

fluent-bit-packaging

Fluent Bit Linux Packaging environment using Docker
Dockerfile
12
star
51

fluent-logger-forward-node

A fluent forward protocol implementation for Node.js
TypeScript
11
star
52

fluentd-website

For fluentd.org
CSS
10
star
53

fluent-logger-erlang

A structured logger for Fluentd (Erlang)
Erlang
10
star
54

fluent-bit-ci

CI/CD for Fluent-bit
Shell
8
star
55

fluent-plugin-msgpack-rpc

MessagePack-RPC input plugin for Fluentd data collector
Ruby
8
star
56

fluent-logger-ocaml

A structured logger for Fluentd (OCaml)
OCaml
7
star
57

fluent-plugin-hoop

Hoop (HDFS over HTTP) Plugin for Fluentd data collector
Ruby
6
star
58

data-collection

Data Collection with Fluentd
6
star
59

fluent-logger-d

A structured logger for Fluentd (D)
JavaScript
6
star
60

diagtool

Bringing productivity of trouble shooting to the next level by automating collection of Fluentd configurations, settings and OS parameters as well as masking sensitive information in logs and configurations.
Ruby
5
star
61

fluent-bit-tutorials

Fluent Bit Tutorials, custom articles to get started
5
star
62

m3-workshop-fluentcon

Shell
4
star
63

fluentbit-website-v3

CSS
4
star
64

fluent.github.com

website
JavaScript
4
star
65

fluentd-aggregator-docker-image

A Fluentd container image to be used for log aggregation and based on the official Fluentd Docker image.
Dockerfile
4
star
66

fluent-bit-observability-demo

JavaScript
3
star
67

fluent-bit-docs-stream-processing

Fluent Bit Stream Processing Guide
3
star
68

onigmo

Onigmo library with security and stable patches on top by Fluent maintainers
C
3
star
69

fluent-bit-website

Fluent Bit Website (work in process)
HTML
3
star
70

fluent-plugin-sd-dns

DNS based service discovery plugin for Fluentd
Ruby
2
star
71

fluent-bit-test

Testing infrastructure for Fluent Bit
2
star
72

fluent-bit-labs

Fluent Bit Dev Labs
2
star
73

fluent-bit-website-old

Fluent Bit website
CSS
2
star
74

fluentbit-website-v2

Fluent Bit Website v2
CSS
2
star
75

fluent-plugin-buffer-chunkio

Ruby
2
star
76

fluent-bit-chatops-demo

Demo of using Fluent Bit for ChatOps - created for Cloud Native Rejekts EU 2024 talk
Java
2
star
77

fluent-bit-infra

Automation related to fluent-bit infrastructure
HCL
2
star
78

fluent-plugin-parser-winevt_xml

Fluentd Parser plugin to parse XML rendered windows event log.
Ruby
1
star
79

cfl

Tiny library for data structures management, call it c:\ floppy
C
1
star
80

fluentd-docs-kubernetes

Fluentd DaemonSet Documentation for Kubernetes
1
star
81

fluent-bit-sandbox

A repository to covering the setup and configuration of the Fluent Bit Sandbox.
Shell
1
star
82

fluent-plugin-prometheus_pushgateway

Ruby
1
star
83

fluentd-website-hugo

SCSS
1
star
84

ctraces

Library to create and manipulate traces in C
C
1
star