Discover loggie-io/loggie Open Source project by @loggie-io

English | 中文

Loggie is a lightweight, high-performance, cloud-native agent and aggregator based on Golang.

Supports multiple pipeline and pluggable components, including data transfer, filtering, parsing, and alerting.
Uses native Kubernetes CRD for operation and management.
Offers a range of observability, reliability, and automation features suitable for production environments.

Based on Loggie, we can build a cloud-native scalable log data platform.

Features

Next-generation cloud-native log collection and transmission

Building pipelines based on CRD

Loggie includes LogConfig/ClusterLogConfig/Interceptor/Sink CRDs, allowing for the creation of data collection, transfer, processing, and sending pipelines through simple YAML file creation.

eg:

apiVersion: loggie.io/v1beta1
kind: LogConfig
metadata:
  name: tomcat
  namespace: default
spec:
  selector:
    type: pod
    labelSelector:
      app: tomcat
  pipeline:
    sources: |
      - type: file
        name: common
        paths:
          - stdout
          - /usr/local/tomcat/logs/*.log
    sinkRef: default
    interceptorRef: default

Multiple architectures

Agent: Deployed via DaemonSet, Loggie can collect log files without the need for containers to mount volumes.
Sidecar: Supports non-intrusive auto-injection of Loggie sidecars, without the need to manually add them to the Deployment/StatefulSet templates.
Aggregator: Supports deployment as an independent intermediate machine, which can receive aggregated data sent by Loggie Agent and can also be used to consume and process various data sources.

But regardless of the deployment architecture, Loggie still maintains a simple and intuitive internal design.

High Performance

Benchmark

Configure Filebeat and Loggie to collect logs, and send them to a Kafka topic without using client compression, with the Kafka topic partition configured as 3.

With sufficient resources for the Agent specification, modify the number of files collected, the concurrency of the sending client (configure Filebeat worker and Loggie parallelism), and observe their respective CPU, memory, and pod network card transmission rates.

Agent	File Size	File Count	Sink Concurrency	CPU	MEM (rss)	Transmission Rates
Filebeat	3.2G	1	3	7.5~8.5c	63.8MiB	75.9MiB/s
Filebeat	3.2G	1	8	10c	65MiB	70MiB/s
Filebeat	3.2G	10	8	11c	65MiB	80MiB/s

Loggie	3.2G	1	3	2.1c	60MiB	120MiB/s
Loggie	3.2G	1	8	2.4c	68.7MiB	120MiB/s
Loggie	3.2G	10	8	3.5c	70MiB	210MiB/s

Adaptive Sink Concurrency

With sink concurrency configuration enabled, Loggie can:

Automatically adjust the downstream data sending parallelism based on the actual downstream data response, making full use of the downstream server's performance without affecting its performance.
Adjust the downstream data sending speed appropriately when upstream data collection is blocked to relieve upstream blocking.

Lightweight Streaming Data Analysis and Monitoring

Logs are a universal data type and are not related to platforms or systems. How to better utilize this data is the core capability that Loggie focuses on and develops.

Real-time parsing and transformation

With the configuration of transformer interceptors and the configuration of functional actions, Loggie can achieve:

Parsing of various data formats (json, grok, regex, split, etc.)
Conversion of various fields (add, copy, move, set, del, fmt, etc.)
Support for conditional judgment and processing logic (if, else, return, dropEvent, ignoreError, etc.)

eg:

interceptors:
  - type: transformer
    actions:
      - action: regex(body)
        pattern: (?<ip>\S+) (?<id>\S+) (?<u>\S+) (?<time>\[.*?\]) (?<url>\".*?\") (?<status>\S+) (?<size>\S+)
      - if: equal(status, 404)
        then:
          - action: add(topic, not_found)
          - action: return()
      - if: equal(status, 500)
        then:
          - action: dropEvent()

Detection, recognition, and alerting

Helps you quickly detect potential problems and anomalies in the data and issue timely alerts. Support custom webhooks to connect to various alert channels.

Supports matching methods such as:

No data: no log data generated within the configured time period.
Fuzzy matching
Regular expression matching
Conditional judgment
- Field comparison: equal/less/greater...

Log data aggregation and monitoring

Often, metric data is not only exposed through prometheus exporters, but log data itself can also provide a source of metrics. For example, by counting the access logs of a gateway, you can calculate the number of 5xx or 4xx status codes within a certain time interval, aggregate the qps of a certain interface, and calculate the total amount of body data, etc.

eg:

- type: aggregator
  interval: 1m
  select:
    # operator：COUNT/COUNT-DISTINCT/SUM/AVG/MAX/MIN
    - {key: amount, operator: SUM, as: amount_total}
    - {key: quantity, operator: SUM, as: qty_total}
  groupBy: ["city"]
  calculate:
    - {expression: " ${amount_total} / ${qty_total} ", as: avg_amount}

Observability and fast troubleshooting

Loggie provides configurable and rich metrics, and dashboards that can be imported into Grafana with one click.
Quickly troubleshoot Loggie itself and any problems in data transmission by Loggie terminal.

FAQs

Loggie vs Filebeat/Fluentd/Logstash/Flume

	Loggie	Filebeat	Fluentd	Logstash	Flume
Language	Golang	Golang	Ruby	JRuby	Java
Multiple Pipelines	✓	single queue	single queue	✓	✓
Multiple output	✓	one output	copy	✓	✓
Aggregator	✓	✓	✓	✓	✓
Log Alarm	✓
Kubernetes container log collection	support container stdout and logs files in container	stdout	stdout
Configuration delivery	through CRD	manual	manual	manual	manual
Monitoring	support Prometheus metrics，and can be configured to output indicator log files separately, sending metrics, etc.		prometheus metrics	need exporter	need exporter
Resource Usage	low	low	average	high	high