• Stars
    star
    173
  • Rank 212,968 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.

Singer logo   Singer

High-performance, reliable and extensible logging agent

Singer is a high performance logging agent for uploading logs to Kafka. It can also be extended to support writing to other message transporters or storage systems.

Singer runs as a standalone process on the service boxes. It monitors the log directories by listening to file system events, and uploads data once it detects new data. Singer guarantees at least one time delivery of log messages.

Key Features:

  • Support thrift log format and text log format out-of-box: Thrift log format provides better throughput and efficiency. We highly recommend you use thrift log format if your logs will not be consumed directly by humans. To facilitate thrift log format usage, we build a set of client libraries in Python, Java, and Go for converting text log messages into JSON and thrift formats.

  • At-least-once message delivery to Kafka: Singer will retry when it fails to upload a batch of messages. For each log stream, Singer uses a watermark file to track its progress. When Singer restarts, it processes messages from the watermark position.

  • Support logging in Kubernetes as a side-car service. Logging in Kubernetes as a daemonset. Singer can monitor and upload loads from log directories of multiple Kubernetes pods.

  • High throughput writes to Kafka: Singer uses multiple layers of thread pool to achieve maximum parallelism. Using thrift log format, Singer can achieve >100MB/second writing throughput to Kafka from one host. Singer can process text logs at 20MB/second.

  • Low latency logging: Singer supports configurable processing latency and batch sizes, it can achieve <5ms log uploading latency.

  • Flexible partitioning: Singer provides multiple partitioners for writing data to Kafka, including locality aware partitioners that can avoid producer traffic across availability zones and reduce data transfer costs. Singer also supports customized partitioner.

  • Heartbeat: Singer supports sending heartbeats to a kafka topic periodically based on the configuration. This allows the users to set up central monitoring of Singer instances across fleets.

  • Write auditing: Singer can write an audit message to another topic for each batch of messages that it writes to kafka. This allow users to audit Singer kafka writes.

  • Extensible design: Singer can be easily extended to support data uploading to custom destinations.

Detailed design

Please see docs/DESIGN.md on Singer design.

Build

Get Singer code

git clone [git-repo-url] singer
cd singer

Build Singer binary

mvn clean package -pl singer -am -DskipTests

As there is no native support in JDK for file system events monitoring on Mac OSX, some tests that run fine in the Linux environment may fail intermittently on Mac OSX. Please use -DskipTests flag if you want to build Singer on macOS.

Build thift-logger client library

mvn clean package -pl thrift-logger -am

Testing

Singer has a set of unit tests that can be run through mvn test package -pl singer -am.

An end-to-end integration test that can be run through:

mvn clean package -pl singer -am 
singer/src/main/scripts/run_singer_tests.sh

Quick Start

The tutorial directory contains a demo that shows how to run Singer. Please see tutorial/README.md for details.

Usage

Use Singer client library to log data to local disk

Singer uses file inode + offset as the watermark position to track its progress, and writes the watermark info to disk after it writes a batch of messages to kafka. It resumes from the last watermark position after restarting. Because of this, Singer requires that a log stream is a sequence of append-only log files, and uses file renaming for log rotation.

Singer does not handle log streams that use file copy and truncation for log rotation, because Singer cannot use file inode + offset to uniquely identify log messages when a log file is copied and truncated.

For example, we have before rotation:

ls -li 
  1001    service.log      # service.log with inode 1001

after rotation

 ls -li 
 
   1001   service.log.2018-11-30   # service.log.2018-11-30 with inode 1001 (was renamed from the old service.log)
   1002   service.log              # (this was newly generated service.log)

For logged data in plaintext format, you can directly config Singer to upload those logs. Singer also support high throughput logging using thrift format. You can write data to local disk using thrift-logger library that Singer provides. Currently Singer has thrift_logger libraries in Python, Java, Go, and C++.

Samples on using thrift_logger libraies:

Config Singer to upload data from local disk to Kafka

Singer uploads data based on configuration settings. Singer configuration is composed of two parts: 1) singer.properties that configures global Singer settings, e.g. size of thread pools, daily restart settings, heartbeat settings, etc. 2) log stream configuration: for each set of log streams, singer needs one log stream configuration to define log stream related settings.

Please see tutorial/etc/singer for singer configurations. docs/configuration_samples/sample_kubernetes has an example on Singer configuration for Kubernetes.

Run Singer

java -server  -cp $singer_home:$singer_home/lib/*:$singer_home/singer-$version.jar  \
     -Dlog4j.configuration=log4j.prod.properties -Dsinger.config.dir=$config_dir \
     com.pinterest.singer.SingerMain

Package Singer as a debian package

tar xzvf singer-${VERSION}-bin.tar.gz --directory $SINGER_DIR
cd $BUILD_DIR

fpm -s dir -t deb -n singer -v $VERSION --deb-upstart ../singer.upstart  \
    --deb-default ../singer.default -- .

Singer Metrics

Singer exposes metrics using Twitter Ostrich framework. Singer stats can be checked using the following command. Here 2047 is the ostrich port that you define in singer.ostrichPort configuration.

curl -s localhost:2047/stats.txt

License

Singer is distributed under Apache License, Version 2.0.

More Repositories

1

ktlint

An anti-bikeshedding Kotlin linter with built-in formatter
Kotlin
6,006
star
2

gestalt

A set of React UI components that supports Pinterest’s design language
JavaScript
4,205
star
3

PINRemoteImage

A thread safe, performant, feature rich image fetcher
C
3,998
star
4

PINCache

Fast, non-deadlocking parallel object cache for iOS, tvOS and OS X
Objective-C
2,644
star
5

secor

Secor is a service implementing Kafka log persistence
Java
1,832
star
6

teletraan

Teletraan is Pinterest's deploy system.
Java
1,792
star
7

querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
TypeScript
1,728
star
8

knox

Knox is a secret management service
Go
1,216
star
9

pinball

Pinball is a scalable workflow manager
JavaScript
1,047
star
10

mysql_utils

Pinterest MySQL Management Tools
Python
878
star
11

elixometer

A light Elixir wrapper around exometer.
Elixir
827
star
12

snappass

Share passwords securely
Python
812
star
13

pymemcache

A comprehensive, fast, pure-Python memcached client.
Python
740
star
14

bonsai

Understand the tree of dependencies inside your webpack bundles, and trim away the excess.
JavaScript
739
star
15

esprint

Fast eslint runner
JavaScript
657
star
16

bender

An easy-to-use library for creating load testing applications
Go
654
star
17

rocksplicator

RocksDB Replication
C++
640
star
18

DoctorK

DoctorK is a service for Kafka cluster auto healing and workload balancing
Java
633
star
19

plank

A tool for generating immutable model objects
Swift
469
star
20

riffed

Provides idiomatic Elixir bindings for Apache Thrift
Elixir
307
star
21

thrift-tools

thrift-tools is a library and a set of tools to introspect Apache Thrift traffic.
Python
229
star
22

elixir-thrift

A Pure Elixir Thrift Implementation
Elixir
212
star
23

widgets

JavaScript widgets, including the Pin It button.
JavaScript
195
star
24

terrapin

Serving system for batch generated data sets
Java
176
star
25

git-stacktrace

Easily figure out which git commit caused a given stacktrace
Python
157
star
26

jbender

An easy-to-use library for creating load testing applications.
Java
155
star
27

ptracer

A library for ptrace-based tracing of Python programs
Python
154
star
28

react-pinterest

JavaScript
153
star
29

pinlater

PinLater is a Thrift service to manage scheduling and execution of asynchronous jobs.
Java
135
star
30

it-cpe-cookbooks

A suite of Chef cookbooks that we use to manage our fleet of client devices
Ruby
117
star
31

memq

MemQ is an efficient, scalable cloud native PubSub system
Java
111
star
32

psc

PubSubClient (PSC)
Java
110
star
33

pinterest-api-demo

JavaScript
105
star
34

PINOperation

Objective-C
102
star
35

api-quickstart

Code that makes it easy to get started with the Pinterest API.
Python
100
star
36

soundwave

A searchable EC2 Inventory store
Java
97
star
37

orion

Management and automation platform for Stateful Distributed Systems
Java
94
star
38

PINFuture

An Objective-C future implementation that aims to provide maximal type safety
Objective-C
81
star
39

kingpin

KingPin is the toolset used at Pinterest for service discovery and application configuration.
Python
69
star
40

arcanist-linters

A collection of custom Arcanist linters
PHP
61
star
41

pagerduty-monit

Wrapper scripts to integrate monit and PagerDuty.
Shell
60
star
42

pinrepo

Pinrepo is a highly scalable solution for storing and serving build artifacts such as debian packages, maven jars and pypi packages.
Python
57
star
43

quasar-thrift

A Thrift server that uses Quasar's lightweight threads to handle connections.
Java
47
star
44

yuvi

Yuvi is an in-memory storage engine for recent time series metrics data.
Java
45
star
45

transformer_user_action

Transformer-based Realtime User Action Model for Recommendation at Pinterest
Python
44
star
46

pinterest-python-sdk

An SDK that makes it quick and easy to build applications with Pinterest API.
Python
35
star
47

slackminion

A python bot framework for slack
Python
22
star
48

atg-research

Python
20
star
49

l10nmessages

L10nMessages is a library that makes internationalization (i18n) and localization (l10n) of Java applications easy and safe.
Java
17
star
50

arcanist-owners

An Arcanist extension for displaying file ownership information
PHP
16
star
51

api-description

OpenAPI descriptions for Pinterest's REST API
15
star
52

thriftcheck

A linter for Thrift IDL files
Go
13
star
53

.github

Pinterest's Open Source Project Template
11
star
54

pinterest-python-generated-api-client

This is the auto-generated code using OpenAPI generator. Generated code comprises HTTP requests to various v5 API endpoints.
Python
10
star
55

homebrew-tap

macOS Homebrew formulas to install Pinterest open source software
Ruby
9
star
56

wheeljack

Work with interdependent python repositories seemlessly.
Python
8
star
57

vscode-gestalt

Visual Studio Code extension for Gestalt, Pinterest's design system
TypeScript
7
star
58

ffffound

FFFFOUND Import tool for Pinterest
HTML
6
star
59

vscode-package-watcher

Watch package lock files and suggest to re-run npm or yarn.
TypeScript
5
star
60

graphql-lint-rules

Pinterest GraphQL Lint Rules
TypeScript
5
star
61

ss-gtm-template

This is a repository to implement the Google Tag Manager server-side tag template for Pinterest API for Conversions to be deployed into Google Community Template Gallery.
Smarty
4
star
62

pinterest-magento2-extension

PHP
3
star
63

Pinterest-Salesforce-Commerce-Cartridge

JavaScript
2
star
64

slate

Resource Lifecycle Management framework
Java
1
star