• Stars
    star
    132
  • Rank 274,205 (Top 6 %)
  • Language
    Go
  • License
    MIT License
  • Created over 7 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A parser for MySQL binlog files that creates JSON messages. Useful for creating a stream of database events that can be stored and analyzed.

binlog-parser

Build Status

A tool for parsing a MySQL binlog file to JSON. Reads a binlog input file, queries a database for field names, writes JSON to stdout. The output looks like this:

{
    "Header": {
        "Schema": "test_db",
        "Table": "buildings",
        "BinlogMessageTime": "2017-04-13T06:34:30Z",
        "BinlogPosition": 397,
        "XId": 9
    },
    "Type": "Insert",
    "Data": {
        "Row": {
            "address": "3950 North 1st Street CA 95134",
            "building_name": "ACME Headquaters",
            "building_no": 1
        },
        "MappingNotice": ""
    }
}
...

Installation

Requires Go version 1.7 or higher.

$ git clone https://github.com/zalora/binlog-parser.git
$ cd binlog-parser
$ git submodule update --init
$ make
$ ./bin/binlog-parser -h

Assumptions

  • It is assumed that MySQL row-based binlog format is used (or mixed, but be aware, that then only the row-formatted data in mixed binlogs can be extracted)
  • This tool is written with MySQL 5.6 in mind, although it should also work for MariaDB when GTIDs are not used

Usage

Run binlog-parser -h to get the list of available options:

Usage:	binlog-parser [options ...] binlog

Options are:

  -alsologtostderr
    	log to standard error as well as files
  -include_schemas string
    	comma-separated list of schemas to include
  -include_tables string
    	comma-separated list of tables to include
  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory
  -logtostderr
    	log to standard error instead of files
  -prettyprint
    	Pretty print json
  -stderrthreshold value
    	logs at or above this threshold go to stderr
  -v value
    	log level for V logs
  -vmodule value
    	comma-separated list of pattern=N settings for file-filtered logging

Required environment variables:

DB_DSN	 Database connection string, needs read access to information_schema

Example usage

Using dbuser and no password, connecting to information_schema database on localhost, parsing the binlog file /some/binlog.bin:

DB_DSN=dbuser@/information_schema ./binlog-parser /some/binlog.bin

Matching field names and data

The mysql binlog format doesn't include the fieldnames for row events (INSERT/UPDATE/DELETE). As the goal of the parser is to output usable JSON, it connects to a running MySQL instance and queries the information_schema database for information on field names in the table.

The database connection is creatd by using the environment variable DB_DSN, which should contain the database credentials in the form of user:password@/dbname - the format that the Go MySQL driver uses.

Effect of schema changes

As this tool doesn't keep an internal representation of the database schema, it is very well possible that the database schema and the schema used in the queries in the binlog file already have diverged (e. g. parsing a binlog file from a few days ago, but the schema on the main database already changed by dropping or adding columns).

The parser will NOT make an attempt to map data to fields in a table if the information schema retuns more or too less columns compared to the format found in the binlog. The field names will be mapped as "unknown":

{
    "Header": {
        "Schema": "test_db",
        "Table": "employees",
        "BinlogMessageTime": "2017-04-13T08:02:04Z",
        "BinlogPosition": 635,
        "XId": 8
    },
    "Type": "Insert",
    "Data": {
        "Row": {
            "(unknown_0)": 1,
            "(unknown_1)": "2017-04-13",
            "(unknown_2)": "Max",
            "(unknown_3)": "Mustermann"
        },
        "MappingNotice": "column names array is missing field(s), will map them as unknown_*"
    }
}

More complex case

Changing the order of fields in a table can lead to unexpected parser results. Consider an example binlog file A.bin. A query like INSERT INTO db.foo SET field_1 = 10, field_2 = 20 will look in the binlog like this:

...
### INSERT INTO `db`.`foo`
### SET
###   @1=20 /* ... */
###   @2=20 /* ... */
...

The parser queries information_schema for the field names of the db.foo table:

+-------------+-----+
| Field       | ... |
+-------------+-----+
| field_1     | ... |
| field_2     | ... |
+-------------+-----+

The fields will be mapped by the parser in the order as specified in the table and the JSON will look like this:

{
    ...
    "Type": "Insert",
    "Data": {
        "Row": {
            "field_1": 10,
            "field_2": 20
        }
    }
}
...

Now if a schema change happened after some time, db.foo fields might look now like this (the order of the fiels changed):

+-------------+-----+
| Field       | ... |
+-------------+-----+
| field_2     | ... |
| field_1     | ... |
+-------------+-----+

If you parse the same binlog file A.bin now again, but against the new schema of db.foo (in which the fields changed position), the resulting JSON will look like that:

{
    ...
    "Type": "Insert",
    "Data": {
        "Row": {
            "field_2": 10,
            "field_1": 20
        }
    }
}
...

This means you have to be very careful when parsing old binlog files, as the db schema can have evolved since the binlog was generated and the parser has no way of knowing of these changes.

If this limitation is not acceptable, some tools like Maxwell's Daemon by Zendesk can work around that issue at the cost of greater complexity.

Releases

How to do a release:

git tag -a X.X.X -m "... release note ..."
git push --follow-tags

Travis CI will attach a statically built binary to the release tag on GitHub.

More Repositories

1

upcast

Abandonware
Nix
90
star
2

microgram

ABANDONED
Nix
50
star
3

sproxy

See http://hackage.haskell.org/package/sproxy2
Haskell
35
star
4

redsift

Web interface to Amazon Redshift for exploring its data
Haskell
19
star
5

deployix

zalora nix expressions library
Nix
17
star
6

nixsap

Nix
10
star
7

zerobin

Share secrets via 0bin sites running https://github.com/sametmax/0bin
Haskell
9
star
8

aws-ec2

Now maintained by: See https://github.com/memcachier/aws-ec2
Haskell
8
star
9

zalora-capture

Reporting bugs from Chrome browser
JavaScript
7
star
10

postgresql-user-manager

A simple command-line tool for managing User Privileges in PostgreSQL Database.
Haskell
6
star
11

jenkins4nix

Obsolete. See https://github.com/zalora/jenkins4nix/issues/1
Haskell
5
star
12

global-scss

This repo holds the ZALORA global scss definitions which are the base styling for every web project
CSS
5
star
13

punyan

PHP implementation of the Bunyan logger
PHP
5
star
14

styleguide

Elm
4
star
15

replicator

ABANDONED. MySQL/MariaDB slaves made easy
Haskell
4
star
16

kraken

Haskell
4
star
17

Webpack-Workshop-WWC

Women Who Code workshop on Webpack
JavaScript
4
star
18

normalize-imports

Sort and align Haskell import statements
Haskell
3
star
19

ares

(Deprecated) application server for Microgram
Haskell
3
star
20

myrapi

myracloud api client
Haskell
2
star
21

taurus-docker

Custom AWS CodeBuild Docker Container for Taurus
HTML
2
star
22

configserver

Spring Cloud Server on ElasticBeanstalk
Java
2
star
23

gctuner

GC log goes in, suggested tuning options go out
Haskell
2
star
24

journal-mailer

daemon that sends out email notifications on error messages in systemd's journal
Haskell
2
star
25

mywatch

Haskell
2
star
26

go-mq

A message queue library in Go.
Go
1
star
27

octopus

authenticated, supervised, serialized and throttled access to shell commands
Haskell
1
star
28

json-selector-lang

A tiny language for selecting JSON values.
TypeScript
1
star
29

juandelacosa

HTTP server for managing MariaDB users
Haskell
1
star
30

zcast

Added server side compression to Hazelcast
Java
1
star
31

sproxy-web

Web frontend to the sproxy service
Haskell
1
star
32

cron-syslog-viewer

script to turn syslog files into a visualization of cronjob start and end times
Haskell
1
star
33

heavy-sync

Synchronize huge cloud buckets with ease
Python
1
star