avro-cli-examples
Examples on how to use the command line tools in Avro Tools to read and write Avro files.
See my original article Reading and Writing Avro Files From the Command Line for more information on using Avro Tools.
Table of Contents
- Getting Avro Tools
- JSON to binary Avro
- Binary Avro to JSON
- Retrieve Avro schema from binary Avro
- Related tools
Getting Avro Tools
You can get a copy of the latest stable Avro Tools jar file from the
Avro Releases page. The actual file is in the java
subdirectory
of a given Avro release version.
Here is a direct link to avro-tools-1.11.0.jar (55 MB) on the US Apache mirror site.
# Download the Avro Tools jar to the current local directory.
# The examples below assume the jar is in the current directory.
$ curl -O -J https://dlcdn.apache.org/avro/avro-1.11.0/java/avro-tools-1.11.0.jar
File overview
- twitter.avro — data records in uncompressed binary Avro format
- twitter.snappy.avro — data records in Snappy-compressed binary Avro format
- twitter.avsc — Avro schema of the example data
- twitter.json — data records in plain-text JSON format
- twitter.pretty.json — data records in pretty-printed JSON format
JSON to binary Avro
Without compression:
$ java -jar avro-tools-1.11.0.jar fromjson --schema-file twitter.avsc twitter.json > twitter.avro
With Snappy compression:
$ java -jar avro-tools-1.11.0.jar fromjson --codec snappy --schema-file twitter.avsc twitter.json > twitter.snappy.avro
Binary Avro to JSON
The same command works on both uncompressed and compressed data.
$ java -jar avro-tools-1.11.0.jar tojson twitter.avro > twitter.json
$ java -jar avro-tools-1.11.0.jar tojson twitter.snappy.avro > twitter.json
Example:
$ java -jar avro-tools-1.11.0.jar tojson twitter.avro
returns
{"username":"miguno","tweet":"Rock: Nerf paper, scissors is fine.","timestamp": 1366150681 }
{"username":"BlizzardCS","tweet":"Works as intended. Terran is IMBA.","timestamp": 1366154481 }
You can also pretty-print the JSON output with the -pretty
parameter:
$ java -jar avro-tools-1.11.0.jar tojson -pretty twitter.avro > twitter.pretty.json
$ java -jar avro-tools-1.11.0.jar tojson -pretty twitter.snappy.avro > twitter.pretty.json
Example:
$ java -jar avro-tools-1.11.0.jar tojson -pretty twitter.avro
returns
{
"username" : "miguno",
"tweet" : "Rock: Nerf paper, scissors is fine.",
"timestamp" : 1366150681
}
{
"username" : "BlizzardCS",
"tweet" : "Works as intended. Terran is IMBA.",
"timestamp" : 1366154481
}
Retrieve Avro schema from binary Avro
The same command works on both uncompressed and compressed data.
$ java -jar avro-tools-1.11.0.jar getschema twitter.avro > twitter.avsc
$ java -jar avro-tools-1.11.0.jar getschema twitter.snappy.avro > twitter.avsc
Example:
$ java -jar avro-tools-1.11.0.jar getschema twitter.avro
{
"type" : "record",
"name" : "twitter_schema",
"namespace" : "com.miguno.avro",
"fields" : [ {
"name" : "username",
"type" : "string",
"doc" : "Name of the user account on Twitter.com"
}, {
"name" : "tweet",
"type" : "string",
"doc" : "The content of the user's Twitter message"
}, {
"name" : "timestamp",
"type" : "long",
"doc" : "Unix epoch time in seconds"
} ],
"doc:" : "A basic schema for storing Twitter messages"
}
Related tools
You can also take a look at the CLI tools avrocat, avromod, and avropipe that are part of the Avro suite. You must build these tools yourself by following their respective INSTALL instructions.