• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 8 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ruby DSL to create Avro schemas

Avro::Builder

Build Status Gem Version

Avro::Builder provides a Ruby DSL to create Apache Avro Schemas.

This DSL was created because:

  • The Avro IDL is not supported in Ruby.
  • The Avro IDL can only be used to define Protocols.
  • Schemas can be extracted as JSON from an IDL Protocol but support for imports is still limited.

Additional background on why we developed avro-builder is provided here.

Features

  • The syntax is designed for ease-of-use.
  • Definitions can be imported by name. This includes auto-loading from a configured set of paths. This allows definitions to split across files and even reused between projects.
  • Record definitions can inherit from other record definitions.
  • Schema Store to load files written in the DSL and return Avro::Schema objects.

Limitations

  • Only Avro Schemas, not Protocols are supported.
  • See Issues for functionality that has yet to be implemented.
  • This is beta quality code. There may be breaking changes until version 1.0 is released.

Installation

Add this line to your application's Gemfile:

gem 'avro-builder'

And then execute:

$ bundle

Or install it yourself as:

$ gem install avro-builder

Railtie

When included in a Rails project, #{Rails.root}/avro/dsl is configured as a load path for the DSL.

A rake task is also defined for generating Avro JSON schemas from the DSL.

Usage

To use Avro::Builder, define a schema:

namespace 'com.example'

fixed :password, 8

enum :user_type, :ADMIN, :REGULAR

record :user do
  required :id, :long
  required :user_name, :string
  required :type, :user_type, default: :REGULAR
  required :pw, :password
  optional :full_name, :string
  required :nicknames, :array, items: :string
  required :permissions, :map, values: :bytes
end

The schema definition may be passed as a string or a block to Avro::Builder.build.

This generates the following Avro JSON schema:

{
  "type": "record",
  "name": "user",
  "namespace": "com.example",
  "fields": [
    {
      "name": "id",
      "type": "long"
    },
    {
      "name": "user_name",
      "type": "string"
    },
    {
      "name": "type",
      "type": {
        "name": "user_type",
        "type": "enum",
        "symbols": [
          "ADMIN",
          "REGULAR"
        ],
        "namespace": "com.example"
      },
      "default": "REGULAR"
    },
    {
      "name": "pw",
      "type": {
        "name": "password",
        "type": "fixed",
        "size": 8,
        "namespace": "com.example"
      }
    },
    {
      "name": "full_name",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "nicknames",
      "type": {
        "type": "array",
        "items": "string"
      }
    },
    {
      "name": "permissions",
      "type": {
        "type": "map",
        "values": "bytes"
      }
    }
  ]
}

Required and Optional

Fields for a record are specified as required or optional. Optional fields are implemented as a union in Avro, where null is the first type in the union and the field has a default value of null.

Named Types

fixed and enum fields may be specified inline as part of a record or as standalone named types.

# Either syntax is supported for specifying the size
fixed :f, 4
fixed :g, size: 8

# Either syntax is supported for specifying symbols
enum :e, :X, :Y, :Z
enum :d, symbols: [:A, :B]

# defaults can be set for enums with Ruby Avro v1.10.0
enum :c, symbols: [:A, :B], default: :A

record :my_record_with_named do
  required :f_ref, :f
  required :fixed_inline, :fixed, size: 9
  required :e_ref, :e
  required :enum_inline, :enum, symbols: [:P, :Q]
end

Complex Types

Array, maps and unions can each be embedded within another complex type using methods that match the type name:

record :complex_types do
  required :array_of_unions, :array, items: union(:int, :string)
  required :array_or_map, :union, types: [array(:int), map(:int)]
end

Methods may also be used for complex types instead of separately specifying the type name and options:

record :complex_types do
  required :array_of_unions, array(union(:int, :string))
  required :array_or_map, union(array(:int), map(:int))
end

For more on unions see below.

Nested Records

Nested records may be created by referring to the name of the previously defined record or using the field type :record.

record :sub_rec do
  required :i, :int
end

record :top_rec do
  required :sub, :sub_rec
end

Definining a subrecord inline:

record :my_rec do
  required :nested, :record do
    required :s, :string
  end
end

Nested record types defined without an explicit name are given a generated name based on the name of the field and record that they are nested within. In the example above, the nested record type would have the generated name __my_rec_nested_record:

{
  "type": "record",
  "name": "my_rec",
  "fields": [
    {
      "name": "nested",
      "type": {
        "type": "record",
        "name": "__my_rec_nested_record",
        "fields": [
          {
            "name": "s",
            "type": "string"
          }
        ]
      }
    }
  ]
}

Unions

A union may be specified within a record using required and optional with the :union type:

record :my_record_with_unions do
  required :req_union, :union, types: [:string, :int]
  optional :opt_union, :union, types: [:float, :long]
end

For an optional union, null is automatically added as the first type for the union and the field defaults to null.

Unions may also be defined using the union method instead of specifying the :union type and member types separately:

record :my_record_with_unions do
  required :req_union, union(:string, :int)
  optional :opt_union, union(:float, :long)
end

Logical Types

The DSL supports setting a logical type on any type except a union. The Avro spec lists the logical types that are currently defined. Note: avro-builder is more permissive and any logical type can be specified on a type.

A logical type can be specified for a field using the logical_type attribute:

record :with_timestamp
 required :created_at, :long, logical_type: 'timestamp-micros'
end

Primitive types with a logical type can also be embedded within complex types using either the generic type method:

record :with_date_array
  required :date_array, :array, type(:int, logical_type: date)
end

Or using a primitive type specific method:

record :with_date_array
  required :date_array, :array, int(logical_type: date)
end

Decimal Logical Types

The decimal logical type, for bytes and fixed types, is currently the only logical type that requires additional attributes. For decimals, precision must be specified and scale may optionally be specified. avro-builder supports both of these attributes for bytes and fixed decimals. See the Avro spec for more details.

Abstract Types

Types can be declared as abstract in the DSL. Declaring a type as abstract prevents the rake task from generating an Avro JSON schema for the type.

A type can be declared as abstract using either an option or a method in the DSL when defining the type:

record :unique_id, abstract: true
  required :uuid, :fixed, size: 38
end

enum :status do
  symbols %w(valid invalid)
  abstract true
end

Type Macros

avro-builder allows type macros to be defined that expand to types that cannot normally be named in Avro schemas. These macro names are not retained in generated schemas but allow definitions to be reused across DSL files:

type_macro :timestamp, long(logical_type: 'timestamp-millis')

record :user do
  required :created_at, :timestamp
  required :updated_at, :timestamp
end

Type macros inherit the namespace from the context where they are defined or an explicit namespace option may be specified:

type_macro :timestamp, long(logical_type: 'timestamp-millis'),
           namespace: 'com.my_company'

Type macros are always marked as abstract and do not generate an Avro JSON schema file when using the rake task.

Auto-loading and Imports

Specify paths to search for definitions:

Avro::Builder.add_load_path('/path/to/dsl/files')

Undefined references are automatically loaded from a file with the same name. The load paths are searched for .rb file with a matching name.

Files may also be explicitly imported using import <filename>.

Extends

A previously defined record may be referenced in the definition of another record using extends <record_name>. This adds all of the fields from the referenced record to the current record. The current record may override fields in the record that it extends.

record :original do
  required :first, :string
  required :second, :int
end

record :extended do
  extends :original
  optional :first, :string
end

Additionally you can provide a namespace to extends if necessary to remove ambiguity.

namespace 'com.newbie'

record :original, namespace: 'com.og' do
  required :first, :string
  required :second, :int
end

record :original do
  required :first, :string
  required :second, :int
end

record :extended do
  extends :original, namespace: 'com.og'
  optional :first, :string
end

Schema Store

The Avro::Builder::SchemaStore can be used to load DSL files and return cached Avro::Schema objects. This schema store can be used as the schema store for avromatic to generate models directly from schemas defined using the DSL.

The schema store must be initialized with the path where DSL files are located:

schema_store = Avro::Builder::SchemaStore.new(path: '/path/to/dsl/files')
schema_store.find('schema_name', 'my_namespace')
#=> Avro::Schema (for file at '/path/to/dsl/files/my_namespace/schema_name.rb')

To configure Avromatic to use this schema store and its Messaging API:

Avromatic.configure do |config|
  config.schema_store = Avro::Builder::SchemaStore.new(path: 'avro/dsl')
  config.registry_url = 'https://builder:[email protected]'
  config.build_messaging!
end

Avro Generate Rake Task

There is a rake task that can be used to generate Avro schemas from all DSL files.

A rake task is automatically defined via a Railtie for Rails projects that uses #{Rails.root}/avro/dsl as the root for Avro DSL files.

Custom rake tasks can also be defined:

require 'avro/builder/rake/avro_generate_task'
Avro::Builder::Rake::AvroGenerateTask.new(name: :custom_gen,
                                          dependencies: [:load_app]) do |task|
  task.filetype = 'avsc' # default option
  task.root = '/path/to/dsl/files'
  task.load_paths << '/additional/dsl/files'
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Issues and pull requests are welcome on GitHub at https://github.com/salsify/avro-builder.

License

The gem is available as open source under the terms of the MIT License.

More Repositories

1

goldiloader

Just the right amount of Rails eager loading
Ruby
1,575
star
2

jsonstreamingparser

A JSON streaming parser implementation in PHP.
PHP
718
star
3

ember-css-modules

CSS Modules for ambitious applications
JavaScript
282
star
4

safer_rails_console

Make rails console less dangerous!
Ruby
143
star
5

avro-schema-registry

Implementation of the Confluent Schema Registry API as a Rails application
Ruby
90
star
6

avromatic

Generate Ruby models from Avro schemas
Ruby
86
star
7

offline-sort

A Ruby gem to sort large amounts of data using a predictable amount of memory.
Ruby
84
star
8

ember-cli-dependency-lint

Lint your app's addon dependencies, making sure you only have one version of each.
JavaScript
83
star
9

action-detect-and-tag-new-version

A GitHub action to detect and tag new versions of a repo based on changes to its contents
TypeScript
57
star
10

omniauth-multi-provider

OmniAuth support for multiple providers of an authentication strategy
Ruby
43
star
11

ember-cli-pact

Contract testing with Ember.js and Pact
JavaScript
42
star
12

rails-multitenant

Ruby
37
star
13

delayed_job_worker_pool

Worker process pooling for Delayed Job
Ruby
36
star
14

ember-debug-logger

An Ember addon for attaching debug logging to container-managed objects
JavaScript
36
star
15

botanist

A JavaScript DSL for traversing and transforming data based on structural rules
TypeScript
28
star
16

delayed_job_groups_plugin

Job groups for delayed_job - http://www.salsify.com/blog/adding-job-groups-to-delayed-job-in-rails
Ruby
18
star
17

milestones

Tools for finding your way through async code
TypeScript
15
star
18

arc-furnace

Need to melt, weave, and meld information together? Arc furnace will fuse anything you've got.
Ruby
14
star
19

omniauth-multi-provider-saml

An extension to omniauth-saml for handling multiple identity providers
Ruby
14
star
20

ember-exclaim

An addon allowing apps to expose declarative, JSON-configurable custom UIs backed by Ember components
JavaScript
14
star
21

delayed_job_heartbeat_plugin

Delayed::Job plugin to unlock jobs from dead workers
Ruby
13
star
22

ember-cli-sticky

JavaScript
11
star
23

postgres-vacuum-monitor

Simple stats collector for postgres auto vacuumer and long running queries
Ruby
8
star
24

broccoli-css-modules

A broccoli plugin for compiling modular CSS
JavaScript
8
star
25

avro-patches

Patches to the official Apache Avro ruby implementation
Ruby
6
star
26

salsify_rubocop

Salsify shared RuboCop configuration and experimental cops
Ruby
5
star
27

broccoli-gzip

Broccoli plugin to apply gzip compression to trees
JavaScript
4
star
28

salsify-to-4-tell

Example project showing how to run a service for free on Heroku that takes data published from Salsify and pushes it to another service, in this case 4-Tell.
PHP
2
star
29

elasticsearch-proxy

Ruby
2
star
30

logstash-codec-avro-data-file

Logstash codec for parsing Avro Data Files
Ruby
2
star
31

salsify-gtin

Validates and converts GTIN variants to standardized GTIN-14 representation
Ruby
2
star
32

multipartuploader

Small PHP library to make sending multipart uploads a little less painful.
PHP
2
star
33

zzz-test-commissioner

A CircleCI test failure aggregator and analysis tool
Ruby
2
star
34

activerecord-forbid_implicit_connection_checkout

Optionally prevent threads from checking out out an ActiveRecord connection
Ruby
2
star
35

avro_schema_registry-client

Client for the the avro-schema-registry app
Ruby
1
star
36

heroku_rails_deploy

Simple script for deploying a Rails project to Heroku
Ruby
1
star
37

tree_reject

Remove deeply nested keys from hash.
Ruby
1
star
38

delayed_job_chainable_hooks

Implement DelayedJob lifecyle hook methods without overriding previous definitions
Ruby
1
star
39

alexa-app

JavaScript
1
star
40

thrifty_charlock_holmes

A charlock holmes decidedly trimmer, and lacking in history
Ruby
1
star
41

customer-success-interview

1
star
42

ruby-exclaim

Exclaim UI processor for Ruby
Ruby
1
star
43

html-lambda-cli

Command line interface for creating HTML lambda's
JavaScript
1
star
44

avro-resolution_canonical_form

Unique identification of Avro schemas for schema resolution and compatibility
Ruby
1
star