• Stars
    star
    201
  • Rank 194,491 (Top 4 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created almost 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generate Scala case class definitions from Avro schemas

avrohugger

Scala CI Maven Central

Schema-to-case-class code generation for working with Avro in Scala.

  • avrohugger-core: Generate source code at runtime for evaluation at a later step.
  • avrohugger-filesorter: Sort schema files for proper compilation order.
  • avrohugger-tools: Generate source code at the command line with the avrohugger-tools jar.

Alternative Distributions:

  • sbt: sbt-avrohugger - Generate source code at compile time with an sbt plugin.
  • Maven: avrohugger-maven-plugin - Generate source code at compile time with a maven plugin.
  • Mill: mill-avro - Generate source code at compile time with a Mill plugin.
  • Gradle: gradle-avrohugger-plugin - Generate source code at compile time with a gradle plugin.
  • mu-rpc: mu-scala - Generate rpc models, messages, clients, and servers.

Table of contents

Generates Scala case classes in various formats:
  • Standard Vanilla case classes (for use with Apache Avro's GenericRecord API, etc.)

  • SpecificRecord Case classes that implement SpecificRecordBase and therefore have mutable var fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.).

  • Scavro (@deprecated since avrohugger v1.5.0) Case classes with immutable fields, intended to wrap Java generated Avro classes (for use with the Scavro runtime, Java classes provided separately (see Scavro Plugin or sbt-avro)).

Supports generating case classes with arbitrary fields of the following datatypes:
Avro Standard SpecificRecord Notes
INT Int Int See Logical Types: date
LONG Long Long See Logical Types: timestamp-millis
FLOAT Float Float
DOUBLE Double Double
STRING String String
BOOLEAN Boolean Boolean
NULL Null Null
MAP Map Map
ENUM scala.Enumeration
Scala case object
Java Enum
EnumAsScalaString
Java Enum
EnumAsScalaString
See Customizable Type Mapping
BYTES Array[Byte]
BigDecimal
Array[Byte]
BigDecimal
See Logical Types: decimal
FIXED case class
case class + schema
case class extending SpecificFixed See Logical Types: decimal
ARRAY Seq
List
Array
Vector
Seq
List
Array
Vector
See Customizable Type Mapping
UNION Option
Either
Shapeless Coproduct
Option
Either
Shapeless Coproduct
See Customizable Type Mapping
RECORD case class
case class + schema
case class extending SpecificRecordBase See Customizable Type Mapping
PROTOCOL No Type
Scala ADT
RPC trait
Scala ADT
See Customizable Type Mapping
Date java.time.LocalDate
java.sql.Date
java.time.LocalDate
java.sql.Date
See Customizable Type Mapping
TimestampMillis java.time.Instant
java.sql.Timestamp
java.time.Instant
java.sql.Timestamp
See Customizable Type Mapping
UUID java.util.UUID java.util.UUID See Customizable Type Mapping
Decimal BigDecimal BigDecimal See Customizable Type Mapping
Logical Types Support:

NOTE: Currently logical types are only supported for Standard and SpecificRecord formats

  • date: Annotates Avro int schemas to generate java.time.LocalDate or java.sql.Date (See Customizable Type Mapping). Examples: avdl, avsc.
  • decimal: Annotates Avro bytes and fixed schemas to generate BigDecimal. Examples: avdl, avsc.
  • timestamp-millis: Annotates Avro long schemas to genarate java.time.Instant or java.sql.Timestamp (See Customizable Type Mapping). Examples: avdl, avsc.
  • uuid: Annotates Avro string schemas and idls to generate java.util.UUID (See Customizable Type Mapping). Example: avsc.
  • time-millis: Annotates Avro int schemas to genarate java.time.LocalTime or java.sql.Time
Protocol Support:
  • the records defined in .avdl, .avpr, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping.

  • For SpecificRecord, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).

Doc Support:
  • .avdl: Comments that begin with /** are used as the documentation string for the type or field definition that follows the comment.

  • .avsc, .avpr, and .avro: Docs in Avro schemas are used to define a case class' ScalaDoc

  • .scala: ScalaDocs of case class definitions are used to define record and field docs

Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).

Usage

  • Library For Scala 2.12, and 2.13
  • Parses Schemas and IDLs with Avro version 1.11
  • Generates Code Compatible with Scala 2.12, 2.13

avrohugger-core

Get the dependency with:
"com.julianpeeters" %% "avrohugger-core" % "1.5.1"
Description:

Instantiate a Generator with Standard, Scavro, or SpecificRecord source formats. Then use

tToFile(input: T, outputDir: String): Unit

or

tToStrings(input: T): List[String]

where T can be File, Schema, or String.

Example
import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File

val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"

where an input File can be .avro, .avsc, .avpr, or .avdl,

and where an input String can be the string representation of an Avro schema, protocol, IDL, or a set of case classes that you'd like to have implement SpecificRecordBase.

Customizable Type Mapping:

To reassign Scala types to Avro types, use the following (e.g. for customizing Specific):

import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector

val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)
  • record can be assigned to ScalaCaseClass and ScalaCaseClassWithSchema(with schema in a companion object)
  • array can be assigned to ScalaSeq, ScalaArray, ScalaList, and ScalaVector
  • enum can be assigned to JavaEnum, ScalaCaseObjectEnum, EnumAsScalaString, and ScalaEnumeration
  • fixed can be assigned to , ScalaCaseClassWrapper and ScalaCaseClassWrapperWithSchema(with schema in a companion object)
  • union can be assigned to OptionShapelessCoproduct, OptionEitherShapelessCoproduct, or OptionalShapelessCoproduct
  • int, long, float, double can be assigned to ScalaInt, ScalaLong, ScalaFloat, ScalaDouble
  • protocol can be assigned to ScalaADT and NoTypeGenerated
  • decimal can be assigned to e.g. ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN)) and ScalaBigDecimalWithPrecision(None) (via Shapeless Tagged Types)
Customizable Namespace Mapping:

Namespaces can be reassigned by instantiating a Generator with a custom namespace map (please see warnings below):

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))

Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))

avrohugger-filesorter

Get the dependency with:
"com.julianpeeters" %% "avrohugger-filesorter" % "1.5.1"
Description:

To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord" parser errors), sort avsc and avdl files with the sortSchemaFiles method on AvscFileSorter and AvdlFileSorterrespectively.

Example:
import avrohugger.filesorter.AvscFileSorter
import java.io.File

val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc")

avrohugger-tools

Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir:

  • generate generates Scala case class definitions:

java -jar /path/to/avrohugger-tools_2.12-1.5.1-assembly.jar generate schema user.avsc .

  • generate-specific generates definitions that extend Avro's SpecificRecordBase:

java -jar /path/to/avrohugger-tools_2.12-1.5.1-assembly.jar generate-specific schema user.avsc .

  • generate-scavro (@deprecated since avrohugger v1.5.0) generates definitions that extend Scavro's AvroSerializable:

java -jar /path/to/avrohugger-tools_2.12-1.5.1-assembly.jar generate-scavro schema user.avsc .

Warnings

  1. If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g. val sdw = SpecificDatumWriter[MyRecord](schema)).

  2. For the SpecificRecord format, generated case class fields must be mutable (var) in order to be compatible with the SpecificRecord API. Note: If your framework allows GenericRecord, avro4s provides a type class that converts to and from immutable case classes cleanly.

  3. SpecificRecord requires that enum be represented as JavaEnum

Testing

To test for regressions, please run sbt:avrohugger> + test.

To test that generated code can be de/serialized as expected, please run:

  1. sbt:avrohugger> + publishLocal
  2. then clone sbt-avrohugger and update its avrohugger dependency to the locally published version
  3. finally run sbt:sbt-avrohugger> scripted avrohugger/*, or, e.g., scripted avrohugger/GenericSerializationTests

Credits

Depends on Avro and Treehugger. avrohugger-tools is based on avro-tools.

Contributors:

Marius Soutier
Brian London
alancnet
Matt Coffin
Ryan Koval
Simonas Gelazevicius
Paul Snively
Marco Stefani
Andrew Gustafson
Kostya Golikov
Plínio Pantaleão
Sietse de Kaper
Martin Mauch
Konstantin
Adam Drakeford
Carlos Silva
ismail Benammar
Paul Pearcy
Matt Allen
C-zito
Tim Chan
Saket
Daniel Davis
Zach Cox
Diego E. Alonso Blas
Fede Fernández
Rob Landers
Simon Petty
Andreas Drobisch
natefitzgerald
Timo Schmid
mcenkar
Luca Tronchin
LydiaSkuse
Stefano Galarraga
Lars Albertsson
Eugene Platonov
Jerome Wacongne
Jon Morra
Raúl Raja Martínez
Kaur Matas
Chris Albright
Francisco Díaz
Bobby Rauchenberg
Leonard Ehrenfried
François Sarradin
niqdev
Julien BENOIT
Algimantas Milašius
Leonard Ehrenfried
Massimo Siani
Criticism is appreciated.
Fork away, just make sure the tests pass before sending a pull request.

More Repositories

1

sbt-avrohugger

sbt plugin for generating Scala sources for Apache Avro schemas and protocols.
Scala
132
star
2

avro-scala-macro-annotations

Compile-time tools for working with Avros in Scala
Scala
55
star
3

case-class-generator

Dynamically defines and loads Scala classes at runtime. Useful for turning JSON schemas into Scala case classes on the fly.
Scala
44
star
4

avro2caseclass

Generate Scala case class definitions from Avro schemas
Scala
5
star
5

artisanal-pickle-maker

Make your own Scala pickled signatures. Takes class info as strings and gives back a pickled Scala signatures. For use with Java bytecode engineering libraries.
Scala
4
star
6

avro-scala-macro-annotation-examples

EXAMPLES HAVE MOVED TO THE PROJECT'S REPOSITORY:
Scala
2
star
7

hello-kafka-streams-twitter

Java
2
star
8

dc10

Code generation tools for Scala
Scala
2
star
9

polynomial

The category of Poly, simply typed.
Scala
2
star
10

dynamical

Mode-dependent dynamical systems
Scala
2
star
11

asm-example

A few examples of objectweb's asm, a java bytecode library. Trying out the ASMifier to run HelloWorlds from bytecode in both Scala and Java.
Java
1
star
12

asm-salat-example

An example of how to use Salat with a dynamically generated case class as the model.
Java
1
star
13

schemabuilder4cats

A referentially transparent FFI to `org.apache.avro.SchemaBuilder`
Scala
1
star
14

dc10-cats-effect

Library for use with the `dc10-scala` code generator
Scala
1
star
15

toolbox-type-provider

A runtime type-provider that gives case classes from strings, using standard (but experimental) Scala reflection
Scala
1
star
16

dc10-scala

A definitional compiler for generating Scala code.
Scala
1
star
17

programming-language-foundations

Programming Language Foundations (Stump 2013)
Haskell
1
star
18

dc10-scalaq

Scala
1
star
19

dc10-scalaq-twelf

Render to Twelf as a target lang. Library for use with the dc10-scalaq code generator.
Scala
1
star
20

destructured

Typeclasses of a lower kind.
Scala
1
star
21

sbt-buildlevel

Which build is currently loaded, meta-build or proper build?
Scala
1
star
22

scalassist-example

*Under Construction* Scalassist is a tool to make Java bytecode-engineering libraries more usable with Scala. It updates a pickled Scala signature to reflect changes made to the bytecode.
Scala
1
star
23

scalasig-caseclass-experiments

generating a bunch of case class varieties, and comparing their scala sigs to get a sense of how to make one myself.
Scala
1
star