• Stars
    star
    163
  • Rank 231,141 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Algebraic data types in Java.

Build Status codecov Maven Central License

DataEnum

DataEnum allows you to work with algebraic data types in Java.

You can think of it as an enum where every individual value can have different data associated with it.

What problem does it solve?

The idea of algebraic data types is not new and already exists in many other programming languages, for example:

It is possible to represent such algebraic data types using subclasses: the parent class is the "enumeration" type, and each child class represents a case of the enumeration with it's associated parameters. This will however either require you to spread out your business logic in all the subclasses, or to cast the child class manually to access the parameters and be very careful to only cast if you know for sure that the class is of the right type.

The goal of DataEnum is to help you generate all these classes and give you a fluent API for easily accessing their data in a type-safe manner.

The primary use-case we had when designing DataEnum was to execute different business logic depending on an incoming message. And as mentioned above, we wanted to keep all that business logic in one place, and not spread it out in different classes. With plain Java, youโ€™d have to write something like this:

if (message instanceof Login) {
    Login login = (Login) message;
    // login logic here
} else if (message instanceof Logout) {
    Logout logout = (Logout) message;
    // logout logic here
}

There are a number of things here that developers tend to not like: repeated if-else statements, manual instanceof checks and safe-but-noisy typecasting. On top of that it doesn't look very idiomatic and there's a high risk that mistakes get introduced over time. If you use DataEnum, you can instead write the same expression like this:

message.match(
   login -> { /* login logic; the 'login' parameter is 'message' but cast to the type Login. */ },
   logout -> { /* logout logic; the 'logout' parameter is 'message' but cast to the type Logout. */ }
);

In this example only one of the two lambdas will be executed depending on the message type, just like with the if-statements. match is just a method that takes functions as arguments, but if you write expressions with linebreaks like in the example above it looks quite similar to a switch-statement, a match-expression in Scala, or a when-expression in Kotlin. DataEnum makes use of this similarity to make match-statements look and feel like a language construct.

There are many compelling use-cases for using an algebraic data type to represent values. To name a few:

  • Create a vocabulary of possible actions. List all the actions that can be performed in a certain part of your application, for example on a login/logout page. Each action can have different data associated with it, for example the login action would have a username and password, while a logout action doesn't have any data.

  • Representing states of a state machine. This allows you to only keep the data that actually is available in each state, making it impossible to even reference data that isn't available in a particular state.

  • Rich, type-safe error handling Instead of having just an error code as a result of a network request, you can have different types for different errors, each with relevant information attached: ConnectivityLost, NoRouteToHost(String host), TooManyRetries(int retryCount).

  • Metadata in RxJava streams. It is often useful to wrap data in RxJava in order to provide metadata about what's happening. One common example is to represent different kinds of success and failure: InProgress(T placeholder), Success(T data), Error(String reason).

Status

DataEnum is in Beta status, meaning it is used in production in Spotify Android applications, but we may keep making changes relatively quickly.

It is currently built for Java 7 (because Android doesn't support Java 8 well yet), hence the duplication of some concepts defined in java.util.function (Consumer, Function, Supplier).

Using it in your project

The latest version of DataEnum is available through Maven Central (LATEST_RELEASE below is latest not found):

Gradle

implementation 'com.spotify.dataenum:dataenum:LATEST_RELEASE'                
annotationProcessor 'com.spotify.dataenum:dataenum-processor:LATEST_RELEASE' 

Maven

<dependencies>
  <dependency>
    <groupId>com.spotify.dataenum</groupId>
    <artifactId>dataenum</artifactId>
    <version>LATEST_RELEASE</version>
  </dependency>
  <dependency>
    <groupId>com.spotify.dataenum</groupId>
    <artifactId>dataenum-processor</artifactId>
    <version>LATEST_RELEASE</version>
    <scope>provided</scope>
  </dependency>
</dependencies>

It may be an option to use the annotationProcessorPaths configuration option of the maven-compiler-plugin rather than an optional dependency.

How do I create a DataEnum type?

First, you define all the cases and their parameters in an interface like this:

@DataEnum
interface MyMessages_dataenum {
    dataenum_case Login(String userName, String password);
    dataenum_case Logout();
    dataenum_case ResetPassword(String userName);
}

Then, you apply the dataenum-processor annotation processor to that code, and your DataEnum case classes will be generated for you.

Some things to note:

  • We use a Java interface for the specification. The rationale is that it allows the IDE to help you find and import types correctly. We deliberately made it look weird, so nobody would think itโ€™s a normal class. This is abusing Java a bit, but weโ€™re OK with that.

  • The interface will never be used for anything other than code generation, so you should normally make the interface package-private. The one exception is when one _dataenum spec needs to reference another as described below.

  • The interface name has to end with _dataenum. This is to make the interface stick out and make it easier to filter out from artifacts and exclude from static analysis.

  • The methods in the interface have to be declared as returning a dataenum_case. Each method corresponds to one of the possible cases of the enum, and the parameters of the method become the member fields of that case. Note that the method names from the interface will be used as class names for the cases, so you'll want to name them using CamelCase as in the example above. The methods ย in the _dataenum interface will never be implemented, and there is no way to create a dataenum_case instance. The type is only used as a marker.

  • The prefix of the @DataEnum annotated interface will be used as the name of a generated super-class (MyMessages in the example above). This class will have factory methods for all the cases.

  • For each method in the interface, an inner class will be generated (in this example MyMessages.Login, MyMessages.Logout and MyMessages.ResetPassword). These classes will extend the outer class MyMessages.

Using the generated DataEnum class

Some usage examples, based on the @DataEnum specification above:

// Instantiate by passing in the required parameters. 
// Youโ€™ll get something that is of the super type - this is to help Javaโ€™s 
// not-always-great type inference do the right thing in many common cases.
MyMessages message = MyMessages.login("petter", "s3cr3t");

// If you actually needed the subtype you can easily cast it using the as-methods.
Logout logout = MyMessages.logout().asLogout();

// For every as-method there is also an is-method to check the type of the message.
assertThat(message.isLogin(), is(true));

// Apply different business logic to different message types. Note how getters are generated (but not
// setters, DataEnum case types should be considered immutable).
message.match(
    login -> Logger.debug("got a login request from user: {}", login.userName()),
    logout -> Logger.debug("user logged out"),
    resetPassword -> Logger.debug("password reset requested for user: {}", resetPassword.userName())
);

// So far we've been looking at 'match', but there is also the very useful 'map' which is used to
// transform values. When using 'map' you define how the message should be transformed in each case.
int passwordLength = message.map(
    login -> login.password().length(),
    logout -> 0,
    resetPassword -> -1);
}

// There are some utility methods provided that allow you to deal with unimplemented or illegal cases:
int passwordLength = message.map(
    login -> login.password().length(),
    logout -> Cases.illegal("logout message does not contain a password"), // throws IllegalStateException
    resetPassword -> Cases.todo()); // throws UnsupportedOperationException
}

// Sometimes, only a minority of cases are handled differently, in which case a 'map' or 'match'
// can lead to duplication:
int passwordLength = message.map(
    login -> handleLogin(login),
    logout -> Cases.illegal("only login is allowed"),
    resetPassword -> Cases.illegal("only login is allowed")
    // This could really get bad if there are many cases here
);

// For those scenarios you can just use regular language control structures (like if-else):
if (message.isLogin()) {
  return handleLogin(message.asLogin()); // Technically just a cast but easier to read than manual casting.
} else {
  throw new IllegalStateException("only login is allowed");
}

Features

  • Case types are immutable. All generated classes are value types and cannot be modified after being created. Of course this assumes that all the parameters of your cases are immutable too, since an object only is immutable if all its fields also are immutable.

  • Everything is non-null by default. Passing in a null will cause an exception to be thrown unless you explicitly annotate the parameters as @Nullable. Any annotation with the name 'Nullable' can be used.

  • toString, hashCode, and equals are generated for all case classes.

  • isFoo/asFoo methods are provided, as a more high level alternative to manually doing instanceof and casting.

  • Generic type support. The DataEnum interfaces can be type parameterized, which makes it possible to create reusable data types.

  • Recursive data type support. The generated DataEnum types may refer to itself recursively, even with type parameters. When doing so you must use the _dataenum-suffixed name to avoid any chicken-and-egg problems with the generated classes.

    The recursive data type support allows you to do things like this:

    @DataEnum
    interface Tree_dataenum<T> {
      dataenum_case Branch(Tree_dataenum<T> left, Tree_dataenum<T> right);
      dataenum_case Leaf(T value);
    }
  • Sometimes, you want to reference a dataenum from another one. You can do that using this slightly clunky syntax:

    interface First_dataenum {
      dataenum_case SomeCase();
    }
    
    interface Second_dataenum {
      dataenum_case NeedsFirst(First_dataenum first);
    }

    The generated NeedsFirst class will have a member field that is of the type First. Again, because the First class doesn't exist until the annotation processor has run, so the Second_dataenum spec must reference the First_dataenum spec. If First_dataenum is in a different package than Second_dataenum, it must of course be public.

  • If you have sensitive information in a field and don't want the generated toString method to print that information, you can use the @Redacted annotation:

    dataenum_case UserInfo(String name, @Redacted String password);

    We provide an annotation in the runtime dependencies, but any annotation named Redacted will work.

Configuration

DataEnum currently has a single configurable setting determining the visibility of constructors in generated code. Generally speaking, private is best as it ensures there is a single way of creating case instances (the generated static factory methods like MyMessages.login(String, String) above). However, for Android development, you want to keep the method count down to a minimum, and private constructors lead to synthetic constructors being generated, increasing the method count. Since that is an important use case for us, we've chosen the package-private as the default. This is configurable through adding a @ConstructorAccess annotation to a package-info.java file. See the javadocs for more information.

Known weaknesses of DataEnum

  • While the generated classes are immutable, they do not enforce that parameters are immutable. It is up to users of DataEnum to eg. use ImmutableList for lists instead of List.

  • The names of the arguments to the lambdas when using match/map only indicate the type of the object by convention, so some discipline is required to make sure you manually update lambda argument names if a case is renamed.

  • Renaming cases of a dataenum can be painful since the generated class doesn't have a connection to the interface.

  • Reordering cases can be dangerous if you only use lambdas with type-inference. If you swap the order of two cases with the same parameter names then usages of map/match will still compile even though they are now incorrect. This can be mitigated using method references instead of lambdas, lambdas with explicit type parameters, and good test coverage of code using DataEnum.

  • The _dataenum-suffixed interface is only used as an input to code generation, and it breaks certain conventions around naming. You might need to suppress some static analysis when you use DataEnum, and you probably want to strip the _dataenum classes from artifacts.

Alternatives

An alternative implementation of algebraic data types for Java is ADT4J. We feel DataEnum has the advantage of being less verbose than ADT4J, although ADT4J is more flexible in terms of customising your generated types.

Features that might be added in the future

  • Generating builders for case types with many parameters.
  • Generating mutator functions for case types to create modified versions of them.
  • Support for writing extensions, eg. to allow adding support for serialization.
  • IntelliJ plugin for refactoring and for generating map/match statements.

Why is it called DataEnum?

The name โ€˜DataEnumโ€™ comes from the fact that itโ€™s used similarly to an enum, but you can easily and type-safely have different data attached to each enum value.

Code of Conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

More Repositories

1

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Python
17,796
star
2

annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
C++
13,197
star
3

pedalboard

๐ŸŽ› ๐Ÿ”Š A Python library for audio.
C++
5,147
star
4

docker-gc

INACTIVE: Docker garbage collection of containers and images
Shell
5,068
star
5

chartify

Python library that makes it easy for data scientists to create charts.
Python
3,510
star
6

basic-pitch

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Python
3,184
star
7

dockerfile-maven

MATURE: A set of Maven tools for dealing with Dockerfiles
Java
2,756
star
8

docker-maven-plugin

INACTIVE: A maven plugin for Docker
Java
2,652
star
9

scio

A Scala API for Apache Beam and Google Cloud Dataflow.
Scala
2,485
star
10

helios

Docker container orchestration platform
Java
2,097
star
11

web-api-examples

Basic examples to authenticate and fetch data using the Spotify Web API
HTML
1,889
star
12

HubFramework

DEPRECATED โ€“ Spotifyโ€™s component-driven UI framework for iOS
Objective-C
1,861
star
13

apollo

Java libraries for writing composable microservices
Java
1,648
star
14

dh-virtualenv

Python virtualenvs in Debian packages
Python
1,614
star
15

docker-client

INACTIVE: A simple docker client for the JVM
Java
1,431
star
16

docker-kafka

Kafka (and Zookeeper) in Docker
Shell
1,399
star
17

SPTPersistentCache

Everyone tries to implement a cache at some point in their iOS appโ€™s lifecycle, and this is ours.
Objective-C
1,243
star
18

voyager

๐Ÿ›ฐ๏ธ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
C++
1,242
star
19

mobius

A functional reactive framework for managing state evolution and side-effects.
Java
1,223
star
20

sparkey

Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.
C
1,178
star
21

ruler

Gradle plugin which helps you analyze the size of your Android apps.
Kotlin
1,130
star
22

XCMetrics

XCMetrics is the easiest way to collect Xcode build metrics and improve developer productivity.
Swift
1,102
star
23

web-api

This issue tracker is no longer used. Join us in the Spotify for Developers forum for support with the Spotify Web API โžก๏ธ https://community.spotify.com/t5/Spotify-for-Developers/bd-p/Spotify_Developer
RAML
981
star
24

echoprint-codegen

Codegen for Echoprint
C++
948
star
25

snakebite

A pure python HDFS client
Python
856
star
26

heroic

The Heroic Time Series Database
Java
843
star
27

klio

Smarter data pipelines for audio.
Python
836
star
28

XCRemoteCache

Swift
830
star
29

ios-sdk

Spotify SDK for iOS
Objective-C
643
star
30

SPTDataLoader

The HTTP library used by the Spotify iOS client
Objective-C
630
star
31

apps-tutorial

A Spotify App that contains working examples of the use of Spotify Apps API
627
star
32

JniHelpers

Tools for writing great JNI code
C++
593
star
33

postgresql-metrics

Tool that extracts and provides metrics on your PostgreSQL database
Python
590
star
34

Mobius.swift

A functional reactive framework for managing state evolution and side-effects [Swift implementation]
Swift
557
star
35

reactochart

๐Ÿ“ˆ React chart component library ๐Ÿ“‰
JavaScript
552
star
36

dockerfile-mode

An emacs mode for handling Dockerfiles
Emacs Lisp
535
star
37

threaddump-analyzer

A JVM threaddump analyzer
JavaScript
488
star
38

featran

A Scala feature transformation library for data science and machine learning
Scala
467
star
39

android-sdk

Spotify SDK for Android
HTML
457
star
40

echoprint-server

Server for the Echoprint audio fingerprint system
Java
395
star
41

completable-futures

Utilities for working with futures in Java 8
Java
393
star
42

web-scripts

DEPRECATED: A collection of base configs and CLI wrappers used to speed up development @ Spotify.
TypeScript
383
star
43

spotify-web-api-ts-sdk

A Typescript SDK for the Spotify Web API with types for returned data.
TypeScript
356
star
44

SpotifyLogin

Swift framework for authenticating with the Spotify API
Swift
347
star
45

ratatool

A tool for data sampling, data generation, and data diffing
Scala
338
star
46

fmt-maven-plugin

Opinionated Maven Plugin that formats your Java code.
Java
324
star
47

coordinator

A visual interface for turning an SVG into XY coรถrdinates.
HTML
288
star
48

big-data-rosetta-code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Scala
287
star
49

trickle

A small library for composing asynchronous code
Java
285
star
50

pythonflow

๐Ÿ Dataflow programming for python.
Python
285
star
51

styx

"The path to execution", Styx is a service that schedules batch data processing jobs in Docker containers on Kubernetes.
Java
266
star
52

cstar

Apache Cassandra cluster orchestration tool for the command line
Python
254
star
53

confidence

Python
254
star
54

netty-zmtp

A Netty implementation of ZMTP, the ZeroMQ Message Transport Protocol.
Java
243
star
55

ios-style

Guidelines for iOS development in use at Spotify
243
star
56

cassandra-reaper

Software to run automated repairs of cassandra
235
star
57

docker-cassandra

Cassandra in Docker with fast startup
Shell
220
star
58

basic-pitch-ts

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection.
TypeScript
216
star
59

terraform-gke-kubeflow-cluster

Terraform module for creating GKE clusters to run Kubeflow
HCL
213
star
60

linux

Spotify's Linux kernel for Debian-based systems
C
208
star
61

dns-java

DNS wrapper library that provides SRV lookup functionality
Java
206
star
62

git-test

test your commits
Shell
203
star
63

SPStackedNav

[DEPRECATED] Navigation controller which represents its content in stacks of panes, rather than one at a time
Objective-C
195
star
64

spotify-json

Fast and nice to use C++ JSON library.
C++
194
star
65

quickstart

A CommonJS module resolver, loader and compiler for node.js and browsers.
JavaScript
193
star
66

dbeam

DBeam exports SQL tables into Avro files using JDBC and Apache Beam
Java
189
star
67

flink-on-k8s-operator

Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Go
185
star
68

bazel-tools

Tools for dealing with very large Bazel-managed repositories
Java
166
star
69

magnolify

A collection of Magnolia add-on modules
Scala
163
star
70

lingon

A user friendly tool for building single-page JavaScript applications
JavaScript
162
star
71

async-google-pubsub-client

[SUNSET] Async Google Pubsub Client
Java
158
star
72

gcp-audit

A tool for auditing security properties of GCP projects.
Python
157
star
73

spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Scala
155
star
74

should-up

Remove most of the "should" noise from your tests
JavaScript
153
star
75

folsom

An asynchronous memcache client for Java
Java
147
star
76

missinglink

Build time tool for detecting link problems in java projects
Java
146
star
77

flo

A lightweight workflow definition library
Java
146
star
78

spotify-web-playback-sdk-example

React based example app that creates a new player in Spotify Connect to play music from in the browse using Spotify Web Playback SDK.
JavaScript
144
star
79

android-auth

Spotify authentication and authorization for Android. Part of the Spotify Android SDK.
HTML
143
star
80

proto-registry

An implementation of the Protobuf Registry API
TypeScript
141
star
81

futures-extra

Java library for working with Guava futures
Java
138
star
82

zoltar

Common library for serving TensorFlow, XGBoost and scikit-learn models in production.
Java
138
star
83

annoy-java

Approximate nearest neighbors in Java
Java
138
star
84

spydra

Ephemeral Hadoop clusters using Google Compute Platform
Java
134
star
85

github-java-client

A Java client to Github API
Java
129
star
86

docker-stress

Simple docker stress test and monitoring tools
Python
125
star
87

spotify-tensorflow

Provides Spotify-specific TensorFlow helpers
Python
124
star
88

crtauth

a public key backed client/server authentication system
Python
118
star
89

sparkey-java

Java implementation of the Sparkey key value store
Java
118
star
90

redux-location-state

Utilities for reading & writing Redux store state to & from the URL
JavaScript
118
star
91

realbook

Easier audio-based machine learning with TensorFlow.
Python
112
star
92

rspec-dns

Easily test your DNS with RSpec
Ruby
107
star
93

web-playback-sdk

This issue tracker is no longer used. Join us in the Spotify for Developers forum for support with the Spotify Web Playback SDK โžก๏ธ https://community.spotify.com/t5/Spotify-for-Developers/bd-p/Spotify_Developer
107
star
94

ffwd-ruby

An event and metrics fast-forwarding agent.
Ruby
105
star
95

gimme

Creating time bound IAM Conditions with ease and flair
Python
103
star
96

super-smash-brogp

Sends and withdraws BGP prefixes for fun.
Python
98
star
97

spotify.github.io

Showcase site for hand-picked open-source projects by Spotify
HTML
96
star
98

lighthouse-audit-service

TypeScript
95
star
99

python-graphwalker

Python re-implementation of the graphwalker testing tool
Python
93
star
100

noether

Scala Aggregators used for ML Model metrics monitoring
Scala
91
star