• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Painless polymorphism

ply

ply is an experimental compile-to-Go language. Its syntax and semantics are basically identical to Go's, but with more builtin functions for manipulating generic containers (slices, arrays, maps). This is accomplished by forking Go's type-checker, running it on the .ply file, and using the resolved types to generate specific versions of the generic function. For example, given the following Ply code:

m1 := map[int]int{1: 1}
m2 := map[int]int{2: 2}
m3 := merge(m1, m2)

merge is a generic function. After type-checking, the Ply compiler knows the types of m1 and m2, so it can generate a specific function for these types:

func mergeintint(m1, m2 map[int]int) map[int]int {
	m3 := make(map[int]int)
	for k, v := range m1 {
		m3[k] = v
	}
	for k, v := range m2 {
		m3[k] = v
	}
	return m3
}

mergeintint is then substituted for merge in the relevant expression, and the modified source can then be passed to the Go compiler.

A similar approach is used to implement generic methods:

xs := []int{1, 2, 3, 4, 6, 20}
b := xs.filter(func(x int) bool { return x > 3 }).
        morph(func(x int) bool { return x % 2 == 0 }).
        fold(func(x, y bool) bool { return x && y })

In the above, b is true because all the integers in xs greater than 3 are even. To compile this, xs is wrapped in a new type that has a filter method. Then, that call is wrapped in a new type that has a morph method, and so on.

Note that in most cases, Ply can combine these method chains into a single "pipeline" that does not allocate any intermediate slices. Without pipelining, filter would allocate a slice and pass it to morph, which would allocate another slice and pass it to fold. But Ply is able to merge these methods into a single transformation that does not require allocations, the same way a (good) human programmer would write it.

Usage

First, install the Ply compiler:

go get github.com/lukechampine/ply

The ply command behaves similarly to the go command. In fact, you can run any go subcommand through ply, including build, run, install, and even test.

When you run ply run test.ply, ply parses test.ply and generates a ply-impls.go file containing the specific implementations of any generics used in test.ply. It then rewrites test.ply as a standard Go file, ply-test.go, that calls those implementations. Finally, go run is invoked on ply-test.go and ply-impls.go.

Supported Functions and Methods

Builtins: enum, max, merge, min, not, zip

  • Planned: repeat, compose

Methods: all, any, contains, drop, dropWhile, elems, filter, fold, foreach, keys, morph, reverse, sort, take, takeWhile, tee, toMap, toSet, uniq

  • Planned: join, replace, split

All functions and methods are documented in the ply pseudo-package.

Supported Optimizations

In many cases we can reduce allocations when using Ply functions and methods. The Ply compiler will automatically apply these optimizations when it is safe to do so. However, all optimizations have trade-offs. If performance is important, you should always read the docstring of each method in order to understand what optimizations may be applied. Depending on your use case, it may be necessary to write your own implementation to squeeze out maximum performance.

Pipelining:

Pipelining means chaining together multiple Ply functions and/or methods. Currently only method chaining is supported. For example:

xs := []int{1, 2, 3, 4, 6, 20}
b := xs.filter(func(x int) bool { return x > 3 }).
        morph(func(x int) bool { return x % 2 == 0 }).
        fold(func(acc, x bool) bool { return acc && x })

As written, this chain requires allocating a new slice for the filter and a new slice for the morph. But if we were writing this transformation by hand, we could optimize it like so:

b := true
for _, x := range xs {
	if x > 3 {
		b = b && (x % 2 == 0)
	}
}

(A good rule of thumb is that, for most chains, only the allocations in the final method are required. fold doesn't require any allocations, but if the chain stopped at morph, then of course we would still need to allocate memory in order to return the morphed slice.)

Ply is able to perform the above optimization automatically. The bodies of filter, morph, and fold are combined into a single method, pipe, and the callsite is rewritten to supply the arguments of each chained function:

xs := []int{1, 2, 3, 4, 6, 20}
b := filtermorphfold(xs).pipe(
		func(x int) bool { return x > 3 },
		func(x int) bool { return x % 2 == 0 },
		func(x, y bool) bool { return x && y })

However, not all methods can be pipelined. reverse is a good example. If reverse is the first method in the chain, then we can eliminate an allocation by reversing the order in which we iterate through the slice. We can also eliminate an allocation if reverse is the last method in the chain, since we can reverse the result in-place. But what do we do if reverse is in the middle? Consider this chain:

xs.takeWhile(even).reverse().morph(square)

Since we don't know what takeWhile will return, there is no way to pass its reversed elements to morph without allocating an intermediate slice. So we resort to a less-efficient form, splitting the chain into takeWhile(even) and reverse().morph(square), each of which will perform an allocation.

Fortunately, it is usually possible to reorder the chain such that reverse is the first or last method. In the above, we know that morph doesn't affect the length or order of the slice, so we can move reverse to the end and the result will be the same. Ply can't perform this reordering automatically though: methods may have side effects that the programmer is relying upon.

Side effects are also problematic because pipelining can change the number of times a function is called. For example, in this expression:

[]int{1, 2, 3, 4, 6, 20}.morph(fn).take(3)

Without pipelining, fn is called on every element of the slice. But with pipelining, it is only called 3 times. So the best practice is to avoid side effects in functions passed to morph, filter, etc.

Lastly, it's worth pointing out that pipelining cannot eliminate any allocations performed inside function arguments. For example, in this chain:

myEnum := func(n int) []int {
	r := make([]int, n)
	for i := range r {
		r[i] = i
	}
	return r
}
concat := func(x, y []int) []int { return append(x, y...) }
list := xs.morph(myEnum).fold(concat)

A handwritten version of this chain could eliminate the allocations performed by myEnum, but there is no way to do so programmatically.

Parallelization (planned):

Functor operations like morph can be trivially parallelized, but this optimization should not be applied automatically. For small lists, the overhead is probably not worth it. More importantly, if the function has side effects, parallelizing may cause a race condition. So this optimization must be specifically requested by the caller via separate identifiers, e.g. pmorph, pfilter, etc.

Reassignment (planned):

It is a common pattern to reassign the result of a transformation to the original variable, for example when filtering or reversing a slice. In such cases, we would like to reuse the existing slice's memory instead of allocating a new one. At one time, Ply did this automatically (by detecting reassignment), but the feature was later removed because it is not provably safe. If the underlying slice memory is referenced by a different variable, then silently performing this optimization would affect that memory as well, which is surprising behavior.

However, this optimization remains important. It is directly in line with Ply's goal of generating code that is as good as the hand-written version. We just need a different approach; probably a more explicit one. This could take the form of separate identifiers (e.g. rfilter), similar to parallelization. But this leads to an unfortunate bifurcation: what if you want both reassignment and parallelization? So now we need four different forms: standard, parallel, reassigned, and parallel reassigned, each with its own identifier. More identifiers means more burden on the programmer, so I'm hesistant to implement this approach until I've given it more thought.

Compile-time evaluation:

A few functions (currently just max and min) can be evaluated at compile time if their arguments are also known at compile time. This is similar to how the builtin len and cap functions work:

len([3]int) // known at compile-time; compiles to 3

max(1, min(2, 3)) // known at compile time; compiles to 2

In theory, it is also possible to perform compile-time evaluation on certain literals. For example:

[]int{1, 2, 3}.contains(3) // compile to true?

We could even go further and support arbitrary compile-time execution. But that seems a little dangerous. At best, it's useful for things like computing a large table instead of including it in the source. But I don't think that single case warrants such a powerful feature.

Function hoisting (planned):

not currently returns a function that wraps its argument. Instead, not could generate a new top-level function definition, and replace the callsite wholesale. For example, given these definitions:

even := func(i int) bool { return i % 2 == 0 }
odd := not(even)

The compiled code currently looks like this:

func not_int(fn func(int) bool) func(int) bool {
	return func(i int) bool {
		return !fn(i)
	}
}

even := func(i int) bool { return i % 2 == 0 }
odd := not_int(even)

But we could improve upon this by generating a top-level not_even function:

func not_even(i int) bool {
	return !even(i)
}

even := func(i int) bool { return i % 2 == 0 }
odd := not_even

This is non-trivial, though, because even is not in the top-level scope; we would need to hoist its definition into the function body of not_even. Alternatively, we could simply not consider local functions for this optimization -- but we'd still need a way to distinguish global functions from local functions.

The motivation for this optimization is that the Go compiler is more likely to inline top-level functions (AFAIK). Eliminating the overhead of a function call could be significant when, say, filtering a large slice. Benchmarks are needed to confirm that this would actually result in a significant speedup.

FAQ

Why wouldn't you just use [existing generics solution]?

There are basically two options: runtime generics (via reflection) and compile-time generics (via codegen). They both suck for different reasons: reflection is slow, and codegen is cumbersome. Ply is an attempt at making codegen suck a bit less. You don't need to grapple with magic annotations or custom types; you can just start using filter and fold as though Go had always supported them.

What are the downsides of this approach?

The most obvious is that it's less flexible; you can only use the functions and methods that Ply provides. Another annoyance is that since they behave like builtins, you can't pass them around as first-class values. Fortunately this is a pretty rare thing to do, and it's possible to work around it in most cases. (For example, you can wrap the call in a func.)

Generating a specific implementation of every generic function call produces very fast code, at the cost of slower compilation, larger binaries, and less helpful error messages. Your build process will also be more complicated, though hopefully not as complicated as writing template code and using go generate. The fact of the matter is that there is no silver bullet: every implementation of generics has its downsides. Do your research before deciding whether Ply is the right approach for your project.

What if I want to define my own generic functions?

Sorry, that's not in the cards. The purpose of Ply is to make polymorphism as painless as possible. Supporting custom generics would mean defining some kind of template syntax, and that adds a lot of complexity to the language. Restricting the set of generic functions also allows the Ply compiler to apply deep optimizations, such as pipelining.

I understand that this is a controversial position, and Ply's set of functions may not suit everyone's needs. My rationale is that by adding a small set of new functions, Go can be made much more productive without becoming any harder to parse (by computer or by human). If you have suggestions for new functions, open an issue and I'll consider adding them.

What about generic data structures?

Go seems pretty productive without them. Slices and maps are sufficient for the vast majority of programs. Adding new generic data structures would complicate Go's syntax (do we overload make for our new RedBlackTree type?) and I really want to avoid that. Go's simplicity is one of its biggest strengths.

How does Ply interact with the existing Go toolchain?

One nice thing about Ply is that because it has the same syntax as Go, many tools built for Go will "just work" with Ply. For example, you can run gofmt and golint on .ply files. Other tools (like go vet) are pickier about their input filenames ending in .go, but will work if you rename your .ply files. Lastly, tools that require type information will fail, because Go's type-checker does not understand Ply builtins.

One current deficiency is that Ply will not automatically compile imported .ply files. So you can't write pure-Ply packages (yet).

Will you add support for feature X?

Open an issue and I will gladly consider it.

More Repositories

1

jsteg

JPEG steganography
Go
555
star
2

blake3

A pure-Go implementation of the BLAKE3 cryptographic hash function
Assembly
314
star
3

geiger

A Geiger counter for allocations
Go
269
star
4

uint128

uint128 for Go
Go
212
star
5

stm

Software Transactional Memory in Go
Go
190
star
6

freeze

An immutability helper for Go
Go
138
star
7

frand

A fast userspace CSPRNG
Go
81
star
8

lthash

A homomorphic hash function
Go
70
star
9

fastxor

The fastest way to xor bytes in Go
Go
69
star
10

few

Fastest Encoder in the West
Go
64
star
11

randmap

Truly random map access and iteration for Go (OUTDATED -- FOR EDUCATIONAL PURPOSES ONLY)
Go
61
star
12

flagg

Barebones subcommand handling
Go
56
star
13

intersort

Sort anything.
Go
55
star
14

us

An alternative interface to Sia
Go
54
star
15

noescape

Promise to the Go compiler that your Reads and Writes are well-behaved
Go
51
star
16

advent

Advent of Code solutions
Go
38
star
17

rote

A flashcard app for Landscape
JavaScript
31
star
18

goldilocks

Find your habitable zone
JavaScript
28
star
19

hey

Hey!
Go
22
star
20

slouch

A competitive programming language (WIP -- MAY CONTAIN HIDEOUS CODE)
Go
20
star
21

jr

A tiny Command Line Interface JavaScript Object Notation Remote Procedure Call client
Go
16
star
22

user

A CLI renter for Sia
Go
12
star
23

walrus

A wallet server for Sia
Go
12
star
24

go-urbit

Go + Urbit
Go
11
star
25

muse

A contract server for Sia
Go
10
star
26

ascon

Go
6
star
27

httprpc

An HTTP wrapper for net/rpc
Go
6
star
28

ADHN

Inline summaries on the HN front page
JavaScript
5
star
29

nock

A simple Nock interpreter in Go
Go
5
star
30

caller

Nicely-formatted callstack strings
Go
5
star
31

walrus-cli

A client for the walrus wallet server
Go
3
star
32

mkvsynth

Linux Video Processing
C
3
star
33

Streaming-Merkle-Trees-Paper

TeX
2
star
34

tenten

A 1010! AI
Go
2
star
35

pool

Go
2
star
36

threadgroup

A shutdown facilitator for Go
Go
2
star
37

shard

A host announcement relay daemon for Sia
Go
2
star
38

dispel

A searchable online image database
Go
2
star
39

jj

The JSON Journal
Go
1
star
40

silicon

modal fork of sandy
C
1
star
41

0hh1

Automated solver for http://0hh1.com
Go
1
star
42

algo

Fooling around with animation in Go
Go
1
star
43

sialedger

Go interface to the Sia Ledger Nano S app
Go
1
star
44

upnp

Go
1
star
45

us-bindings

Various bindings for https://github.com/lukechampine/us
Go
1
star
46

mjson

Modify JSON super fast
Go
1
star