• Stars
    star
    1,920
  • Rank 24,155 (Top 0.5 %)
  • Language
    Go
  • License
    MIT License
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A POSIX-compliant AWK interpreter written in Go, with CSV support

GoAWK: an AWK interpreter with CSV support

Documentation GitHub Actions Build

AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Programming Language I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" and GNU AWK test suites.

GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the library of the University of Antwerp. Read the CSV documentation.

You can also read one of the articles I've written about GoAWK:

Basic usage

To use the command-line version, simply use go install to install it, and then run it using goawk (assuming ~/go/bin is in your PATH):

$ go install github.com/benhoyt/goawk@latest

$ goawk 'BEGIN { print "foo", 42 }'
foo 42

$ echo 1 2 3 | goawk '{ print $1 + $3 }'
4

# Or use GoAWK's CSV and @"named-field" support:
$ echo -e 'name,amount\nBob,17.50\nJill,20\n"Boba Fett",100.00' | \
  goawk -i csv -H '{ total += @"amount" } END { print total }'
137.5

To use it in your Go programs, you can call interp.Exec() directly for simple needs:

input := strings.NewReader("foo bar\n\nbaz buz")
err := interp.Exec("$0 { print $1 }", " ", input, nil)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// foo
// baz

Or you can use the parser module and then interp.ExecProgram() to control execution, set variables, and so on:

src := "{ print NR, tolower($0) }"
input := "A\naB\nAbC"

prog, err := parser.ParseProgram([]byte(src), nil)
if err != nil {
    fmt.Println(err)
    return
}
config := &interp.Config{
    Stdin: strings.NewReader(input),
    Vars:  []string{"OFS", ":"},
}
_, err = interp.ExecProgram(prog, config)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// 1:a
// 2:ab
// 3:abc

If you need to repeat execution of the same program on different inputs, you can call interp.New once, and then call the returned object's Execute method as many times as you need.

Read the package documentation for more details.

Differences from AWK

The intention is for GoAWK to conform to awk's behavior and to the POSIX AWK spec, but this section describes some areas where it's different.

Additional features GoAWK has over AWK:

  • It has proper support for CSV and TSV files (read the documentation).
  • It's the only AWK implementation we know with a code coverage feature (read the documentation).
  • It supports negative field indexes to access fields from the right, for example, $-1 refers to the last field.
  • It's embeddable in your Go programs! You can even call custom Go functions from your AWK scripts.
  • Most AWK scripts are faster than awk and on a par with gawk, though usually slower than mawk. (See recent benchmarks.)
  • The parser supports 'single-quoted strings' in addition to "double-quoted strings", primarily to make Windows one-liners easier when using the cmd.exe shell (which uses " as the quote character).

Things AWK has over GoAWK:

  • Scripts that use regular expressions are slower than other implementations (unfortunately Go's regexp package is relatively slow).
  • AWK is written by Alfred Aho, Peter Weinberger, and Brian Kernighan.

Stability

This project has a good suite of tests, which include my own intepreter tests, the original AWK test suite, and the relevant tests from the Gawk test suite. I've used it a bunch personally, and it's used in the Benthos stream processor as well as by the software team at the library of the University of Antwerp. However, to err == human, so please use GoAWK at your own risk. I intend not to change the Go API in a breaking way in any v1.x.y version.

AWKGo

The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can read more about AWKGo or browse the code on the awkgo branch.

License

GoAWK is licensed under an open source MIT license.

The end

Have fun, and please contact me if you're using GoAWK or have any feedback!

More Repositories

1

inih

Simple .INI file parser in C, good for embedded systems
C
2,402
star
2

scandir

Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib
Python
529
star
3

pygit

Just enough git (written in Python) to create a repo and push to GitHub
Python
314
star
4

countwords

Playing with counting word frequencies (and performance) in various languages.
Rust
306
star
5

protothreads-cpp

Protothread.h, a C++ port of Adam Dunkels' protothreads library
C++
178
star
6

pybktree

Python BK-tree data structure to allow fast querying of "close" matches
Python
169
star
7

ht

Simple hash table implemented in C
C
148
star
8

pyast64

Compile a subset of the Python AST to x64-64 assembler
Python
136
star
9

go-routing

Different approaches to HTTP routing in Go
Go
120
star
10

loxlox

Lox interpreter written in Lox
Python
112
star
11

mugo

Mugo, a toy compiler for a subset of Go that can compile itself
HTML
108
star
12

littlelang

A little language interpreter written in Go
Go
92
star
13

third

Third, a small Forth compiler for 8086 DOS
Forth
75
star
14

go-1brc

My Go solutions to the One Billion Row Challenge
Go
74
star
15

prig

Prig is for Processing Records In Go. Like AWK, but snobbish.
Go
64
star
16

gogit

Just enough of a git client (in Go) to init a repo, commit, and push to GitHub
Go
51
star
17

cdnupload

Upload your site's static files to a directory or CDN, using content-based hashing
Python
50
star
18

web-service-stdlib

Rewrite of Go RESTful API tutorial using only the stdlib
Go
49
star
19

simplelists

Tiny to-do list web app written in Go
Go
48
star
20

pas2go

Pascal to Go converter (converts a subset of Turbo Pascal 5.5)
Pascal
42
star
21

symplate

Symplate, a simple and fast Python template language (NOTE: no longer maintained; use Jinja2 or Mako instead)
Python
30
star
22

nibbleforth

A very compact stack machine (Forth) bytecode
Python
29
star
23

zztgo

Port of ZZT to Go (using a Pascal-to-Go converter)
Go
26
star
24

gosnip

Run small snippets of Go code from the command line
Go
24
star
25

python-pentomino

Pentomino puzzle solver using Python code generation
Python
20
star
26

benhoyt.github.com

Source code for my website
HTML
18
star
27

betterwalk

BetterWalk, a better and faster os.walk() for Python -- DEPRECATED, see my "scandir" project
Python
17
star
28

fe

Bruce Hoyt's Forth Editor (Dad's editor that I grew up coding with)
Forth
16
star
29

namedmutex

namedmutex.py, a simple ctypes wrapper for Win32 named mutexes
Python
15
star
30

soft404

Soft 404 (dead page) detector in Python
Python
13
star
31

io-performance

Code repo for https://benhoyt.com/writings/io-is-no-longer-the-bottleneck/
Go
13
star
32

awkmake

Code to go with my article "The AWK book's 60-line version of Make"
Awk
13
star
33

repike

Rob Pike's simple regex matcher converted to Go
Go
11
star
34

benos

A tiny 32-bit Forth operating system I wrote when I was 16
Forth
8
star
35

counter

Fast hash table for counting short strings in Go
Go
7
star
36

false-forth

A False compiler and interpreter written in ANS Forth
Forth
7
star
37

py-1brc

Optimising the One Billion Row Challenge (1BRC) in Python
Python
5
star
38

mro

MRO is not an ORM - Map Rows to Objects with web.py
Python
2
star
39

snappass-test

Demo of Juju K8s sidecar charm with Pebble
Python
2
star
40

circle

Draw circles using the Bresenham Circle Algorithm in Go
Go
2
star
41

interpspeed

Test interpreter speed of various language VMs
Python
1
star
42

boggle

Boggle solver competition
Python
1
star