mona
mona
is a Javascript library for easily writing reusable, composable parsers.
It makes parsing complex grammars easy and fun!
With mona
, you simply write some Javascript functions that parse small pieces
of text and return any Javascript value, and then you glue them together into
big, intricate parsers using combinators
to... combine them! No custom syntax
or separate files or separate command line tools to run: you can integrate this
into your regular JS app.
It even makes it really really easy to give excellent error messages, including line and column numbers, and messages with what was expected, with little to no effort.
New parsers are hella easy to write -- give it a shot! And if you're familiar with Parsec, then you've come to the right place. :)
Table of Contents
- Install
- Examples
- API
- Gentle Introduction to Monadic Parser Combinators
Install
$ npm install mona
You can directly require mona
through your module loader of choice, or you can
use the prebuilt UMD versions found in the browser/
directory:
- Node.js/CommonJS -
var mona = require('mona')
- ES6 Modules/Babel -
import mona from 'mona'
- AMD -
define(['node_modules/mona/browser/mona'], function (mona) { ... })
- Global -
<script src=/js/node_modules/mona/browser/mona.min.js></script>
Examples
Parse a series of ints separated by commas
function commaInts () {
return mona.split(mona.integer(), mona.string(','))
}
mona.parse(commaInts(), '1,2,3,49829,49,139')
// => [1, 2, 3, 49829, 49, 139]
A simple, readable CSV parser in ~25 lines
function parseCSV (text) {
return mona.parse(csv(), text)
}
function csv () {
return mona.splitEnd(line(), mona.eol())
}
function line () {
return mona.split(cell(), mona.string(','))
}
function cell () {
return mona.or(quotedCell(),
mona.text(mona.noneOf(',\n\r')))
}
function quotedCell () {
return mona.between(mona.string('"'),
mona.string('"'),
mona.text(quotedChar()))
}
function quotedChar () {
return mona.or(mona.noneOf('"'),
mona.and(mona.string('""'),
mona.value('"')))
}
parseCSV('foo,"bar"\n"b""az",quux\n')
// => [['foo', 'bar'], ['b"az', 'quux']]
API
mona
is a package composed of multiple other packages, re-exported through a
single module. You have the option of installing mona
from npm directly, or
installing any of the subpackages and using those independently.
This API section is organized such that each parser or function is listed under the subpackage it belongs to, along with the name of the npm package you can find it in.
@mona/parse
This module or one of its siblings is needed in order to actually execute defined parsers. Currently, it exports only a single function: a synchronous parser runner.
> parse(parser, string[, opts]) -> T
Synchronously executes a parser on a given string, and returns the resulting value.
{Parser<T>} parser
- The parser to execute.{String} string
- String to parse.{Opts} [opts]
- Options object.{Boolean} [opts.throwOnError=true]
- If truthy, throws a ParserError if the parser fails and returns ParserState instead of its value.{String} [opts.fileName]
- filename to use for error messages.
parse(token(), 'a') // => 'a'
parse(integer(), '123') // => 123
@mona/parse-async
This module exports only a single function: an asynchronous parser runner. You need this module or something similar in order to actually execute your parsers.
> parseAsync(parser, callback[, opts]) -> Handle
Executes a parser asynchronously, returning an object that can be used to manage the parser state.
You can feed new data into the parsing process by calling the returned handle's
#data()
method. Unless the parser given tries to match eof()
, parsing will
continue until the handle's #done()
method is called.
{Function} parser
- The parser to execute.{AsyncParserCallback} callback
- node-style 2-arg callback executed once per successful application ofparser
.{Object} [opts]
- Options object.{String} [opts.fileName]
- filename to use for error messages.
var handle = parseAsync(token(), function(tok) {
console.log('Got a token: ', tok)
})
handle.data('foo')
// logs:
// > Got a token: f
// > Got a token: o
// > Got a token: o
@mona/core
The core parser package contains essential and dev-utility parsers that are
intended to be the core of the rest of the parser libraries. Some of these are
very low level, such as bind()
. Others are not necessarily meant to be used in
production, but can help with debugging, such as log()
.
> value(val) -> Parser<T>
Always succeeds with val
as its value, without consuming any input.
{T} val
- value to use as this parser's value.
parse(value('foo'), '') // => 'foo'
> bind(parser, fun) -> Parser<U>
Calls fun
on the value from parser
. Fails without executing fun
if
parser
fails.
-
{Parser<T>} parser
- The parser to execute. -
{Function(Parser<T>) -> Parser<U>} fun
- Function called with the resulting value ofparser
.
parse(bind(token(), function (x) {
return value(x + '!')
}), 'a') // => 'a!'
> fail([msg[, type]]) -> Parser<Fail>
Always fails without consuming input. Automatically includes the line and column
positions in the final ParserError
.
{String} [msg='parser error']
- Message to report with the failure.{String} [type='failure']
- A type to apply to the ParserError.
> label(parser, msg) -> Parser<T>
Label a parser
failure by replacing its error messages with msg
.
{Parser<T>} parser
- Parser whose errors to replace.{String} msg
- Error message to replace errors with.
parse(token(), '') // => unexpected eof
parse(label(token(), 'thing'), '') // => expected thing
> token([count]) -> Parser<String>
Consumes a single item from the input, or fails with an unexpected eof error if there is no input left.
{Integer} [count=1]
- number of tokens to consume. Must be > 0.
parse(token(), 'a') // => 'a'
> eof() -> Parser<true>
Succeeds with a value of true
if there is no more input to consume.
parse(eof(), '') // => true
> delay(constructor, ...args) -> Parser<T>
Delays calling of a parser constructor function until parse-time. Useful for recursive parsers that would otherwise blow the stack at construction time.
{Function(...T) -> Parser<T>} constructor
- A function that returns a Parser.{...T} args
- Arguments to apply to the constructor.
// The following would usually result in an infinite loop:
function foo() {
return or(x(), foo())
}
// But you can use delay() to remedy this...
function foo() {
return or(x(), delay(foo))
}
>log(parser, label[, level]) -> Parser<T>
Logs the ParserState
resulting from parser
with a label
.
{Parser<T>} parser
- Parser to wrap.{String} tag
- Tag to use when logging messages.{String} [level='log']
- 'log', 'info', 'debug', 'warn', 'error'.
>map(fun, parser) -> Parser<T>
Transforms the resulting value of a successful application of its given parser.
This function is a lot like bind
, except it always succeeds if its parser
succeeds, and is expected to return a transformed value, instead of another
parser.
{Function(U) -> T} transformer
- Function called onparser
's value. Its return value will be used as themap
parser's value.{Parser<U>} parser
- Parser that will yield the input value.
parse(map(parseFloat, text()), '1234.5') // => 1234.5
>tag(parser, tag) -> Parser<Object<T>>
Results in an object with a single key whose value is the result of the given parser. This can be useful for when you want to build ASTs or otherwise do some tagged tree structure.
{Parser<T>} parser
- Parser whose value will be tagged.{String} tag
- String to use as the object's key.
parse(tag(token(), 'myToken'), 'a') // => {myToken: 'a'}
>lookAhead(parser) -> Parser<T>
Runs a given parser without consuming input, while still returning a success or failure.
{Parser<T>} parser
- Parser to execute.
parse(and(lookAhead(token()), token()), 'a') // => 'a'
>is(predicate[, parser]) -> Parser<T>
Succeeds if predicate
returns a truthy value when called on parser
's result.
{Function(T) -> Boolean} predicate
- Tests a parser's result.{Parser<T>} [parser=token()]
- Parser to run.
parse(is(function (x) { return x === 'a' }), 'a') // => 'a'
> isNot(predicate[, parser]) -> Parser<T>
Succeeds if predicate
returns a falsy value when called on parser
's result.
{Function(T) -> Boolean} predicate
- Tests a parser's result.{Parser<T>} [parser=token()]
- Parser to run.
parse(isNot(function (x) { return x === 'a' }), 'b') // => 'b'
@mona/combinators
Parser combinators are at the very core of what makes something like mona shine: They are, themselves, parsers, but they are intended to accept other parsers as arguments, that they will then use to do whatever job they're doing.
Combinators do just that: They combine parsers. They act as the glue that lets you take all those individual parsers that you wrote, and combine them into increasingly more intricate parsers.
This package contains things like collect()
, split()
, and the or()
/and()
pair.
> and(...parsers, lastParser) -> Parser<T>
Succeeds if all the parsers given to it succeed, using the value of the last executed parser as its return value.
{...Parser<*>} parsers
- Parsers to execute.{Parser<T>} lastParser
- Parser whose result is returned.
parse(and(token(), token()), 'ab') // => 'b'
> or(...parsers[, label]) -> Parser<T>
Succeeds if one of the parsers given to it succeeds, using the value of the first successful parser as its result.
{...Parser<T,*>} parsers
- Parsers to execute.{String} [label]
- Label to replace the full message with.
parse(or(string('foo'), string('bar')), 'bar') // => 'bar'
> maybe(parser) -> Parser<T> | Parser<undefined>
Returns the result of parser
if it succeeds, otherwise succeeds with a value
of undefined
without consuming any input.
{Parser<T>} parser
- Parser to try.
parse(maybe(token()), 'a') // => 'a'
parse(maybe(token()), '') // => undefined
> not(parser) -> Parser<undefined>
Succeeds if parser
fails. Does not consume.
{Parser<*>} parser
- parser to test.
parse(and(not(string('a')), token()), 'b') // => 'b'
> unless(notParser, ...moreParsers, lastParser) -> Parser<T>
Works like and
, but fails if the first parser given to it succeeds. Like
and
, it returns the value of the last successful parser.
{Parser<*>} notParser
- If this parser succeeds,unless
will fail.{...Parser} moreParsers
- Rest of the parses to test.{Parser<T>} lastParser
- Parser whose value to return.
parse(unless(string('a'), token()), 'b') // => 'b'
> sequence(fun) -> Parser<T>
Put simply, this parser provides a way to write complex parsers while letting
your code look like regular procedural code. You just wrap your parsers with
s()
, and the rest of your code can be sequential. If the description seems
confusing, see the example.
This parser executes fun
while handling the parserState
internally, allowing
the body of fun
to be written sequentially. The purpose of this parser is to
simulate do
notation and prevent the need for heavily-nested bind
calls.
The fun
callback will receive a function s
which should be called with
each parser that will be executed, which will update the internal
parserState. The return value of the callback must be a parser.
If any of the parsers fail, sequence will exit immediately, and the entire sequence will fail with that parser's reason.
{Function -> Parser<T>} fun
- A sequence callback function to execute.
mona.sequence(function (s) {
var x = s(mona.token())
var y = s(mona.string('b'))
return mona.value(x + y)
})
> join(...parsers) -> Parser<Array<T>>
Succeeds if all the parsers given to it succeed, and results in an array of all the resulting values, in order.
{...Parser<T>} parsers
- One or more parsers to execute.
parse(join(alpha(), integer()), 'a1') // => ['a', 1]
> followedBy(parser, ...moreParsers) -> Parser<T>
Returns the result of its first parser if it succeeds, but fails if any of the following parsers fail.
-
{Parser<T>} parser
- The value of this parser is returned if it succeeds. -
{...Parser<*>} moreParsers
- These parsers must succeed in order forfollowedBy
to succeed.
parse(followedBy(string('a'), string('b'), string('c')), 'abc') // => 'a'
parse(followedBy(string('a'), string('a')), 'abc') // => expected {a}
> split(parser, separator[, opts]) -> Parser<Array<T>>
Results in an array of successful results of parser
, divided by the
separator
parser.
{Parser<T>} parser
- Parser for matching and collecting results.{Parser<U>} separator
- Parser for the separator{Opts} [opts]
- Optional options for controlling min/max.{Integer} [opts.min=0]
- Minimum length of the resulting array.{Integer} [opts.max=Infinity]
- Maximum length of the resulting array.
parse(split(token(), space()), 'a b c d') // => ['a','b','c','d']
> splitEnd(parser, separator[, opts]) -> Parser<Array<T>>
Results in an array of results that have been successfully parsed by parser
,
separated and ended by separator
.
{Parser<T>} parser
- Parser for matching and collecting results.{Parser<U>} separator
- Parser for the separator{Integer} [opts.enforceEnd=true]
- If true,separator
must be at the end of the parse.{Integer} [opts.min=0]
- Minimum length of the resulting array.{Integer} [opts.max=Infinity]
- Maximum length of the resulting array.
parse(splitEnd(token(), space()), 'a b c ') // => ['a', 'b', 'c']
> collect(parser[, opts]) -> Parser<Array<T>>
Results in an array of min
to max
number of matches of parser
{Parser<T>} parser
- Parser to match.{Integer} [opts.min=0]
- Minimum number of matches.{Integer} [opts.max=Infinity]
- Maximum number of matches.
parse(collect(token()), 'abcd') // => ['a', 'b', 'c', 'd']
> exactly(parser, n) -> Parser<Array<T>>
Results in an array of exactly n
results for parser
.
{Parser<T>} parser
- The parser to collect results for.{Integer} n
- exact number of results to collect.
parse(exactly(token(), 4), 'abcd') // => ['a', 'b', 'c', 'd']
> between(open, close, parser) -> Parser<V>
Results in a value between an opening and closing parser.
{Parser<T>} open
- Opening parser.{Parser<U>} close
- Closing parser.{Parser<V>} parser
- Parser to return the value of.
parse(between(string('('), string(')'), token()), '(a)') // => 'a'
> skip(parser) -> Parser<undefined>
Skips input until parser
stops matching.
{Parser<T>} parser
- Determines whether to continue skipping.
parse(and(skip(string('a')), token()), 'aaaab') // => 'b'
> range(start, end[, parser[, predicate]]) -> Parser<T>
Accepts a parser if its result is within range of start
and end
.
{*} start
- lower bound of the range to accept.{*} end
- higher bound of the range to accept.{Parser<T>} [parser=token()]
- parser whose results to test{Function(T) -> Boolean} [predicate=function(x,y){return x<=y }]
- Tests range
parse(range('a', 'z'), 'd') // => 'd'
@mona/strings
This package is intended as a collection of string-related parsers. That is, parsers that specifically return string-related data or somehow match and manipulate strings themselves.
Here, you'll find the likes of string()
(the exact-string matching parser),
spaces()
, and trim()
.
> stringOf(parser) -> Parser<String>
Results in a string containing the concatenated results of applying parser
.
parser
must be a combinator that returns an array of string parse results.
{Parser<Array<String>>} parser
- Parser whose result to concatenate.
parse(stringOf(collect(token())), 'aaa') // => 'aaa'
> oneOf(matches[, caseSensitive]) -> Parser<String>
Succeeds if the next token or string matches one of the given inputs.
{String|Array<String>} matches
- Characters or strings to match. If this argument is a string, it will be treated as if matches.split('') were passed in.{Boolean} [caseSensitive=true]
- Whether to match char case exactly.
parse(oneOf('abcd'), 'c') // => 'c'
parse(oneOf(['foo', 'bar', 'baz']), 'bar') // => 'bar'
> noneOf(matches[, caseSensitive[, other]]) -> Parser<T>
Fails if the next token or string matches one of the given inputs. If the third
parser
argument is given, that parser will be used to collect the actual value
of noneOf
.
{String|Array} matches
- Characters or strings to match. If this argument is a string, it will be treated as if matches.split('') were passed in.{Boolean} [caseSensitive=true]
- Whether to match char case exactly.{Parser<T>} [other=token()]
- What to actually parse if none of the given matches succeed.
parse(noneOf('abc'), 'd') // => 'd'
parse(noneOf(['foo', 'bar', 'baz']), 'frob') // => 'f'
parse(noneOf(['foo', 'bar', 'baz'], true, text()), 'frob') // => 'frob'
> string(str[, caseSensitive]) -> Parser<String>
Succeeds if str
matches the next str.length
inputs,
consuming the string and returning it as a value.
{String} str
- String to match against.{Boolean} [caseSensitive=true]
- Whether to match char case exactly.
parse(string('foo'), 'foo') // => 'foo'
> alphaUpper() -> Parser<String>
Matches a single non-unicode uppercase alphabetical character.
parse(alphaUpper(), 'D') // => 'D'
> alphaLower() -> Parser<String>
Matches a single non-unicode lowercase alphabetical character.
parse(alphaLower(), 'd') // => 'd'
> alpha() -> Parser<String>
Matches a single non-unicode alphabetical character.
parse(alpha(), 'd') // => 'd'
parse(alpha(), 'D') // => 'D'
> digit(base) -> Parser<String>
Parses a single digit character token from the input.
{Integer} [base=10]
- Optional base for the digit.
parse(digit(), '5') // => '5'
> alphanum(base) -> Parser<String>
Matches an alphanumeric character.
{Integer} [base=10]
- Optional base for numeric parsing.
parse(alphanum(), '1') // => '1'
parse(alphanum(), 'a') // => 'a'
parse(alphanum(), 'A') // => 'A'
> space() -> Parser<String>
Matches one whitespace character.
parse(space(), '\r') // => '\r'
> spaces() -> Parser<String>
Matches one or more whitespace characters. Returns a single space character as its result, regardless of which whitespace characters and how many were matched.
parse(spaces(), ' \r\n\t \r \n') // => ' '
> text([parser[, opts]]) -> Parser<String>
Collects between min
and max
number of matches for parser
. The result is
returned as a single string. This parser is essentially collect()
for strings.
{Parser<String>} [parser=token()]
- Parser to use to collect the results.{Object} [opts]
- Options to control match count.{Integer} [opts.min=0]
- Minimum number of matches.{Integer} [opts.max=Infinity]
- Maximum number of matches.
* parse(text(), 'abcde') // => 'abcde'
* parse(text(noneOf('a')), 'bcde') // => 'bcde'
> trim(parser) -> Parser<T>
Trims any whitespace surrounding parser
, and returns parser
's result.
{Parser<T>} parser
- Parser to match after cleaning up whitespace.
parse(trim(token()), ' \r\n a \t') // => 'a'
> trimLeft(parser) -> Parser<T>
Trims any leading whitespace before parser
, and returns parser
's result.
{Parser<T>} parser
- Parser to match after cleaning up whitespace.
parse(trimLeft(token()), ' \r\n a') // => 'a'
> trimRight(parser) -> Parser<T>
Trims any trailing whitespace before parser
, and returns parser
's result.
{Parser} parser
- Parser to match after cleaning up whitespace.
parse(trimRight(token()), 'a \r\n') // => 'a'
> eol() -> Parser<String>
Parses the end of a line.
parse(eol(), '\n') // => '\n'
@mona/numbers
If you ever need a parser that will take strings and turn them into the numbers
you want the to be, this is the place to look. Parsers in this package include
integer()
, float()
, and ordinal()
(which parses English ordinals (first
,
second
, third
) into numbers).
> natural(base) -> Parser<Integer>
Matches a natural number. That is, a number without a positive/negative sign or decimal places, and returns a positive integer.
{Integer} [base=10]
- Base to use when parsing the number.
* parse(natural(), '1234') // => 1234
> integer(base) -> Parser<Integer>
Matches an integer, with an optional + or - sign.
{Integer} [base=10]
- Base to use when parsing the integer.
parse(integer(), '-1234') // => -1234
> real() -> Parser<Float>
Parses a floating point number.
parse(real(), '-1234e-10') // => -1.234e-7
> cardinal() -> Parser<Integer>
Parses english cardinal numbers into their numerical counterparts
parse(cardinal(), 'two thousand') // => 2000
> ordinal() -> Parser<Integer>
Parses English ordinal numbers into their numerical counterparts.
parse(ordinal(), 'one-hundred thousand and fifth') // 100005
> shortOrdinal() -> Parser<Integer>
Parses shorthand english ordinal numbers into their numerical counterparts. Optionally allows you to remove correct suffix checks and allow any apparent ordinal to get through.
{Boolean} [strict=true]
- Whether to accept only appropriate suffixes for each number. (if false,2th
parses to2
)
parse(shortOrdinal(), '5th') // 5
Gentle Intro to Monadic Parser Combinators
mona
works by composing functions called parsers
. These functions are
created by so-called parser constructors
. Most of the mona
API exposes these
constructors.
Primitive parsers
There are three primitive parsers in mona: value()
, fail()
, and
token()
.
value()
- results in its single argument, without consuming input.fail()
- fails unconditionally, without consuming input.token()
- consumes a single token, or character, from the input. Fails if there's nothing left to consume.
Simply creating a parser is not enough to execute a parser, though. We need to
use the parse
function, to actually execute the parser on an input string:
mona.parse(mona.value('foo'), '') // => 'foo'
mona.parse(mona.fail(), '') // => throws an exception
mona.parse(mona.token(), 'a') // => 'a'
mona.parse(mona.token(), '') // => error, unexpected eof
The primitive combinator
These three parsers do not seem to get us much of anywhere, so we introduce our
first combinator: bind()
. bind()
accepts a parser as its first argument,
and a function as its second argument. The function will be called with the
parser's result value only if the parser succeeds. The function must then
return another parser, which will be used to determine bind()
's value:
mona.parse(mona.bind(mona.token(), function (character) {
if (character === 'a') {
return mona.value('found an "a"!')
} else {
return mona.fail()
}
}), 'a') // => 'found an "a"!'
Basic utility combinators
bind()
, of course, is just the beginning. Now that we know we can combine
parsers, we can play with some of mona
's fancier parsers and combinators. For
example, the or
combinator resolves to the first parser that succeeds, in the
order they were provided, or fails if none of those parsers succeeded:
mona.parse(mona.or(mona.fail('nope'),
mona.fail('nope again'),
mona.value('this one!')),
'')
// => 'this one!'
mona.parse(mona.or(mona.fail('nope'),
mona.value('this one!'),
mona.value('but not this one')),
'')
// => 'this one!'
and()
is another basic combinator. It succeeds only if all its parsers
succeed, and resolves to the value of the last parser. Otherwise, it fails with
the first failed parser's error.
mona.parse(mona.and(mona.value('foo'),
mona.value('bar')),
'')
// => 'bar'
Finally, there's the not()
combinator. It's important to note that, regardless
of its argument's result, not()
will not consume input... it must be combined
with something that does.
mona.parse(mona.and(mona.not(mona.token()), mona.value('end of input')), '')
// => 'end of input'
Matching strings
The string()
parser might come in handy: It results in a string matching a given
string:
mona.parse(mona.string('foo'), 'foo')
// => 'foo'
And can of course be combined with some combinator to provide an alternative value:
monap.parse(mona.and(mona.string('foo'), mona.value('got a foo!')), 'foo')
// => 'got a foo!'
The is()
parser can also be used to succeed or fail depending on whether the
next token matches a particular predicate:
mona.parse(mona.is(function (x) { return x === 'a' }), 'a')
// => 'a'
Sequential syntax
Writing parsers by composing functions is perfectly fine and natural, and you
might get quite a feel for it, but sometimes it's nice to have something that
feels a bit more procedural. For situations like that, you can use sequence
:
function parenthesized () {
return mona.sequence(function (s) {
// The s() function passed into `sequence()`'s callback
// must be used to execute any parsers within the sequence.
var open = s(mona.string('('))
// open === '(' if the `string()` parser succeeds.
var data = s(mona.token())
var close = s(mona.string(')'))
// The `sequence()` callback must return another parser, just like `bind()`.
// Also like `bind()`, it can `return fail()` to fail the parser.
return mona.value(data)
})
}
mona.parse(parenthesized(), '(a)')
// => 'a'
We can generalize this parser into a combinator by accepting an arbitrary parser as an input:
function parenthesized (parser) {
return mona.sequence(function (s) {
var open = s(mona.string('('))
var data = s(parser) // Use the parser here!
var close = s(mona.string(')'))
return mona.value(data)
})
}
mona.parse(parenthesized(mona.string('foo!')), '(foo!)')
// => 'foo!'
Note that if the given parser consumes closing parentheses, this will fail:
mona.parse(parenthesized(mona.string('something)'), '(something)')
// => error, unexpected EOF
The Rest of It
Once you've got the basics down, you can explore mona
's API for more
interesting parsers. A variety of useful parsers are available for use, such as
collect()
, which collects the results of a parser into an array until the
parser fails, or float()
, which parses a floating-point number and returns the
actual number. For more examples on how to use mona
to create parsers for
actual formats, take a look in the examples/
directory included with the
project, which includes examples for json
and csv
.