• Stars
    star
    224
  • Rank 171,882 (Top 4 %)
  • Language
    Swift
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mac and iOS library for parsing structured text

Travis Coveralls Platforms Swift 5.0 License Twitter

Introduction

What?

Consumer is a library for Mac and iOS for parsing structured text such as a configuration file, or a programming language source file.

The primary interface is the Consumer type, which is used to programmatically build up a parsing grammar.

Using that grammar, you can then parse String input into an AST (Abstract Syntax Tree), which can then be transformed into application-specific data

Why?

There are many situations where it is useful to be able to parse structured data. Most popular file formats have some kind of parser, typically either written by hand or by using code generation.

Writing a parser is a time-consuming and error-prone process. Many tools exist in the C world for generating parsers, but relatively few such tools exist for Swift.

Swift's strong typing and sophisticated enum types make it well-suited for creating parsers, and Consumer takes advantage of these features.

How?

Consumer uses an approach called recursive descent to parse input. Each Consumer instance consists of a tree of sub-consumers, with the leaves of the tree matching individual strings or characters in the input.

You build up a consumer by starting with simple rules that match individual words or values (known as "tokens") in your language or file format. You then compose these into more complex rules that match sequences of tokens, and so on until you have a single consumer that describes an entire document in the language you are trying to parse.

Usage

Installation

The Consumer type and its dependencies are encapsulated in a single file, and everything public is prefixed or name-spaced, so you can just drag Consumer.swift into your project to use it.

If you prefer, there's a framework for Mac and iOS that you can import which includes the Consumer type. You can install this manually, or by using CocoaPods, Carthage, or Swift Package Manager.

To install Consumer using CocoaPods, add the following to your Podfile:

pod 'Consumer', '~> 0.3'

To install using Carthage, add this to your Cartfile:

github "nicklockwood/Consumer" ~> 0.3

To install using Swift Package Manage, add this to the dependencies: section in your Package.swift file:

.package(url: "https://github.com/nicklockwood/Consumer.git", .upToNextMinor(from: "0.3.0")),

Parsing

The Consumer type is an enum, so you can create a consumer by assigning one of its possible values to a variable. For example, here is a consumer that matches the string "foo":

let foo: Consumer<String> = .string("foo")

To parse a string with this consumer, call the match() function:

do {
    let match = try foo.match("foo")
    print(match) // Prints the AST
} catch {
    print(error)
}

In this simple example above, the match will always succeed. If tested against arbitrary input, the match will potentially fail, in which case an Error will be thrown. The Error will be of type Consumer.Error, which includes information about the error type and the location in the input string where it occurred.

The example above is not very useful - there are much simpler ways to detect string equality! Let's try a slightly more advanced example. The following consumer matches an unsigned integer:

let integer: Consumer<String> = .oneOrMore(.character(in: "0" ... "9"))

The top-level consumer in this case is of type oneOrMore, meaning that it matches one or more instances of the nested .character(in: "0" ... "9") consumer. In other words, it will match any sequence of characters in the range "0" - "9".

There's a slight problem with this implementation though: An arbitrary sequence of digits might include leading zeros, e.g. "01234", which could be mistaken for an octal number in some programming languages, or even just be treated as a syntax error. How can we modify the integer consumer to reject leading zeros?

We need to treat the first character differently from the subsequent ones, which means we need two different parsing rules to be applied in sequence. For that, we use a sequence consumer:

let integer: Consumer<String> = .sequence([
    .character(in: "1" ... "9"),
    .zeroOrMore(.character(in: "0" ... "9")),
])

So instead of oneOrMore digits in the range 0 - 9, we're now looking for a single digit in the range 1 - 9, followed by zeroOrMore digits in the range 0 - 9. That means that a zero preceding a nonzero digit will not be matched.

do {
    _ = try integer.match("0123")
} else {
    print(error) // Unexpected token "0123" at 0
}

We've introduced another bug though - Although leading zeros are correctly rejected, "0" on its own will now also be rejected since it doesn't start with 1 - 9. We need to accept either zero on its own, or the sequence we just defined. For that, we can use any:

let integer: Consumer<String> = .any([
    .character("0"),
    .sequence([
        .character(in: "1" ... "9"),
        .zeroOrMore(.character(in: "0" ... "9")),
    ]),
])

That will do what we want, but it's quite a bit more complex. To make it more readable, we could break it up into separate variables:

let zero: Consumer<String> = .character("0")
let oneToNine: Consumer<String> = .character(in: "1" ... "9")
let zeroToNine: Consumer<String> = .character(in: "0" ... "9")

let nonzeroInteger: Consumer<String> = .sequence([
    oneToNine, .zeroOrMore(zeroToNine),
])

let integer: Consumer<String> = .any([
    zero, nonzeroInteger,
])

We can then further extend this with extra rules, e.g.

let sign = .any(["+", "-"])

let signedInteger: Consumer<String> = .sequence([
    .optional(sign), integer,
])

Character Sets

The basic consumer type is charset(Charset) which matches a single character in a specified set. The Charset type is opaque, and cannot be constructed directly - instead, you should use the character(...) family of convenience constructors, which accept either a range of UnicodeScalars or a Foundation CharacterSet.

For example, to define a consumer that matches the digits 0 - 9, you can use a range:

let range: Consumer<String> = .character(in: "0" ... "9")

You could also use the predefined decimalDigits CharacterSet provided by Foundation, though you should note that this includes numerals from other languages such as Arabic, and so may not be what you want when parsing a data format like JSON, or a programming language which only expects ASCII digits.

let range: Consumer<String> = .character(in: .decimalDigits)

These two functions are actually equivalent to the following, but thanks to the magic of type inference and function overloading, you can use the more concise syntax above:

let range: Consumer<String> = Consumer<String>.character(in: CharacterSet(charactersIn: "0" ... "9"))
let range: Consumer<String> = Consumer<String>.character(in: CharacterSet.decimalDigits)

You can create an inverse character set by using the anyCharacter(except: ...) constructor. This is useful if you want to match any character except a particular set. In the following example, we use this feature to parse a quoted string literal by matching a double quote followed by a sequence of any characters except a double quote, followed by a final closing double quote:

let string: Consumer<String> = .sequence([
    .character("\""),
    .zeroOrMore(.anyCharacter(except: "\"")),
    .character("\""),
])

The .anyCharacter(except: "\"") constructor is functionally equivalent to:

 .character(in: CharacterSet(charactersIn: "\"").inverted)

But the former produces a more helpful error message if matching fails since it retains the concept of being "every character except X", whereas the latter will be displayed as a range containing all unicode characters except the ones specified.

Transforming

In the previous section we wrote a consumer that can match an integer number. But what do we get when we apply that to some input? Here is the matching code:

let match = try integer.match("1234")
print(match)

And here is the output:

(
    '1'
    '2'
    '3'
    '4'
)

That's ... odd. You were probably hoping for a String containing "1234", or at least something a bit simpler to work with.

If we dig in a bit deeper and look at the structure of the Match value returned, we'll find it's something like this (omitting namespaces and other metadata for clarity):

Match.node(nil, [
    Match.token("1", 0 ..< 1),
    Match.token("2", 1 ..< 2),
    Match.token("3", 2 ..< 3),
    Match.token("4", 3 ..< 4),
])

Because each digit in the number was matched individually, the result has been returned as an array of tokens, rather than a single token representing the entire number. This level of detail is potentially useful for some applications, but we don't need it right now - we just want to get the value. To do that, we need to transform the output.

The Match type has a method called transform() for doing exactly that. The transform() method takes a closure argument of type Transform, which has the signature (_ name: Label, _ values: [Any]) throws -> Any?. The closure is applied recursively to all matched values in order to convert them to whatever form your application needs.

Unlike parsing, which is done from the top down, transforming is done from the bottom up. That means that the child nodes of each Match will be transformed before their parents, so that all the values passed to the transform closure should have already been converted to the expected types.

So the transform function takes an array of values and collapses them into a single value (or nil) - pretty straightforward - but you're probably wondering about the Label argument. If you look at the definition of the Consumer type, you'll notice that it also takes a generic argument of type Label. In the examples so far we've been passing String as the label type, but we've not actually used it yet.

The Label type is used in conjunction with the label consumer. This allows you to assign a name to a given consumer rule, which can be used to refer to it later. Since you can store consumers in variables and refer to them that way, it's not immediately obvious why this is useful, but it has two purposes:

The first purpose is to allow forward references, which are explained below.

The second purpose is for use when transforming, to identify the type of node to be transformed. Labels assigned to consumer rules are preserved in the Match node after parsing, making it possible to identify which rule was matched to create a particular type of value. Matched values that are not labelled cannot be individually transformed, they will instead be passed as the values for the first labelled parent node.

So, to transform the integer result, we must first give it a label, by using the label consumer type:

let integer: Consumer<String> = .label("integer", .any([
    .character("0"),
    .sequence([
        .character(in: "1" ... "9"),
        .zeroOrMore(.character(in: "0" ... "9")),
    ]),
]))

We can then transform the match using the following code:

let result = try integer.match("1234").transform { label, values in
    switch label {
    case "integer":
        return (values as! [String]).joined()
    default:
        preconditionFailure("unhandled rule: \(name)")
    }
}
print(result ?? "")

We know that the integer consumer will always return an array of string tokens, so we can safely use as! in this case to cast values to [String]. This is not especially elegant, but its the nature of dealing with dynamic data in Swift. Safety purists might prefer to use as? and throw an Error if the value is not a [String], but that situation could only arise in the event of a programming error - no input data matched by the integer consumer we've defined above will ever return anything else.

With the addition of this function, the array of character tokens is transformed into a single string value. The printed result is now simply '1234'. That's much better, but it's still a String, and we may well want it to be an actual Int if we're going to use the value. Since the transform function returns Any?, we can return any type we want, so let's modify it to return an Int instead:

switch label {
case "integer":
    let string = (values as! [String]).joined()
    guard let int = Int(string) else {
        throw MyError(message: "Invalid integer literal '\(string)'")
    }
    return int
default:
    preconditionFailure("unhandled rule: \(name)")
}

The Int(_ string: String) initializer returns an Optional in case the argument cannot be converted to an Int. Since we've already pre-determined that the string only contains digits, you might think we could safely force unwrap this, but it is still possible for the initializer to fail - the matched integer might have too many digits to fit into 64 bits, for example.

We could just return the result of Int(string) directly, since the return type for the transform function is Any?, but this would be a mistake because that would silently omit the number from the output if the conversion failed, and we actually want to treat it as an error instead.

We've used an imaginary error type called MyError here, but you can use whatever type you like. Consumer will wrap the error you throw in a Consumer.Error before returning it, which will annotate it with the source input offset and other useful metadata preserved from the parsing process.

Common Transforms

Certain types of transform are very common. In addition to the Array -> String conversion we've just done, other examples include discarding a value (equivalent to returning nil from the transform function), or substituting a given string for a different one (e.g. replace "\n" with a newline character, or vice-versa).

For these common operations, rather than applying a label to the consumer and having to write a transform function, you can use one of the built-in consumer transforms:

  • flatten - flattens a node tree into a single string token
  • discard - removes a matched string token or node tree from the results
  • replace - replaces a matched node tree or string token with a different string token

Note that these transforms are applied during the parsing phase, before the Match is returned or the regular transform() function can be applied.

Using the flatten consumer, we can simplify our integer transform a bit:

let integer: Consumer<String> = .label("integer", .flatten(.any([
    .character("0"),
    .sequence([
        .character(in: "1" ... "9"),
        .zeroOrMore(.character(in: "0" ... "9")),
    ]),
])))

let result = try integer.match("1234").transform { label, values in
    switch label {
    case "integer":
        let string = values[0] as! String // matched value is now always a string
        guard let int = Int(string) else {
            throw MyError(message: "Invalid integer literal '\(string)'")
        }
        return int
    default:
        preconditionFailure("unhandled rule: \(name)")
    }
}

Typed Labels

Besides the need for force-unwrapping, another inelegance in our transform function is the need for the default: clause in the switch statement. Swift is trying to be helpful here by insisting that we handle all possible label values, but we know that "integer" is the only possible label in this code, so the default: is redundant.

Fortunately, Swift's type system can help here. Remember that the label value is not actually a String but a generic type Label. This allows use to use any type we want for the label (provided it conforms to Hashable), and a really good approach is to create an enum for the Label type:

enum MyLabel: String {
    case integer
}

If we now change our code to use this MyLabel enum instead of String, we avoid error-prone copying and pasting of string literals and we eliminate the need for the default: clause in the transform function, since Swift can now determine statically that integer is the only possible value. The other nice benefit is that if we add other label types in future, the compiler will warn us if we forget to implement transforms for them.

The complete, updated code for the integer consumer is shown below:

enum MyLabel: String {
    case integer
}

let integer: Consumer<MyLabel> = .label(.integer, .flatten(.any([
    .character("0"),
    .sequence([
        .character(in: "1" ... "9"),
        .zeroOrMore(.character(in: "0" ... "9")),
    ]),
])))

enum MyError: Error {
    let message: String
}

let result = try integer.match("1234").transform { label, values in
    switch label {
    case .integer:
        let string = values[0] as! String
        guard let int = Int(string) else {
            throw MyError(message: "Invalid integer literal '\(string)'")
        }
        return int
    }
}
print(result ?? "")

Forward References

More complex parsing grammars (e.g. for a programming language or a structured data file) may require circular references between rules. For example, here is an abridged version of the grammar for parsing JSON:

let null: Consumer<String> = .string("null")
let bool: Consumer<String> = ...
let number: Consumer<String> = ...
let string: Consumer<String> = ...
let object: Consumer<String> = ...

let array: Consumer<String> = .sequence([
    .string("["),
    .optional(.interleaved(json, ","))
    .string("]"),
])

let json: Consumer<String> = .any([null, bool, number, string, object, array])

The array consumer contains a comma-delimited sequence of json values, and the json consumer can match any other type, including array itself.

You see the problem? The array consumer references the json consumer before it has been declared. This is known as a forward reference. You might think we can solve this by predeclaring the json variable before we assign its value, but this won't work - Consumer is a value type, so every reference to it is actually a copy - it needs to be defined up front.

In order to implement this, we need to make use of the label and reference features. First, we must give the json consumer a label so that it can be referenced before it is declared:

let json: Consumer<String> = .label("json", .any([null, bool, number, string, object, array]))

Then we replace json inside the array consumer with .reference("json"):

let array: Consumer<String> = .sequence([
    .string("["),
    .optional(.interleaved(.reference("json"), ","))
    .string("]"),
])

Note: You must be careful when using references like this, not just to ensure that the named consumer actually exists, but that it is included in a non-reference form somewhere in your root consumer (the one which you actually try to match against the input).

In this case, json is the root consumer, so we know it exists. But what if we had defined the reference the other way around?

let json: Consumer<String> = .any([null, bool, number, string, object, .reference("array")])

let array: Consumer<String> = .label("array", .sequence([
    .string("["),
    .optional(.interleaved(json, ","))
    .string("]"),
]))

So now we've switched things up so that json is defined first, and has a forward reference to array. It seems like this should work, but it won't. The problem is that when we go to match json against an input string, there's no copy of the actual array consumer anywhere in the json consumer. It's referenced by name only.

You can avoid this problem if you ensure that references only point from child nodes to their parents, and that parent consumers reference their children directly, rather than by name.

Syntax Sugar

Consumer deliberately doesn't go overboard with custom operators because it can make code that is inscrutable to other Swift developers, however there are a few syntax extensions that can help to make your parser code a bit more readable:

The Consumer type conforms to ExpressibleByStringLiteral as shorthand for the .string() case, which means that instead of writing:

let foo: Consumer<String> = .string("foo")
let foobar: Consumer<String> = .sequence([.string("foo"), .string("bar")])

You can actually just write:

let foo: Consumer<String> = "foo"
let foobar: Consumer<String> = .sequence(["foo", "bar"])

Additionally, Consumer conforms to ExpressibleByArrayLiteral as a shorthand for .sequence(), so instead of:

let foobar: Consumer<String> = .sequence(["foo", "bar"])

You can just write:

let foobar: Consumer<String> = ["foo", "bar"]

The OR operator | is also overloaded for Consumer as an alternative to using .any(), so instead of:

let fooOrbar: Consumer<String> = .any(["foo", "bar"])

You can write:

let fooOrbar: Consumer<String> = "foo" | "bar"

Be careful when using the | operator for very complex expressions however, as it can cause Swift's compile time to go up exponentially due to the complexity of type inference. It's best to only use | for a small number of cases. If it's more than 4 or 5, or if it's deeply nested inside a complex expression, you should probably use any() instead.

White Space

Consumer makes no assumptions about the nature of the text that you are parsing, so it does not have any built-in distinction between meaningful content and white space (spaces or linebreaks between tokens).

In practice, many programming languages and structured data files have a policy of ignoring (or mostly ignoring) white space between tokens, so what's the best way to do that?

First, define the grammar for your language, excluding any consideration of white space. For example, here is a simple consumer that matches a comma-delimited list of integers:

let integer: Consumer<MyLabel> = .flatten("0" | [.character(in: "1" ... "9"), .zeroOrMore(.character(in: "0" ... "9"))])
let list: Consumer<MyLabel> = .interleaved(integer, .discard(","))

Currently, this will match a number sequence like "12,0,5,78", but if we include spaces between the numbers it will fail. So next we need to define a consumer for matching white space:

let space: Consumer<MyLabel> = .discard(.zeroOrMore(.character(in: " \t\r\n")))

This consumer will match (and discard) any sequence of space, tab, carriage return or linefeed characters. Using the space rule, we could manually modify our list pattern to ignore spaces as follows:

let list: Consumer<MyLabel> = [space, .interleaved(integer, .discard([space, ",", space])), space]

This should work, but manually inserting spaces between every rule in the grammar like this is pretty tedious. It also makes the grammar harder to follow, and it's easy to miss a space accidentally.

To simplify dealing with white space, Consumer has a convenience constructor called ignore() that allows you to automatically ignore a given pattern when matching. We can use ignore() to combine our original list rule with the space rule as follows:

let list: Consumer<MyLabel> = .ignore(space, in: .interleaved(integer, .discard(",")))

This results in a consumer that is functionally equivalent to the manually spaced list that we created above, but with much less code.

The ignore() constructor is powerful, but because it is applied recursively to the entire consumer hierarchy, you need to be careful not to ignore white space in places that you don't want to allow it. For example, we wouldn't want to allow white space inside an individual token, such as the integer literal in our example.

Individual tokens in a grammar are typically returned as a single string value by using the flatten transform. The ignore() constructor won't modify consumers inside flatten, so the integer token from our example is actually not affected.

For more complex grammars, you may not be able to use ignore(), or may only be able to use it on certain sub-trees of the overall consumer, instead of the entire thing. For example, in the JSON example included with the Consumer library, string literals can contain escaped unicode character literals that must be transformed using the transform function. That means that JSON string literals can't be flattened, which also means that the JSON grammar can't use ignore() for handling white space, otherwise the grammar would ignore white space inside strings, which would mess up the parsing.

Note: ignore() can be used to ignore any kind of input, not just white space. Also, while the input is ignored from the point of view of the grammar, it doesn't have to be discarded in the output. If you were writing a code linter or formatter you might want to preserve the white space from the original source. To do that, you would remove the discard() clause from inside your white space rule:

let space: Consumer<MyLabel> = .zeroOrMore(.character(in: " \t\r\n"))

In some languages, such as Swift or JavaScript, spaces are mostly ignored but linebreaks have semantic significance. For such cases you might ignore only spaces but not linebreaks, or you might ignore both but only discard the spaces, so you can process the linebreaks manually in your transform function:

let space: Consumer<MyLabel> = .zeroOrMore(.discard(.character(in: " \t")) | .character(in: "\r\n"))

It's also common in programming languages to allow comments, which typically have no semantic meaning and can appear anywhere that white space is permitted. You can ignore comments in the same way you'd ignore spaces:

let space: Consumer<MyLabel> = .character(in: " \t\r\n")
let comment: Consumer<MyLabel> = ["/*", .zeroOrMore([.not("*/"), .anyCharacter()]), "*/"]
let spaceOrComment: Consumer<MyLabel> = .discard(.zeroOrMore(space | comment))

let program: Consumer<MyLabel> = .ignore(spaceOrComment, in: ...)

Error Handling

There are two types of error that can occur in Consumer: parsing errors and transform errors.

Parsing errors are generated automatically by the Consumer framework when it encounters input that doesn't match the specified grammar. When this happens, Consumer will generate a Consumer.Error value that contains the kind of error that occurred, and the location of the error in the original source input.

Source locations are specified as a Consumer.Location value, which contains the character range of the error, and can lazily compute the line number and column at which that range occurs.

Transform errors are generated after the initial parsing pass by throwing an error inside the Consumer.Transform function. Any error thrown will be wrapped in a Consumer.Error so that it can be annotated with the source location.

Consumer's errors conform to CustomStringConvertible, and can be directly displayed to the user (although the message is not localized), but how useful this message is depends partly on how you write your consumer implementation.

When Consumer encounters an unexpected token, the error message will include a description of what was actually expected. Built-in consumer types like string and charset are automatically assigned meaningful descriptions. Labelled consumers will be displayed using the Label description:

let integer: Consumer<String> = .label("integer", "0" | [
    .character(in: "1" ... "9"),
    .zeroOrMore(.character(in: "0" ... "9")),
])

_ = try integer.match("foo") // will throw 'Unexpected token 'foo' at 1:1 (expected integer)'

If you are using String as your Label type then the description will be the literal string value. If you are using an enum (as recommended) then by default the rawValue of the label enum will be displayed.

The naming of your enum cases may not be optimal for user display. To fix this, you can change the label string, as follows:

enum JSONLabel: String {
    case string = "a string"
    case array = "an array"
    case json = "a json value"
}

This will improve the error message, but it's not localizable and may not be desirable to tie JSONLabel values to user-readable strings in case we want to serialize them, or make breaking changes in future. A better option is to make your Label type conform to CustomStringConvertible, then implement a custom description:

enum JSONLabel: String, CustomStringConvertible {
    case string
    case array
    case json
    
    var description: String {
        switch self {
        case .string: return "a string"
        case .array: return "an array"
        case .json: return "a json value"
        }
    }
}

Now the user-friendly label descriptions are independent of the actual values. This approach also make localization easier, as you could use the rawValue to index a strings file instead of a hard-coded switch statement:

var description: String {
    return NSLocalizedString(self.rawValue, comment: "")
}

Similarly, when throwing custom errors during the transform phase, it's a good idea to implement CustomStringConvertible for your custom error type:

enum JSONError: Error, CustomStringConvertible {
    case invalidNumber(String)
    case invalidCodePoint(String)
    
    var description: String {
        switch self {
        case let .invalidNumber(string):
            return "invalid numeric literal '\(string)'"
        case let .invalidCodePoint(string):
            return "invalid unicode code point '\(string)'"
        }
    }
}

Performance

The performance of a Consumer parser can be greatly affected by the way that your rules are structured. This section includes some tips for getting the best possible parsing speed.

Note: As with any performance tuning, it's important that you measure the performance of your parser before and after making changes, otherwise you may waste time optimizing something that's already fast enough, or even inadvertently make it slower.

Backtracking

The best way to get good parsing performance from your Consumer grammar is to try to avoid backtracking.

Backtracking is when the parser has to throw away partially matched results and parse them again. It occurs when multiple consumers in a given any group begin with the same token or sequence of tokens.

For example, here is an example of an inefficient pattern:

let foobarOrFoobaz: Consumer<String> = .any([
    .sequence(["foo", "bar"]),
    .sequence(["foo", "baz"]),
])

When the parser encounters the input "foobaz", it will first match "foo", then try to match "bar". When that fails it will backtrack right back to the beginning and try the second sequence of "foo" followed by "baz". This will make parsing slower than it needs to be.

We could instead rewrite this as:

let foobarOrFoobaz: Consumer<String> = .sequence([
    "foo", .any(["bar", "baz"])
])

This consumer matches exactly the same input as the previous one, but after successfully matching "foo", if it fails to match "bar" it will try "baz" immediately, instead of going back and matching "foo" again. We have eliminated the backtracking.

Character Sequences

The following consumer example matches a quoted string literal containing escaped quotes. It matches a zero or more instances of either an escaped quote \" or any other character besides " or \.

let string: Consumer<String> = .flatten(.sequence([
    .discard("\""),
    .zeroOrMore(.any([
        .replace("\\\"", "\""), // Escaped "
        .anyCharacter(except: "\"", "\\"),
    ])),
    .discard("\""),
]))

The above implementation works as expected, but it is not as efficient as it could be. For each character encountered, it must first check for an escaped quote, and then check if it's any other character. That's quite an expensive check to perform, and it can't (currently) be optimized by the Consumer framework.

Consumer has optimized code paths for matching .zeroOrMore(.character(...)) or .oneOrMore(.character(...)) rules, and we can rewrite the string consumer to take advantage of this optimization as follows:

let string: Consumer<String> = .flatten(.sequence([
    .discard("\""),
    .zeroOrMore(.any([
        .replace("\\\"", "\""), // Escaped "
        .oneOrMore(.anyCharacter(except: "\"", "\\")),
    ])),
    .discard("\""),
]))

Since most characters in a typical string are not \ or ", this will run much faster because it can efficiently consume a long run of non-escape characters between each escape sequence.

Flatten and Discard

We mentioned the flatten and discard transforms in the Common Transforms section above, as a convenient way to omit redundant information from the parsing results prior to applying a custom transform.

But using "flatten" and "discard" can also improve performance, by simplifying the parsing process, and avoiding the need to gather and propagate unnecessary information like source offsets.

If you intend to eventually flatten a given node of your matched results, it's much better to do this within the consumer itself by using the flatten rule than by using Array.joined() in your transform function. The only time when you won't be able to do this is if some of the child consumers need custom transforms to be applied, because by flattening the node tree you remove the labels that are needed to reference the node in your transform.

Similarly, for unneeded match results (e.g. commas, brackets and other punctuation that isn't required after parsing) you should always use discard to remove the node or token from the match results before applying a transform.

Note: Transform rules are applied hierarchically, so if a parent consumer already has flatten applied, there is no further performance benefit to be gained from applying it individually to the children of that consumer.

Example Projects

Consumer includes a number of example projects to demonstrate the framework:

JSON

The JSON example project implements a JSON parser, along with a transform function to convert it into Swift data.

REPL

The REPL (Read Evaluate Print Loop) example is a Mac command-line tool for evaluating expressions. The REPL can handle numbers, booleans and string values, but currently only supports basic math operations.

Each line you type into the REPL is evaluated independently and the result is printed in the console. To share values between expressions, you can define variables using an identifier name followed by = and then an expression, e.g:

foo = (5 + 6) + 7

The named variable ("foo", in this case) is then available to use in subsequent expressions.

This example demonstrates a number of advanced techniques such as mutually recursive consumer rules, operator precedence, and negative lookahead using not()

More Repositories

1

iCarousel

A simple, highly customisable, data-driven 3D carousel for iOS and Mac OS
Objective-C
11,991
star
2

SwiftFormat

A command-line tool and Xcode Extension for formatting Swift code
Swift
7,417
star
3

FXBlurView

[DEPRECATED]
Objective-C
4,941
star
4

iRate

[DEPRECATED]
Objective-C
4,114
star
5

FXForms

[DEPRECATED]
Objective-C
2,929
star
6

SwipeView

SwipeView is a class designed to simplify the implementation of horizontal, paged scrolling views on iOS. It is based on a UIScrollView, but adds convenient functionality such as a UITableView-style dataSource/delegate interface for loading views dynamically, and efficient view loading, unloading and recycling.
Objective-C
2,648
star
7

layout

A declarative UI framework for iOS
Swift
2,222
star
8

iVersion

[DEPRECATED]
Objective-C
1,955
star
9

NullSafe

NullSafe is a simple category on NSNull that returns nil for unrecognised messages instead of throwing an exception
Objective-C
1,941
star
10

RetroRampage

Tutorial series demonstrating how to build a retro first-person shooter from scratch in Swift
Swift
1,454
star
11

XMLDictionary

[DEPRECATED]
Objective-C
1,139
star
12

AutoCoding

AutoCoding is a category on NSObject that provides automatic support for NSCoding and NSCopying to every object.
Objective-C
1,067
star
13

GZIP

A simple NSData category for gzipping/unzipping data in iOS and Mac OS
Objective-C
980
star
14

FastCoding

A faster and more flexible binary file format replacement for NSCoding, Property Lists and JSON
C
975
star
15

AsyncImageView

[DEPRECATED]
Objective-C
908
star
16

iConsole

[DEPRECATED]
Objective-C
860
star
17

FXLabel

[DEPRECATED]
Objective-C
817
star
18

Expression

A cross-platform Swift library for evaluating mathematical expressions at runtime
Swift
803
star
19

CountryPicker

CountryPicker is a custom UIPickerView subclass that provides an iOS control allowing a user to select a country from a list. It can optionally display a flag next to each country name, and the library includes a set of 249 high-quality, public domain flag images from FAMFAMFAM (http://www.famfamfam.com/lab/icons/flags/) that have been painstakingly re-named by country code to work with the library.
Objective-C
738
star
20

SoundManager

Simple sound and music player class for playing audio on Mac and iPhone
Objective-C
631
star
21

FXImageView

FXImageView is a class designed to simplify the application of common visual effects such as reflections and drop-shadows to images, and also to help the performance of image loading by handling it on a background thread.
Objective-C
629
star
22

Euclid

A Swift library for creating and manipulating 3D geometry
Swift
606
star
23

Base64

[DEPRECATED]
Objective-C
578
star
24

FXKeychain

[DEPRECATED]
Objective-C
556
star
25

MustOverride

Provides a macro that you can use to ensure that a method of an abstract base class *must* be overriden by its subclasses.
Objective-C
524
star
26

LayerSprites

LayerSprites is a library designed to simplify the use of sprite sheets (image maps containing multiple sub-images) in UIKit applications without using OpenGL or 3rd-party game libraries. Can load sprite sheets in the Coco2D format.
Objective-C
505
star
27

GLView

[DEPRECATED]
Objective-C
474
star
28

FXNotifications

An alternative API for NSNotificationCenter that doesn't suck
Objective-C
391
star
29

ShapeScript

The ShapeScript 3D modeling app for macOS and iOS
Swift
383
star
30

VectorMath

A Swift library for Mac and iOS that implements common 2D and 3D vector and matrix functions, useful for games or vector-based graphics
Swift
364
star
31

ReflectionView

[DEPRECATED]
Objective-C
360
star
32

Swiftenstein

Simple Wolfenstein 3D clone written in Swift
Swift
357
star
33

LRUCache

LRUCache is an open-source replacement for NSCache that behaves in a predictable, debuggable way
Swift
353
star
34

JPNG

JPNG is a bespoke image file format that combines the compression benefits of JPEG with the alpha channel support of a PNG file. The JPNG library provides an Objective-C implementation of this format along with transparent JPNG loading support for iOS and Mac OS.
Objective-C
338
star
35

StandardPaths

StandardPaths is a category on NSFileManager for simplifying access to standard application directories on iOS and Mac OS and abstracting the iCloud backup flags on iOS. It also provides support for working with device-specific file suffixes, such as the @2x suffix for Retina displays, or the -568h suffix for iPhone 5 and can optionally swizzle certain UIKit methods to support these suffixes more consistently.
Objective-C
337
star
36

ViewUtils

ViewUtils is a collection of category methods designed that extend UIView with all the handy little properties and functionality that you always wished were built-in to begin with.
Objective-C
325
star
37

FXPageControl

Simple, drop-in replacement for the iPhone UIPageControl that allows customisation of the dot colour, size and spacing.
Objective-C
298
star
38

BaseModel

BaseModel provides a base class for building model objects for your iOS or Mac OS projects. It saves you the hassle of writing boilerplate code, and encourages good practices by reducing the incentive to cut corners in your model implementation.
Objective-C
288
star
39

OrderedDictionary

This library provides OrderedDictionary and MutableOrderedDictionary subclasses.
Objective-C
277
star
40

ColorUtils

[DEPRECATED]
Objective-C
257
star
41

Tribute

A command-line tool for tracking Swift project licenses
Swift
246
star
42

OSNavigationController

[DEPRECATED]
Objective-C
234
star
43

iNotify

[DEPRECATED]
Objective-C
226
star
44

FPSControls

An experimental implementation of touch-friendly first-person shooter controls using SceneKit and Swift
Swift
216
star
45

OSCache

OSCache is an open-source re-implementation of NSCache that behaves in a predictable, debuggable way.
Objective-C
200
star
46

RequestQueue

[DEPRECATED]
Objective-C
175
star
47

FXReachability

Lightweight reachability class for Mac and iOS
Objective-C
173
star
48

Chess

A simple Chess game for iOS, written in Swift
Swift
171
star
49

Sprinter

A library for formatting strings on iOS and macOS
Swift
166
star
50

CryptoCoding

CryptoCoding is a superset of the NSCoding protocol that allows for simple, seamless AES encryption of any NSCoding-compatible object.
Objective-C
148
star
51

RequestUtils

A collection of category methods designed to simplify the process of HTTP request construction and manipulation in Cocoa.
Objective-C
142
star
52

CubeController

CubeController is a UIViewController subclass that can be used to create a rotating 3D cube navigation.
Objective-C
142
star
53

HTMLLabel

[DEPRECATED]
Objective-C
139
star
54

NSOperationStack

[DEPRECATED]
Objective-C
117
star
55

SVGPath

Cross-platform Swift library for parsing SVGPath strings
Swift
105
star
56

HRCoder

HRCoder is a replacement for the NSKeyedArchiver and NSKeyedUnarchiver classes that uses a human-readable/editable format that can easily be stored in a regular Plist or JSON file.
Objective-C
104
star
57

iPrompt

[DEPRECATED]
Objective-C
99
star
58

Presentations

Code samples and projects for presentations that I have given
Objective-C
99
star
59

FXPhotoEditView

[DEPRECATED]
Objective-C
92
star
60

StackView

StackView is a class designed to simplify the implementation of vertical stacks of views on iOS. You can think of it as a bit like a simplified version of UITableView.
Objective-C
73
star
61

WebContentView

[DEPRECATED]
Objective-C
69
star
62

StringCoding

StringCoding is a simple Mac/iOS library for setting object properties of any type using string values. It can automatically detect the property type and attempt to interpret the string as the right kind of value. It's particularly oriented towards iOS app theming (see README for details).
Objective-C
57
star
63

ArrayUtils

[DEPRECATED]
Objective-C
50
star
64

Swune

Swift/UIKit reimplementation of the Dune II RTS game
Swift
46
star
65

Parsing

Supporting code for my talk entitled "Parsing Formal Languages with Swift"
Swift
42
star
66

MACAddress

[DEPRECATED]
Objective-C
39
star
67

RotateView

Objective-C
35
star
68

FXParser

[DEPRECATED]
Objective-C
34
star
69

RandomSequence

A class for creating independent, repeatable pseudorandom number sequences on Mac and iOS
Objective-C
28
star
70

FloatyBalloon

This is the source code for a simple game called Floaty Balloon, based on the gameplay of Flappy Bird. It was created as a tutorial for http://iosdevelopertips.com
Objective-C
25
star
71

Concurrency

Full source code for a simple currency calculator app
Objective-C
15
star
72

FXJSON

[DEPRECATED]
Objective-C
15
star
73

PNGvsJPEG

This is a simple benchmark app to compare JPEG vs PNG loading performance on iOS. Spoiler: JPEG wins.
Objective-C
6
star