• This repository has been archived on 06/Feb/2023
  • Stars
    star
    211
  • Rank 186,828 (Top 4 %)
  • Language
    Swift
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open source regex engine

Regex

Open source regex engine.

Warning. Not meant to be used in production, created for learning purposes!
See Let's Build a Regex Engine series to learn how this project came to be.

Usage

Create a Regex object by providing a pattern and an optional set of options (Regex.Options):

let regex = try Regex(#"<\/?[\w\s]*>|<.+[\W]>"#)

The pattern is parsed and compiled to the special internal representation. If there is an error in the pattern, the initializer will throw a detailed error with an index of the failing token and an error message.

Use isMatch(_:) method to check if the regular expression patterns occurs in the input text:

regex.isMatch("<h1>Title</h1>")

Retrieve one or all occurrences text that matches the regular expression by calling matches(in:) method. Each match contains a range in the input string.

for match in regex.matches(in: "<h1>Title</h1>\n<p>Text</p>") {
    print(match.value)
    // Prints ["<h1>", "</h1>", "<p>", "</p>"]
}

If you just want a single match, use regex.firstMatch(in:).

Regex is fully thead safe.

Features

Character Classes

A character class matches any one of a set of characters.

  • [character_group] โ€“ matches any single character in character_group, e.g. [ae]
  • [^</b><i>character_group</i><b>] โ€“ negation, matches any single character that is not in character_group, e.g. [^ae]
  • [first-last] โ€“ character range, matches any single character in the given range from first to last, e.g. [a-z]
  • . โ€“ wildcard, matches any single character except \n
  • \w - matches any word character (negation: \W)
  • \s - matches any whitespace character (negation: \S)
  • \d - matches any decimal digit (negation: \D)
  • \z - matches end of string (negation: \Z)
  • \p{name} - matches characters from the given unicode category, e.g. \p{P} for punctuation characters (supported categories: P, Lt, Ll, N, S) (negation: \P)

Characters consisting of multiple unicode scalars (extended grapheme clusters) are interpreted as single characters, e.g. pattern "๐Ÿ‡บ๐Ÿ‡ธ+" matches "๐Ÿ‡บ๐Ÿ‡ธ" and "๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡บ๐Ÿ‡ธ" but not "๐Ÿ‡ธ๐Ÿ‡ธ". But when used inside character group, each unicode scalar is interpreted separately, e.g. pattern "[๐Ÿ‡บ๐Ÿ‡ธ]" matches "๐Ÿ‡บ๐Ÿ‡ธ" and "๐Ÿ‡ธ๐Ÿ‡ธ" which consist of the same scalars.

Character Escapes

The backslash (\) either indicates that the character that follows is a special character or that the keyword should be interpreted literally.

  • \keyword โ€“ interprets the keyword literally, e.g. \{ matches the opening bracket
  • \special_character โ€“ interprets the special character, e.g. \b matches word boundary (more info in "Anchors")
  • \u{nnnn} โ€“ matches a UTF-16 code unit, e.g. \u0020 matches escape (Swift-specific feature)

Anchors

Anchors specify a position in the string where a match must occur.

  • ^ โ€“ matches the beginning of the string (or beginning of the line when .multiline option is enabled)
  • $ โ€“ matches the end of the string or \n at the end of the string (end of the line in .multiline mode)
  • \A โ€“ matches the beginning of the string (ignores .multiline option)
  • \Z โ€“ matches the end of the string or \n at the end of the string (ignores .multiline option)
  • \z โ€“ matches the end of the string (ignores .multiline option)
  • \G โ€“ match must occur at the point where the previous match ended
  • \b โ€“ match must occur on a boundary between a word character and a non-word character (negation: \B)

Grouping Constructs

Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string.

  • (subexpression) โ€“ captures a subexpression in a group
  • (?:subexpression) โ€“ non-capturing group

Backreferences

Backreferences provide a convenient way to identify a repeated character or substring within a string.

  • \number โ€“ matches the capture group at the given ordinal position e.g. \4 matches the content of the fourth group

If the referenced group can't be found in the pattern, the error will be thrown.

Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

  • * โ€“ match zero or more times
  • + โ€“ match one or more times
  • ? โ€“ match zero or one time
  • {n} โ€“ match exactly n times
  • {n,} โ€“ match at least n times
  • {n,m} โ€“ match from n to m times, closed range, e.g. a{3,4}

All quantifiers are greedy by default, they try to match as many occurrences of the pattern as possible. Append the ? character to a quantifier to make it lazy and match as few occurrences as possible, e.g. a+?.

Warning: lazy quantifiers might be used to control which groups and matches are captured, but they shouldn't be used to optimize matcher performance which already uses an algorithm which can handle even nested greedy quantifiers.

Alternation

  • | โ€“ match either left side or right side

Options

Regex can be initialized with a set of options (Regex.Options).

  • .caseInsensitive โ€“ match letters in the pattern independent of case.
  • .multiline โ€“ control the behavior of ^ and $ anchors. By default, these match at the start and end of the input text. If this flag is set, will match at the start and end of each line within the input text.
  • .dotMatchesLineSeparators โ€“ allow . to match any character, including line separators.

Not supported Features

  • Most unicode categories are not support, e.g.\p{Sc} (currency symbols) is not supported
  • Character class subtraction, e.g. [a-z-[b-f]]
  • Named blocks, e.g. \p{IsGreek}

Grammar

See Grammar.ebnf for a formal description of the language using EBNF notation. See Grammar.xhtml for a visualization (railroad diagram) of the grammar generated thanks to https://www.bottlecaps.de/rr/ui.

References

License

Regex is available under the MIT license. See the LICENSE file for more info.

More Repositories

1

Nuke

Image loading system
Swift
8,112
star
2

Pulse

Network logger for Apple platforms
Swift
6,307
star
3

DFImageManager

Image loading, processing, caching and preheating
Objective-C
1,180
star
4

Get

Web API client built using async/await
Swift
941
star
5

Preheat

Automates prefetching of content in UITableView and UICollectionView
Swift
629
star
6

PulsePro

A macOS app for viewing logs from Pulse
Swift
482
star
7

Align

Intuitive and powerful Auto Layout library
Swift
350
star
8

Future

Streamlined Future<Value, Error> implementation
Swift
317
star
9

FetchImage

Makes it easy to download images using Nuke and display them in SwiftUI apps
Swift
212
star
10

Arranged

Open source replacement of UIStackView for iOS 8 (100% layouts supported)
Swift
208
star
11

VPN

Sample custom VPN client/server in Swift
Swift
182
star
12

Formatting

Swift
179
star
13

DFCache

Composite LRU cache with fast metadata using UNIX extended file attributes
Objective-C
162
star
14

RxNuke

RxSwift extensions for Nuke
Swift
148
star
15

CreateAPI

Delightful code generator for OpenAPI specs
Swift
142
star
16

SwiftSQL

Swift API for SQLite
Swift
131
star
17

ThreeColumnNavigation

A minimal example of three-column navigation for iPad and macOS using SwiftUI
Swift
127
star
18

Stacks

A micro UIStackView convenience API inspired by SwiftUI
Swift
73
star
19

Nuke-FLAnimatedImage-Plugin

FLAnimatedImage plugin for Nuke
Swift
53
star
20

RxUI

Auto-binding for RxSwift inspired by SwiftUI
Swift
45
star
21

Nuke-Alamofire-Plugin

Alamofire plugin for Nuke
Swift
40
star
22

NukeDemo

Nuke Demo
Swift
34
star
23

DFJPEGTurbo

Objective-C libjpeg-turbo wrapper
C
33
star
24

ImagePublisher

Combine publishers for Nuke
Swift
25
star
25

NukeUI

Lazy image loading for Apple platforms: SwiftUI, UIKit, AppKit
Swift
19
star
26

articles

Articles for kean.github.io
19
star
27

URLQueryEncoder

URL query encoder with support for all OpenAPI serialization options
Swift
17
star
28

NukeBuilder

A fun and convenient way to use Nuke
Swift
14
star
29

ScrollViewPrefetcher

Prefetching for SwiftUI
Swift
14
star
30

PulseLogHandler

SwiftLog Extension for Pulse
Swift
12
star
31

HTTPHeaders

Parsing Simple HTTP Headers
Swift
11
star
32

OctoKit

GitHub API client built with Fuse
Swift
8
star
33

PulseApps

Base Pulse macOS and iOS apps and a few demo projects
Swift
7
star
34

kean

1
star