Kaleidoscope
Statically-checked inline matching on regular expressions
Kaleidoscope is a small library which provides pattern matching using regular
expressions, and extraction of capturing groups into values, which are typed
according to the repetition of the group. Patterns can be written inline,
directly in a case
pattern, and do not need to be predefined.
Features
- pattern match strings against regular expressions
- regular expressions can be written inline in patterns
- extraction of capturing groups in patterns
- typed extraction (into
List
s orOption
s) of variable-length capturing groups - static verification of regular expression syntax
- simpler "glob" syntax is also provided
Availability
Kaleidoscope has not yet been published as a binary.
Getting Started
To use Kaleidoscope, first import its package,
import kaleidoscope.*
and you can then use a Kaleidoscope regular expression—a string prefixed with
the letter r
—anywhere you can use a pattern in Scala. For example,
path match
case r"/images/.*" => println("image")
case r"/styles/.*" => println("stylesheet")
case _ => println("something else")
or,
email match
case r"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,6}$$" => Some(email)
case _ => None
Such patterns will either match or not, however should they match, it is
possible to extract parts of the matched string using capturing groups. The
pattern syntax is exactly as described in the Java Standard
Library,
with the exception that a capturing group (enclosed within (
and )
) may be
bound to an identifier by placing it, like an interpolated string substitution,
immediately prior to the capturing group, as $identifier
or ${identifier}
.
Here is an example:
path match
case r"/images/${img}(.*)" => Image(img)
case r"/styles/$styles(.*)" => Stylesheet(styles)
Alternatively, this can be extracted directly in a val
definition, like so:
val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6})$$" = "[email protected]"
> domain: String = "example.com"
> tld: String = "com"
In addition, the syntax of the regular expressionwill be checked at compile-time, and any issues will be reported then.
Repeated and optional capture groups
A normal, unitary capturing group will extract into a Text
value. But if a capturing group has
a repetition suffix, such as *
or +
, then the extracted type will be a List[Text]
. This also
applies to repetition ranges, such as {3}
, {2,}
or {1,9}
. Note that {1}
will still extract
a Text
value.
A capture group may be marked as optional, meaning it can appear either zero or one times. This
will extract a value with the type Option[Text]
.
For example, see how init
is extracted as a List[Text]
, below:
"parsley, sage, rosemary, and thyme" match
case r"$only([a-z]+)" => List(only)
case r"$first([a-z]+) and $second([a-z]+)" => List(first, second)
case r"$init([a-z]+, )*and $last([a-z]+)" => init.map(_.drop(2, Rtl)) :+ last
Escaping
Note that inside an extractor pattern string, whether it is single- (r"..."
)
or triple-quoted (r"""..."""
), special characters, notably \
, do not need
to be escaped, with the exception of $
which should be written as $$
. It is
still necessary, however, to follow the regular expression escaping rules, for
example, an extractor matching a single opening parenthesis would be written as
r"\("
or r"""\("""
.
Status
Kaleidoscope is classified as maturescent. For reference, Scala One projects are categorized into one of the following five stability levels:
- embryonic: for experimental or demonstrative purposes only, without any guarantees of longevity
- fledgling: of proven utility, seeking contributions, but liable to significant redesigns
- maturescent: major design decisions broady settled, seeking probatory adoption and refinement
- dependable: production-ready, subject to controlled ongoing maintenance and enhancement; tagged as version
1.0.0
or later - adamantine: proven, reliable and production-ready, with no further breaking changes ever anticipated
Projects at any stability level, even embryonic projects, are still ready to be used, but caution should be taken if there is a mismatch between the project's stability level and the importance of your own project.
Kaleidoscope is designed to be small. Its entire source code currently consists of 509 lines of code.
Building
Kaleidoscope can be built on Linux or Mac OS with Fury, however the approach to building is currently in a state of flux, and is likely to change.
Contributing
Contributors to Kaleidoscope are welcome and encouraged. New contributors may like to look for issues marked beginner.
We suggest that all contributors read the Contributing Guide to make the process of contributing to Kaleidoscope easier.
Please do not contact project maintainers privately with questions unless there is a good reason to keep them private. While it can be tempting to repsond to such questions, private answers cannot be shared with a wider audience, and it can result in duplication of effort.
Author
Kaleidoscope was designed and developed by Jon Pretty, and commercial support and training is available from Propensive OÜ.
Name
Kaleidoscope is named after the optical instrument which shows pretty patterns to its user, while the library also works closely with patterns.
In general, Scala One project names are always chosen with some rationale, however it is usually frivolous. Each name is chosen for more for its uniqueness and intrigue than its concision or catchiness, and there is no bias towards names with positive or "nice" meanings—since many of the libraries perform some quite unpleasant tasks.
Names should be English words, though many are obscure or archaic, and it should be noted how willingly English adopts foreign words. Names are generally of Greek or Latin origin, and have often arrived in English via a romance language.
License
Kaleidoscope is copyright © 2023 Jon Pretty & Propensive OÜ, and is made available under the Apache 2.0 License.