• Stars
    star
    326
  • Rank 129,027 (Top 3 %)
  • Language
    Clojure
  • License
    Mozilla Public Li...
  • Created over 4 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Royally reified regular expressions

Regal

Royally reified regular expressions

CircleCI cljdoc badge Clojars Project bb compatible

tl;dr

Regal lets you manipulate regular expressions as data, by providing a Hiccup-like regex syntax, and ways to convert between this Hiccup syntax (Regal syntax), compiled regex patterns, and test.check generators. It also helps with writing cross-platform code by providing consistent semantics across JS/Java runtimes, and it allows converting JavaScript regex to Java regex semantically (useful for e.g. dealing with JSON Schema in Clojure)

The slightly longer version

Regal provides a syntax for writing regular expressions using plain Clojure data: vectors, keywords, strings. This is known as Regal notation.

Once you have a Regal form you can either compile it to a regex object (java.util.regex.Pattern or JavaScript RegExp), or you can use it to create a Generator (see test.check) for generating values that conform to the given pattern.

It is also possible to parse regular expression patterns back to Regal forms.

Regal is Clojure and ClojureScript compatible, and has fixed semantics across platforms. Write your forms once and run them anywhere! It also allows manipulating multiple regex flavors regardless of the current platform, so you can do things like converting a JavaScript regex pattern to one that is suitable for Java's regex engine.

Support Lambda Island Open Source

Regal is part of a growing collection of quality Clojure libraries and tools released on the Lambda Island label. If you find value in our work please consider becoming a backer on Open Collective

Project status

Regal is alpha level software, this does not mean it is of low quality or not fit for use, it does mean that future breakage of the API is still possible.

The following aspects of the library are generally well tested and developed, and we intend to retain compatibility as much as practically possible.

  • Regal syntax as described in this README
  • Generating regex patterns from regal forms
  • Parsing regex patterns to regal forms

The following aspects have known issues or are otherwise untested or incomplete, and you can expect them to change significantly as we further develop them:

  • Creating test.check generators from regal forms
  • clojure.spec-alpha integration
  • Malli integration

Installation

deps.edn

lambdaisland/regal {:mvn/version "0.0.143"}

project.clj

[lambdaisland/regal "0.0.143"]

An example

(require '[lambdaisland.regal :as regal]
         '[lambdaisland.regal.generator :as regal-gen])

;; Regal expression, like Hiccup but for Regex
(def r [:cat
        [:+ [:class [\a \z]]]
        "="
        [:+ [:not \=]]])

;; Convert to host-specific regex
(regal/regex r)
;;=> #"[a-z]+\Q=\E[^=]+"

;; Match strings
(re-matches (regal/regex r) "foo=bar")
;;=> "foo=bar"

;; ... And generate them
(regal-gen/gen r)
;;=> #clojure.test.check.generators.Generator{...}

(regal-gen/sample r)
;;=> ("t=�" "d=5Ë" "zja=·" "uatt=ß¾" "lqyk=É" "xkj=q\f��" "gxupw=æ" "pkadbgmc=¯²" "f=Ã�J" "d=ç")

A swiss army knife

Regal can convert between three different represenations for regular expressions, Regal forms, patterns(i.e. strings), and regex objects. Here is an overview of how to get from one to the other.

↓From / To→ Form Pattern Regex
Form identity lambdaisland.regal/pattern lambdaisland.regal/regex
Pattern lambdaisland.regal.parse/parse-pattern identity lambdaisland.regal/compile
Regex lambdaisland.regal.parse/parse lambdaisland.regal/regex-pattern identity

Regal forms

Forms consist of vectors, keywords, strings, character literals, and in some cases integers. For example:

[:cat [:alt [:char 11] [:char 13]] \J [:rep "hello" 2 3]]

Forms have platform-independent semantics. The same regal form will match the same strings both in Clojure and ClojureScript, even though Java and JavaScript (and even different versions of Java or JavaScript) have different regex "flavors". In other words, we generate the regex that is right for the target platform.

;; Clojure
(regal/regex :vertical-whitespace) ;;=> #"\v"

;; ClojureScript
(regal/regex :vertical-whitespace) ;;=> #"[\n\x0B\f\r\x85\u2028\u2029]"

Regal currently knows about three "flavors"

  • :java8 Java 1.8 (earlier versions are not supported)
  • :java9 Java 9 or later
  • :ecma ECMAScript (JavaScript)

By default it takes the flavor that is best suited for the platform, but you can override that with lambdaisland.regal/with-flavor

(regal/with-flavor :ecma
  (regal/pattern ...))

Note that using regal/regex with a flavor that does not correspond with the flavor of the platform may yield unexpected results, when dealing with "foreign" regex flavors always stick to string representations (i.e. patterns).

Pattern

The second regex representation regal knows about is the pattern, i.e. the regex pattern in string form.

(regal/regex-pattern #"\u000B\v") ;; => "\\u000B\\v"

Depending on the situation there are several reasons why you might want to use this pattern representation over the compiled regex object.

  • simple strings, so easy to (de-)serialize
  • value semantics (can be compared)
  • allow manipulating regex pattern of regex flavors other than the one supported by the current runtime

Note that in Clojure the syntax available in regex patterns differs from the syntax available in strings, in particluar when it comes to notations starting with a backslash. e.g. #"\xFF" is a valid regex, while "\xFF" is not a valid string. We encode regex patterns in strings, which practically speaking means that backslashes are escaped (doubled).

(regal/regex-pattern #"\xFF") ;;=> "\\xFF"
(regal/compile "\\xFF")       ;;=> #"\xFF"

Regex

To use the regex engine provided by the runtime (e.g. through re-find or re-seq) you need a platform-specific regex object. This is what lambdaisland.regal/regex gives you.

Grammar

  • Strings and characters match literally. They are escaped, so . matches a period, not any character, ^ matches a caret, etc.
  • A few keywords have special meaning.
    • :any : match any character, like .. Does not match newlines.
    • :start match the start of the input
    • :end : match the end of the input
    • :digit : match any digit (0-9)
    • :non-digit : match non-digits (not 0-9)
    • :word : match word characters (A-Za-z0-9_)
    • :non-word : match non-word characters (not A-Za-z0-9_)
    • :newline : Match \n
    • :return : Match \r
    • :tab : Match \t
    • :form-feed : Match \f
    • :line-break : Match \n, \r, \r\n, or other unicode newline characters
    • :alert : match \a (U+0007)
    • :escape : match \e (U+001B)
    • :whitespace : match any whitespace character. Uses \s on JavaScript, and a character range of whitespace characters on Java with equivalent semantics as JavaScript \s, since \s in Java only matches ASCII whitespace.
    • :non-whitespace : match non-whitespace
    • :vertical-whitespace : match vertical whitespace, including newlines and vertical tabs #"\n\x0B\f\r\x85\u2028\u2029"
    • :vertical-tab : match a vertical tab \v (U+000B)
    • :null : match a NULL byte/char
  • All other forms are vectors, with the first element being a keyword
    • [:cat forms...] : concatenation, match the given Regal expressions in order
    • [:alt forms...] : alternatives, match one of the given options, like (foo|bar|baz)
    • [:* form] : match the given form zero or more times
    • [:+ form] : match the given form one or more times
    • [:? form] : match the given form zero or one time
    • [:*? form] : lazily match the given form zero or more times
    • [:+? form] : lazily match the given form one or more times
    • [:?? form] : lazily match the given form zero or one time
    • [:class entries...] : match any of the given characters or ranges, with ranges given as two element vectors. E.g. [:class [\a \z] [\A \Z] "_" "-"] is equivalent to [a-zA-Z_-]
    • [:not entries...] : like :class, but negates the result, equivalent to [^...]
    • [:repeat form num] : repeat a form fixed number of times, like {5}
    • [:repeat form min max] : repeat a form a number of times, like {2,5}
    • [:lazy-repeat form num] : lazily repeat a form fixed number of times, like {5}?
    • [:lazy-repeat form min max] : lazily repeat a form a number of times, like {2,5}?
    • [:capture forms...] : capturing group with implicit concatenation of the given forms
    • [:char number] : a single character, denoted by its unicode codepoint
    • [:ctrl char] : a control character, e.g. [:ctrl \A] => ^A => #"\cA"
    • [:lookahead ...] : match if followed by pattern, without consuming input
    • [:negative-lookahead ...] : match if not followed by pattern
    • [:lookbehind ...] : match if preceded by pattern
    • [:negative-lookbehind ...] : match if not preceded by pattern
    • [:atomic ...] : match without backtracking (atomic group)
  • A clojure.spec.alpha definition of the grammar can be made available as :lambdaisland.regal/form by explicitly requiring lambdaisland.regal.spec-alpha

You can add your own extensions (custom tokens) by providing a :registry option mapping namespaced keywords to Regal expressions.

Use with spec.alpha

(require '[lambdaisland.regal.spec-alpha :as regal-spec]
         '[clojure.spec.alpha :as s]
         '[clojure.spec.gen.alpha :as gen])

(s/def ::x-then-y (regal-spec/spec [:cat [:+ "x"] "-" [:+ "y"]]))

(s/def ::xy-with-stars (regal-spec/spec [:cat "*" ::x-then-y "*"]))

(s/valid? ::xy-with-stars "*xxx-yy*")
;; => true

(gen/sample (s/gen ::xy-with-stars))
;; => ("*x-y*"
;;     "*xx-y*"
;;     "*x-y*"
;;     "*xxxx-y*"
;;     "*xxx-yyyy*"
;;     "*xxxx-yyy*"
;;     "*xxxxxxx-yyyyy*"
;;     "*xx-yyy*"
;;     "*xxxxx-y*"
;;     "*xxx-yyyy*")

Use with Malli

The lambdaisland.regal.malli namespace is no longer compatible with the latest Malli, so we don't offer a custom Regal Malli schema, but you can use Malli's regex schema instead (:re), passing it the results from Regal.

(require '[malli.core :as m]
         '[malli.error :as me]
         '[malli.generator :as mg]
         '[lambdaisland.regal :as regal]
         '[lambdaisland.regal.generator :as regal-gen])

(def form [:+ "y"])

(def schema [:re (regal/regex form)])

(m/form schema)
;; => [:re #"y+"]

(m/type schema)
;; => :re

(m/validate schema "yyy")
;; => true

(me/humanize (m/explain schema "xxx"))
;; => ["should match regex"]

(me/humanize (m/explain schema "xxx") {:errors {:re {:error/message {:en "Pattern does not match"}}}})
;; => ["Pattern does not match"]

(mg/sample [:re {:gen/gen (regal-gen/gen form)} (regal/regex form)])
;; => ("y" "y" "y" "y" "yy" "yy" "yyyyy" "yyyyy" "yyyyy" "yyyy")

BYO test.check / spec-alpha

Regal does not declare any dependencies. This lets people who only care about using Regal Expressions to replace normal regexes to require lambdaisland.regal without imposing extra dependencies upon them.

If you want to use lambdaisland.regal.generator you will require org.clojure/test.check. For lambdisland.regal.spec-alpha you will additionally need org.clojure/spec-alpha.

Contributing

Everyone has a right to submit patches to this projects, and thus become a contributor.

Contributors MUST

  • adhere to the LambdaIsland Clojure Style Guide
  • write patches that solve a problem. Start by stating the problem, then supply a minimal solution. *
  • agree to license their contributions as MPLv2.
  • not break the contract with downstream consumers. **
  • not break the tests.

Contributors SHOULD

  • update the CHANGELOG and README.
  • add tests for new functionality.

If you submit a pull request that adheres to these rules, then it will almost certainly be merged immediately. However some things may require more consideration. If you add new dependencies, or significantly increase the API surface, then we need to decide if these changes are in line with the project's goals. In this case you can start by writing a pitch, and collecting feedback on it.

* This goes for features too, a feature needs to solve a problem. State the problem it solves, then supply a minimal solution.

** As long as this project has not seen a public release (i.e. is not on Clojars) we may still consider making breaking changes, if there is consensus that the changes are justified.

Prior Art

License

Copyright © 2020 Arne Brasseur

Licensed under the term of the Mozilla Public License 2.0, see LICENSE.

More Repositories

1

kaocha

Full featured next gen Clojure test runner
Clojure
792
star
2

deep-diff2

Deep diff Clojure data structures and pretty print the result
Clojure
295
star
3

uri

A pure Clojure/ClojureScript URI library
Clojure
243
star
4

trikl

Terminal UI library for Clojure
Clojure
145
star
5

witchcraft

Clojure API for manipulating Minecraft, based on Bukkit
Clojure
135
star
6

fetch

ClojureScript wrapper for the JavaScript fetch API
Clojure
122
star
7

glogi

A ClojureScript logging library based on goog.log
Clojure
119
star
8

ornament

Clojure Styled Components
Clojure
118
star
9

uniontypes

Union Types (ADTs, sum types) built on clojure.spec
Clojure
115
star
10

launchpad

Clojure/nREPL launcher
Clojure
87
star
11

classpath

Classpath/classloader/deps.edn related utilities
Clojure
84
star
12

corgi

Emacs Lisp
75
star
13

metabase-datomic

Datomic driver for Metabase
Clojure
65
star
14

chui

Clojure
62
star
15

npmdemo

Demo of using Node+Express with ClojureScript
Clojure
60
star
16

funnel

Transit-over-WebSocket Message Relay
Clojure
58
star
17

deja-fu

ClojureScript local time/date library with a delightful API
Clojure
48
star
18

facai

Factories for fun and profit. 恭喜發財!
Clojure
45
star
19

open-source

A collection of Clojure/ClojureScript tools and libraries
Clojure
43
star
20

witchcraft-workshop

materials and code for the ClojureD 2022 workshop on Minecraft+Clojure
Clojure
40
star
21

kaocha-cljs

ClojureScript support for Kaocha
Clojure
40
star
22

cljbox2d

Clojure
40
star
23

thirdpartyjs

Demonstration of how to use third party JS in ClojureScript
Clojure
38
star
24

kaocha-cucumber

Cucumber support for Kaocha
Clojure
37
star
25

dom-types

Implement ClojureScript print handlers, as well Datify/Navigable for various built-in browser types.
Clojure
36
star
26

kaocha-cloverage

Code coverage analysis for Kaocha
Clojure
32
star
27

ansi

Parse ANSI color escape sequences to Hiccup syntax
Clojure
31
star
28

embedkit

Metabase as a Dashboard Engine
Clojure
30
star
29

plenish

Clojure
30
star
30

pennon

A feature flag library for Clojure
Clojure
30
star
31

hiccup

Enlive-backed Hiccup implementation (clj-only)
Clojure
28
star
32

edn-lines

Library for dealing with newline separated EDN files
Shell
27
star
33

kaocha-cljs2

Run ClojureScript tests from Kaocha (major rewrite)
Clojure
26
star
34

witchcraft-plugin

Add Clojure support (and an nREPL) to any Bukkit-based Minecraft server
Clojure
23
star
35

cli

Opinionated command line argument handling, with excellent support for subcommands
Clojure
22
star
36

garden-watcher

A component that watches-and-recompiles your Garden stylesheets.
Clojure
22
star
37

reitit-jaatya

Freeze your reitit routes and create a static site out of it
Clojure
21
star
38

nrepl-proxy

Proxy for debugging nREPL interactions
Clojure
18
star
39

data-printers

Quickly define print handlers for tagged literals across print/pprint implementations.
Clojure
18
star
40

lambdaisland-guides

In depth guides into Clojure and ClojureScript by Lambda Island
TeX
17
star
41

specmonstah-malli

Clojure
17
star
42

faker

Port of the Ruby Faker gem
Clojure
15
star
43

puck

ClojureScript wrapper around Pixi.js, plus other game dev utils
Clojure
15
star
44

kaocha-junit-xml

JUnit XML output for Kaocha
Clojure
11
star
45

aoc_2020

Advent of Code 2020
Clojure
11
star
46

harvest

Flexible factory library, successor to Facai
Clojure
11
star
47

gaiwan_co

Website for Gaiwan GmbH
Clojure
8
star
48

nrepl

Main namespace for starting an nREPL server with `clj`
Clojure
8
star
49

zipper-viz

Visualize Clojure zippers using Graphviz
Clojure
8
star
50

exoscale

Clojure/Babashka wrapper for the Exoscale HTTP API
Clojure
7
star
51

webstuff

The web as it was meant to be
Clojure
7
star
52

birch

A ClojureScript/Lumo version of the Unix "tree" command
Clojure
7
star
53

funnel-client

Websocket client for Funnel + examples
Clojure
7
star
54

corgi-packages

Emacs Packages developed as part of Corgi
Emacs Lisp
7
star
55

kanban

Episode 9. Reagent
Clojure
6
star
56

react-calculator

A calculator built with ClojureScript and React
JavaScript
6
star
57

souk

Clojure
6
star
58

logback-clojure-filter

Logback appender filter that takes a Clojure expression
Clojure
6
star
59

breakout

The retro game "Breakout". re-frame/Reagent/React/SVG.
Clojure
5
star
60

activities

Clojure
5
star
61

booklog

Keep track of the books you read (Auth with Buddy)
Clojure
5
star
62

li40-ultimate

Code from episode 40: The Ultimate Dev Setup
Shell
5
star
63

l33t

Demo ClojureScript+Node.js app
JavaScript
5
star
64

ep47-interceptors

Accompanying code to Lambda Island episode 47. Interceptors.
Clojure
5
star
65

daedalus

"Path finding and Delaunay triangulation in 2D, cljs wrapper for hxdaedalus-js"
Clojure
5
star
66

component_example

Example code for the Lambda Island episodes about Component
Clojure
4
star
67

ep43-data-science-kixi-stats

Clojure
4
star
68

lambwiki

A small wiki app to demonstrate Luminus
Clojure
4
star
69

new-project

Template for new projects
Emacs Lisp
3
star
70

kaocha-boot

Kaocha support for Boot
Clojure
3
star
71

datomic-quick-start

Datomic Quickstart sample code
Clojure
3
star
72

redolist

TodoMVC in re-frame
Clojure
3
star
73

rolodex-gui

Reagent app for testing the Rolodex API
Clojure
2
star
74

ep33testcljs

Testing ClojureScript with multiple backends
Clojure
2
star
75

elpa

Lambda Island Emacs Lisp Package Archive
Emacs Lisp
2
star
76

datalog-benchmarks

Clojure
2
star
77

kaocha-doctest

Doctest test type for Kaocha
Clojure
1
star
78

morf

Clojure
1
star
79

land-of-regal

Playground for Regal
Clojure
1
star
80

compobook

An example Compojure app
Clojure
1
star
81

ep41-react-components-reagent

Demo code from Episode 41, using React Components from Reagent
Clojure
1
star
82

repl-tools

Clojure
1
star
83

li45_polymorphism

Code for Lambda Island episode 45 and 46 about Polymorphism
Clojure
1
star
84

dotenv

Clojure
1
star
85

kaocha-midje

Midje integration for Kaocha
Clojure
1
star
86

rolodex

Clojure
1
star
87

laoban

Clojure
1
star
88

cookie-cutter

Auto-generate Clojure test namespaces in bulk.
Clojure
1
star
89

shellutils

Globbing and other shell/file utils
Clojure
1
star
90

kaocha-cljs2-demo

Example setups for kaocha-cljs2. WIP
Clojure
1
star
91

kaocha-demo

Clojure
1
star
92

kaocha-nauseam

Example project with a large (artificial) test suite
Clojure
1
star
93

li39-integrant

Accompanying code for Lambda Island episode 38 about Integrant
Clojure
1
star
94

webbing

Clojure
1
star
95

slack-backfill

Save Slack history to JSON files
Clojure
1
star
96

janus

Parser for Changelog files
Clojure
1
star
97

xdemo

Demo of xforms/redux/kixi.stats
Clojure
1
star
98

ep23deftype

Code for Lambda Island Episode 23, deftype and definterface
Clojure
1
star
99

ep24defrecord

Code for Lambda Island Episode 24, defrecord and defprotocol
Clojure
1
star
100

ep32testing

Code for Lambda Island Episode 32, Introduction to Clojure Testing
Clojure
1
star