• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Clojure
  • License
    MIT License
  • Created about 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A DSL that translates to regex

luna

Clojars Project codecov

No more regrets, wield the power of regex with the readability of English with luna.

About

luna is a Domain specific language (DSL) that is readable and translates into a Regex.Pattern object. luna is still in Beta but don't let this discourage you from using it, it has a good test suite and bug reports are key to improving it.

Why?

Readable code can be hard to maintain. Unreadable code can be impossible to maintain.

Installing and Using

add this to your project.clj file

:dependencies [[org.clojars.abhinav/luna "0.1.0-SNAPSHOT"]]

Luna has one function pre that does the heavy lifting.

;; import it
(ns user.core
  (:require [luna.core :as luna]))

(luna/pre [:match ["x" :digits :atleast 4 :times] :when :at-start])
;;=> #"^x\d{4,}"

Contributing

I welcome contributions, even from first-timers. Feedbacks and suggestions are welcome too.

If you'd like to contribute but don't know how

Test cases

The easiest thing you can do to contribute is write a test case, this project can never have too many test cases.

Documentation

Documentation is very important, more so than the code in the project, so I value these contributions highly. There will be some parts (hopefully not a lot) of the documentation that may not make sense, or maybe wrong, or can be worded differently.

Refactoring

I welcome refactors like

  • Variable and function renaming.
  • Extracting functions.
  • Moving things around to make more sense.

Grammar/language

the pre function is used to parse the dsl and return a regex.Pattern object.

the arguments to pre can be plain strings, or vectors, or a Pattern object.

=> (pre "xy")
#"xy"
; pre can take multiple args
=> (pre "a" #"b" [:match "c" :when :at-start])
#"ab^c"

The first element in the vector determines how the rest is processed. There are two main and commonly used keywords :match (or :m) and :capture (or :c) that are valid first elements.

=> (pre [:match "xy"])
#"xy"
=> (pre [:capture "xy"])
#"(xy)"

The next element is either a string or a vector, containing character classes. The valid syntax of the vector depends on whether :match or :capture was used.

character class syntax for :match

to be used as

[:match ["xy"]]

;;      ----char-class vector----
[:match ["x" :when :at-start "y"]]

I will omit [:match ...] for brevity.

examples of valid char-class vector

;; by default the elements in the char-class vector are evaluate to a string and separated by | in match
["xy"] => #"xy"
["x" "y"] => #"x|y"
["x" "y" "z"] => #"x|y|z"

;; if you would prefer to concatenate them, then use a nested vector
[["x" "y" "z"]] => #"xyz"

;; using ranges in character classes
[[1 :to 7]] => #"[1-7]"
[1 [2 :to 5]] => #"[1[2-5]]"

;; using anchors inside vector
["x" :when :at-start] => #"^x"
["x" :when :at-start "y"] => #"^xy"

;; using quantifiers inside vector
["x" :atleast 5 :times "y"] => #"x{5,}y"

;; the :times can be omitted but helps with readability
["x" :atleast 5 "y"] => #"x{5,}y"

;;combining anchors and quantifiers
["x" :atleast 5 :times :when :at-start "y"] => #"^x{5,}y"

after the character class vector we have modifiers.

;;             -modifiers-
[:match ["xy"] :atleast 2] => #"xy{2,}"
;;             ---modifiers---
[:match ["xy"] :when :at-start] => #"^xy"

Note! if you're using quantifiers and/or anchors inside the character class vector and outside then the result will be a "match everything enclosed" here's an example

[:match ["x" :atleast 5 "y"] :atleast 2] => #"(?:x{5}y){2}"

:match-enc

:match-enc[closed] by default [:match ["x" :atleast 5 "y"] ] yields #"x{5}|y" instead if you want #"(?:x{5}|y) use :match-enc

[:match-enc ["x" :atleast 5 "y"]] => #"(?:x{5}y)"

sets

if you wish to use set constructs like negation [^abc] or intersection [abc&&[ab]] you can use clojure's literal set notation

; negation
[:match #{:not "abc"}] => #"[^abc]"
[:match #{:not "abc" :upper [1 :to 5]}] => #"[^abcA-Z1-5]"

;; intersection
[:match #{:and "abc" "ab"}] => #"[abc&&[ab]]"
[:match #{:and "abc" :upper [1 :to 4]}] => #"[abc&&[A-Z]&&[1-4]]"

; combining both
[:match #{:and "abc" #{:not "ab" :digits}}] => #"[abc&&[^ab1-9]]"

capture

the syntax of capture is similar to :match

Anchors

Match

[:match "x" :when :at-start] ; #"^x"
[:match "xy" :when :at-start] ; #"^xy"
[:match ["xy"] :when :at-start] ; #"^xy"
[:match ["x" :or "y"] :when :at-start] ; #"^x|y"
[:match ["x" :when :at-start :or "y"]] ; #"^x|y"

[:match [:digits] :when :at-end] ; #"\d$"

Capture

[:capture "x" :when :at-start] ; #"^(x)"
[:capture "xy" :when :at-start] ; #"^(xy)"
[:capture ["x" "y"] :when :at-start] ; #"^(xy)"
[:capture [["x" "y"]] :when :at-start] ; #"^([xy])"

[:capture :digits :when :at-end] ; #"$(\d)"
[:capture "x" :when :at-word-start] ; #"\b(x)"
[:capture "x" :when :not :at-word-start] ; #"\B(x)"

Group constructs

assertions

;; lookahead positive and negative
[:match "x" :when :before "y"] ; #"x(?=y)"
[:match "x" :when :not-before "y"] ; #"x(?!y)"

;; lookbehind positive and negative
[:match "x" :when :after "y"] ; #"(?<=y)x"
[:match "x" :when :not-after "y"] ; #"(?<!y)x"

;; both can be combined
[:match "y" :between "x" :and "z"]  ; #"(?<=x)y(?=z)"
[:match "y" :between "x" :and :not "z"]  ; #"(?<=x)y(?!z)"
[:match "y" :between :not "x" :and :not "z"]  ; #"(?<!x)y(?!z)"

Quantifiers

note: the :times can be omitted if you want but it helps with readability

[:match "xyz" :lazily] ; #"xyz*?"
[:match ["xyz"] :lazily-1] ; #"xyz+?"

[:match ["xyz"] :greedily] ; #"xyz*"
[:match ["xyz"] :greedily-1] ; #"xyz+"

[:match ["xyz"] :possessively] ; #"xyz*+"

[:match ["x"] :atleast 3 :times] ; #"x{3,}"
[:match ["x"] :atleast 3] ; #"x{3,}"

[:match ["x"] :atmost 3 :times] ; #"x{0,3}"
[:match ["x"] :exactly 3 :times] ; #"x{3}"
;; Combining both
[:match ["x"] :atleast 3 :atmost 5 :times] ; #"x{3,5}
[:match ["x"] :between 3 :to 5 :times] ;  #"x{3,5}

A few practical examples

#"^\S+@\S+$" ;regex

(pre [:m [:!spaces :greedily-1 :when :at-start
          "@" :!spaces :greedily-1 :when :at-end]])
#"^[A-Z0-9+_.-]+@[A-Z0-9.-]+$"

(pre [:m [[["A" :to "Z"] [0 :to 9] "+_.-"]] :greedily-1 :when :at-start]
     "@"
     [:m [[["A" :to "Z"] [0 :to 9] ".-"]] :greedily-1 :when :at-end])

#"^([0-9]{4})-(1[0-2]|0[1-9])"


(pre [:c [[0 :to 9] :atleast 4] :when :at-start]
     "-"
     [:c [1 [0 :to 2] :or 0 [1 :to 9]]])

#"^([0-9]{4})-W(5[0-3]|[1-4][0-9]|0[1-9])$"

(pre [:c [[0 :to 9] :atleast 4] :when :at-start]
     "-W"
     [:c [5 [0 :to 3] :or [1 :to 4] [0 :to 9] :or 0 [1 :to 9]] :when :at-end])
#"(?<=>)[\s\S]*?(?=<)"
(pre
  [:match [:everything] :lazily :between ">" :and "<"])


#"^[A-Z]{1,2}[0-9R][0-9A-Z]?[0-9][ABD-HJLNP-UW-Z]{2}$"
(pre [:m :upper :between 1 :and 2 :when :at-start]
     [:m [[[0 :to 9] "R"]]]
     [:m [[[0 :to 9] :upper]] :0-or-1]
     [:m [[[0 :to 9]]]]
     [:m [["AB" ["D" :to "H"] "JLN" ["P" :to "U"] ["W" :to "Z"]]]
      :atleast 2 :when :at-end])