• Stars
    star
    181
  • Rank 212,110 (Top 5 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 15 years ago
  • Updated about 12 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Regular Expression Construction

Regular Expression Construction

Complex regular expressions are hard to construct and even harder to
read. The Re library allows users to construct complex regular
expressions from simpler expressions. For example, consider the
following regular expression that will parse dates:

   /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/

Using the Re library, that regular expression can be built
incrementaly from smaller, easier to understand expressions.
Perhaps something like this:

  require 're'

  include Re

  delim                = re.any("- /.")
  century_prefix       = re("19") | re("20")
  under_ten            = re("0") + re.any("1-9")
  ten_to_twelve        = re("1") + re.any("012")
  ten_and_under_thirty = re.any("12") + re.any("0-9")
  thirties             = re("3") + re.any("01")

  year = (century_prefix + re.digit.repeat(2)).capture(:year)
  month = (under_ten | ten_to_twelve).capture(:month)
  day = (under_ten | ten_and_under_thirty | thirties).capture(:day)

  date = (year + delim + month + delim + day).all

Although it is more code, the individual pieces are smaller and
easier to independently verify. As an additional bonus, the capture
groups can be retrieved by name:

  result = date.match("2009-01-23")
  result[:year]      # => "2009"
  result[:month]     # => "01"
  result[:day]       # => "23"

Version

This document describes Re version 0.0.6.

Usage

  include Re

  number = re.any("0-9").all
  if number =~ string
    puts "Matches!"
  else
    puts "No Match"
  end

Examples

Simple Examples

  re("a")                -- matches "a"
  re("a") + re("b")      -- matches "ab"
  re("a") | re("b")      -- matches "a" or "b"
  re("a").many           -- matches "", "a", "aaaaaa"
  re("a").one_or_more    -- matches "a", "aaaaaa", but not ""
  re("a").optional       -- matches "" or "a"
  re("a").all            -- matches "a", but not "xab"

See Re::Rexp for a complete list of expressions.

Using re without an argument allows access to a number of common
regular expression constants. For example:

  re.space / re.spaces  -- matches " ", "\n" or "\t"
  re.digit / re.digits  -- matches a digit / sequence of digits

Also, re without arguments can also be used to construct character
classes:

  re.any                -- Matches any charactor
  re.any("abc")         -- Matches "a", "b", or "c"
  re.any("0-9")         -- Matches the digits 0 through 9
  re.any("A-Z", "a-z", "0-9", "_")
                        -- Matches alphanumeric or an underscore

See Re::ConstructionMethods for a complete list of common constants
and character class functions.

See Re.re, Re::Rexp, and Re::ConstructionMethods for details.

regexml Example

Regexml is an XML based language to express regular expressions.
Here is their example for matching URLs.

    <regexml xmlns="http://schemas.regexml.org/expressions">
        <expression id="url">
            <start/>
            <match equals="[A-Za-z]" max="*" capture="true"/> <!-- scheme (e.g., http) -->
            <match equals=":"/>
            <match equals="//" min="0"/> <!-- mailto: and news: URLs do not require forward slashes -->
            <match equals="[0-9.\-A-Za-z@]" max="*" capture="true"/> <!-- domain (e.g., www.regexml.org) -->
            <group min="0">
                <match equals=":"/>
                <match equals="\d" max="5" capture="true"/> <!-- port number -->
            </group>
            <group min="0" capture="true"> <!-- resource (e.g., /sample/resource) -->
                <match equals="/"/>
                <match except="[?#]" max="*"/>
            </group>
            <group min="0">
                <match equals="?"/>
                <match except="#" min="0" max="*" capture="true"/> <!-- query string -->
            </group>
            <group min="0">
                <match equals="#"/>
                <match equals="." min="0" max="*" capture="true"/> <!-- anchor tag -->
            </group>
            <end/>
        </expression>
    </regexml>

Here is the Re expression to match URLs:

    URL_PATTERN =
      re.any("A-Z", "a-z").one_or_more.capture(:scheme) +
      re(":") +
      re("//").optional +
      re.any("0-9", "A-Z", "a-z", "-@.").one_or_more.capture(:host) +
      (re(":") + re.digit.repeat(1,5).capture(:port)).optional +
      (re("/") + re.none("?#").many).capture(:path).optional +
      (re("?") + re.none("#").many.capture(:query)).optional +
      (re("#") + re.any.many.capture(:anchor)).optional

    URL_RE = URL_PATTERN.all

Performance

We should say a word or two about performance.

First of all, building regular expressions using Re is slow. If you
use Re to build regular expressions, you are encouraged to build the
regular expression once and reuse it as needed. This means you
won’t do a lot of inline expressions using Re, but rather assign the
generated Re regular expression to a constant. For example:

  PHONE_RE = re.digit.repeat(3).capture(:area) +
               re("-") +
               re.digit.repeat(3).capture(:exchange) +
               re("-") +
               re.digit.repeat(4)).capture(:subscriber)

Alternatively, you can arrange for the regular expression to be
constructed only when actually needed. Something like:q

  def phone_re
    @phone_re ||= re.digit.repeat(3).capture(:area) +
                    re("-") +
                    re.digit.repeat(3).capture(:exchange) +
                    re("-") +
                    re.digit.repeat(4)).capture(:subscriber)
  end

That method constructs the phone number regular expression once and
returns a cached value thereafter. Just make sure you put the
method in an object that is instantiated once (e.g. a class method).

When used in matching, Re regular expressions perform fairly well
compared to native regular expressions. The overhead is a small
number of extra method calls and the creation of a Re::Result object
to return the match results.

If regular expression performance is a premium in your application,
then you can still use Re to construct the regular expression and
extract the raw Ruby Regexp object to be used for the actual
matching. You lose the ability to use named capture groups easily,
but you get raw Ruby regular expression matching performance.

For example, if you wanted to use the raw regular expression from
PHONE_RE defined above, you could extract the regular expression
like this:

  PHONE_REGEXP = PHONE_RE.regexp

And then use it directly:

  if PHONE_REGEXP =~ string
    # blah blah blah
  end

The above match runs at full Ruby matching speed. If you still
wanted named capture groups, you can something like this:

  match_data = PHONE_REGEXP.match(string)
  area_code = match_data[PHONE_RE.name_map[:area]]

License and Copyright

Copyright 2009 by Jim Weirich ([email protected]).
All rights Reserved.

Re is provided under the MIT open source license (see MIT-LICENSE)

Links:

Documentation :: http://re-lib.rubyforge.org
Source :: http://github.com/jimweirich/re
GemCutter :: http://gemcutter.org/gems/re
Download :: http://rubyforge.org/frs/?group_id=9329
Bug Tracker :: http://www.pivotaltracker.com/projects/47758
Continuous Integration :: http://travis-ci.org/#!/jimweirich/re
Author :: [email protected]

More Repositories

1

rspec-given

Given/When/Then keywords for RSpec Specifications
Ruby
652
star
2

builder

Provide a simple way to create XML markup and data structures.
Ruby
362
star
3

wyriki

Experimental Rails application to explore decoupling app logic from Rails.
CSS
272
star
4

gilded_rose_kata

The Gilded Rose Code Cata
Ruby
202
star
5

argus

Ruby API for controlling a Parrot AR Drone
Ruby
117
star
6

swimlanes

Draw git repositories in swim lane notation
JavaScript
113
star
7

sorcerer

Generate Ruby source from a Ripper style AST
Ruby
99
star
8

flexmock

Flexible mocking for Ruby testing
Ruby
93
star
9

sicp-study

Study Group Worked Exercises from "The Structure and Interpretation of Computer Programs"
Scheme
86
star
10

presentation_solid_ruby

SOLID Ruby Design Principles Presentation
Ruby
74
star
11

Given

A Given/When/Then Specification Framework
Ruby
60
star
12

emacs-setup

Emacs Setup and Customization
Emacs Lisp
59
star
13

presentation_connascence

The Grand Unifying Theory of Software Development: Connascence
Ruby
49
star
14

lambda_fizz

The Classic FizzBuzz program implemented in pure Ruby-Flavored Lambda Calculus
Ruby
47
star
15

emacs-setup-esk

My Emacs Setup based on the Emacs Starter Kit (ESK)
Emacs Lisp
45
star
16

bnr-ios-rubymotion

Big Nerd Ranch Guide to iOS Programming Examples in RubyMotion
Ruby
41
star
17

texp

Temporal Expressions for Ruby
Ruby
40
star
18

emacs-starter-kit

A Starter Kit for Rubyists wanting to use Emacs
Emacs Lisp
29
star
19

presentation_source_control

Source Control for People Who Don't Like Source Control
25
star
20

dim

DIM - Dependency Injection - Minimal
Ruby
25
star
21

sudoku

A Simple Sudoku Solver
Ruby
22
star
22

irb-setup

My setup and initialization files for irb
Ruby
18
star
23

presentation_enterprise_mom

What the Enterprise Can Learn From Your Mom presentation for erubycon 2008. (Aka "What? Threads are Hard?")
Ruby
18
star
24

presentation_10papers

10 Papers -- Really Fast
Ruby
17
star
25

presentation_writing_solid_ruby_code

How to Write Robust Ruby Programs
Ruby
16
star
26

partially_valid

A Rails plugin that allows validation on partially completed Active Record models (useful in wizards that incrementally build a model).
Ruby
16
star
27

pair_programming_bot

Pair Programming Bot iPhone Application
Ruby
16
star
28

presentation_ynot

Keynote and practice files for the Y-Not Talk (deriving the y-combinator from first principles)
15
star
29

rava

Ruby Code for Java Developers
Ruby
14
star
30

gotags

Simple TAGS file generator written in go (compare to ctags or exuberant_ctags)
Go
13
star
31

presentation_agile_engineering_practices

Agile Engineering Practices Overview
13
star
32

presentation_testing_why_dont_we_do_it_like_this

A presentation on ways to improve the way we do testing in an agile process.
Ruby
13
star
33

presentation_kata_and_analysis

A Presentation on a simple code kata and an analysis of the decisions made throughout the coding session.
Ruby
11
star
34

beer_song

Beer Song Kata (courtesy of Sandi Metz)
9
star
35

dudley

Techniques for Decoupling your application logic from Rails (or any web framework for that matter).
Ruby
9
star
36

BankOcrKata

Ruby solution to the Bank OCR Kata described at http://www.codingdojo.org/cgi-bin/wiki.pl?KataBankOCR
Ruby
9
star
37

presentation-connascence-examined

Connascence Examined Presentation
Java
9
star
38

polite_programmer_presentation

The Polite Programmer Presentation
Ruby
9
star
39

presentation_parenthetically_speaking

Keynote Presentation on SICP
Ruby
8
star
40

present_code

Tools for autoupdating Keynote presentations from a live code base.
Ruby
8
star
41

presentation_playing_it_safe

Presentation on Writing Good Library Code in Ruby
JavaScript
7
star
42

presentation_event-vs-cells

Presentation given at Big Ruby on Evented vs Celluloid
Ruby
7
star
43

presentation_to_infinity

Mountain Ruby Keynote - Don't be afraid to pioneer your ideas
6
star
44

RakePresentations

Rake Boot Camp and Power Rake Presentations
Ruby
6
star
45

polite_programmer_blog

The Blog of the Polite Programmers
6
star
46

sample_friends_app

This is a sample Rails app where I play around with some queries.
Ruby
5
star
47

presentation_flying_robots

Presentation on Controlling AR Drone with Ruby
4
star
48

Personography

Personal Project for Jenny
Ruby
4
star
49

project_euler_solutions

My solutions for the Project Euler problem set.
Ruby
4
star
50

rakedocs

Documents for the Rake Build System
CSS
3
star
51

jsblogger_sample

Sample Implementation of JS Blogger
Ruby
3
star
52

presentation-given

RSpec Given/When/Then Presentation
Ruby
2
star
53

protection_proxy

A proxy that protects against updates of selected fields
Ruby
2
star
54

example_blogger_with_seo

This is a version of the JumpStart blogger example with SEO url mapping
Ruby
1
star
55

travis_ci_flexmock_debug

A Project using FlexMock that can be deployed onto Travis-CI to see why flexmock isn't picked up.
Ruby
1
star