• Stars
    star
    208
  • Rank 189,015 (Top 4 %)
  • Language
    OCaml
  • License
    Other
  • Created almost 13 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pure OCaml regular expressions, with support for Perl and POSIX-style strings

Description

Re is a regular expression library for OCaml. Build status

Contact

This library has been written by Jerome Vouillon ([email protected]). It can be downloaded from https://github.com/ocaml/ocaml-re

Bug reports, suggestions and contributions are welcome.

Features

The following styles of regular expressions are supported:

  • Perl-style regular expressions (module Re.Perl);
  • Posix extended regular expressions (module Re.Posix);
  • Emacs-style regular expressions (module Re.Emacs);
  • Shell-style file globbing (module Re.Glob).

It is also possible to build regular expressions by combining simpler regular expressions (module Re).

The most notable missing features are back-references and look-ahead/look-behind assertions.

There is also a subset of the PCRE interface available in the Re.Pcre module. This makes it easier to port code from that library to Re with minimal changes.

Performances

The matches are performed by lazily building a DFA (deterministic finite automaton) from the regular expression. As a consequence, matching takes linear time in the length of the matched string.

The compilation of patterns is slower than with libraries using back-tracking, such as PCRE. But, once a large enough part of the DFA is built, matching is extremely fast.

Of course, for some combinations of regular expression and string, the part of the DFA that needs to be build is so large that this point is never reached, and matching will be slow. This is not expected to happen often in practice, and actually a lot of expressions that behaves badly with a backtracking implementation are very efficient with this implementation.

The library is at the moment entirely written in OCaml. As a consequence, regular expression matching is much slower when the library is compiled to bytecode than when it is compiled to native code.

Here are some timing results (Pentium III 500Mhz):

  • Scanning a 1Mb string containing only as, except for the last character which is a b, searching for the pattern aa?b (repeated 100 times):

    • RE: 2.6s
    • PCRE: 68s
  • Regular expression example from http://www.bagley.org/~doug/shootout/ [1]

    • RE: 0.43s
    • PCRE: 3.68s

    [1] this page is no longer up but is available via the Internet Archive http://web.archive.org/web/20010429190941/http://www.bagley.org/~doug/shootout/bench/regexmatch/

  • The large regular expression (about 2000 characters long) that Unison uses with my preference file to decide whether a file should be ignored or not. This expression is matched against a filename about 20000 times.

    • RE: 0.31s
    • PCRE: 3.7s However, RE is only faster than PCRE when there are more than about 300 filenames.

More Repositories

1

ocaml

The core OCaml system: compilers, runtime system, base libraries
OCaml
4,732
star
2

dune

A composable build system for OCaml.
OCaml
1,626
star
3

merlin

Context sensitive completion for OCaml in Vim and Emacs
OCaml
1,574
star
4

opam

opam is a source-based package manager. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow.
OCaml
1,225
star
5

ocaml-lsp

OCaml Language Server Protocol implementation
OCaml
766
star
6

opam-repository

Main public package repository for opam, the source package manager of OCaml.
516
star
7

tuareg

Emacs OCaml mode
Emacs Lisp
346
star
8

v2.ocaml.org

Implementation of the ocaml.org website.
HTML
323
star
9

odoc

Documentation compiler for OCaml and Reason
OCaml
320
star
10

ocamlunix

Unix system programming in OCaml book
TeX
276
star
11

Zarith

The Zarith library implements arithmetic and logical operations over arbitrary-precision integers and rational numbers. The implementation, based on GMP, is very efficient.
OCaml
223
star
12

setup-ocaml

GitHub Action for the OCaml programming language
TypeScript
196
star
13

ocaml.org

The official OCaml website.
HTML
162
star
14

omd

extensible Markdown library and tool in "pure OCaml"
OCaml
152
star
15

RFCs

Design discussions about the OCaml language
150
star
16

oasis

Cabal like system for OCaml
OCaml
124
star
17

ocamlbuild

The legacy OCamlbuild build manager
OCaml
121
star
18

ocaml-ci-scripts

Skeletons for CI scripts
OCaml
101
star
19

flexdll

a dlopen-like API for Windows
OCaml
100
star
20

vim-ocaml

Vim runtime files for OCaml
Vim Script
85
star
21

v3.ocaml.org-rescript

The next implementation of ocaml.org, built on OCaml, ReScript, NextJS, and Tailwind.
ReScript
75
star
22

graphics

The Graphics library from OCaml, in a standalone repository
C
51
star
23

infrastructure

WIki to hold the information about the machine resources available to OCaml.org
HTML
40
star
24

num

The legacy Num library for arbitrary-precision integer and rational arithmetic that used to be part of the OCaml core distribution
OCaml
35
star
25

MPP-language-blender

MPP: a meta preprocessor that blends programming languages
OCaml
33
star
26

obi

OCaml Build Infrastructure
OCaml
30
star
27

oasis2opam

Tool to convert OASIS metadata to OPAM package descriptions
OCaml
27
star
28

ocamlfind

The OCaml findlib library manager
OCaml
26
star
29

ocaml-logo

Official Logo for OCaml
26
star
30

caml-mode

Emacs mode to edit OCaml files
Emacs Lisp
19
star
31

code-of-conduct

Documents related to the Code of Conduct
17
star
32

ocaml-beta-repository

Opam2 remote for beta versions of the OCaml compiler
Shell
16
star
33

ocaml-manual

OBSOLETE, ARCHIVED mirror of the OCaml manual
TeX
15
star
34

opam-file-format

Parser and printer for the opam file syntax
OCaml
15
star
35

ood

OCaml.org v3 data repository
OCaml
14
star
36

camlp-streams

The Stream and Genlex libraries for use with Camlp4 and Camlp5
OCaml
14
star
37

platform-blog

Repository for the Platform blog
13
star
38

oloop

Evaluate code through the OCaml toploop for inclusion in educational material.
OCaml
12
star
39

dbm

The legacy CamlDBM library for accessing NDBM/GDBM database files
OCaml
12
star
40

ocaml-library-standard

Documenting how OCaml libraries are managed
11
star
41

stdlib-shims

Shim to substitute `Pervasives` with `Stdlib` before 4.08.
Standard ML
10
star
42

dune-www

Website for dune.build
SCSS
10
star
43

0install-tools

Tools for distributing OCaml via 0install
8
star
44

platform-dev

Dev versions of the tools used to build the upcoming platform
Shell
8
star
45

opam.ocaml.org

Scripts and documentation for the opam.ocaml.org website
Shell
7
star
46

oasis-db

Hackage like system for OCaml based on OASIS
OCaml
7
star
47

uchar

Uchar compatibility library
OCaml
6
star
48

stdlib-random

Versioned random number library
OCaml
6
star
49

oasis-website

Devel website for OASIS http://oasis.forge.ocamlcore.org
JavaScript
5
star
50

cwn-data

The data repository for the Caml Weekly News
HTML
5
star
51

ocaml-pr-repository

opam switches for all the proposed pull requests against the compiler
5
star
52

oasis2debian

Convert _oasis to debian/ directory.
OCaml
5
star
53

homebrew-ocaml

A Homebrew tap for OCaml and OPAM distribution
Ruby
5
star
54

ocaml.org-media

Media files that we don't want to include in main ocaml.org repo.
HTML
4
star
55

opam-source-archives

mirror of precious opam repository packages whose source websites have disappeared
Shell
4
star
56

subsystem-meetings

sharing documents for specialized developer meetings
4
star
57

ocaml.org-scripts

Scripts for the ocaml.org infrastructure machines
Shell
2
star
58

ocaml.org-infratest

Tests for ocaml.org websites and services.
Shell
2
star
59

release-readiness

Tracking release readiness for OCaml compiler releases
1
star
60

opam-bulk-logs

Logs of daily OPAM bulk package builds
1
star
61

obi-logs

Logs for OCaml Build Infrastructure
1
star