• Stars
    star
    105
  • Rank 318,214 (Top 7 %)
  • Language
    HTML
  • Created over 13 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Booyer-Moore-Horspool string search algorithm implementation in C++

Introduction

This repository contains various C++ implementations of the Boyer-Moore string search algorithm and derivative algorithms. These family of algorithms allow fast searching of substrings, much faster than strstr() and memmem(). The longer the substring, the faster the algorithms work. The implementations are written to be both efficient and minimalist so that you can easily incorporate them in your own code.

This blog post has more information: http://blog.phusion.nl/2010/12/06/efficient-substring-searching/

Files

Horspool.cpp

Implements Boyer-Moore-Horspool. HorspoolTest.cpp is the unit test file.

BoyerMooreAndTurbo.cpp

Implements Boyer-Moore and Turbo Boyer-Moore. No special test files, but they're used in the benchmark program which serves as a basic sanity test.

StreamBoyerMooreHorspool.h

A special Boyer-Moore-Horspool implementation that supports "streaming" input. Instead of supplying the entire haystack at once, you can supply the haystack piece-by-piece. This makes it especially suitable for parsing data that you may receive over the network. This implementation also contains various memory and CPU optimizations, allowing it to be slightly faster and to use less memory than Horspool.cpp. See the file for detailed documentation.

Unit tests are in StreamTest.cpp.

benchmark.cpp

Benchmark program. Used in combination with the run_benchmark Rake task.

TestMain.cpp

Unit test runner program.

Testing and benchmarking

You need Ruby and Rake. To compile the unit tests:

rake test
./test

To run the benchmarks:

rake run_benchmark

Which algorithm to use?

I've found that, on average, Boyer-Moore-Horspool performs best thanks to its simple inner loop which can be heavily optimized. It has pretty bad worst-case performance but the worst case (or even bad cases) almost never occur in practice.

More Repositories

1

default_value_for

Provides a way to specify default values for ActiveRecord models
Ruby
736
star
2

debian-packaging-for-the-modern-developer

Debian packaging tutorials for the modern developer
Shell
387
star
3

daemon_controller

A library for implementing daemon management capabilities.
Ruby
212
star
4

heap_dumper_visualizer

ptmalloc2 heap dumper and visualizer
Ruby
132
star
5

multipart-parser

A C++ multipart MIME parser that isn't bloated with unnecessary stuff and doesn't depend on huge external libraries
C++
94
star
6

rubyenterpriseedition

Ruby Enterprise Edition based on MRI 1.8.6
Ruby
75
star
7

crash-watch

Monitor processes and display useful information when they crash
Ruby
71
star
8

mizuho

Documentation formatting tool. Converts Asciidoc input into nicely formatted HTML.
Python
70
star
9

encrypted_cookie_store

EncryptedCookieStore for Ruby on Rails 2.3
Ruby
48
star
10

matchhostfsowner

Solves the Docker host filesystem owner matching problem
Rust
45
star
11

rubyenterpriseedition187

Ruby Enterprise Edition based on MRI 1.8.7-p174
Ruby
42
star
12

rubyenterpriseedition187-330

Ruby Enterprise Edition based on Ruby 1.8.7-p330 and later
Ruby
25
star
13

paypal

Paypal IPN handling library
Ruby
22
star
14

rubyenterpriseedition187-248

Ruby Enterprise Edition based on MRI 1.8.7-p248
Ruby
22
star
15

better

A collection of better replacements for Ruby standard libaries
Ruby
17
star
16

gedit-class-browser-plugin

Gedit class browser plugin
Python
16
star
17

distributed-lock-google-cloud-storage-ruby

Ruby implementation of a distributed lock based on Google Cloud Storage
Ruby
16
star
18

streaming_find

Streams ActiveRecord query results instead of loading the entire result set into memory.
Ruby
14
star
19

work_batcher

Small library for batching work
Ruby
13
star
20

gedit-snapopen-plugin

Plugin for making it convenient to open related source files by typing in their name. Inspired by the similar TextMate feature.
Python
9
star
21

mikuru

Mikuru static image gallery generator
Ruby
8
star
22

yuumius_comments

Quickly and easily add a comments area to your website
Ruby
7
star
23

gedit-advanced-bookmarks-plugin

Advanced bookmarks plugin for Gedit
Python
6
star
24

p2

An implementation of the P^2 algorithm by Jain et al
C++
6
star
25

rubygems

RubyGems
Ruby
5
star
26

auto_redirection

Rails plugin for easily implementing correct, nested redirections
Ruby
5
star
27

clean-google-search-url

Get rid of Google search result link redirections
JavaScript
4
star
28

waph

Web Application Packaging Helper
Ruby
4
star
29

ruby-debug

enchanced ruby-debug
Emacs Lisp
4
star
30

platform_info

Various system autodetection code, extracted from Phusion Passenger
Ruby
3
star
31

sandbox

2
star
32

threadtest

A research prototype for investigating ways to implement the fastest possible multi-core-utilizing Phusion Passenger application pool
C++
2
star
33

authlogic_rails3_rspec2_test

Ruby
2
star
34

revelation

Revelation password manager
2
star
35

pulljoy

Make external pull requests CI-able again
Ruby
1
star
36

queryexplainer

Ruby
1
star
37

BulkRecorder

Bulk recording audio files on OS X
Objective-C
1
star