• Stars
    star
    127
  • Rank 282,790 (Top 6 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ukkonen's Approximate String Matching algorithm

Ukkonen - Approximate String Matching

npm version Checks

This project implements the Approximate String Matching algorithm by Esko Ukkonen extended with ideas from An Extension of Ukkonen's Enhanced Dynamic Programming ASM Algorith by Hal Berghel and David Roach.

Ukkonen's algorithm is very competitive with the Levenshtein distance and for longer strings it is much more performant than Levenshtein distance.

In addition to being a competitive alternative to Levenshtein distance, Ukkonen's algorithm also allows you to provide a threshold for the distance which increases the performance even more for texts that are longer than the threshold.

HTML diffing using Levenshtein HTML diffing using Ukkonen's algorithm

Above you can see the different of using Levenshtein distance and Ukkonen's algorithm for matching sub-trees when diffing HTML.

Install

npm install --save ukkonen

Usage

You can find the distance between the strings Ukkonen and Levenshtein the following way:

var ukkonen = require("ukkonen");

assert.equal(ukkonen("Ukkonen", "Levenshtein"), 8);

If you want to limit the distance by a given threshold:

var ukkonen = require("ukkonen");

assert.equal(ukkonen("Ukkonen", "Levenshtein", 6), 6);
assert.equal(ukkonen("Ukkonen", "Levenshtein", 10), 8);

Platform support

The library is ES6 and will work with any JavaScript bundler in the browser as well as Node versions with ES6 support.

Benchmark

I have benchmarked the library against the fastest Levenshtein distance implementation on NPM.

Running benchmarks with 1000 iterations

# ukkonen: Edit distance one word (14 examples)
ok ~18 ms (0 s + 17993165 ns)

# leven: Edit distance one word (14 examples)
ok ~13 ms (0 s + 13155407 ns)

# ukkonen: Edit distance on sentence with small differences
ok ~1.66 ms (0 s + 1656841 ns)

# leven: Edit distance on sentence with small differences
ok ~7.23 ms (0 s + 7233814 ns)

# ukkonen: Edit distance on paragraphs with small differences
ok ~5.37 ms (0 s + 5367561 ns)

# leven: Edit distance on paragraphs with small differences
ok ~416 ms (0 s + 416468504 ns)

# ukkonen: Edit distance on longer texts with small differences
ok ~10 ms (0 s + 10305586 ns)

# leven: Edit distance on longer texts with small differences
ok ~1.7 s (1 s + 703731130 ns)

# ukkonen: Edit distance on longer texts with many differences
ok ~3.28 s (3 s + 280166305 ns)

# leven: Edit distance on longer texts with many differences
ok ~2.52 s (2 s + 519432479 ns)

# ukkonen: Edit distance on longer texts with small differences and a threshold of 20
ok ~9.69 ms (0 s + 9691021 ns)

# leven: Edit distance on longer texts with small differences and a threshold of 20
ok ~1.61 s (1 s + 610079082 ns)

# ukkonen: Edit distance on longer texts with many differences and a threshold of 40
ok ~15 ms (0 s + 15225792 ns)

# leven: Edit distance on longer texts with many differences and a threshold of 40
ok ~2.54 s (2 s + 539519721 ns)

Acknowledgements

Obviously the authors of the papers describing the algorithm Esko Ukkonen, Hal Berghel and David Roach.

I stole a lot of ideas from Sindre Sorhus's leven library and I also used it to test my implementation against.

License

MIT Β© Sune Simonsen

More Repositories

1

react-dom-testing

A tiny wrapper around react-dom/test-utils to make it more convenient.
JavaScript
37
star
2

stylewars

A tiny CSS in JS library that requires no tooling
JavaScript
31
star
3

magicpen

Styled output in both consoles and browsers
JavaScript
19
star
4

nano-router

A tiny modern router
JavaScript
10
star
5

offline-github-changelog

A changelog generator for Github projects that only uses the Git history
JavaScript
9
star
6

chance-generators

Random generators based on changejs
JavaScript
9
star
7

less-border-layout

An example showing how to do
CSS
7
star
8

transformation

Transformation pipelines
JavaScript
7
star
9

generative-testing-example

Generative testing example
JavaScript
6
star
10

depository

A reactive storage engine
JavaScript
6
star
11

fdb-blobs

A blob store build on top of FoundationDB
Go
6
star
12

graphql-fakester

Create stub data from a GraphQL schema
JavaScript
4
star
13

eslint-config-pretty-standard

A shareable eslint+prettier config based on standard
JavaScript
4
star
14

htmx-hackernews

Hackernews made with htmx and Go
Go
3
star
15

workspace-cache

A workspace cache for Yarn workspaces and Lerna
JavaScript
3
star
16

ido-bookmark-jump

Emacs plugin to fuzzy search for bookmarks
Emacs Lisp
3
star
17

emacs-runtests

Run any kind of tests from Emacs
Emacs Lisp
3
star
18

changeless

An immutable collection library for Java inspired by Clojure
Java
3
star
19

less-example

JavaScript
3
star
20

factory.js

A library for creating test-data factories
JavaScript
2
star
21

dependable-state

Observables and computeds for state management
JavaScript
2
star
22

append-styles

Append CSS in a given order
JavaScript
1
star
23

evil-config

Emacs evil-mode config
Emacs Lisp
1
star
24

unexpected-presentation

A presentation about Unexpected and testing in general
JavaScript
1
star
25

7W17732

A Twitter application with the purpose of showing the capabilities of jQuery
JavaScript
1
star
26

collect-js

A JavaScript library for collecting data from tree data structures
JavaScript
1
star
27

dotfiles

Unix configuration files
Emacs Lisp
1
star
28

WhoOwesYou

A Rails app to settle debt between multiple parties
Ruby
1
star
29

chewbacca

A benchmark UI for Mocha that will run the body of each test multiple times.
JavaScript
1
star
30

reactive-variable

An implementation of the Apollo Reactive Variables that can be used as stepping stone for upgrading from v2 to v3.
JavaScript
1
star
31

Emacs.d

My emacs.d folder
Emacs Lisp
1
star
32

haunted-web-components

Experiment with haunted web components
JavaScript
1
star
33

fdb-streams

An event streaming library for FoundationDB that supports reactive consumers
Go
1
star