• Stars
    star
    426
  • Rank 98,054 (Top 2 %)
  • Language
    Go
  • License
    MIT License
  • Created over 3 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library

Coverage Status Go Report Card License: MIT PkgGoDev

Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...


Table of Contents


Requirements

  • Go (v1.13+)

Introduction

Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode characters!
This library is 100% test covered 😁

Features

Benchmarks

You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through StringsSimilarity function here

However, if you want or need more details, you can also viewing benchmark raw output here, which also includes memory allocations and test cases output (similarity result and errors).

If you are on Linux and want to run them on your setup, you can run ./tests/benchmark.sh script.

Installation

Open bash into your project folder and run:

go get github.com/hbollon/go-edlib

And import it into your project:

import (
	"github.com/hbollon/go-edlib"
)

Run tests

If you are on Linux and want to run all unit tests just run ./tests/tests.sh script.

For Windows users you can run:

go test ./... # Add desired parameters to this command if you want

Documentation

You can find all the documentation here : Documentation

Examples

Calculate string similarity index between two string

You can use StringSimilarity(str1, str2, algorithm) function. algorithm parameter must one of the following constants:

// Algorithm identifiers
const (
	Levenshtein Algorithm = iota
	DamerauLevenshtein
	OSADamerauLevenshtein
	Lcs
	Hamming
	Jaro
	JaroWinkler
	Cosine
)

Example with levenshtein:

res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Similarity: %f", res)
}

Execute fuzzy search based on string similarity algorithm

1. Most matching unique result without threshold

You can use FuzzySearch(str, strList, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result: %s", res)
}
Result: testing 

2. Most matching unique result with threshold

You can use FuzzySearchThreshold(str, strList, minSimilarity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig': %s", res)
}

res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'hello': %s", res)
}
Result for 'testnig': testing
Result for 'hello':

3. Most matching result set without threshold

You can use FuzzySearchSet(str, strList, resultQuantity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Results: %s", strings.Join(res, ", "))
}
Results: testing, test, tester 

4. Most matching result set with threshold

You can use FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}

res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}
Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing

Get raw edit distance (Levenshtein, LCS, Damerau–Levenshtein, Hamming)

You can use one of the following function to get an edit distance between two strings :

Example with Levenshtein distance:

res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
Result: 3

LCS, LCS Backtrack and LCS Diff

1. Compute LCS(Longuest Common Subsequence) between two strings

You can use LCS(str1, str2) function.

lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
Length of their LCS: 3

2. Backtrack their LCS

You can use LCSBacktrack(str1, str2) function.

res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: %s", res)
}
LCS: ABD

3. Backtrack all their LCS

You can use LCSBacktrackAll(str1, str2) function.

res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
LCS: ABD, ACD

4. Get LCS Diff between two strings

You can use LCSDiff(str1, str2) function.

res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
LCS Diff: 
 h c o m p u s e b o a t e r
 + -   - -   + + + + +   - -

Author

👤 Hugo Bollon

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

📝 License

Copyright © 2020 Hugo Bollon.
This project is MIT License licensed.

More Repositories

1

IGopher

⚡ Powerful, customizable and easy to use Instagram dm bot. With TUI and Electron.js GUI! Using Selenium webdriver and Yaml configuration files. (WIP)
Go
133
star
2

portfolio-vuejs

💼 Portfolio template using VueJs framework, CosmicJS API and Bootstrap
Vue
117
star
3

k8s-voting-app-aws

☸️ Example of a distributed voting app running on Kubernetes. Written in Golang with Terraform definitions to deploy to AWS EKS
Go
30
star
4

IG_Automation_Bot

Python tool for Instagram direct message automation with scheduler, quota management, user blacklist & autonomous user scrapping. Easily configurable through Yaml config files! Not maintained anymore in favor of IGopher (https://github.com/hbollon/IGopher)
Python
22
star
5

GyroscopeControl

🌀 Unity script used for smooth and customizable object rotation with gyroscope (initially configured to rotate x and z axis using x and y axis of gyro but can be easily edited). It include initial calibration with offset, rotation speed (Time.deltaTime * velocity), smoothing parameter editable in Unity inspector and debug overlay.
C#
22
star
6

android-sqlite-toolbox

Android java package designed to easily manage a sqlite database. Include creation of the DB and interactions with it, import and export in several formats and synchronization through http.
Java
9
star
7

gobot

🤖 Messenger chatbot using the Levenshtein distance algorithm for pattern matching. Easily configurable with yaml files. Use mux and yaml.v2.
Go
8
star
8

proxy-login-automator

🚀 Node.js application to automatically inject user/password to http proxy server via a local forwarder. Also support PAC servers.
JavaScript
6
star
9

hbollon

Welcome to My Github profile !
4
star
10

SkillsList

Cross-platform university skills validation application. Builded with Flutter.
Dart
4
star
11

go-hash

Go
4
star
12

Urbalog

Android adaptation of board game using Nearby Connections API
Java
4
star
13

Laganga

Collaborative agenda
Java
3
star
14

Gyro-Ball

3D labyrinth game project for Android using Unity. Use device gyroscope.
C#
3
star
15

jgo

📔 Generic JSON parser library. Used to unmarshal .json to generic structures and be able to marshal them.
Go
3
star
16

svm-ml-exercise

Showcase exercise to use svm algorithm in machine learning
Jupyter Notebook
3
star
17

Deploy-script-Laravel

Simple deployment script for any Laravel project
Shell
3
star
18

pepites-insolites-website

Vue
2
star
19

Tales_Of_A_Survivor

Game project for ISN, make in C++, SFML and Qt. This is also my first programming project with which I discovered programming and C++.
C++
2
star
20

Flappy_Bird

Little game project like Flappy Bird in C++ with SFML.
C++
2
star
21

Kingdom

New game project in C++ with SFML, it's a little rpg in 3/4 view.
C++
2
star
22

goneurax

Single layer perceptron
Go
2
star
23

Introsort-Fusion

IntroSort implementation in C
C
2
star
24

Tetris_gtk3.0

Tetris game with terminal and gtk version
C
2
star
25

SuperFileSaver

All in one software for filesystem backup and restore. (PAUSED)
C++
2
star
26

crypto-steganography-signature

Go
1
star
27

go-middleware

Go
1
star
28

MarketplaceJ2EE_Client

TypeScript
1
star
29

MarketplaceJ2EE_Server

Go
1
star
30

I703_TP2_Lambada

INFO703 TP 2,3,4
Java
1
star
31

go-project-template

📜 My personal way-to-go template for all of my Go projects on Github. Including gitignore, readme, licence, Github Action workflows, issue/PR templates, contributing guideline...
Makefile
1
star
32

TupleSpace-System

Go
1
star
33

my-dotfiles

My Linux config files for all my differents environments!
Shell
1
star
34

release-please-go-test

Go
1
star
35

UserVoice-Server

Java
1
star