• Stars
    star
    2,813
  • Rank 16,178 (Top 0.4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fuzzy String Matching in Python

TheFuzz

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

For testing

  • pycodestyle
  • hypothesis
  • pytest

Installation

Using PIP via PyPI

pip install thefuzz

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=thefuzz

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=thefuzz

Manually via GIT

git clone git://github.com/seatgeek/thefuzz.git thefuzz
cd thefuzz
python setup.py install

Usage

>>> from thefuzz import fuzz
>>> from thefuzz import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Partial Token Sort Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
    84
>>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

More Repositories

1

fuzzywuzzy

Fuzzy String Matching in Python
Python
9,191
star
2

react-infinite

A browser-ready efficient scrolling container based on UITableView
JavaScript
2,708
star
3

soulmate

Unmaintained, use Soulheart!
Ruby
1,057
star
4

SGImageCache

A flexible image caching library for image rich iOS applications
Objective-C
401
star
5

android-PlacesAutocompleteTextView

An address-autocompleting text field for Android
Java
280
star
6

djjob

PHP port of delayed_job, a database backed asynchronous priority queue
PHP
254
star
7

hashi-helper

Disaster Recovery and Configuration Management for Consul and Vault
Go
185
star
8

nomad-helper

Useful tools for working with @hashicorp Nomad at scale
Go
158
star
9

docker-mirror

Mirror docker images across image repositories
Go
144
star
10

nomad-firehose

Firehose all nomad job, allocation, nodes and evaluations changes to rabbitmq, kinesis or stdout
Go
115
star
11

bash-aptfile

A simple method of defining apt-get dependencies for an application
Shell
92
star
12

businesstime

A simple python utility for calculating business time aware timedeltas between two datetimes
Python
84
star
13

react-slider

DEPRECATED: A Slider in React
JavaScript
80
star
14

docker-build-cacher

Builds a service with docker and caches the intermediate stages
Haskell
53
star
15

druzhba

Python
36
star
16

backstage-plugins

SeatGeek Backstage Plugins Collection
TypeScript
33
star
17

dhall-nomad

Create maintainable nomad job files
Dhall
25
star
18

conductor

A data-backed adwords campaign bidder
Python
25
star
19

haldane

a friendly http interface to the aws api
Python
23
star
20

SGHTTPRequest

Objective-C
17
star
21

tornado-async-transformer

libcst transformer that replaces tornado's legacy @gen.coroutine syntax with python3.5+ native async/await
Python
17
star
22

aws-dynamic-consul-catalog

Keep your Consul service catalog in sync with your RDS instances
Go
16
star
23

amqp-dispatcher

A daemon to run AMQP consumers
Python
14
star
24

statsd_rb

DEPRECATED. Use https://github.com/etsy/statsd instead of this.
Ruby
13
star
25

hell

Deprecated: Hell is an open source web interface that exposes a set of capistrano recipes as a json api, for usage within large teams
JavaScript
12
star
26

redis-health

Can be used to check the health of your Redis instance.
Go
12
star
27

SGAPI

The SG Api SDK for iOS
Objective-C
11
star
28

SGListAnimator

Provides animated transitions for your table and collection views, so you don't have to resort to calling `reloadData`.
Objective-C
10
star
29

api-support

A support channel for the SeatGeek Platform
9
star
30

react-select-option

A plain <select> component that can be styled.
JavaScript
8
star
31

circus-logstash

A Circus logger for shipping logs to Logstash
Python
6
star
32

nomad-crashloop-detector

detect Nomad allocation crash-loops, by consuming the allocation stream from nomad-firehose
Go
5
star
33

gramo

Kotlin
5
star
34

datadog-service-helper

use consul catalog to manage datadog service checks
Go
4
star
35

logrus-gelf-formatter

Formats logrus messages in the GELF JSON format
Go
4
star
36

wrecker-ui

An HTML interface for wrecker, the load testing tool
Elm
4
star
37

graceful_listener

net.Listener implementation for graceful shutdown
Go
4
star
38

sgcli

A command-line interface for SeatGeek
Python
4
star
39

geocoder-java

Fork of https://code.google.com/p/geocoder-java/
Java
3
star
40

homebrew-formulae

Custom SeatGeek Formula for Homebrew
Ruby
3
star
41

k8s-reconciler-generic

A generic Kubernetes reconciler abstraction based on kubebuilder
Go
3
star
42

sgmods-go

Codemods for golang from SeatGeek.
Go
1
star
43

api-intro-presentation

1
star
44

seatgeek-emea-ios-sdk

Swift
1
star
45

elastic-search-health

Go
1
star
46

greenhouse-api-client

Kotlin
1
star
47

vault-stress

Go
1
star
48

sfn-stack-profile

Ruby
1
star
49

eslint-config-seatgeek-react-standard

React rules specific to the SeatGeek repositories.
JavaScript
1
star