• Stars
    star
    197
  • Rank 197,722 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated almost 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Diff, Match and Patch Library (original at http://google.com/p/google-diff-match-patch)

Diff, Match and Patch

This is a mirror/fork of the Diff, Match and Patch Library by Neil Fraser.

Diff, Match and Patch Library

http://code.google.com/p/google-diff-match-patch/ Neil Fraser

Online demo: http://GerHobbelt.github.io/google-diff-match-patch/

License and installing the software

The software is licenced under the Apache License Version 2.0.

To install the library please use bower or simply clone this repository.

bower install google-diff-match-patch-js

Available languages / ports

This library is currently available in seven different ports, all using the same API. Every version includes a full set of unit tests.

C++:

  • Ported by Mike Slemmer.
  • Currently requires the Qt library.

C#:

  • Ported by Matthaeus G. Chajdas.

Dart:

  • The Dart language is still growing and evolving, so this port is only as stable as the underlying language.

Java:

  • Included is both the source and a Maven package.

JavaScript:

  • diff_match_patch_uncompressed.js is the human-readable version. Users of node.js should 'require' this uncompressed version since the compressed version is not guaranteed to work outside of a web browser.
  • diff_match_patch.js has been compressed using Google's internal JavaScript compressor. Non-Google hackers who wish to recompress the source can use: http://dean.edwards.name/packer/

Lua:

  • Ported by Duncan Cross.
  • Does not support line-mode speedup.

Objective C:

  • Ported by Jan Weiss.
  • Includes speed test (this is a separate bundle for other languages).

Python:

  • Two versions, one for Python 2.x, the other for Python 3.x.
  • Runs 10x faster under PyPy than CPython.

Demos:

  • Separate demos for Diff, Match and Patch in JavaScript.

Introduction

This library is available in multiple languages. Regardless of the language used, the interface for using it is the same. This page describes the API for the public functions. For further examples, see the relevant test harness.

Initialization

The first step is to create a new diff_match_patch object. This object contains various properties which set the behaviour of the algorithms, as well as the following methods/functions:

diff_main(text1, text2) => diffs

An array of differences is computed which describe the transformation of text1 into text2. Each difference is an array (JavaScript, Lua) or tuple (Python) or Diff object (C++, C#, Objective C, Java). The first element specifies if it is an insertion (1), a deletion (-1) or an equality (0). The second element specifies the affected text.
diff_main("Good dog", "Bad dog") => [(-1, "Goo"), (1, "Ba"), (0, "d dog")]
Despite the large number of optimisations used in this function, diff can take a while to compute. The diff_match_patch.Diff_Timeout property is available to set how many seconds any diff's exploration phase may take. The default value is 1.0. A value of 0 disables the timeout and lets diff run until completion. Should diff timeout, the return value will still be a valid difference, though probably non-optimal.

diff_cleanupSemantic(diffs) => null

A diff of two unrelated texts can be filled with coincidental matches. For example, the diff of "mouse" and "sofas" is [(-1, "m"), (1, "s"), (0, "o"), (-1, "u"), (1, "fa"), (0, "s"), (-1, "e")]. While this is the optimum diff, it is difficult for humans to understand. Semantic cleanup rewrites the diff, expanding it into a more intelligible format. The above example would become: [(-1, "mouse"), (1, "sofas")]. If a diff is to be human-readable, it should be passed to diff_cleanupSemantic.

diff_cleanupEfficiency(diffs) => null

This function is similar to diff_cleanupSemantic, except that instead of optimising a diff to be human-readable, it optimises the diff to be efficient for machine processing. The results of both cleanup types are often the same.
The efficiency cleanup is based on the observation that a diff made up of large numbers of small diffs edits may take longer to process (in downstream applications) or take more capacity to store or transmit than a smaller number of larger diffs. The diff_match_patch.Diff_EditCost property sets what the cost of handling a new edit is in terms of handling extra characters in an existing edit. The default value is 4, which means if expanding the length of a diff by three characters can eliminate one edit, then that optimisation will reduce the total costs.

diff_levenshtein(diffs) => int

Given a diff, measure its Levenshtein distance in terms of the number of inserted, deleted or substituted characters. The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.

diff_prettyHtml(diffs) => html

Takes a diff array and returns a pretty HTML sequence. This function is mainly intended as an example from which to write ones own display functions.

match_main(text, pattern, loc) => location

Given a text to search, a pattern to search for and an expected location in the text near which to find the pattern, return the location which matches closest. The function will search for the best match based on both the number of character errors between the pattern and the potential match, as well as the distance between the expected location and the potential match.
The following example is a classic dilemma. There are two potential matches, one is close to the expected location but contains a one character error, the other is far from the expected location but is exactly the pattern sought after: match_main("abc12345678901234567890abbc", "abc", 26) Which result is returned (0 or 24) is determined by the diff_match_patch.Match_Distance property. An exact letter match which is 'distance' characters away from the fuzzy location would score as a complete mismatch. For example, a distance of '0' requires the match be at the exact location specified, whereas a threshold of '1000' would require a perfect match to be within 800 characters of the expected location to be found using a 0.8 threshold (see below). The larger Match_Distance is, the slower match_main() may take to compute. This variable defaults to 1000.
Another property is diff_match_patch.Match_Threshold which determines the cut-off value for a valid match. If Match_Threshold is closer to 0, the requirements for accuracy increase. If Match_Threshold is closer to 1 then it is more likely that a match will be found. The larger Match_Threshold is, the slower match_main() may take to compute. This variable defaults to 0.5. If no match is found, the function returns -1.

patch_make(text1, text2) => patches

patch_make(diffs) => patches

patch_make(text1, diffs) => patches

Given two texts, or an already computed list of differences, return an array of patch objects. The third form (text1, diffs) is preferred, use it if you happen to have that data available, otherwise this function will compute the missing pieces.

patch_toText(patches) => text

Reduces an array of patch objects to a block of text which looks extremely similar to the standard GNU diff/patch format. This text may be stored or transmitted.

patch_fromText(text) => patches

Parses a block of text (which was presumably created by the patch_toText function) and returns an array of patch objects.

patch_apply(patches, text1) => [text2, results]

Applies a list of patches to text1. The first element of the return value is the newly patched text. The second element is an array of true/false values indicating which of the patches were successfully applied. [Note that this second element is not too useful since large patches may get broken up internally, resulting in a longer results list than the input with no way to figure out which patch succeeded or failed. A more informative API is in development.]
The previously mentioned Match_Distance and Match_Threshold properties are used to evaluate patch application on text which does not match exactly. In addition, the diff_match_patch.Patch_DeleteThreshold property determines how closely the text within a major (~64 character) delete needs to match the expected text. If Patch_DeleteThreshold is closer to 0, then the deleted text must match the expected text more closely. If Patch_DeleteThreshold is closer to 1, then the deleted text may contain anything. In most use cases Patch_DeleteThreshold should just be set to the same value as Match_Threshold.

More Repositories

1

pthread-win32

clone of pthread-win32 (a.k.a. pthreads4w) + local tweaks (including MSVC2008 - MSVC2022 project files)
C
291
star
2

hilitor

text highlighting anywhere in the HTML/DOM - cloned from original at http://www.the-art-of-web.com/javascript/search-highlight/
JavaScript
33
star
3

HTMLawed

a highly customizable PHP script to sanitize / make (X)HTML secure against XSS attacks, so users can edit HTML without risk of your site getting compromised by evildoers.
PHP
30
star
4

otl

augmented clone of OTL (http://otl.sourceforge.net/) Oracle/SQL Server/DB2/... Database I/O Template Library
HTML
28
star
5

Developing-a-D3.js-Edge

All source code, data files, etc. that accompany the book 'Developing a D3.js Edge' by Bleeding Edge Press
JavaScript
21
star
6

libtre

TRE library by Ville Laurikari
C
13
star
7

civet-webserver

Fork of the old mongoose webserver; others have rebranded it as civetweb ... This one comes with tweaks: IPv6 support across the board; extended event callbacks; client-side socket connections using [mg_]connect(); [mg_]socketpair(); porting code placed in separate .h header file for easiest re-use in your own applications which embed/use mongoose; optional pthread support.
C
11
star
8

crm114

Windows ports and some blathering about crm114, the statisitical classifier suite, a.k.a. the Regex Mutilator / spam filter.
C
9
star
9

ultimatemysql

work based on http://www.phpclasses.org/package/3698-PHP-MySQL-database-access-wrapper.html
JavaScript
9
star
10

tws_c_api

Continuation of the TWS C API available at SF
C
9
star
11

libjson

clone of http://sourceforge.net/projects/libjson/
C++
8
star
12

cheap-PCB-manufacturing

Investigation into and results of looking for ways to get your PCBs done on the cheap.
8
star
13

owemdjee

Data Science & Image Processing amalgam library in C/C++
Shell
6
star
14

htmltidy

Further work on libtidy / htmltidy for HTML5 and flexible output
6
star
15

rangyinputs

jQuery rangyinputs: a clone of https://code.google.com/p/rangyinputs/
JavaScript
5
star
16

xmail

xmail by Davide Libenzi + patches. XMail is an Internet and intranet mail server featuring an ESMTP server, POP3 server, finger server, TLS support for SMTP and POP3 (both server and client side), multiple domains, no need for users to have a real system account, SMTP relay checking, DNS based maps check, custom (IP based and address based) spam protection, SMTP authentication (PLAIN LOGIN CRAM-MD5 POP3-before-SMTP and custom), a POP3 account syncronizer with external POP3 accounts, account aliases, domain aliases, custom mail processing, direct mail files delivery, custom mail filters, mailing lists, remote administration, custom mail exchangers, logging, and multi-platform code. XMail sources compile under GNU/Linux, FreeBSD, OpenBSD, NetBSD, OSX, Solaris and NT/2K/XP/Win7.
C
5
star
17

xlslib

https://sourceforge.net/projects/xlslib/ -- my copy as I use git exclusively; edits are submitted to SF svn; this is a personal clone, use the SF site as the main repository.
Shell
5
star
18

jquery.print-in-page

jquery plugin which enables you to print parts (elements) in the page without using popups or iframes: useful for complex parts which you don't want to copy / clone.
JavaScript
5
star
19

libxml2

old & new porting work on libxml2
C
4
star
20

csvutils

clone of csvutils for libcsv [http://csvutils.sourceforge.net/]
C
4
star
21

duff

duplicate file finder - continuation of the duff tool
3
star
22

html2db

htmltidy-derived HTML to DocBook converter
3
star
23

GeckoFX-60-clone

clone of latest GeckoFX (60.0)
C#
3
star
24

FontOrg-and-FontRenamer

copy of FontOrg (Al Jones) and FontRenamer (Philip M. Engel) font tools as posted in yonder days on NNTP alt.binaries.fonts
Java
3
star
25

developer-utility-commands

shell scripts, awk and other scripts to keep your (git) development environment fresh and managed.
Shell
2
star
26

utfcpp.sourceforge.net

unofficial SVN mirror of utfcpp.sourceforge.net
C++
2
star
27

libiconv

old & new porting work on libiconv (for libxml et al)
C
2
star
28

ib_tws_if

Interfacing Excel 2010 (64 bit) and others to IB's TWS system
C++
2
star
29

qiqqa-revengin

reverse engineering the data stored by Qiqqa (bibtex database, etc.)
TeX
2
star
30

milkbox

my work on milkbox 3
JavaScript
2
star
31

fileset-muncher4

Interactive command line tool which can select and process file sets in bulk. Supports 4DOS and later descript.ion file description databases. Supports interactive combining of multiple regex based selection criteria for both file names, paths and descriptions to pick / construct the file set you wish to work on right now. Supports bulk rename, move, copy and, of course, delete operations.
C
2
star
32

whitespace-cleaner

This little tool is a quick way to 'rewrite' or otherwise process files' whitespace.
C
2
star
33

markdown-it-dirty-dozen

A bunch of markdown-it plugins which are verified to cooperate.
HTML
2
star
34

HtmlTidyWrapper.NET

extensive .NET wrapper for the htmltidy / libtidy / tidy C library; exports the entire API and then some
2
star
35

bl.ocks.org-hack

fixing bl.ocks.org uglinesses
JavaScript
1
star
36

recrmdir

recursively remove empty directories (fast, use for large directory trees)
C
1
star
37

mochaui-build

Java Library Used To Build MochaUI
Java
1
star
38

avatars

personal avatars
1
star
39

simpletest

clone of the SimpleTest project at SF
PHP
1
star
40

JavaScript-keyboard-layout-learn

detect international and custom keyboard layouts and use the intel to enhance/unify keydown, keyup and keypress events
HTML
1
star
41

jquery-fixclick

clone of AlloVince's fixclick plugin - his site is down so this is based on the v1.0 floating around...
1
star
42

debug-helper-functions

JavaScript debug helper functions
JavaScript
1
star
43

php-email-address-validation

unofficial github clone of https://code.google.com/p/php-email-address-validation/
PHP
1
star
44

oss-libraries-of-interest

Open Source libraries, applications, examples, etc. of interest
Shell
1
star
45

site.inventory-at-temple.net

Shell
1
star
46

unify-paths

Unify file paths from any platform (Windows, OSX, UNIX) inside any text (string) to UNIX format.
JavaScript
1
star
47

Icey-Robson-s-CSS-compressor

clone of Icey Robson's CSS compressor
1
star
48

W

tracking bugs, caveats, reminders and ramblings in and of my public clones/forks
1
star
49

qiqqa-searcher

The backend / engine used by Qiqqa to provide full text & metadata search and index powers across all your libraries. Based on Apache SOLRâ„¢.
1
star
50

deGaulle

static site generator with attitude, control and verification too
TypeScript
1
star
51

BeatMaster

Minimal framework for when your DOM/UI/application performance is of utmost importance: it provides an 'animationFrame'-driven state machine to help you sequence, schedule and manage your application tasks. Use BeatMaster when MVC/MVP sounds nice but doesn't cut it helping you stay in control of your application and want to be able to optimize it without losing your mind. BeatMaster will drive your slaves. Use BeatMaster when you feel your app can benefit from the characteristics of a 'game engine' while not being a game itself.
Shell
1
star
52

Montmartre

Presentation framework inspired by (and borrowing a thing or two from) RevealJS from Hakim el Hattab. Use Montmartre whether your presentation is testing the classic boundaries of the presentation paradigm or you want your slide transitions to be controlled even when some of your slides contain complex logic and/or styling.
1
star