• Stars
    star
    5
  • Rank 2,766,665 (Top 57 %)
  • Language
    Scala
  • License
    MIT License
  • Created over 3 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Myers diff algorithm in Scala

Maven Central

app.tulz.diff

Myers diff algorithm in Scala.

"app.tulz" %%% "stringdiff" % "0.3.4" 

Overview

The core algorithm is a Scala translation of the Python implementation described here: https://blog.robertelder.org/diff-algorithm/

Additionally, the following is implemented on top of it:

  • interpretation of the algorithm's output
  • customizable formatting of the interpreted result into a string (with a few provided out of the box formatters)
  • additional transformations for when the input is expected to be a sequence of tokens (TokenDiff)

Interpretation

The core algorithm outputs a set of instructions to get from SeqA to SeqB, something like this:

in order to get from s1="bcdefgzio" to s2="abcxyfgi":

Insert a from s2 before position 0 into s1.
Delete d from s1 at position 2 in s1.
Insert x from s2 before position 3 into s1.
Insert y from s2 before position 3 into s1.
Delete e from s1 at position 3 in s1.
Delete z from s1 at position 6 in s1.
Delete o from s1 at position 8 in s1.

It's hard to work with it. Also, it is somewhat tricky at first to understand how to follow these instructions (try it! :) ).

MyersInterpret object parses the raw output into a List[DiffElement]:

trait DiffElement[Repr]

object DiffElement {
  final case class InBoth[Repr](v: Repr)        extends DiffElement[Repr]
  final case class InFirst[Repr](v: Repr)       extends DiffElement[Repr]
  final case class InSecond[Repr](v: Repr)      extends DiffElement[Repr]
  final case class Diff[Repr](x: Repr, y: Repr) extends DiffElement[Repr]
}

The result of the interpretation (without any transformations applied) for the same example:

StringDiff
  .raw(
    "bcdefgzio",
    "abcxyfgi",
    collapse = false
  ).mkString("[\n  ", "\n  ", "\n]")
[
  InSecond(a)
  InBoth(bc)
  InFirst(d)
  InSecond(x)
  InSecond(y)
  InFirst(e)
  InBoth(fg)
  InFirst(z)
  InBoth(i)
  InFirst(o)
]

Collapsing

By default, diff functions will collapse the diff:

StringDiff
  .raw(
    "bcdefgzio",
    "abcxyfgi",
    collapse = true // default is true
  ).mkString("[\n  ", "\n  ", "\n]")
[
  InSecond(a)
  InBoth(bc)
  Diff(de,xy)
  InBoth(fg)
  InFirst(z)
  InBoth(i)
  InFirst(o)
]

Here, the following list of DiffElements:

[
  InFirst(d)
  InSecond(x)
  InSecond(y)
  InFirst(e)
]

got collapsed into a single one:

[
  Diff(de,xy)
]

In a nutshell, collapsing removes empty elements and joins same or otherwise "join-able" subsequent DiffElements.

Examples:

  • any InFirst, InLast, Diff or InBoth gets removed if the element is empty
  • Diff becomes InFirst or InSecond if one the elements is empty
  • InFirst(a)+InFirst(b) -> InFirst(ab)
  • InFirst(a)+InSecond(b) -> Diff(a,b)
  • InSecond(a)+InDiff(b,c) -> Diff(ab,c)
  • etc

Diff'ing sequences:

SeqDiff
  .seq(
    Seq(1, 2, 3, 4, 5).toIndexedSeq,
    Seq(1, 2, 8, 3, 8, 4, 5, 0).toIndexedSeq
  ).mkString("[\n  ", "\n  ", "\n]")
[
  InBoth(Vector(1, 2))
  InSecond(Vector(8))
  InBoth(Vector(3))
  InSecond(Vector(8))
  InBoth(Vector(4, 5))
  InSecond(Vector(0))
]

Diff'ing strings:

Raw diff AST:
  println(
    StringDiff.diff(
      "bcdefgzio",
      "abcxyfgi"
    )
  )
[
  InSecond(a)
  InBoth(bc)
  Diff(de,xy)
  InBoth(fg)
  InFirst(z)
  InBoth(i)
  InFirst(o)
]  
Text output:
  println(
    StringDiff.text(
      "bcdefgzio",
      "abcxyfgi"
    )
  )
[โˆ…|a]]bc][de|xy]]fg][z|โˆ…]]i][o|โˆ…]
ANSI color output:
  println(
    StringDiff.ansi(
      "bcdefgzio",
      "abcxyfgi"
    )
  )

screenshot1

(default formatters highlight missing text with yellow, extraneous text โ€” with red, and matching text is underlined)

Diff'ing tokens

When the inputs are strings that are expected to contain whitespace-separated tokens, TokenDiff will try to make the diff more comprehensible in terms of tokens (while preserving the accuracy).

  println(
    TokenDiff.ansi(
      "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
      "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
    )    
  )

screenshot3

With a StringDiff the output would look like the following:

  println(
    StringDiff.ansi(
      "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
      "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
    )    
  )

screenshot4

Inline diffs for both strings
  println(
    TokenDiff.ansiBoth(
      "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
      "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
    )    
  )

screenshot3

Usage

Diff'ing Seqs:
SeqDiff.seq(
  Seq(1, 2, 3),
  Seq(2, 3, 4)
)
Diff'ing Stringss:
StringDiff.ansi("abc", "acb")
// OR
StringDiff("abc", "acb")

StringDiff.ansiBoth("abc", "acb")
StringDiff.text("abc", "acb")
StringDiff.diff("abc", "acb")
StringDiff.raw("abc", "acb")
Diff'ing Stringss with tokens:
TokenDiff.ansi("abc", "acb")
// OR
TokenDiff("abc", "acb")

TokenDiff.ansiBoth("abc", "acb")
TokenDiff.text("abc", "acb")
TokenDiff.diff("abc", "acb")
TokenDiff.raw("abc", "acb")

Custom formatters

Formatters are instances of the DiffFormat[Out] trait:

trait DiffFormat[Out] {

  def apply(diff: List[DiffElement[String]]): Out

}
object MyFormat extends DiffFormat[MyDiffOutput] { ... }

val diff: MyDiffOutput = MyFormat(StringDiff("abc", "acb"))

For example, the TextDiffFormat is implemented like this:

object TextDiffFormat extends DiffFormat[String] {

  import DiffElement._

  def apply(diff: List[DiffElement[String]]): String = {
    val sb = new StringBuilder
    diff.foreach {
      case InBoth(both) =>
        sb.append("]")
        sb.appendAll(both)
        sb.append("]")
      case InSecond(second) =>
        sb.append("[โˆ…|")
        sb.appendAll(second)
        sb.append("]")
      case InFirst(first) =>
        sb.append("[")
        sb.appendAll(first)
        sb.append("|โˆ…]")
      case Diff(first, second) =>
        sb.append("[")
        sb.appendAll(first)
        sb.append("|")
        sb.appendAll(second)
        sb.append("]")
      case _ =>
    }
    sb.toString()
  }

Author

Iurii Malchenko โ€“ @yurique

License

stringdiff is provided under the MIT license.