• Stars
    star
    425
  • Rank 102,094 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created about 14 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tree edit distance using the Zhang Shasha algorithm

Zhang-Shasha: Tree edit distance in Python

https://travis-ci.org/timtadh/zhang-shasha.svg?branch=master

The zss module provides a function (zss.distance) that computes the edit distance between the two given trees, as well as a small set of utilities to make its use convenient.

If you'd like to learn more about how it works, see References below.

Brought to you by Tim Henderson ([email protected]).

Read the full documentation for more information.

Installation

You can get zss and its soft requirements ( editdist and numpy >= 1.7) from PyPI:

pip install zss

Both modules are optional. editdist uses string edit distance to compare node labels rather than a simple equal/not-equal check, and numpy significantly speeds up the library. The only reason version 1.7 of numpy is required is that earlier versions have trouble installing on current versions of Mac OS X.

You can install zss from the source code without dependencies in the usual way:

python setup.py install

If you want to build the docs, you'll need to install Sphinx >= 1.0.

Usage

To compare the distance between two trees, you need:

  1. A tree.
  2. Another tree.
  3. A node-node distance function. By default, zss compares the edit distance between the nodes' labels. zss currently only knows how to handle nodes with string labels.
  4. Functions to let zss.distance traverse your tree.

Here is an example using the library's built-in default node structure and edit distance function

from zss import simple_distance, Node

A = (
    Node("f")
        .addkid(Node("a")
            .addkid(Node("h"))
            .addkid(Node("c")
                .addkid(Node("l"))))
        .addkid(Node("e"))
    )
B = (
    Node("f")
        .addkid(Node("a")
            .addkid(Node("d"))
            .addkid(Node("c")
                .addkid(Node("b"))))
        .addkid(Node("e"))
    )
assert simple_distance(A, B) == 2

Specifying Custom Tree Formats

Specifying custom tree formats and distance metrics is easy. The zss.simple_distance function takes 3 extra parameters besides the two tree to compare:

  1. get_children - a function to retrieve a list of children from a node.
  2. get_label - a function to retrieve the label object from a node.
  3. label_dist - a function to compute the non-negative integer distance between two node labels.

Example

#!/usr/bin/env python

import zss

try:
    from editdist import distance as strdist
except ImportError:
    def strdist(a, b):
        if a == b:
            return 0
        else:
            return 1

def weird_dist(A, B):
    return 10*strdist(A, B)

class WeirdNode(object):

    def __init__(self, label):
        self.my_label = label
        self.my_children = list()

    @staticmethod
    def get_children(node):
        return node.my_children

    @staticmethod
    def get_label(node):
        return node.my_label

    def addkid(self, node, before=False):
        if before:  self.my_children.insert(0, node)
        else:   self.my_children.append(node)
        return self

A = (
WeirdNode("f")
    .addkid(WeirdNode("d")
    .addkid(WeirdNode("a"))
    .addkid(WeirdNode("c")
        .addkid(WeirdNode("b"))
    )
    )
    .addkid(WeirdNode("e"))
)
B = (
WeirdNode("f")
    .addkid(WeirdNode("c")
    .addkid(WeirdNode("d")
        .addkid(WeirdNode("a"))
        .addkid(WeirdNode("b"))
    )
    )
    .addkid(WeirdNode("e"))
)

dist = zss.simple_distance(
    A, B, WeirdNode.get_children, WeirdNode.get_label, weird_dist)

print dist
assert dist == 20

References

The algorithm used by zss is taken directly from the original paper by Zhang and Shasha. If you would like to discuss the paper, or the the tree edit distance problem (we have implemented a few other algorithms as well) please email the authors.

approxlib by Dr. Nikolaus Augstent contains a good Java implementation of Zhang-Shasha as well as a number of other useful tree distance algorithms.

Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18:1245–1262, 1989. (the original paper)

Slide deck overview of Zhang-Shasha

Another paper describing Zhang-Shasha

More Repositories

1

data-structures

Go datastructures.
Go
416
star
2

lexmachine

Lex machinary for go.
Go
401
star
3

fs2

B+ Tree - List - File Structures 2 - Memory Mapped File Structures for Go
Go
400
star
4

file-structures

File Structures (B+Tree, BTree) for Go
Go
70
star
5

parsemis

The ParSeMiS project (Parallel and Sequential Graph Mining Suite) searches for frequent, interesting substructures in graph databases. https://www2.cs.fau.de/EN/research/zold/ParSeMiS/index.html
Java
43
star
6

pyflwor

A Query Langauge and System for Python Objects
Python
26
star
7

jpdg

generate program dependence graphs for java programs.
Java
25
star
8

dynagrok

Go
23
star
9

getline

A cross platform, BSD Licensed, Python library for getting keyboard input from the console. Support for history and line editing.
Python
21
star
10

netutils

A little library for turning TCP connections into go channels.
Go
11
star
11

dot

Stream parsing of the graphviz dot language for go.
Go
11
star
12

fuzzbuzz

An Attribute Grammar Fuzzer
Python
10
star
13

slang

Simple Language : A very simple computer language for teaching purposes.
Python
10
star
14

dot_tools

A library to parse and generate the graphviz dot langauge.
Python
10
star
15

goiso

A wrapper around libbliss for graph isomorphism testing and canonical labeling.
C++
8
star
16

goplay

Random Small Go Programs
Go
8
star
17

regex-machines

Regular Expression Processing Virtual Machines.
Go
7
star
18

pyflow

A compiler written in python for dataflow analysis experimentation.
Python
7
star
19

swork

A shell enviroment manager. Easily manage the shell enviroment (variables etc.) for working on various projects. Stands for start work.
Python
7
star
20

gobuild-fork

An obsolete build to for the Go Language (obsoleted by Go 1, the first stable go release).
Go
7
star
21

graple

Go
7
star
22

tcel

A type checked functional language.
Go
7
star
23

getopt

Yet Another getopt Library for Go. This one works like the python module getopt.
Go
6
star
24

GoAST

Generate a serialized AST of a Go Package as an ordered labeled tree.
Go
6
star
25

tdpp

Top Down Predictive Parser Written in Go
Go
6
star
26

peano

A evaluator for Peano Arithmetic as given in "The Incompleteness Phnomenon" by M. Goldstern and H. Judah. ISBN 1-56881-093-8.
Python
6
star
27

queued

Queued is a simple queue deamon; allows clients to have persistent TCP connections.
Go
5
star
28

combos

Parser combinators for Go. Easy parsing of LL(k) grammars.
Go
5
star
29

betterast

An AST for Python.
Python
4
star
30

jist

Python
4
star
31

diplomacy

Python
4
star
32

tst

Ternary Search Trie
Python
4
star
33

optutils

Small, composable utilities for writing great command line tools.
Python
4
star
34

gramstat

Produce statistics about the operational distribution of a grammar.
Python
3
star
35

crypt_framework

A generic encrypted communication framework for python.
Python
3
star
36

gitfutz

Futzing with data from git using pygit2
Python
2
star
37

pytdpp

A top down predictive parser implementation for Python. Can be used with any LL(1) grammar.
Python
2
star
38

ops-example

A small, very simple example of how to use Ansible and Fabric.
Nginx
2
star
39

twik

Tim's Web Interaction Kit
Python
2
star
40

GoIR

Produces a MIR (Medium level Intermediate Representation) for Go based on the AST produced by github.com/timtadh/GoAST. The MIR is intended to be useful starting point for program analysis/optimization.
Go
2
star
41

regrax

Go
2
star
42

passmash

The Site Specific Password Munger
Python
1
star
43

prettyme

Takes [markdown|html] file (singular) adds optional [CSS, title] and outputs an html page.
Python
1
star
44

hackthology

repo which generates timtadh.github.io
HTML
1
star
45

gopkgr

A packaging and installation system for Go.
Go
1
star
46

lex

A lexer written in Go.
Go
1
star
47

msbench

Python
1
star
48

lambda

A lambda calculus interpreter
Python
1
star
49

timtadh.github.io

website
HTML
1
star
50

warc-extractor

extract a random sample of HTML files from WARCs
Java
1
star
51

imposterous

a simple markdown posterous poster
Python
1
star
52

expr-caculator

A simple calculator
Go
1
star
53

PyOhio2011

PyOhio 2011 Example Code for "Python, Parsing, and You"
Python
1
star
54

calloc

An unsafe Go interface to the C memory allocator.
Go
1
star