• Stars
    star
    800
  • Rank 56,950 (Top 2 %)
  • Language
    Java
  • License
    MIT License
  • Created almost 12 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

align and compare tables

Build Status NPM version Gem Version PyPI version PHP version Bower version Badge count

daff: data diff

This is a library for comparing tables, producing a summary of their differences, and using such a summary as a patch file. It is optimized for comparing tables that share a common origin, in other words multiple versions of the "same" table.

For a live demo, see:

https://paulfitz.github.io/daff/

Install the library for your favorite language:

npm install daff -g  # node/javascript
pip install daff     # python
gem install daff     # ruby
composer require paulfitz/daff-php  # php
install.packages('daff') # R wrapper by Edwin de Jonge
bower install daff   # web/javascript

Other translations are available here:

https://github.com/paulfitz/daff/releases

Or use the library to view csv diffs on github via a chrome extension:

https://github.com/theodi/csvhub

The diff format used by daff is specified here:

http://paulfitz.github.io/daff-doc/spec.html

This library is a stripped down version of the coopy toolbox (see http://share.find.coop). To compare tables from different origins, or with automatically generated IDs, or other complications, check out the coopy toolbox.

The program

You can run daff/daff.py/daff.rb as a utility program:

$ daff
daff can produce and apply tabular diffs.
Call as:
  daff a.csv b.csv
  daff [--color] [--no-color] [--output OUTPUT.csv] a.csv b.csv
  daff [--output OUTPUT.html] a.csv b.csv
  daff [--www] a.csv b.csv
  daff parent.csv a.csv b.csv
  daff --input-format sqlite a.db b.db
  daff patch [--inplace] a.csv patch.csv
  daff merge [--inplace] parent.csv a.csv b.csv
  daff trim [--output OUTPUT.csv] source.csv
  daff render [--output OUTPUT.html] diff.csv
  daff copy in.csv out.tsv
  daff in.csv
  daff git
  daff version

The --inplace option to patch and merge will result in modification of a.csv.

If you need more control, here is the full list of flags:
  daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv
     --act ACT:     show only a certain kind of change (update, insert, delete, column)
     --all:         do not prune unchanged rows or columns
     --all-rows:    do not prune unchanged rows
     --all-columns: do not prune unchanged columns
     --color:       highlight changes with terminal colors (default in terminals)
     --context NUM: show NUM rows of context (0=none)
     --context-columns NUM: show NUM columns of context (0=none)
     --fail-if-diff: return status is 0 if equal, 1 if different, 2 if problem
     --id:          specify column to use as primary key (repeat for multi-column key)
     --ignore:      specify column to ignore completely (can repeat)
     --index:       include row/columns numbers from original tables
     --input-format [csv|tsv|ssv|psv|json|sqlite]: set format to expect for input
     --eol [crlf|lf|cr|auto]: separator between rows of csv output.
     --no-color:    make sure terminal colors are not used
     --ordered:     assume row order is meaningful (default for CSV)
     --output-format [csv|tsv|ssv|psv|json|copy|html]: set format for output
     --padding [dense|sparse|smart]: set padding method for aligning columns
     --table NAME:  compare the named table, used with SQL sources. If name changes, use 'n1:n2'
     --unordered:   assume row order is meaningless (default for json formats)
     -w / --ignore-whitespace: ignore changes in leading/trailing whitespace
     -i / --ignore-case: ignore differences in case

  daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv
     --css CSS.css: generate a suitable css file to go with the html
     --fragment:    generate just a html fragment rather than a page
     --plain:       do not use fancy utf8 characters to make arrows prettier
     --unquote:     do not quote html characters in html diffs
     --www:         send output to a browser

Formats supported are CSV, TSV, Sqlite (with --input-format sqlite or the .sqlite extension), and ndjson.

Using with git

Run daff git csv to install daff as a diff and merge handler for *.csv files in your repository. Run daff git for instructions on doing this manually. Your CSV diffs and merges will get smarter, since git will suddenly understand about rows and columns, not just lines:

Example CSV diff

The library

You can use daff as a library from any supported language. We take here the example of Javascript. To use daff on a webpage, first include daff.js:

<script src="daff.js"></script>

Or if using node outside the browser:

var daff = require('daff');

For concreteness, assume we have two versions of a table, data1 and data2:

var data1 = [
    ['Country','Capital'],
    ['Ireland','Dublin'],
    ['France','Paris'],
    ['Spain','Barcelona']
];
var data2 = [
    ['Country','Code','Capital'],
    ['Ireland','ie','Dublin'],
    ['France','fr','Paris'],
    ['Spain','es','Madrid'],
    ['Germany','de','Berlin']
];

To make those tables accessible to the library, we wrap them in daff.TableView:

var table1 = new daff.TableView(data1);
var table2 = new daff.TableView(data2);

We can now compute the alignment between the rows and columns in the two tables:

var alignment = daff.compareTables(table1,table2).align();

To produce a diff from the alignment, we first need a table for the output:

var data_diff = [];
var table_diff = new daff.TableView(data_diff);

Using default options for the diff:

var flags = new daff.CompareFlags();
var highlighter = new daff.TableDiff(alignment,flags);
highlighter.hilite(table_diff);

The diff is now in data_diff in highlighter format, see specification here:

http://paulfitz.github.io/daff-doc/spec.html

[ [ '!', '', '+++', '' ],
  [ '@@', 'Country', 'Code', 'Capital' ],
  [ '+', 'Ireland', 'ie', 'Dublin' ],
  [ '+', 'France', 'fr', 'Paris' ],
  [ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
  [ '+++', 'Germany', 'de', 'Berlin' ] ]

For visualization, you may want to convert this to a HTML table with appropriate classes on cells so you can color-code inserts, deletes, updates, etc. You can do this with:

var diff2html = new daff.DiffRender();
diff2html.render(table_diff);
var table_diff_html = diff2html.html();

For 3-way differences (that is, comparing two tables given knowledge of a common ancestor) use daff.compareTables3 (give ancestor table as the first argument).

Here is how to apply that difference as a patch:

var patcher = new daff.HighlightPatch(table1,table_diff);
patcher.apply();
// table1 should now equal table2

For other languages, you should find sample code in the packages on the Releases page.

Supported languages

The daff library is written in Haxe, which can be translated reasonably well into at least the following languages:

Some translations are done for you on the Releases page. To make another translation, or to compile from source first follow the Haxe language introduction for the language you care about. At the time of writing, if you are on OSX, you should install haxe using brew install haxe. Then do one of:

make js
make php
make py
make java
make cs
make cpp

For each language, the daff library expects to be handed an interface to tables you create, rather than creating them itself. This is to avoid inefficient copies from one format to another. You'll find a SimpleTable class you can use if you find this awkward.

Other possibilities:

API documentation

Sponsors

the zen of venn

The Data Commons Co-op, "perhaps the geekiest of all cooperative organizations on the planet," has given great moral support during the development of `daff`. Donate a multiple of `42.42` in your currency to let them know you care: https://datacommons.coop/donate/.

Reading material

License

daff is distributed under the MIT License.

More Repositories

1

mlsql

inferring sql queries from plain-text questions about tables
Python
915
star
2

cosmicos

Sending the lambda calculus into deep space
Java
135
star
3

coopy

distributed spreadsheets with intelligent merges
C
95
star
4

catsql

cat for sql dbs
Python
73
star
5

visql

edit slices of SQL databases in vi
Python
48
star
6

sheetsite

sync a website or local spreadsheet with a google sheet
Python
34
star
7

makesweet

Put pictures into animations from the command line.
C++
21
star
8

deepmoon

the deep learning framework from beyond the moon
Python
11
star
9

emcycles

Compiling cycles renderer in javascript
C++
11
star
10

ucanvcam

Automatically exported from code.google.com/p/ucanvcam
C++
8
star
11

emacsql

edit a slice of a SQL database in emacs
Python
8
star
12

daff-php

php version of daff
PHP
7
star
13

html2video

JavaScript
6
star
14

segmenty

training convnets to segment visual patterns without annotated data
Python
5
star
15

blender2haxe

a fork of rozengain.com's great AS3Export tool, to support haxe.
Python
5
star
16

exoplanets

calculate exoplanet birthdays
JavaScript
4
star
17

zify

add depth to an icon using blender cycles
Shell
3
star
18

asql

Query a database in natural language
Python
3
star
19

gnumeric

notes on building gnumeric on windows
C
2
star
20

pixplz

need some training images in a hurry? pixplz!
Python
2
star
21

daff-doc

Documentation for daff
CSS
2
star
22

cycles_hack

Blender cycles renderer hacked to remove ... absolutely everything. You don't want this.
C++
2
star
23

coopyhx_gem

wraps up coopyhx for ruby
C++
2
star
24

chumby

Shell
1
star
25

url2020

Enter a URL in mobile chrome in 2020
HTML
1
star
26

glig

All About Gligs
HTML
1
star
27

data_commons

CDI DCP tools
1
star
28

dearet

Scheme
1
star
29

winarm_cmake

C
1
star
30

filabel

file labels for quick image classification projects
Python
1
star
31

grist_quiz

JavaScript
1
star
32

keras_cli

keras cli
Python
1
star
33

icub_skin

C++
1
star
34

daff-web

compare tables, producing a human and machine readable diff that is itself tabular
HTML
1
star
35

robobo

Random robotics
C++
1
star
36

grist-plugin-api-docs

HTML
1
star
37

spelling_bee_prep

spell time
JavaScript
1
star
38

poker

1
star
39

stonesouper

A backend for stonesoup-style directories
TypeScript
1
star
40

yarp_thrift

Use Thrift with YARP
C++
1
star
41

dylan

Extends the lyrics of Bob Dylan's A Hard Rain's A-Gonna Fall
Python
1
star
42

rb_coopy

native ruby implementation of coopy data diff tools
Ruby
1
star
43

grist-widget

JavaScript
1
star