• Stars
    star
    8,154
  • Rank 4,553 (Top 0.09 %)
  • Language
    Go
  • License
    Other
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

What is Miller?

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed.

What can Miller do for me?

With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.

cover-art

  • Miller operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map.

  • Miller handles a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON/JSON Lines. (Miller can handle positionally-indexed data too!)

In the above image you can see how Miller embraces the common themes of key-value-pair data in a variety of data formats.

Getting started

More documentation links

Installing

There's a good chance you can get Miller pre-built for your system:

Ubuntu Ubuntu 16.04 LTS Fedora Debian Gentoo

Pro-Linux Arch Linux

NetBSD FreeBSD

Anaconda Homebrew/MacOSX MacPorts/MacOSX Chocolatey WinGet

OS Installation command
Linux yum install miller
apt-get install miller
Mac brew install miller
port install miller
Windows choco install miller
winget install Miller.Miller

See also README-versions.md for a full list of package versions. Note that long-term-support (LtS) releases will likely be on older versions.

See also building from source.

Community

GitHub stars Homebrew downloads Conda downloads

All Contributors

Build status

Multi-platform build status CodeQL status Codespell status

Building from source

  • First:
    • cd /where/you/want/to/put/the/source
    • git clone https://github.com/johnkerl/miller
    • cd miller
  • With make:
    • To build: make. This takes just a few seconds and produces the Miller executable, which is ./mlr (or .\mlr.exe on Windows).
    • To run tests: make check.
    • To install: make install. This installs the executable /usr/local/bin/mlr and manual page /usr/local/share/man/man1/mlr.1 (so you can do man mlr).
    • You can do ./configure --prefix=/some/install/path before make install if you want to install somewhere other than /usr/local.
  • Without make:
    • To build: go build github.com/johnkerl/miller/cmd/mlr.
    • To run tests: go test github.com/johnkerl/miller/pkg/... and mlr regtest.
    • To install: go install github.com/johnkerl/miller/cmd/mlr will install to GOPATH/bin/mlr.
  • See also the doc page on building from source.
  • For more developer information please see README-dev.md.

For developers

License

License: BSD2

Features

  • Miller is multi-purpose: it's useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.

  • You can use Miller to snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.

  • Miller complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.

  • Miller complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.

  • Miller also goes beyond the classic Unix tools by stepping fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.

  • Miller is streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your systemโ€™s available RAM, and you can use Miller in tail -f contexts.

  • Miller is pipe-friendly and interoperates with the Unix toolkit.

  • Miller's I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, TSV, JSON, JSON Lines, and others.

  • Miller does conversion between formats.

  • Miller's processing is format-aware: e.g. CSV sort and tac keep header lines first.

  • Miller has high-throughput performance on par with the Unix toolkit.

  • Miller is written in portable, modern Go, with zero runtime dependencies. You can download or compile a single binary, scp it to a faraway machine, and expect it to work.

What people are saying about Miller

Today I discovered Millerโ€”it's like jq but for CSV: https://t.co/pn5Ni241KM

Also, "Miller complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data." @GreatBlueC @nfmcclure

โ€” Adrien Trouillaud (@adrienjt) September 24, 2020

Underappreciated swiss-army command-line chainsaw.

"Miller is like awk, sed, cut, join, and sort for [...] CSV, TSV, and [...] JSON." https://t.co/TrQqSUK3KK

โ€” Dirk Eddelbuettel (@eddelbuettel) February 28, 2017

Miller looks like a great command line tool for working with CSV data. Sed, awk, cut, join all rolled into one: http://t.co/9BBb6VCZ6Y

โ€” Mike Loukides (@mikeloukides) August 16, 2015

Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV: http://t.co/1zPbfg6B2W - handy tool!

โ€” Ilya Grigorik (@igrigorik) August 22, 2015

Btw, I think Miller is the best CLI tool to deal with CSV. I used to use this when I need to preprocess too big CSVs to load into R (now we have vroom, so such cases might be rare, though...)https://t.co/kUjrSSGJoT

โ€” Hiroaki Yutani (@yutannihilat_en) April 21, 2020

Miller: a *format-aware* data munging tool By @__jo_ker__ to overcome limitations with *line-aware* workshorses like awk, sed et al https://t.co/LCyPkhYvt9

The project website is a fantastic example of good software documentation!!

โ€” Donny Daniel (@dnnydnl) September 9, 2018

Holy holly data swiss army knife batman! How did no one suggest Miller https://t.co/JGQpmRAZLv for solving database cleaning / ETL issues to me before

Congrats to @__jo_ker__ for amazingly intuitive tool for critical data management tasks!#DataScienceandLaw #ComputationalLaw

โ€” James Miller (@japanlawprof) June 12, 2018

๐Ÿคฏ@__jo_ker__'s Miller easily reads, transforms, + writes all sorts of tabular data. It's standalone, fast, and built for streaming data (operating on one line at a time, so you can work on files larger than memory).

And the docs are dream. I've been reading them all morning! https://t.co/Be2pGPZK6t

โ€” Benjamin Wolfe (he/him) (@BenjaminWolfe) September 9, 2021

Contributors โœจ

Thanks to all the fine people who help make Miller better (emoji key):


Andrea Borruso

๐Ÿค” ๐ŸŽจ

Shaun Jackman

๐Ÿค”

Fred Trotter

๐Ÿค” ๐ŸŽจ

komosa

๐Ÿค”

jungle-boogie

๐Ÿค”

Thomas Klausner

๐Ÿš‡

Stephen Kitt

๐Ÿ“ฆ

Leah Neukirchen

๐Ÿค”

Luigi Baldoni

๐Ÿ“ฆ

Hiroaki Yutani

๐Ÿค”

Daniel M. Drucker

๐Ÿค”

Nikos Alexandris

๐Ÿค”

kundeng

๐Ÿ“ฆ

Victor Sergienko

๐Ÿ“ฆ

Adrian Ho

๐ŸŽจ

zachp

๐Ÿ“ฆ

David Selassie

๐Ÿค”

Joel Parker Henderson

๐Ÿค”

Michel Ace

๐Ÿค”

Matus Goljer

๐Ÿค”

Richard Patel

๐Ÿ“ฆ

Jakub Podlaha

๐ŸŽจ

Miodrag Miliฤ‡

๐Ÿ“ฆ

Derek Mahar

๐Ÿค”

spmundi

๐Ÿค”

Peter Kรถrner

๐Ÿ›ก๏ธ

rubyFeedback

๐Ÿค”

rbolsius

๐Ÿ“ฆ

awildturtok

๐Ÿค”

agguser

๐Ÿค”

jganong

๐Ÿค”

Fulvio Scapin

๐Ÿค”

Jordan Torbiak

๐Ÿค”

Andreas Weber

๐Ÿค”

vapniks

๐Ÿ“ฆ

Zombo

๐Ÿ“ฆ

Brian Fulton-Howard

๐Ÿ“ฆ

ChCyrill

๐Ÿค”

Jauder Ho

๐Ÿ’ป

Paweล‚ Sacawa

๐Ÿ›

schragge

๐Ÿ“–

Jordi

๐Ÿ“– ๐Ÿค”

This project follows the all-contributors specification. Contributions of any kind are welcome!

More Repositories

1

scripts

Productivity tools for Linux/Unix.
Ruby
18
star
2

sack

Simple command-line calculator for small finite groups. Unix bc for abstract algebra.
Python
9
star
3

lumin

Simple command-line tool to highlight matches in files -- like grep with --color, but shows all lines while highlighting matches
Go
8
star
4

percolation

C code for 2D/3D lattice percolation
C
8
star
5

ctools

General-purpose C tools, vintage ANSI
C
6
star
6

minecurser

Command-line Minesweeper clone using the curses library, and vi-style navigation keystrokes
C
6
star
7

mcmc-interacting-spatial-permutations

C source code for Markov chain Monte Carlo methods for interacting spatial permutations
C
5
star
8

pgpg

PGPG is the Pretty Good Parser Generator
Go
4
star
9

pgr

Plotter tool for tabular numerical data: file I/O and argument-parsing wrapped around matplotlib
Python
4
star
10

iff2

Minimalistic p=2 32/64-bit-word finite-field library
C
4
star
11

scripts-math

Various linear-algebra and stats scripts, in Python and Perl.
Python
4
star
12

dotfiles

A place to put my Linux/Unix dotfiles
Shell
3
star
13

pingety-ping

Colorized terminal-mode bar-charting for ping
Ruby
3
star
14

quarto-markdown-skeleton

CSS
2
star
15

ruffl

Finite-fields library in Ruby
Ruby
2
star
16

poki

Poki is a poor man's wiki generator
Ruby
2
star
17

bridge-walk-count

C code for bridges in self-avoiding random walks (SAWs) in the plane
C
2
star
18

scripts-misc

Simple image-file manipulation, LaTeX support, and Unixism-replacers for Perl-on-Windows
Perl
2
star
19

pangram-checker

A simple JavaScript app to see if a phrase is a pangram
HTML
2
star
20

spit

Scripting for piles of idempotent tasks
Python
2
star
21

practical-chrestomathies

How to do task T in language L
Go
1
star
22

the-shoke-language

Shell
1
star
23

spffl

A finite-field library in C++
C++
1
star
24

classical-ciphers

Light-hearted poking around with Vignere, Playfair, and Four-square ciphers
Python
1
star
25

rube

Cycle decomposition for moves on the Rubik's cube; Common Lisp / Python / Clojure porting experiments
Clojure
1
star
26

quarto-book-mode

1
star
27

away

No-frills setup for conveying minimal bashrc/vimrc/scripts to away hosts (EC2, etc.)
Vim Script
1
star
28

uno-runner

A little Java simulator for the card game Uno, for investigating fat-tailedness of game-length and other distributions.
Java
1
star
29

miller-js

Very experimental Miller UI project
Vue
1
star
30

jko-toggler

A quick little example of using JavaScript buttons to toggle section visibility for HTML pages.
HTML
1
star
31

quarto-ci-render-experiment

Testing `quarto render` from within CI
CSS
1
star
32

python-package-skeleton

Runnable cheat-sheet with all things python3
Python
1
star
33

johnkerl.github.com

Parent page for repos at http://github.com/johnkerl
CSS
1
star
34

quarto-ghp-experiment

Python
1
star
35

johnkerl

Profile README
1
star