• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Perl
  • Created about 11 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Numeric Fu for the command line

nfu: Numeric Fu for your shell

NOTE: nfu is unlikely to receive any more major updates, as I'm currently working on its successor ni.

nfu is a text data hub and transformation tool with a large set of composable functions and source/sink adapters. For example, if you wanted to do a map-side inner join between a PostgreSQL table, a CSV from the Internet, and stuff on HDFS and gather the results into a sorted/uniqued text file:

$ nfu sql:P@:'%*mytable' \
      -i0 @[ http://data.com/csv -F , ] \
      -H@::H. [ -i0 hdfsjoin:/path/to/hdfs/data ] ^gcf1. \
      -g \
  > output

# equivalent long version
$ nfu sql:Pdbname:'select * from mytable' \
      --index 0 @[ http://data.com/csv --fieldsplit , ] \
      --hadoop /tmp/temp-resharded-upload-path [ ] [ ] \
      --hadoop . [ --index 0 hdfsjoin:/path/to/hdfs/data ] \
                 [ --group --count --fields 1. ] \
      --group \
  > output

Then if you wanted to plot a cumulative histogram of the .metadata.size JSON field from the third column values, binned to the nearest 100:

$ nfu output -m 'jd(%2).metadata.size' -q100ocOs1f10p %l

# equivalent long version
$ nfu output --map 'json_decode($_[2]).metadata.size' \
             --quant 100 --order --count --rorder \
             --sum 1 --fields 10 --plot 'with lines'

Documentation

Contributors

MIT license as usual.

Options and stuff

If you invoke nfu with no arguments, it will give you the following summary:

usage: nfu [prefix-commands...] [input-files...] commands...
where each command is one of the following:

  -A|--aggregate  (1) <aggregator fn>
     --append     (1) <pseudofile; appends its contents to current stream>
  -a|--average    (0) -- window size (0 for full average) -- running average
  -b|--branch     (1) <branch (takes a pattern map)>
  -R|--buffer     (1) <creates a pseudofile from the data stream>
  -c|--count      (0) -- counts by first column value; like uniq -c
  -S|--delta      (0) -- value -> difference from last value
  -D|--drop       (0) -- number of records to drop
     --duplicate  (2) <two shell commands as separate arguments>
  -e|--each       (1) <template; executes with {} set to each value>
     --entropy    (0) -- running entropy of relative probabilities/frequencies
  -E|--every      (1) <n (returns every nth row)>
  -L|--exp        (0) -- optional base (default e)
  -f|--fields     (0) -- string of digits, each a zero-indexed column selector
  -F|--fieldsplit (1) <regexp to use for splitting>
     --fold       (1) <function that returns true when line should be folded>
  -g|--group      (0) -- sorts ascending, takes optional column list
  -H|--hadoop     (3) <hadoop streaming: outpath|.|@, mapper|:, reducer|:|_>
     --http       (1) <HTTP adapter for TCP server output>
  -i|--index      (2) <field index, unsorted pseudofile to join against>
  -I|--indexouter (2) <field index, unsorted pseudofile to join against>
  -z|--intify     (0) -- convert column to dense integers (linear space)
  -j|--join       (2) <field index, sorted pseudofile to join against>
  -J|--joinouter  (2) <field index, sorted pseudofile to join against>
  -k|--keep       (1) <row filter fn>
  -l|--log        (0) -- optional base (default e)
  -m|--map        (1) <row map fn>
     --mplot      (1) <gnuplot arguments per column, separated by ;>
  -N|--ntiles     (1) <takes N, produces ntiles of numbers>
  -n|--number     (0) -- prepends line number to each line
     --octave     (1) <pipe through octave; vector is called xs>
  -o|--order      (0) -- sorts ascending by general numeric value
     --partition  (2) <partition id fn, shell command (using {})>
     --pipe       (1) <shell command to pipe through>
  -p|--plot       (1) <gnuplot arguments>
  -M|--pmap       (1) <row map fn (executed multiple times in parallel)>
  -P|--poll       (2) <interval in seconds, command whose output to collect>
     --prepend    (1) <pseudofile; prepends its contents to current stream>
     --preview    (0) 
  -q|--quant      (1) <number to round to>
  -r|--read       (0) -- reads pseudofiles from the data stream
  -K|--remove     (1) <inverted row filter fn>
     --repeat     (2) <repeat count, pseudofile to repeat>
  -G|--rgroup     (0) -- sorts descending, takes optional column list
  -O|--rorder     (0) -- sorts descending by general numeric value
     --sample     (1) <row selection probability in [0, 1]>
     --sd         (0) -- running standard deviation
     --splot      (1) <gnuplot arguments>
  -Q|--sql        (3) <create/query SQL table: db[:[+]table], schema|_, query|_>
  -s|--sum        (0) -- value -> total += value
  -T|--take       (0) -- n to take first n, +n to take last n
     --tcp        (1) <TCP server (emits fifo filenames)>
     --tee        (1) <shell command; duplicates data to stdin of command>
  -C|--uncount    (0) -- the opposite of --count; repeats each row N times
  -V|--variance   (0) -- running variance
  -w|--with       (1) <pseudofile to join column-wise onto input>

and prefix commands are:

  documentation (not used with normal commands):
    --explain           <other-options>
    --expand-pseudofile <filename>
    --expand-code       <code>
    --expand-gnuplot    <gnuplot options>
    --expand-sql        <sql>

  pipeline modifiers:
    --quote     -- quotes args: eval $(nfu --quote ...)
    --use       <file.pl>
    --run       <perl code>

argument bracket preprocessing:

  ^stuff -> [ -stuff ]

   [ ]    nfu as function: [ -gc ]     == "$(nfu --quote -gc)"
  @[ ]    nfu as data:    @[ -gc foo ] == sh:"$(nfu --quote -gc foo)"
  q[ ]    quote things:   q[ foo bar ] == "foo bar"

pseudofile patterns:

  file.bz2       decompress file with bzip2 -dc
  file.gz        decompress file with gzip -dc
  file.lzo       decompress file with lzop -dc
  file.xz        decompress file with xz -dc
  hdfs:path      read HDFS file(s) with hadoop fs -text
  hdfsjoin:path  mapside join pseudofile (a subset of hdfs:path)
  http[s]://url  retrieve url with curl
  id:X           verbatim text X
  n:number       numbers from 1 to n, inclusive
  perl:expr      perl -e 'print "$_\n" for (expr)'
  s3://url       access S3 using s3cmd
  sh:stuff       run sh -c "stuff", take stdout
  sql:db:query   results of query as TSV
  user@host:x    remote data access (x can be a pseudofile)

gnuplot expansions:

  %d -> ' with dots'
  %i -> ' with impulses'
  %l -> ' with lines'
  %p -> ' lc palette '
  %t -> ' title '
  %u -> ' using '
  %v -> ' with vectors '

SQL expansions:

  %\* -> ' select * from '
  %c -> ' select count(1) from '
  %d -> ' select distinct * from '
  %g -> ' group by '
  %j -> ' inner join '
  %l -> ' outer left join '
  %r -> ' outer right join '
  %w -> ' where '

database prefixes:

  P = PostgreSQL
  S = SQLite 3

environment variables:

  NFU_ALWAYS_VERBOSE    if set, nfu will be verbose all the time
  NFU_HADOOP_COMMAND    hadoop executable; e.g. hadoop jar, hadoop fs -ls
  NFU_HADOOP_OPTIONS    -D options for hadoop streaming jobs
  NFU_HADOOP_STREAMING  absolute location of hadoop-streaming.jar
  NFU_HADOOP_TMPDIR     default /tmp; temp dir for hadoop uploads
  NFU_MAX_FILEHANDLES   default 64; maximum #subprocesses for --partition
  NFU_NO_PAGER          if set, nfu will not use "less" to preview stdout
  NFU_PMAP_PARALLELISM  number of subprocesses for -M
  NFU_SORT_BUFFER       default 256M; size of in-memory sort for -g and -o
  NFU_SORT_COMPRESS     default none; compression program for sort tempfiles
  NFU_SORT_PARALLEL     default 4; number of concurrent sorts to run

see https://github.com/spencertipping/nfu for documentation

More Repositories

1

jit-tutorial

How to write a very simple JIT compiler
C
1,733
star
2

shell-tutorial

How to write a UNIX shell, with a lot of background
C
326
star
3

js-in-ten-minutes

JavaScript in Ten (arbitrarily long) Minutes
Perl
268
star
4

bash-lambda

Anonymous functions and FP stuff for bash
Shell
190
star
5

caterwaul

A Javascript-to-Javascript compiler
Perl
176
star
6

cd

A better "cd" for bash
Shell
96
star
7

ni

Say "ni" to data of any size
Perl
82
star
8

bashrc-tmux

Smart auto-tmuxing for SSH logins
Shell
58
star
9

jquery.fix.clone

A compilation of fixes for the clone() method in jQuery.
JavaScript
48
star
10

flotsam

Fast floating-point array serialization for Java and JS
JavaScript
35
star
11

canard

A functional concatenative language implemented in Linux/AMD64 machine code and self-modifying perl
Perl
18
star
12

tinyelf

A way to produce really small x86-64 Linux ELF files
Perl
16
star
13

interviewing-in-ten-minutes

A guide to surviving the technical interviewing process
Perl
15
star
14

zerovpn

Automatic OpenVPN using SSH
Shell
14
star
15

js-typeclasses

A typeclass implementation for JavaScript
JavaScript
13
star
16

cheloniidae

Extreme Java Turtle Graphics
Java
13
star
17

delimited-continuations-in-scheme

A simple implementation of shift/reset using call/cc
Scheme
11
star
18

divergence

A functional JavaScript library
JavaScript
11
star
19

cpp-template-lisp

An attempt to write a Lisp in C++ templates
C++
11
star
20

manhattan-model

A 3D model of Manhattan, built from youtube videos
11
star
21

divergence.rebase

Operator overloading and syntactic macros for JavaScript
JavaScript
11
star
22

jquery.gaussian

Gaussian blur plugin for jQuery
JavaScript
10
star
23

jquery.fix.textarea-clone

A fix for blank <textarea> elements after calling clone()
JavaScript
10
star
24

bash-prompt

A bash prompt with custom indicators
Shell
9
star
25

cheloniidae-live

A port of Cheloniidae to JavaScript/Canvas using the Divergence function library
Perl
9
star
26

www

HTML
8
star
27

conky-compiler

Absolute element positioning for conkyrc
Perl
8
star
28

infuse-js

The best Javascript library that could ever possibly exist, ever
JavaScript
8
star
29

fsh

Functional shell scripts
Shell
6
star
30

js-vim-highlighter

A better JavaScript VIM highlighter
Vim Script
6
star
31

bake

Make in bash
Shell
6
star
32

xv

Process-level virtualization for Linux/x86-64
C
6
star
33

perl-objects

Self-modifying Perl objects
Perl
5
star
34

perlquery

A jQuery-like interface to the filesystem
Perl
5
star
35

writing-self-modifying-perl

A step-by-step introduction to self-modifying Perl files
Perl
5
star
36

git-in-ten-minutes

A quick guide to the more confusing parts of Git
Perl
5
star
37

phi

A JIT-compiled functional language in the making
Perl
5
star
38

wumber

CAD for Haskell
Haskell
4
star
39

information-theory-in-ten-minutes

TeX
4
star
40

figment

A programming language with no defined semantics
Perl
4
star
41

plain-blog

A static blog without any Javascript
Perl
4
star
42

divergence.debug

Expression-level debugging for JavaScript
JavaScript
3
star
43

atom-node

An ATOM->JSON converter for node.js
JavaScript
3
star
44

webcrash

My presentations for the Web 3.0 Crash Course
JavaScript
3
star
45

montenegro

A lightweight web framework for node.js and Caterwaul
JavaScript
3
star
46

lock

A mutex for shell commands
Shell
3
star
47

bipolar

Perl
3
star
48

browserpower

A map/reduce server that uses browsers as computing nodes
JavaScript
3
star
49

yagfs

Yet another Git/FUSE filesystem
Ruby
2
star
50

resume

TeX
2
star
51

dotfiles

Emacs Lisp
2
star
52

modus

A UI library for Caterwaul and Montenegro
JavaScript
2
star
53

instaserver

A quick directory server in node
JavaScript
2
star
54

motley

Administration for a motley crew of questionable machines
Shell
2
star
55

data-science-in-ten-minutes

Data science in substantially more than ten minutes
TeX
2
star
56

divergence-guide

Divergence user's guide
JavaScript
2
star
57

caterwaul-terminal

ANSI terminal wrapper for Caterwaul (like ncurses)
JavaScript
2
star
58

docker

A docker SSH/xpra server with stuff I find useful
Dockerfile
2
star
59

futon

Design notes for a futon made from 2x6 spruce
Perl
2
star
60

mulholland

A totally awesome term-rewriting language
Perl
2
star
61

diskbench

A small set of shell scripts to benchmark various disk access patterns
Shell
2
star
62

scala-ctags

A CTags langdef for Scala
2
star
63

metaoptimize-challenge

My solutions to the challenge problem posted at http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m
2
star
64

thermal

A dependency-tracking project management application
JavaScript
2
star
65

jquery.fix.select-clone

A clone() patch to preserve <select> selected values
JavaScript
2
star
66

ocd-scripts

Shell scripts for people with OCD tendencies
Shell
2
star
67

markdown-unlit

Literate compiler for Markdown
Perl
2
star
68

caterwaul-serialization

A serialization library that supports abstract values
JavaScript
2
star
69

note-paper

Graph paper with embedded data markings
PostScript
2
star
70

call-cc-in-ten-minutes

A quick guide to continuations from a Javascript perspective
Perl
2
star
71

mapomatic

Instant Leaflet.js maps
Perl
2
star
72

quickdupe

Fast duplicate-file detector
Perl
2
star
73

node-runabuf

A native extension to execute a node.js Buffer object as machine code
Assembly
2
star
74

rather-insane-serialization

A fairly complete serialization system in Javascript
Perl
2
star
75

perl-in-ten-minutes

A guide to the world's finest programming language
TeX
2
star
76

rho

A Ruby/C/Forth-inspired language
Vim Script
2
star
77

sdoc

Simple documentation for lightweight development
JavaScript
2
star
78

caterwaul-invariant

A library to maintain invariants across state transitions
JavaScript
2
star
79

lazytest

Tests for lazy developers (not that I know of any)
Perl
2
star
80

mathbio2008

A math/biology research project from summer 2008
TeX
1
star
81

mulholland-asm

An x86-64 assembler written in mulholland
JavaScript
1
star
82

caterwaul-reflection

Lexical closure inspection and first-class scope chains for Javascript
Perl
1
star
83

node-talk

A trivial command-line chat client and server
Perl
1
star
84

jquery.instavalidate

A lightweight, general-purpose text field validator for jQuery
JavaScript
1
star
85

bash-hats

Replayable command history
Shell
1
star
86

caterwaul.llasm

A low-level assembler/ELF generator for Caterwaul
JavaScript
1
star
87

peril

The successor of ni
Perl
1
star
88

uml-machine

A self-modifying Perl script to install and manage user-mode linux instances
1
star
89

divergence.vector

Vector geometry Divergence module
JavaScript
1
star
90

caterwaul-futures

A simple but expressive future library for Caterwaul
Perl
1
star
91

caterwaul-c

A really awful C parser/serializer for Caterwaul
Shell
1
star
92

ssh-baby-monitors

So the NSA can't hear stuff your baby says
1
star
93

webshell

A collection of instant-feedback web tools
1
star
94

caterwaul.analysis

Code analysis for Javascript
JavaScript
1
star
95

on

A simple way to run something on another machine
1
star
96

caterwaul.queue.blocking

An asynchronous blocking queue (should work on both client and server)
JavaScript
1
star
97

caterwaul-factory

A Caterwaul library for generating test data
JavaScript
1
star
98

node-router

A simple multiprotocol request router for node.js
JavaScript
1
star
99

bash-variable

Self-modifying files for storing values in bash
Perl
1
star
100

caterwaul-splunge

Realtime graphing for Caterwaul
Perl
1
star