• Stars
    star
    269
  • Rank 152,662 (Top 4 %)
  • Language
    R
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Manipulate CSV files on the command line using dplyr

dplyr-cli

dplyr-cli uses the Rscript executable to run dplyr commands on CSV files in the terminal.

dplyr-cli makes use of the terminal pipe | instead of the magrittr pipe (%>%) to run sequences of commands.

cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl|      mpg|
#> |---:|--------:|
#> |   4| 26.66364|
#> |   6| 19.74286|
#> |   8| 15.10000|

Motivation

I wanted to be able to do quick hacks on CSV files on the command line using dplyr syntax, but without actually starting a proper R session.

What dplyr commands are supported?

Any command of the form:

  • dplyr::verb(.data, code)
  • dplyr::*_join(.data, .rhs)

Currently two extra commands are supported which are not part of dplyr.

  • csv performs no dplyr command, but only outputs the input data as CSV to stdout
  • kable performs no dplyr command, but only outputs the input data as a knitr::kable() formatted string to stdout

Limitations

  • Only tested under ‘bash’ on OSX. YMMV.
  • Every command runs in a separate R session.
  • When using special shell characters such as (), you’ll have to quote your code arguments. Some shells will require more quoting than others.
  • “joins” (such as left_join) do not currently let you specify the by argument, so there must be columns in common to both dataset

Usage

dplyr --help
#  dplyr-cli
#  
#  Usage:
#      dplyr <command> [--file=fn] [--csv | -c] [--verbose | -v] [<code>...]
#      dplyr -h | --help
#  
#  Options:
#      -h --help            show this help text
#      -f FILE --file=FILE  input CSV or RDS filename. If reading from stdin, assumes CSV [default: stdin]
#      -c --csv             write output to stdout in CSV format (instead of default RDS file)
#      -v --verbose         be verbose

History

v0.1.0 2020-04-20

  • Initial release

v0.1.1 2020-04-21

  • Switch to ‘Rscript’ for easier install for users
  • rename ‘dplyr.sh’ to just ‘dplyr’

v0.1.2 2020-04-21

  • Support for joins e.g. left_join

v0.1.3 2020-04-22

  • More robust tmpdir handling

v0.1.4 2022-01-23

  • Fix handling for latest read_csv(). Fixes #9

Contributors

Installation

Because this script straddles a great divide between R and the shell, you need to ensure both are set up correctly for this to work.

  1. Install R packages
  2. Clone this repo and put dplyr in your path

Install R packages - within R

dplyr-cli is run from the shell but at every invocation is starting a new rsession where the following packages are expected to be installed:

install.packages('readr')    # read in CSV data
install.packages('dplyr')    # data manipulation
install.packages('docopt')   # CLI description language
Click to reveal instructions for installing packages on the command line

To do it from the cli on a linux-ish system, install r-base (sudo apt -y install r-base) and then run

sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""

Clone this repo and put dplyr in your path

You’ll then need to download the shell script from this repository and put dplyr somewhere in your path.

git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path

Example data

Put an example CSV file on the filesystem. Note: This CSV file is now included as mtcars.csv as part of this git repository, as is a second CSV file for demonstrating joins - cyl.csv

write.csv(mtcars, "mtcars.csv", row.names = FALSE)

Example 1 - Basic Usage

# cat contents of input CSV into dplyr-cli.  
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
#  "mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
#  21,6,160,110,3.9,2.62,16.46,0,1,4,4
#  21,6,160,110,3.9,2.875,17.02,0,1,4,4
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
#  "mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
#  10.4,8,472,205,2.93,5.25,17.98,0,0,3,4
#  10.4,8,460,215,3,5.424,17.82,0,0,3,4
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
#  "cyl"
#  6
#  6
#  4
#  6
#  8

Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)

cat mtcars.csv | \
   dplyr mutate "cyl2 = 2 * cyl"  | \
   dplyr filter "cyl == 8" | \
   dplyr kable
#  |  mpg| cyl|  disp|  hp| drat|    wt|  qsec| vs| am| gear| carb| cyl2|
#  |----:|---:|-----:|---:|----:|-----:|-----:|--:|--:|----:|----:|----:|
#  | 18.7|   8| 360.0| 175| 3.15| 3.440| 17.02|  0|  0|    3|    2|   16|
#  | 14.3|   8| 360.0| 245| 3.21| 3.570| 15.84|  0|  0|    3|    4|   16|
#  | 16.4|   8| 275.8| 180| 3.07| 4.070| 17.40|  0|  0|    3|    3|   16|
#  | 17.3|   8| 275.8| 180| 3.07| 3.730| 17.60|  0|  0|    3|    3|   16|
#  | 15.2|   8| 275.8| 180| 3.07| 3.780| 18.00|  0|  0|    3|    3|   16|
#  | 10.4|   8| 472.0| 205| 2.93| 5.250| 17.98|  0|  0|    3|    4|   16|
#  | 10.4|   8| 460.0| 215| 3.00| 5.424| 17.82|  0|  0|    3|    4|   16|
#  | 14.7|   8| 440.0| 230| 3.23| 5.345| 17.42|  0|  0|    3|    4|   16|
#  | 15.5|   8| 318.0| 150| 2.76| 3.520| 16.87|  0|  0|    3|    2|   16|
#  | 15.2|   8| 304.0| 150| 3.15| 3.435| 17.30|  0|  0|    3|    2|   16|
#  | 13.3|   8| 350.0| 245| 3.73| 3.840| 15.41|  0|  0|    3|    4|   16|
#  | 19.2|   8| 400.0| 175| 3.08| 3.845| 17.05|  0|  0|    3|    2|   16|
#  | 15.8|   8| 351.0| 264| 4.22| 3.170| 14.50|  0|  1|    5|    4|   16|
#  | 15.0|   8| 301.0| 335| 3.54| 3.570| 14.60|  0|  1|    5|    8|   16|

Example 3 - set up some aliases for convenience

alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"


cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#  | cyl|      mpg|
#  |---:|--------:|
#  |   4| 26.66364|
#  |   6| 19.74286|
#  |   8| 15.10000|

Example 4 - joins

Limitations:

  • first argument after a join command must be an existing file (either CSV or RDS)
  • You can’t yet specify a by argument for a join, so there must be a column in common to join by
cat cyl.csv
#  cyl,description
#  4,four
#  6,six
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
#  |  mpg| cyl|  disp|  hp| drat|    wt|  qsec| vs| am| gear| carb|description |
#  |----:|---:|-----:|---:|----:|-----:|-----:|--:|--:|----:|----:|:-----------|
#  | 21.0|   6| 160.0| 110| 3.90| 2.620| 16.46|  0|  1|    4|    4|six         |
#  | 21.0|   6| 160.0| 110| 3.90| 2.875| 17.02|  0|  1|    4|    4|six         |
#  | 22.8|   4| 108.0|  93| 3.85| 2.320| 18.61|  1|  1|    4|    1|four        |
#  | 21.4|   6| 258.0| 110| 3.08| 3.215| 19.44|  1|  0|    3|    1|six         |
#  | 18.1|   6| 225.0| 105| 2.76| 3.460| 20.22|  1|  0|    3|    1|six         |
#  | 24.4|   4| 146.7|  62| 3.69| 3.190| 20.00|  1|  0|    4|    2|four        |
#  | 22.8|   4| 140.8|  95| 3.92| 3.150| 22.90|  1|  0|    4|    2|four        |
#  | 19.2|   6| 167.6| 123| 3.92| 3.440| 18.30|  1|  0|    4|    4|six         |
#  | 17.8|   6| 167.6| 123| 3.92| 3.440| 18.90|  1|  0|    4|    4|six         |
#  | 32.4|   4|  78.7|  66| 4.08| 2.200| 19.47|  1|  1|    4|    1|four        |
#  | 30.4|   4|  75.7|  52| 4.93| 1.615| 18.52|  1|  1|    4|    2|four        |
#  | 33.9|   4|  71.1|  65| 4.22| 1.835| 19.90|  1|  1|    4|    1|four        |
#  | 21.5|   4| 120.1|  97| 3.70| 2.465| 20.01|  1|  0|    3|    1|four        |
#  | 27.3|   4|  79.0|  66| 4.08| 1.935| 18.90|  1|  1|    4|    1|four        |
#  | 26.0|   4| 120.3|  91| 4.43| 2.140| 16.70|  0|  1|    5|    2|four        |
#  | 30.4|   4|  95.1| 113| 3.77| 1.513| 16.90|  1|  1|    5|    2|four        |
#  | 19.7|   6| 145.0| 175| 3.62| 2.770| 15.50|  0|  1|    5|    6|six         |
#  | 21.4|   4| 121.0| 109| 4.11| 2.780| 18.60|  1|  1|    4|    2|four        |

Security warning

dplyr-cli uses eval(parse(text = ...)) on user input. Do not expose this program to the internet or random users under any circumstances.

Inspirations

  • xsv - a fast CSV command line toolkit written in Rust
  • jq - a command line JSON processor.
  • miller

More Repositories

1

ggrgl

3d extension to ggplot
R
182
star
2

ggsvg

Use SVG images as ggplot points
R
138
star
3

emphatic

Highlighting R output in the console
R
137
star
4

yyjsonr

Fast JSON package for R
C
124
star
5

tickle

Easily create UIs in base R
R
123
star
6

devout

Write R graphics output devices in plain R
C++
97
star
7

anotherworld

AnotherWorld ported to R
R
75
star
8

ggreverse

Reverse a ggplot object back into code
R
67
star
9

eventloop

Event Loop in R
R
65
star
10

isocubes

R
61
star
11

nara

nativeRaster tools for R
C
60
star
12

wordle

Wordle helper for RStats
R
59
star
13

devoutsvg

Bespoke SVG graphics output device with pattern fill support
R
57
star
14

rllama

Minimal R wrapper for llama.cpp
C
55
star
15

carelesswhisper

Automatic speech recognition in R using whisper.cpp
C
54
star
16

svgparser

Render SVG images in R. Load SVG data as data.frames
R
48
star
17

ggthreed

3d geoms and stats for ggplot
R
46
star
18

threed

Three-Dimensional Object Transformations
R
43
star
19

minipdf

Minimal pure-R PDF document creator
R
41
star
20

gluestick

Simple, single-function string interpolation in Base R
R
41
star
21

tr808r

TR-808 Drum Machine for R
R
40
star
22

xxhashlite

Very fast hash functions using xxHash
C++
36
star
23

facetious

Home to some alternate facetting for ggplot2
R
36
star
24

geomlime

ggplot geom_lime()
R
34
star
25

ggblur

Blurry Point Geom for ggplot2
R
34
star
26

fugly

Extract named substrings using named capture groups in regular expressions.
R
34
star
27

rbytecode

R bytecode assembler/disassembler
R
34
star
28

minicss

Build CSS selectors, styles and stylesheets within R
R
31
star
29

minisvg

Create SVG documents with R
R
29
star
30

zstdlite

Fast, configurable in-memory compression of R objects with zstd
C
26
star
31

numberwang

numbers to words and vice versa
R
25
star
32

pacman

Pacman-ish game in R
R
24
star
33

wingspan

Data about the birds and bonus cards in the boardgame "Wingspan"
R
24
star
34

memoisetools

a collection of additional caches and helper functions to work alongside the memoise package
R
23
star
35

terse

Terse output for nested lists and data.frames
R
22
star
36

ransid

Convert images to ANSI with R
R
21
star
37

lz4lite

Very Fast compression/decompression of in-memory numeric vectors with LZ4
C
20
star
38

miranda

Fast PRNGs for R
C
20
star
39

hershey

Hershey vector font data for RStats
R
20
star
40

chipmunkcore

R wrapper around the Chipmunk2d physics simulation library
C
19
star
41

simplercpp

A demo R package incorporating C code with Rcpp
C++
19
star
42

minihtml

A package for building HTML documents in R (shiny compatible)
R
18
star
43

simplecall

A demo R package incorporating C code which is called with .Call()
C
18
star
44

callme

Easily compile inline C code for R
R
17
star
45

foist

Fast Output of Images
C++
17
star
46

RStudioConf-2022

17
star
47

minidrawio

Create simple draw.io documents from R
R
16
star
48

displease

Non-linear numeric interpolation
R
16
star
49

CP1919

Cambridge Pulsar at 19 hours and 19 minutes right ascension
R
16
star
50

purler

Fast run-length encoding with NA support and results as a data.frame
C
16
star
51

fastpng

Read/write 8-bit/16-bit PNGs with rasters, native rasters, numeric+integer arrays, indexed images with palette, packed pixels in raw vector. Configurable compression settings allow for speed/size tradeoff.
C
16
star
52

ggqr

ggplot2 geom for QR codes
R
16
star
53

CRAN-checks

Notes about extra CRAN checks
16
star
54

nonogram

Nonogram solver in rstats
R
15
star
55

optout

Optimized Graphics Output
R
15
star
56

ingrid

Tools for interactive grid creation and manipulation in the console
R
15
star
57

gridfont

A version of the 'gridfont' vector font in an R friendly format
R
15
star
58

phon

Tools and Data for the CMU Pronouncing Dictionary
R
14
star
59

rd2list

Convert Rd documentation to a structured, human-readable list
R
14
star
60

flagon

Flags of the World
R
14
star
61

chipmunkbasic

Higher-level R6 wrapper around Chipmunk2d rigid body physics simulation library
R
14
star
62

anon

Anonymous function creation in R and purrr
R
14
star
63

devoutdrawio

R graphics device to output to draw.io XML vector format
R
14
star
64

miniverse

A constellation of packages for document creation in R
R
14
star
65

devoutaudio

An R graphics device which renders to audio
R
13
star
66

image2xlsx

Convert an image to an excel spreadsheet
R
13
star
67

strictlyr

Stricter subset of dplyr
R
13
star
68

insitu

In-place modification of vectors
C
13
star
69

svgpatternusgs

SVG patterns from the U.S. Geological Survey for use within R
R
13
star
70

cstructr

Exposing C structrs to R
C
13
star
71

ggecho

ggplot2 stat for echoing data
R
13
star
72

arcadefont

Oldschool arcade vector font
R
13
star
73

devoutrgl

R graphics device to render to {rgl}
R
13
star
74

rmonocypher

Easy to use encryption tools for R
C
13
star
75

rconnection

Writing a custom connection for R
C
13
star
76

cgrep

Highlighted grep of R objects
R
12
star
77

serializer

Example showing how to access R's serialization functions from C
C
12
star
78

getrect

Partition matrix into single-valued rectangular areas
R
12
star
79

smallfactor

An R factor backed by a raw vector rather than an integer vector.
R
11
star
80

visvalingam

R package for Visvalingam Line Simplification
C
11
star
81

analemmatic

Create analemmatic sundials with R
R
11
star
82

cairocore

Low-level bindings to the CairoGraphics library for fast 2d drawing operations
R
11
star
83

bdftools

Bitmap font tools for R
R
11
star
84

simplefortran

Demo of how Fortran code could be included in an R package
C
10
star
85

svgpatternsimple

Create some simple repeating SVG patterns in R
R
10
star
86

naratext

Render text as nativeRaster images
R
10
star
87

snowcrash

Encode arbitrary objects as PNGs, rasters and rasterGrobs
R
9
star
88

frak

Fractal Generator
C
9
star
89

grrr

Modify function default arguments
R
9
star
90

triangular

Decompose complex polygons into sets of triangles
R
9
star
91

devoutansi

RStats ANSI graphics device
R
9
star
92

codespacer

Setup CodeSpace with RStudio support
Dockerfile
9
star
93

cryogenic

Freezing calls, modifying arguments and evaluating later
R
8
star
94

btnvips

libvips cli wrapper for #RStats (btn = better than nothing)
R
8
star
95

colourlookup

Technical demonstration of hash lookup for R colour names
C
8
star
96

ggdebug

Package to help debug and inspect ggplot stats
R
8
star
97

simplec

Demo R package with C code which is called using ".C()'
C
8
star
98

simpletrie

Simple trie in pure R
R
8
star
99

poissoned

Poisson Disc Sampling in R
R
8
star
100

devoutpdf

A hand-crafted PDF graphics output device written in plain R
R
8
star