• Stars
    star
    1,385
  • Rank 32,597 (Top 0.7 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created almost 10 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else

csv-parser

tests cover size

Streaming CSV parser that aims for maximum speed as well as compatibility with the csv-spectrum CSV acid test suite.

csv-parser can convert CSV into JSON at at rate of around 90,000 rows per second. Performance varies with the data used; try bin/bench.js <your file> to benchmark your data.

csv-parser can be used in the browser with browserify.

neat-csv can be used if a Promise based interface to csv-parser is needed.

Note: This module requires Node v8.16.0 or higher.

Benchmarks

⚡️ csv-parser is greased-lightning fast

→ npm run bench

  Filename                 Rows Parsed  Duration
  backtick.csv                       2     3.5ms
  bad-data.csv                       3    0.55ms
  basic.csv                          1    0.26ms
  comma-in-quote.csv                 1    0.29ms
  comment.csv                        2    0.40ms
  empty-columns.csv                  1    0.40ms
  escape-quotes.csv                  3    0.38ms
  geojson.csv                        3    0.46ms
  large-dataset.csv               7268      73ms
  newlines.csv                       3    0.35ms
  no-headers.csv                     3    0.26ms
  option-comment.csv                 2    0.24ms
  option-escape.csv                  3    0.25ms
  option-maxRowBytes.csv          4577      39ms
  option-newline.csv                 0    0.47ms
  option-quote-escape.csv            3    0.33ms
  option-quote-many.csv              3    0.38ms
  option-quote.csv                   2    0.22ms
  quotes+newlines.csv                3    0.20ms
  strict.csv                         3    0.22ms
  latin.csv                          2    0.38ms
  mac-newlines.csv                   2    0.28ms
  utf16-big.csv                      2    0.33ms
  utf16.csv                          2    0.26ms
  utf8.csv                           2    0.24ms

Install

Using npm:

$ npm install csv-parser

Using yarn:

$ yarn add csv-parser

Usage

To use the module, create a readable stream to a desired CSV file, instantiate csv, and pipe the stream to csv.

Suppose you have a CSV file data.csv which contains the data:

NAME,AGE
Daffy Duck,24
Bugs Bunny,22

It could then be parsed, and results shown like so:

const csv = require('csv-parser')
const fs = require('fs')
const results = [];

fs.createReadStream('data.csv')
  .pipe(csv())
  .on('data', (data) => results.push(data))
  .on('end', () => {
    console.log(results);
    // [
    //   { NAME: 'Daffy Duck', AGE: '24' },
    //   { NAME: 'Bugs Bunny', AGE: '22' }
    // ]
  });

To specify options for csv, pass an object argument to the function. For example:

csv({ separator: '\t' });

API

csv([options | headers])

Returns: Array[Object]

options

Type: Object

As an alternative to passing an options object, you may pass an Array[String] which specifies the headers to use. For example:

csv(['Name', 'Age']);

If you need to specify options and headers, please use the the object notation with the headers property as shown below.

escape

Type: String
Default: "

A single-character string used to specify the character used to escape strings in a CSV row.

headers

Type: Array[String] | Boolean

Specifies the headers to use. Headers define the property key for each value in a CSV row. If no headers option is provided, csv-parser will use the first line in a CSV file as the header specification.

If false, specifies that the first row in a data file does not contain headers, and instructs the parser to use the column index as the key for each column. Using headers: false with the same data.csv example from above would yield:

[
  { '0': 'Daffy Duck', '1': 24 },
  { '0': 'Bugs Bunny', '1': 22 }
]

Note: If using the headers for an operation on a file which contains headers on the first line, specify skipLines: 1 to skip over the row, or the headers row will appear as normal row data. Alternatively, use the mapHeaders option to manipulate existing headers in that scenario.

mapHeaders

Type: Function

A function that can be used to modify the values of each header. Return a String to modify the header. Return null to remove the header, and it's column, from the results.

csv({
  mapHeaders: ({ header, index }) => header.toLowerCase()
})
Parameters

header String The current column header.
index Number The current column index.

mapValues

Type: Function

A function that can be used to modify the content of each column. The return value will replace the current column content.

csv({
  mapValues: ({ header, index, value }) => value.toLowerCase()
})
Parameters

header String The current column header.
index Number The current column index.
value String The current column value (or content).

newline

Type: String
Default: \n

Specifies a single-character string to denote the end of a line in a CSV file.

quote

Type: String
Default: "

Specifies a single-character string to denote a quoted string.

raw

Type: Boolean

If true, instructs the parser not to decode UTF-8 strings.

separator

Type: String
Default: ,

Specifies a single-character string to use as the column separator for each row.

skipComments

Type: Boolean | String
Default: false

Instructs the parser to ignore lines which represent comments in a CSV file. Since there is no specification that dictates what a CSV comment looks like, comments should be considered non-standard. The "most common" character used to signify a comment in a CSV file is "#". If this option is set to true, lines which begin with # will be skipped. If a custom character is needed to denote a commented line, this option may be set to a string which represents the leading character(s) signifying a comment line.

skipLines

Type: Number
Default: 0

Specifies the number of lines at the beginning of a data file that the parser should skip over, prior to parsing headers.

maxRowBytes

Type: Number
Default: Number.MAX_SAFE_INTEGER

Maximum number of bytes per row. An error is thrown if a line exeeds this value. The default value is on 8 peta byte.

strict

Type: Boolean
Default: false

If true, instructs the parser that the number of columns in each row must match the number of headers specified or throws an exception. if false: the headers are mapped to the column index less columns: any missing column in the middle will result in a wrong property mapping! more columns: the aditional columns will create a "_"+index properties - eg. "_10":"value"

Events

The following events are emitted during parsing:

data

Emitted for each row of data parsed with the notable exception of the header row. Please see Usage for an example.

headers

Emitted after the header row is parsed. The first parameter of the event callback is an Array[String] containing the header names.

fs.createReadStream('data.csv')
  .pipe(csv())
  .on('headers', (headers) => {
    console.log(`First header: ${headers[0]}`)
  })

Readable Stream Events

Events available on Node built-in Readable Streams are also emitted. The end event should be used to detect the end of parsing.

CLI

This module also provides a CLI which will convert CSV to newline-delimited JSON. The following CLI flags can be used to control how input is parsed:

Usage: csv-parser [filename?] [options]

  --escape,-e         Set the escape character (defaults to quote value)
  --headers,-h        Explicitly specify csv headers as a comma separated list
  --help              Show this help
  --output,-o         Set output file. Defaults to stdout
  --quote,-q          Set the quote character ('"' by default)
  --remove            Remove columns from output by header name
  --separator,-s      Set the separator character ("," by default)
  --skipComments,-c   Skip CSV comments that begin with '#'. Set a value to change the comment character.
  --skipLines,-l      Set the number of lines to skip to before parsing headers
  --strict            Require column length match headers length
  --version,-v        Print out the installed version

For example; to parse a TSV file:

cat data.tsv | csv-parser -s $'\t'

Encoding

Users may encounter issues with the encoding of a CSV file. Transcoding the source stream can be done neatly with a modules such as:

Or native iconv if part of a pipeline.

Byte Order Marks

Some CSV files may be generated with, or contain a leading Byte Order Mark. This may cause issues parsing headers and/or data from your file. From Wikipedia:

The Unicode Standard permits the BOM in UTF-8, but does not require nor recommend its use. Byte order has no meaning in UTF-8.

To use this module with a file containing a BOM, please use a module like strip-bom-stream in your pipeline:

const fs = require('fs');

const csv = require('csv-parser');
const stripBom = require('strip-bom-stream');

fs.createReadStream('data.csv')
  .pipe(stripBom())
  .pipe(csv())
  ...

When using the CLI, the BOM can be removed by first running:

$ sed $'s/\xEF\xBB\xBF//g' data.csv

Meta

CONTRIBUTING

LICENSE (MIT)

More Repositories

1

peerflix

Streaming torrent client for node.js
JavaScript
6,094
star
2

playback

Video player built using electron and node.js
JavaScript
2,014
star
3

torrent-stream

The low level streaming torrent engine that peerflix uses
JavaScript
1,938
star
4

why-is-node-running

Node is running but you don't know why? why-is-node-running is here to help you.
JavaScript
1,601
star
5

chromecasts

Query your local network for Chromecasts and have them play media
JavaScript
1,447
star
6

torrent-mount

Mount a torrent (or magnet link) as a filesystem in real time using torrent-stream and fuse. AKA MAD SCIENCE!
JavaScript
1,333
star
7

turbo-http

Blazing fast low level http server
JavaScript
996
star
8

is-my-json-valid

A JSONSchema validator that uses code generation to be extremely fast
JavaScript
954
star
9

pump

pipe streams together and close all of them if one of them closes
JavaScript
895
star
10

airpaste

A 1-1 network pipe that auto discovers other peers using mdns
JavaScript
819
star
11

hyperdb

Distributed scalable database
JavaScript
752
star
12

protocol-buffers

Protocol Buffers for Node.js
JavaScript
751
star
13

signalhub

Simple signalling server that can be used to coordinate handshaking with webrtc or other fun stuff.
JavaScript
663
star
14

turbo-json-parse

Turbocharged JSON.parse for type stable JSON data
JavaScript
613
star
15

turbo-net

Low level TCP library for Node.js
JavaScript
598
star
16

peercast

torrent-stream + chromecast
JavaScript
509
star
17

hyperbeam

A 1-1 end-to-end encrypted internet pipe powered by Hyperswarm
JavaScript
482
star
18

multicast-dns

Low level multicast-dns implementation in pure javascript
JavaScript
470
star
19

hyperlog

Merkle DAG that replicates based on scuttlebutt logs and causal linking
JavaScript
466
star
20

hypervision

P2P Television
JavaScript
442
star
21

webcat

Mad science p2p pipe across the web using webrtc that uses your Github private/public key for authentication and a signalhub for discovery
JavaScript
437
star
22

tar-stream

tar-stream is a streaming tar parser and generator.
JavaScript
381
star
23

webrtc-swarm

Create a swarm of p2p connections using webrtc and a signalhub
JavaScript
375
star
24

discovery-swarm

A network swarm that uses discovery-channel to find peers
JavaScript
375
star
25

tar-fs

fs bindings for tar-stream
JavaScript
339
star
26

torrent-docker

MAD SCIENCE realtime boot of remote docker images using bittorrent
JavaScript
314
star
27

fuse-bindings

Notice: We published the successor module to this here https://github.com/fuse-friends/fuse-native
C++
312
star
28

peerwiki

all of wikipedia on bittorrent
JavaScript
308
star
29

awesome-p2p

List of great p2p resources
301
star
30

hyperfs

A content-addressable union file system build on top of fuse, hyperlog, leveldb and node
JavaScript
270
star
31

respawn

Spawn a process and restart it if it crashes
JavaScript
254
star
32

pumpify

Combine an array of streams into a single duplex stream using pump and duplexify
JavaScript
252
star
33

polo

Polo is a zero configuration service discovery module written completely in Javascript.
JavaScript
247
star
34

benny-hill

Play the Benny Hill theme while running another command
JavaScript
242
star
35

streamx

An iteration of the Node.js core streams with a series of improvements.
JavaScript
217
star
36

mp4-stream

Streaming mp4 encoder and decoder
JavaScript
216
star
37

hyperphone

A telephone over Hyperbeam
JavaScript
198
star
38

flat-file-db

Fast in-process flat file database that caches all data in memory
JavaScript
195
star
39

diffy

A tiny framework for building diff based interactive command line tools.
JavaScript
191
star
40

dns-discovery

Discovery peers in a distributed system using regular dns and multicast dns.
JavaScript
190
star
41

duplexify

Turn a writable and readable stream into a streams2 duplex stream with support for async initialization and streams1/streams2 input
JavaScript
185
star
42

browser-server

A HTTP "server" in the browser that uses a service worker to allow you to easily send back your own stream of data.
JavaScript
185
star
43

ims

Install My Stuff - an opinionated npm module installer
JavaScript
185
star
44

browserify-fs

fs for the browser using level-filesystem and browserify
JavaScript
184
star
45

dns-packet

An abstract-encoding compliant module for encoding / decoding DNS packets
JavaScript
181
star
46

jitson

Just-In-Time JSON.parse compiler
JavaScript
178
star
47

dnsjack

A simple DNS proxy that lets you intercept domains and route them to whatever IP you decide.
JavaScript
172
star
48

nanobench

Simple benchmarking tool with TAP-like output that is easy to parse
JavaScript
169
star
49

localcast

A shared event emitter that works across multiple processes on the same machine, including the browser!
JavaScript
165
star
50

level-filesystem

Full implementation of the fs module on top of leveldb
JavaScript
164
star
51

dht-rpc

Make RPC calls over a Kademlia based DHT.
JavaScript
160
star
52

tetris

Play tetris in your terminal - in color
JavaScript
157
star
53

hyperssh

Run SSH over hyperswarm!
JavaScript
146
star
54

end-of-stream

Call a callback when a readable/writable/duplex stream has completed or failed.
JavaScript
145
star
55

flat-tree

A series of functions to map a binary tree to a list
JavaScript
141
star
56

lil-pids

Dead simple process manager with few features
JavaScript
140
star
57

airswarm

Network swarm that automagically discovers other peers on the network using multicast dns
JavaScript
127
star
58

wat2js

Compile WebAssembly .wat files to a common js module
JavaScript
127
star
59

node-modules

Search for node modules
JavaScript
127
star
60

ssh-exec

Execute a script over ssh using Node.JS
JavaScript
126
star
61

add-to-systemd

Small command line tool to simply add a service to systemd
JavaScript
125
star
62

deejay

Music player that broadcasts to everyone on the same network
JavaScript
124
star
63

protocol-buffers-schema

No nonsense protocol buffers schema parser written in Javascript
JavaScript
120
star
64

tree-to-string

Convert a tree structure into a human friendly string
JavaScript
120
star
65

unordered-array-remove

Efficiently remove an element from an unordered array without doing a splice
JavaScript
117
star
66

hyperpipe

Distributed input/output pipe.
JavaScript
116
star
67

abstract-chunk-store

A test suite and interface you can use to implement a chunk based storage backend
JavaScript
113
star
68

shared-structs

Share a struct backed by the same underlying buffer between C and JavaScript
JavaScript
113
star
69

mininet

Spin up and interact with virtual networks using Mininet and Node.js
JavaScript
113
star
70

p2p-workshop

a workshop to learn about p2p
HTML
112
star
71

jsonkv

Single file write-once database that is valid JSON with efficient random access on bigger datasets
JavaScript
109
star
72

ansi-diff-stream

A transform stream that diffs input buffers and outputs the diff as ANSI. If you pipe this to a terminal it will update the output with minimal changes
JavaScript
109
star
73

browser-sync-stream

Rsync between a server and the browser.
JavaScript
108
star
74

docker-registry-server

docker registry server in node.js
JavaScript
106
star
75

dns-socket

Make custom low-level DNS requests from node with retry support.
JavaScript
102
star
76

utp-native

Native bindings for libutp
JavaScript
100
star
77

taco-nginx

Bash script that runs a service and forwards a subdomain to it using nginx when it listens to $PORT
Shell
100
star
78

gunzip-maybe

Transform stream that gunzips its input if it is gzipped and just echoes it if not
JavaScript
98
star
79

merkle-tree-stream

A stream that generates a merkle tree based on the incoming data.
JavaScript
98
star
80

media-recorder-stream

The Media Recorder API in the browser as a readable stream
JavaScript
97
star
81

thunky

Delay the evaluation of a paramless async function and cache the result
JavaScript
97
star
82

peervision

a live p2p streaming protocol
JavaScript
97
star
83

noise-network

Authenticated P2P network backed by Hyperswarm and Noise
JavaScript
96
star
84

soundcloud-to-dat

Download all music from a Soundcloud url and put it into a Dat
JavaScript
96
star
85

blecat

1-1 pipe over bluetooth low energy
JavaScript
95
star
86

debugment

A debug comment -> debugment
JavaScript
93
star
87

hyperdht

A DHT that supports peer discovery and distributed hole punching
JavaScript
93
star
88

docker-browser-console

Forward input/output from docker containers to your browser
JavaScript
90
star
89

srt-to-vtt

Transform stream that converts srt files to vtt files (html5 video subtitles)
JavaScript
90
star
90

speedometer

speed measurement in javascript
JavaScript
88
star
91

mutexify

Bike shed mutex lock implementation
JavaScript
88
star
92

p2p-file-sharing-workshop

A workshop where you learn about distributed file sharing
HTML
88
star
93

mirror-folder

Small module to mirror a folder to another folder. Supports live mode as well.
JavaScript
87
star
94

utp

utp (micro transport protocol) implementation in node
JavaScript
86
star
95

echo-servers.c

A collection of various echo servers in c
C
83
star
96

recursive-watch

Minimal recursive file watcher
JavaScript
82
star
97

docker-browser-server

Spawn and expose docker containers over http and websockets
JavaScript
80
star
98

are-feross-and-mafintosh-stuck-in-an-elevator

Are @feross and @mafintosh stuck in an elevator?
JavaScript
79
star
99

parallel-transform

Transform stream for Node.js that allows you to run your transforms in parallel without changing the order
JavaScript
79
star
100

peer-wire-swarm

swarm implementation for bittorrent
JavaScript
79
star