• Stars
    star
    642
  • Rank 68,463 (Top 2 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A faster CSV parser in 5KB (min)

𝌠 μDSV

A faster CSV parser in 5KB (min) (MIT Licensed)


Introduction

uDSV is a fast JS library for parsing well-formed CSV strings, either from memory or incrementally from disk or network. It is mostly RFC 4180 compliant, with support for quoted values containing commas, escaped quotes, and line breaksΒΉ. The aim of this project is to handle the 99.5% use-case without adding complexity and performance trade-offs to support the remaining 0.5%.

ΒΉ Line breaks (\n,\r,\r\n) within quoted values must match the row separator.


Features

What does uDSV pack into 5KB?

  • RFC 4180 compliant
  • Incremental or full parsing, with optional accumulation
  • Auto-detection and customization of delimiters (rows, columns, quotes, escapes)
  • Schema inference and value typing: string, number, boolean, date, json
  • Defined handling of '', 'null', 'NaN'
  • Whitespace trimming of values & skipping empty lines
  • Multi-row header skipping and column renaming
  • Multiple outputs: arrays (tuples), objects, nested objects, columnar arrays

Of course, most of these are table stakes for CSV parsers :)


Performance

Is it Lightning Fastβ„’ or Blazing Fastβ„’?

No, those are too slow! uDSV has Ludicrous Speedβ„’; it's faster than the parsers you recognize and faster than those you've never heard of.

Most CSV parsers have one happy/fast path -- the one without quoted values, without value typing, and only when using the default settings & output format. Once you're off that path, you can generally throw any self-promoting benchmarks in the trash. In contrast, uDSV remains fast with any datasets and all options; its happy path is every path.

On a Ryzen 7 ThinkPad, Linux v6.4.11, and NodeJS v20.6.0, a diverse set of benchmarks show a 1x-5x performance boost relative to the popular, proven-fast, Papa Parse.

For way too many synthetic and real-world benchmarks, head over to /bench...and don't forget your coffee!

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ uszips.csv (6 MB, 18 cols x 34K rows)                                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Name                   β”‚ Rows/s β”‚ Throughput (MiB/s)                                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ uDSV                   β”‚ 782K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 140 β”‚
β”‚ csv-simple-parser      β”‚ 682K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 122        β”‚
β”‚ achilles-csv-parser    β”‚ 469K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 83.8                      β”‚
β”‚ d3-dsv                 β”‚ 433K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 77.4                        β”‚
β”‚ csv-rex                β”‚ 346K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 61.9                              β”‚
β”‚ PapaParse              β”‚ 305K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 54.5                                 β”‚
β”‚ csv42                  β”‚ 296K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 52.9                                  β”‚
β”‚ csv-js                 β”‚ 285K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 50.9                                  β”‚
β”‚ comma-separated-values β”‚ 258K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 46.1                                    β”‚
β”‚ dekkai                 β”‚ 248K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 44.3                                     β”‚
β”‚ CSVtoJSON              β”‚ 245K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 43.8                                     β”‚
β”‚ csv-parser (neat-csv)  β”‚ 218K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 39                                         β”‚
β”‚ ACsv                   β”‚ 218K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 39                                         β”‚
β”‚ SheetJS                β”‚ 208K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 37.1                                        β”‚
β”‚ @vanillaes/csv         β”‚ 200K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 35.8                                        β”‚
β”‚ node-csvtojson         β”‚ 165K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 29.4                                           β”‚
β”‚ csv-parse/sync         β”‚ 125K   β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 22.4                                              β”‚
β”‚ @fast-csv/parse        β”‚ 78.2K  β”‚ β–‘β–‘β–‘β–‘β–‘β–‘ 14                                                   β”‚
β”‚ jquery-csv             β”‚ 55.1K  β”‚ β–‘β–‘β–‘β–‘ 9.85                                                   β”‚
β”‚ but-csv                β”‚ ---    β”‚ Wrong row count! Expected: 33790, Actual: 1                 β”‚
β”‚ @gregoranders/csv      β”‚ ---    β”‚ Invalid CSV at 1:109                                        β”‚
β”‚ utils-dsv-base-parse   β”‚ ---    β”‚ unexpected error. Encountered an invalid record. Field 17 o β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation

npm i udsv

or

<script src="./dist/uDSV.iife.min.js"></script>

API

A 150 LoC uDSV.d.ts TypeScript def.


Basic Usage

import { inferSchema, initParser } from 'udsv';

let csvStr = 'a,b,c\n1,2,3\n4,5,6';

let schema = inferSchema(csvStr);
let parser = initParser(schema);

// native format (fastest)
let stringArrs = parser.stringArrs(csvStr); // [ ['1','2','3'], ['4','5','6'] ]

// typed formats (internally converted from native)
let typedArrs  = parser.typedArrs(csvStr);  // [ [1, 2, 3], [4, 5, 6] ]
let typedObjs  = parser.typedObjs(csvStr);  // [ {a: 1, b: 2, c: 3}, {a: 4, b: 5, c: 6} ]
let typedCols  = parser.typedCols(csvStr);  // [ [1, 4], [2, 5], [3, 6] ]

Nested/deep objects can be re-constructed from column naming via .typedDeep():

// deep/nested objects (from column naming)
let csvStr2 = `
_type,name,description,location.city,location.street,location.geo[0],location.geo[1],speed,heading,size[0],size[1],size[2]
item,Item 0,Item 0 description in text,Rotterdam,Main street,51.9280712,4.4207888,5.4,128.3,3.4,5.1,0.9
`.trim();

let schema2 = inferSchema(csvStr2);
let parser2 = initParser(schema2);

let typedDeep = parser2.typedDeep(csvStr2);

/*
[
  {
    _type: 'item',
    name: 'Item 0',
    description: 'Item 0 description in text',
    location: {
      city: 'Rotterdam',
      street: 'Main street',
      geo: [ 51.9280712, 4.4207888 ]
    },
    speed: 5.4,
    heading: 128.3,
    size: [ 3.4, 5.1, 0.9 ],
  }
]
*/

CSP Note:

uDSV uses dynamically-generated functions (via new Function()) for its .typed*() methods. These functions are lazy-generated and use JSON.stringify() code-injection guards, so the risk should be minimal. Nevertheless, if you have strict CSP headers without unsafe-eval, you won't be able to take advantage of the typed methods and will have to do the type conversion from the string tuples yourself.


Incremental / Streaming

uDSV has no inherent knowledge of streams. Instead, it exposes a generic incremental parsing API to which you can pass sequential chunks. These chunks can come from various sources, such as a Web Stream or Node stream via fetch() or fs, a WebSocket, etc.

Here's what it looks like with Node's fs.createReadStream():

let stream = fs.createReadStream(filePath);

let parser = null;
let result = null;

stream.on('data', (chunk) => {
  // convert from Buffer
  let strChunk = chunk.toString();
  // on first chunk, infer schema and init parser
  parser ??= initParser(inferSchema(strChunk));
  // incremental parse to string arrays
  parser.chunk(strChunk, parser.stringArrs);
});

stream.on('end', () => {
  result = parser.end();
});

...and Web streams in Node, or Fetch's Response.body:

let stream = fs.createReadStream(filePath);

let webStream = Stream.Readable.toWeb(stream);
let textStream = webStream.pipeThrough(new TextDecoderStream());

let parser = null;

for await (const strChunk of textStream) {
  parser ??= initParser(inferSchema(strChunk));
  parser.chunk(strChunk, parser.stringArrs);
}

let result = parser.end();

The above examples show accumulating parsers -- they will buffer the full result into memory. This may not be something you want (or need), for example with huge datasets where you're looking to get the sum of a single column, or want to filter only a small subset of rows. To bypass this auto-accumulation behavior, simply pass your own handler as the third argument to parser.chunk():

// ...same as above

let sum = 0;

let reducer = (rows) => {
  for (let i = 0; i < rows.length; i++) {
    sum += rows[i][3]; // sum fourth column
  }
};

for await (const strChunk of textStream) {
  parser ??= initParser(inferSchema(strChunk));
  parser.chunk(strChunk, parser.typedArrs, reducer); // typedArrs + reducer
}

parser.end();

TODO?

  • handle #comment rows
  • emit empty-row and #comment events?

More Repositories

1

uPlot

πŸ“ˆ A small, fast chart for time series, lines, areas, ohlc & bars
JavaScript
8,520
star
2

uFuzzy

A tiny, efficient fuzzy search that doesn't suck
JavaScript
2,525
star
3

dropcss

An exceptionally fast, thorough and tiny unused-CSS cleaner
HTML
2,127
star
4

RgbQuant.js

color quantization lib
JavaScript
409
star
5

reMarked.js

client-side HTML > markdown
JavaScript
395
star
6

dump_r.php

a cleaner, leaner mix of print_r() and var_dump()
PHP
118
star
7

pXY.js

pixel analysis for HTML5 Canvas
JavaScript
86
star
8

preCode.js

pain killer for <pre><code> & <textarea>
JavaScript
79
star
9

transformation-matrix-js

An implementation of a 2D transformation matrix for JavaScript.
CSS
55
star
10

GIFter.js

<canvas> to GIF recorder
JavaScript
44
star
11

Route66.php

PHP micro-router
PHP
15
star
12

notyet

Lazy image & media loader
JavaScript
14
star
13

npa-nxx-miner.php

a stupid-simple NPA-NXX database > CSV miner
PHP
14
star
14

uTable

A tiny, fast UI for viewing, sorting, and filtering CSVs
TypeScript
13
star
15

uExpr

A conditional expression compiler
JavaScript
11
star
16

jquery.scanner.js

a nifty barcode scanning framework
JavaScript
8
star
17

NestedSet.php

unobtrusive MPTT / nested set and tree manip
PHP
5
star
18

handlebar.js

JS implementation of an expanded & modified (clearer?) syntax of the Mustache templating framework
JavaScript
5
star
19

domvm-widgets

Official domvm UI components
3
star
20

flecks

A lightweight CSS flexbox grid
JavaScript
2
star
21

domvm-router

JavaScript
2
star
22

matreeshka

JavaScript
2
star
23

uplot-react

TypeScript
2
star
24

what-i-like

1
star
25

PGAL.php

Payment Gateway Abstraction Layer (php5+)
1
star
26

sweetNav

more useful GPS route navigation
1
star
27

ModBoss.php

MODBUS protocol framework
1
star
28

interactive-floorplan

interactive floorplan generator for web desginers
1
star
29

Trove.php

powerful sorting, filtering and grouping for collections of objects
1
star
30

sweatshop.js

parallel processing via web workers
JavaScript
1
star
31

formation.js

flexible form generation + design framework
1
star
32

myRules

language-agnostic conditional data validation logic
1
star