• Stars
    star
    416
  • Rank 100,119 (Top 3 %)
  • Language
    JavaScript
  • License
    ISC License
  • Created about 11 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A parser and formatter for delimiter-separated values, such as CSV and TSV.

d3-dsv

This module provides a parser and formatter for delimiter-separated values, most commonly comma- (CSV) or tab-separated values (TSV). These tabular formats are popular with spreadsheet programs such as Microsoft Excel, and are often more space-efficient than JSON. This implementation is based on RFC 4180.

Comma (CSV) and tab (TSV) delimiters are built-in. For example, to parse:

d3.csvParse("foo,bar\n1,2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
d3.tsvParse("foo\tbar\n1\t2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]

Or to format:

d3.csvFormat([{foo: "1", bar: "2"}]); // "foo,bar\n1,2"
d3.tsvFormat([{foo: "1", bar: "2"}]); // "foo\tbar\n1\t2"

To use a different delimiter, such as “|” for pipe-separated values, use d3.dsvFormat:

const psv = d3.dsvFormat("|");

console.log(psv.parse("foo|bar\n1|2")); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]

For easy loading of DSV files in a browser, see d3-fetch’s d3.csv, d3.tsv and d3.dsv methods.

Installing

If you use npm, npm install d3-dsv. You can also download the latest release on GitHub. For vanilla HTML in modern browsers, import d3-dsv from Skypack:

<script type="module">

import {csvParse} from "https://cdn.skypack.dev/d3-dsv@3";

const data = csvParse(string);

</script>

For legacy environments, you can load d3-dsv’s UMD bundle from an npm-based CDN such as jsDelivr; a d3 global is exported:

<script src="https://cdn.jsdelivr.net/npm/d3-dsv@3"></script>
<script>

const data = d3.csvParse(string);

</script>

API Reference

# d3.csvParse(string[, row]) <>

Equivalent to dsvFormat(",").parse. Note: requires unsafe-eval content security policy.

# d3.csvParseRows(string[, row]) <>

Equivalent to dsvFormat(",").parseRows.

# d3.csvFormat(rows[, columns]) <>

Equivalent to dsvFormat(",").format.

# d3.csvFormatBody(rows[, columns]) <>

Equivalent to dsvFormat(",").formatBody.

# d3.csvFormatRows(rows) <>

Equivalent to dsvFormat(",").formatRows.

# d3.csvFormatRow(row) <>

Equivalent to dsvFormat(",").formatRow.

# d3.csvFormatValue(value) <>

Equivalent to dsvFormat(",").formatValue.

# d3.tsvParse(string[, row]) <>

Equivalent to dsvFormat("\t").parse. Note: requires unsafe-eval content security policy.

# d3.tsvParseRows(string[, row]) <>

Equivalent to dsvFormat("\t").parseRows.

# d3.tsvFormat(rows[, columns]) <>

Equivalent to dsvFormat("\t").format.

# d3.tsvFormatBody(rows[, columns]) <>

Equivalent to dsvFormat("\t").formatBody.

# d3.tsvFormatRows(rows) <>

Equivalent to dsvFormat("\t").formatRows.

# d3.tsvFormatRow(row) <>

Equivalent to dsvFormat("\t").formatRow.

# d3.tsvFormatValue(value) <>

Equivalent to dsvFormat("\t").formatValue.

# d3.dsvFormat(delimiter) <>

Constructs a new DSV parser and formatter for the specified delimiter. The delimiter must be a single character (i.e., a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not.

# dsv.parse(string[, row]) <>

Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of objects representing the parsed rows.

Unlike dsv.parseRows, this method requires that the first line of the DSV content contains a delimiter-separated list of column names; these column names become the attributes on the returned objects. For example, consider the following CSV file:

Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

The resulting JavaScript array is:

[
  {"Year": "1997", "Make": "Ford", "Model": "E350", "Length": "2.34"},
  {"Year": "2000", "Make": "Mercury", "Model": "Cougar", "Length": "2.38"}
]

The returned array also exposes a columns property containing the column names in input order (in contrast to Object.keys, whose iteration order is arbitrary). For example:

data.columns; // ["Year", "Make", "Model", "Length"]

If the column names are not unique, only the last value is returned for each name; to access all values, use dsv.parseRows instead (see example).

If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the + operator), but better is to specify a row conversion function. See d3.autoType for a convenient row conversion function that infers and coerces common types like numbers and strings.

If a row conversion function is specified, the specified function is invoked for each row, being passed an object representing the current row (d), the index (i) starting at zero for the first non-header row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:

const data = d3.csvParse(string, (d) => {
  return {
    year: new Date(+d.Year, 0, 1), // lowercase and convert "Year" to Date
    make: d.Make, // lowercase
    model: d.Model, // lowercase
    length: +d.Length // lowercase and convert "Length" to number
  };
});

Note: using + rather than parseInt or parseFloat is typically faster, though more restrictive. For example, "30px" when coerced using + returns NaN, while parseInt and parseFloat return 30.

Note: requires unsafe-eval content security policy.

# dsv.parseRows(string[, row]) <>

Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of arrays representing the parsed rows.

Unlike dsv.parse, this method treats the header line as a standard row, and should be used whenever DSV content does not contain a header. Each row is represented as an array rather than an object. Rows may have variable length. For example, consider the following CSV file, which notably lacks a header line:

1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

The resulting JavaScript array is:

[
  ["1997", "Ford", "E350", "2.34"],
  ["2000", "Mercury", "Cougar", "2.38"]
]

If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the + operator), but better is to specify a row conversion function. See d3.autoType for a convenient row conversion function that infers and coerces common types like numbers and strings.

If a row conversion function is specified, the specified function is invoked for each row, being passed an array representing the current row (d), the index (i) starting at zero for the first row. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:

const data = d3.csvParseRows(string, (d, i) => {
  return {
    year: new Date(+d[0], 0, 1), // convert first column to Date
    make: d[1],
    model: d[2],
    length: +d[3] // convert fourth column to number
  };
});

In effect, row is similar to applying a map and filter operator to the returned rows.

# dsv.format(rows[, columns]) <>

Formats the specified array of object rows as delimiter-separated values, returning a string. This operation is the inverse of dsv.parse. Each row will be separated by a newline (\n), and each column within each row will be separated by the delimiter (such as a comma, ,). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

If columns is not specified, the list of column names that forms the header row is determined by the union of all properties on all objects in rows; the order of columns is nondeterministic. If columns is specified, it is an array of strings representing the column names. For example:

const string = d3.csvFormat(data, ["year", "make", "model", "length"]);

All fields on each row object will be coerced to strings. If the field value is null or undefined, the empty string is used. If the field value is a Date, the ECMAScript date-time string format (a subset of ISO 8601) is used: for example, dates at UTC midnight are formatted as YYYY-MM-DD. For more control over which and how fields are formatted, first map rows to an array of array of string, and then use dsv.formatRows.

# dsv.formatBody(rows[, columns]) <>

Equivalent to dsv.format, but omits the header row. This is useful, for example, when appending rows to an existing file.

# dsv.formatRows(rows) <>

Formats the specified array of array of string rows as delimiter-separated values, returning a string. This operation is the reverse of dsv.parseRows. Each row will be separated by a newline (\n), and each column within each row will be separated by the delimiter (such as a comma, ,). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

To convert an array of objects to an array of arrays while explicitly specifying the columns, use array.map. For example:

const string = d3.csvFormatRows(data.map((d, i) => {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
}));

If you like, you can also array.concat this result with an array of column names to generate the first row:

const string = d3.csvFormatRows([[
    "year",
    "make",
    "model",
    "length"
  ]].concat(data.map((d, i) => {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
})));

# dsv.formatRow(row) <>

Formats a single array row of strings as delimiter-separated values, returning a string. Each column within the row will be separated by the delimiter (such as a comma, ,). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

# dsv.formatValue(value) <>

Format a single value or string as a delimiter-separated value, returning a string. A value that contains either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

# d3.autoType(object) <>

Given an object (or array) representing a parsed row, infers the types of values on the object and coerces them accordingly, returning the mutated object. This function is intended to be used as a row accessor function in conjunction with dsv.parse and dsv.parseRows. For example, consider the following CSV file:

Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

When used with d3.csvParse,

d3.csvParse(string, d3.autoType)

the resulting JavaScript array is:

[
  {"Year": 1997, "Make": "Ford", "Model": "E350", "Length": 2.34},
  {"Year": 2000, "Make": "Mercury", "Model": "Cougar", "Length": 2.38}
]

Type inference works as follows. For each value in the given object, the trimmed value is computed; the value is then re-assigned as follows:

  1. If empty, then null.
  2. If exactly "true", then true.
  3. If exactly "false", then false.
  4. If exactly "NaN", then NaN.
  5. Otherwise, if coercible to a number, then a number.
  6. Otherwise, if a date-only or date-time string, then a Date.
  7. Otherwise, a string (the original untrimmed value).

Values with leading zeroes may be coerced to numbers; for example "08904" coerces to 8904. However, extra characters such as commas or units (e.g., "$1.00", "(123)", "1,234" or "32px") will prevent number coercion, resulting in a string.

Date strings must be in ECMAScript’s subset of the ISO 8601 format. When a date-only string such as YYYY-MM-DD is specified, the inferred time is midnight UTC; however, if a date-time string such as YYYY-MM-DDTHH:MM is specified without a time zone, it is assumed to be local time.

Automatic type inference is primarily intended to provide safe, predictable behavior in conjunction with dsv.format and dsv.formatRows for common JavaScript types. If you need different behavior, you should implement your own row accessor function.

For more, see the d3.autoType notebook.

Content Security Policy

If a content security policy is in place, note that dsv.parse requires unsafe-eval in the script-src directive, due to the (safe) use of dynamic code generation for fast parsing. (See source.) Alternatively, use dsv.parseRows.

Byte-Order Marks

DSV files sometimes begin with a byte order mark (BOM); saving a spreadsheet in CSV UTF-8 format from Microsoft Excel, for example, will include a BOM. On the web this is not usually a problem because the UTF-8 decode algorithm specified in the Encoding standard removes the BOM. Node.js, on the other hand, does not remove the BOM when decoding UTF-8.

If the BOM is not removed, the first character of the text is a zero-width non-breaking space. So if a CSV file with a BOM is parsed by d3.csvParse, the first column’s name will begin with a zero-width non-breaking space. This can be hard to spot since this character is usually invisible when printed.

To remove the BOM before parsing, consider using strip-bom.

Command Line Reference

dsv2dsv

# dsv2dsv [options…] [file]

Converts the specified DSV input file to DSV (typically with a different delimiter or encoding). If file is not specified, defaults to reading from stdin. For example, to convert to CSV to TSV:

csv2tsv < example.csv > example.tsv

To convert windows-1252 CSV to utf-8 CSV:

dsv2dsv --input-encoding windows-1252 < latin1.csv > utf8.csv

# dsv2dsv -h
# dsv2dsv --help

Output usage information.

# dsv2dsv -V
# dsv2dsv --version

Output the version number.

# dsv2dsv -o file
# dsv2dsv --out file

Specify the output file name. Defaults to “-” for stdout.

# dsv2dsv -r delimiter
# dsv2dsv --input-delimiter delimiter

Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2dsv --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# dsv2dsv -w delimiter
# dsv2dsv --output-delimiter delimiter

Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2dsv --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# csv2tsv [options…] [file]

Equivalent to dsv2dsv, but the output delimiter defaults to the tab character (\t).

# tsv2csv [options…] [file]

Equivalent to dsv2dsv, but the input delimiter defaults to the tab character (\t).

dsv2json

# dsv2json [options…] [file]

Converts the specified DSV input file to JSON. If file is not specified, defaults to reading from stdin. For example, to convert to CSV to JSON:

csv2json < example.csv > example.json

Or to convert CSV to a newline-delimited JSON stream:

csv2json -n < example.csv > example.ndjson

# dsv2json -h
# dsv2json --help

Output usage information.

# dsv2json -V
# dsv2json --version

Output the version number.

# dsv2json -o file
# dsv2json --out file

Specify the output file name. Defaults to “-” for stdout.

# dsv2json -a
# dsv2json --auto-type

Use type inference when parsing rows. See d3.autoType for how it works.

# dsv2json -r delimiter
# dsv2json --input-delimiter delimiter

Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2json --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# dsv2json -r encoding
# dsv2json --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# dsv2json -n
# dsv2json --newline-delimited

Output newline-delimited JSON instead of a single JSON array.

# csv2json [options…] [file]

Equivalent to dsv2json.

# tsv2json [options…] [file]

Equivalent to dsv2json, but the input delimiter defaults to the tab character (\t).

json2dsv

# json2dsv [options…] [file]

Converts the specified JSON input file to DSV. If file is not specified, defaults to reading from stdin. For example, to convert to JSON to CSV:

json2csv < example.json > example.csv

Or to convert a newline-delimited JSON stream to CSV:

json2csv -n < example.ndjson > example.csv

# json2dsv -h
# json2dsv --help

Output usage information.

# json2dsv -V
# json2dsv --version

Output the version number.

# json2dsv -o file
# json2dsv --out file

Specify the output file name. Defaults to “-” for stdout.

# json2dsv --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# json2dsv -w delimiter
# json2dsv --output-delimiter delimiter

Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.)

# json2dsv --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# json2dsv -n
# json2dsv --newline-delimited

Read newline-delimited JSON instead of a single JSON array.

# json2csv [options…] [file]

Equivalent to json2dsv.

# json2tsv [options…] [file]

Equivalent to json2dsv, but the output delimiter defaults to the tab character (\t).

More Repositories

1

d3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript
106,311
star
2

d3-shape

Graphical primitives for visualization, such as lines and areas.
JavaScript
2,458
star
3

d3-plugins

[DEPRECATED] A repository for sharing D3.js V3 plugins.
JavaScript
1,808
star
4

d3-force

Force-directed graph layout using velocity Verlet integration.
JavaScript
1,702
star
5

d3-scale

Encodings that map abstract data to visual representation.
JavaScript
1,567
star
6

d3-queue

Evaluate asynchronous tasks with configurable concurrency.
JavaScript
1,411
star
7

d3-hierarchy

2D layout algorithms for visualizing hierarchical data.
JavaScript
1,064
star
8

d3-geo-projection

Extended geographic projections for d3-geo.
JavaScript
1,058
star
9

d3-geo

Geographic projections, spherical shapes and spherical trigonometry.
JavaScript
988
star
10

d3-scale-chromatic

Sequential, diverging and categorical color scales.
JavaScript
787
star
11

d3-sankey

Visualize flow between nodes in a directed acyclic network.
JavaScript
763
star
12

d3-format

Format numbers for human consumption.
JavaScript
611
star
13

d3-ease

Easing functions for smooth animation.
JavaScript
604
star
14

d3-delaunay

Compute the Voronoi diagram of a set of two-dimensional points.
JavaScript
588
star
15

d3-selection

Transform the DOM by selecting elements and joining to data.
JavaScript
547
star
16

d3-zoom

Pan and zoom SVG, HTML or Canvas using mouse or touch input.
JavaScript
495
star
17

d3-contour

Compute contour polygons using marching squares.
JavaScript
487
star
18

d3-interpolate

Interpolate numbers, colors, strings, arrays, objects, whatever!
JavaScript
482
star
19

d3-array

Array manipulation, ordering, searching, summarizing, etc.
JavaScript
452
star
20

d3-color

Color spaces! RGB, HSL, Cubehelix, CIELAB, and more.
JavaScript
389
star
21

d3-drag

Drag and drop SVG, HTML or Canvas using mouse or touch input.
JavaScript
328
star
22

d3-time-format

Parse and format times, inspired by strptime and strftime.
JavaScript
324
star
23

d3-voronoi

Compute the Voronoi diagram of a set of two-dimensional points.
JavaScript
250
star
24

d3-hexbin

Group two-dimensional points into hexagonal bins.
JavaScript
231
star
25

d3-time

A calculator for humanity’s peculiar conventions of time.
JavaScript
227
star
26

d3-quadtree

Two-dimensional recursive spatial subdivision.
JavaScript
225
star
27

d3-transition

Animated transitions for D3 selections.
JavaScript
219
star
28

d3-fetch

Convenient parsing for Fetch.
JavaScript
215
star
29

d3-axis

Human-readable reference marks for scales.
JavaScript
204
star
30

d3.github.com

The D3 website.
JavaScript
195
star
31

d3-path

Serialize Canvas path commands to SVG.
JavaScript
192
star
32

d3-timer

An efficient queue for managing thousands of concurrent animations.
JavaScript
159
star
33

d3-brush

Select a one- or two-dimensional region using the mouse or touch.
JavaScript
154
star
34

d3-3.x-api-reference

An archive of the D3 3.x API Reference.
153
star
35

d3-random

Generate random numbers from various distributions.
JavaScript
136
star
36

d3-chord

Visualizations relationships or network flow with a circular layout.
JavaScript
122
star
37

d3-tile

Compute the quadtree tiles to display in a rectangular viewport.
JavaScript
120
star
38

d3-collection

Handy data structures for elements keyed by string.
JavaScript
111
star
39

d3-request

A convenient alternative to XMLHttpRequest.
JavaScript
109
star
40

d3-geo-polygon

Clipping and geometric operations for spherical polygons.
JavaScript
102
star
41

d3-polygon

Geometric operations for two-dimensional polygons.
JavaScript
97
star
42

d3-require

A minimal, promise-based implementation to require asynchronous module definitions.
JavaScript
78
star
43

d3-selection-multi

Multi-value syntax for d3-selection and d3-transition.
JavaScript
75
star
44

d3-dispatch

Register named callbacks and call them with arguments.
JavaScript
75
star
45

versor

a home for Mike Bostock's versor.js
JavaScript
34
star
46

d3-bundler

DEPRECATED; use rollup/rollup.
JavaScript
34
star
47

d3-hsv

The HSV (Hue, Saturation, Value) color space.
JavaScript
26
star
48

d3-logo

D3 brand assets.
23
star
49

d3-cam16

A d3 implementation of the CIECAM16 color appearance model.
JavaScript
22
star
50

d3-hcg

The HCG (Hue, Chroma, Grayness) color space derived from the Munsell color system.
JavaScript
20
star
51

d3-scripts

Common scripts for D3 modules.
JavaScript
15
star
52

d3-hull

DEPRECATED; see d3-polygon’s hull function.
JavaScript
14
star
53

blur-benchmark

temporary benchmark for d3.blur implementations
JavaScript
2
star