• Stars
    star
    497
  • Rank 85,287 (Top 2 %)
  • Language
    C++
  • License
    MIT License
  • Created about 4 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast CSV parser and writer for Modern C++

csv2

Table of Contents

CSV Reader

#include <csv2/reader.hpp>

int main() {
  csv2::Reader<csv2::delimiter<','>, 
               csv2::quote_character<'"'>, 
               csv2::first_row_is_header<true>,
               csv2::trim_policy::trim_whitespace> csv;
               
  if (csv.mmap("foo.csv")) {
    const auto header = csv.header();
    for (const auto row: csv) {
      for (const auto cell: row) {
        // Do something with cell value
        // std::string value;
        // cell.read_value(value);
      }
    }
  }
}

Performance Benchmark

This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for csv2 to memory-map the input CSV file and iterate over every cell in the CSV. See benchmark/main.cpp for more details.

cd benchmark
g++ -I../include -O3 -std=c++11 -o main main.cpp
./main <csv_file>

System Details

Type Value
Processor 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Installed RAM 32.0 GB (31.9 GB usable)
SSD ADATA SX8200PNP
OS Ubuntu 20.04 LTS running on WSL in Windows 11
C++ Compiler g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0

Results (as of 23 SEP 2022)

Dataset File Size Rows Cols Time
Denver Crime Data 111 MB 479,100 19 0.102s
AirBnb Paris Listings 196 MB 141,730 96 0.170s
2015 Flight Delays and Cancellations 574 MB 5,819,079 31 0.603s
StackLite: Stack Overflow questions 870 MB 17,203,824 7 0.911s
Used Cars Dataset 1.4 GB 539,768 25 0.947s
Title-Based Semantic Subject Indexing 3.7 GB 12,834,026 4 2.867s
Bitcoin tweets - 16M tweets 4 GB 47,478,748 9 3.290s
DDoS Balanced Dataset 6.3 GB 12,794,627 85 6.963s
Seattle Checkouts by Title 7.1 GB 34,892,623 11 7.698s
SHA-1 password hash dump 11 GB 2,62,974,241 2 10.775s
DOHUI NOH scaled_data 16 GB 496,782 3213 16.553s

Reader API

Here is the public API available to you:

template <class delimiter = delimiter<','>, 
          class quote_character = quote_character<'"'>,
          class first_row_is_header = first_row_is_header<true>,
          class trim_policy = trim_policy::trim_whitespace>
class Reader {
public:
  
  // Use this if you'd like to mmap and read from file
  bool mmap(string_type filename);

  // Use this if you have the CSV contents in std::string already
  bool parse(string_type contents);

  // Shape
  size_t rows() const;
  size_t cols() const;
  
  // Row iterator
  // If first_row_is_header, row iteration will start
  // from the second row
  RowIterator begin() const;
  RowIterator end() const;

  // Access the first row of the CSV
  Row header() const;
};

Here's the Row class:

// Row class
class Row {
public:
  // Get raw contents of the row
  void read_raw_value(Container& value) const;
  
  // Cell iterator
  CellIterator begin() const;
  CellIterator end() const;
};

and here's the Cell class:

// Cell class
class Cell {
public:
  // Get raw contents of the cell
  void read_raw_value(Container& value) const;
  
  // Get converted contents of the cell
  // Handles escaped content, e.g., 
  // """foo""" => ""foo""
  void read_value(Container& value) const;
};

CSV Writer

This library also provides a basic csv2::Writer class - one that can be used to write CSV rows to file. Here's a basic usage:

#include <csv2/writer.hpp>
#include <vector>
#include <string>
using namespace csv2;

int main() {
    std::ofstream stream("foo.csv");
    Writer<delimiter<','>> writer(stream);

    std::vector<std::vector<std::string>> rows = 
        {
            {"a", "b", "c"},
            {"1", "2", "3"},
            {"4", "5", "6"}
        };

    writer.write_rows(rows);
    stream.close();
}

Writer API

Here is the public API available to you:

template <class delimiter = delimiter<','>>
class Writer {
public:
  
  // Construct using an std::ofstream
  Writer(output_file_stream stream);

  // Use this to write a single row to file
  void write_row(container_of_strings row);

  // Use this to write a list of rows to file
  void write_rows(container_of_rows rows);

Compiling Tests

mkdir build && cd build
cmake -DCSV2_BUILD_TESTS=ON ..
make
cd test
./csv2_test

Generating Single Header

python3 utils/amalgamate/amalgamate.py -c single_include.json -s .

Contributing

Contributions are welcome, have a look at the CONTRIBUTING.md document for more information.

License

The project is available under the MIT license.

More Repositories

1

awesome-hpp

A curated list of awesome header-only C++ libraries
3,057
star
2

indicators

Activity Indicators for Modern C++
C++
2,736
star
3

argparse

Argument Parser for Modern C++
C++
2,224
star
4

tabulate

Table Maker for Modern C++
C++
1,726
star
5

pprint

Pretty Printer for Modern C++
C++
907
star
6

structopt

Parse command line arguments by defining a struct
C++
451
star
7

alpaca

Serialization library written in C++17 - Pack C++ structs into a compact byte-array without any macros or boilerplate code
C++
399
star
8

fccf

fccf: A command-line tool that quickly searches through C/C++ source code in a directory based on a search string and prints relevant code snippets that match the query.
C++
342
star
9

csv

[DEPRECATED] See https://github.com/p-ranav/csv2
C++
233
star
10

glob

Glob for C++17
C++
221
star
11

binary_log

Fast binary logger for C++
C++
207
star
12

criterion

Microbenchmarking for Modern C++
C++
202
star
13

hypergrep

Recursively search directories for a regex pattern
C++
158
star
14

saveddit

Bulk Downloader for Reddit
Python
156
star
15

PhotoLab

AI-Powered Photo Editor (Python, PyQt6, PyTorch)
Python
123
star
16

box

box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.
Python
116
star
17

cppgit2

Git for Modern C++ (A libgit2 Wrapper Library)
C++
106
star
18

repr

repr for Modern C++: Return printable string representation of a value
C++
83
star
19

psched

Priority-based Task Scheduling for Modern C++
C++
80
star
20

fswatch

File/Directory Watcher for Modern C++
C++
70
star
21

envy

envy: Deserialize environment variables into type-safe structs
C++
66
star
22

iris

Lightweight Component Model and Messaging Framework based on ØMQ
C++
53
star
23

merged_depth

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models
Python
45
star
24

pipeline

Pipelines for Modern C++
C++
42
star
25

unicode_display_width

Displayed width of UTF-8 strings in Modern C++
C++
38
star
26

task_system

Task System presented in "Better Code: Concurrency - Sean Parent"
C++
38
star
27

cgol

Conway's Game of Life in the Terminal
C++
33
star
28

jsonlint

Lightweight command-line tool for validating JSON
C++
32
star
29

small_vector

"Small Vector" optimization for Modern C++: store up to a small number of items on the stack
C++
31
star
30

result

Result<T, E> for Modern C++
C++
29
star
31

container_traits

Container Traits for Modern C++
C++
24
star
32

lexer

Hackable Lexer with UTF-8 support
C++
21
star
33

lc

Fast multi-threaded line counter in Modern C++ (2-10x faster than `wc -l` for large files)
C++
17
star
34

oystr

oystr recursively searches directories for a substring.
C++
10
star
35

walnut.v1

The Walnut programming language
C++
8
star
36

line-detector

OpenCV-based Hough Transform Line Detection
C++
8
star
37

ttt

Terminal Typing Test
C++
6
star
38

wxPython-text-editor

wxPython Text Editor
Python
6
star
39

Vulkan-Earth

Vulkan-based 3D Rendering of Earth
HTML
6
star
40

DiverseDepth

The code and data of DiverseDepth
Python
6
star
41

strcpp.old

String Manipulation API for C++
C++
5
star
42

OpenGL-Engine

OpenGL 3D Rendering Engine
C++
5
star
43

zcm

A Lightweight Component Model using ZeroMQ
C++
4
star
44

any_of_trait

Type traits for any_of and any_but
C++
4
star
45

StaticAnalysis

GitHub action for C++ static analysis
Python
4
star
46

ImageViewer-Qt6

Minimalist image viewer in Qt6
C++
3
star
47

krpci

C++ client to kRPC for communication with Kerbal Space Program (KSP)
C++
2
star
48

activity-plotter

Linux Scheduler Thread Activity Plotter
Python
2
star
49

video_device_discovery

Find all video devices connected to Linux-based embedded platform
C++
2
star
50

python-zcm

ZeroMQ-based Component Model in Python
Python
2
star
51

emacs_config

Emacs configuration
Emacs Lisp
1
star
52

plexil-analysis

Timing Analysis for the Plan Interchange Language (Plexil)
Python
1
star
53

object-tracker

OpenCV-based Real-time Object Tracking
C++
1
star
54

json.old

JSON Manipulation Library for C++
C++
1
star
55

phd-dissertation

TeX
1
star
56

OpenGL-Engine-II

OpenGL 3D Rendering Engine II - Alternate Architecture
C++
1
star
57

arangit

Python program that can scan a .git folder and reconstruct a git version control property graph in ArangoDB
Python
1
star
58

ros-installer

Script to install ROS Indigo from source
Python
1
star