• Stars
    star
    1,311
  • Rank 35,893 (Top 0.8 %)
  • Language
    Go
  • License
    MIT License
  • Created over 9 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Import CSV and JSON into PostgreSQL the easy way

pgfutter Build Status Go Report Card License

elephant

Import CSV and line delimited JSON into PostgreSQL the easy way. This small tool abstract all the hassles and swearing you normally have to deal with when you just want to dump some data into the database.

Features:

  • Generated import tables (pgfutter csv <file> and you're done)
  • Good performance using the COPY streaming protocol
  • Easy deployment
  • Dealing with import errors
  • Import over the network
  • Only supports UTF8 encoding

Check out pgclimb for exporting data from PostgreSQL into different data formats.

Install

You can download a single binary for Linux, OSX or Windows.

OSX

wget -O pgfutter https://github.com/lukasmartinelli/pgfutter/releases/download/v1.2/pgfutter_darwin_amd64
chmod +x pgfutter

./pgfutter --help

Linux

wget -O pgfutter https://github.com/lukasmartinelli/pgfutter/releases/download/v1.2/pgfutter_linux_amd64
chmod +x pgfutter

./pgfutter --help

Install from source

go get github.com/lukasmartinelli/pgfutter

If you are using Windows or 32-bit architectures you need to download the appropriate binary yourself.

Import CSV

pgfutter will deal with CSV files conforming to RFC 4180.

Create friends.csv.

name,age,friends
Jacob,26,"Anthony"
Anthony,25,""
Emma,28,"Jacob,Anthony"

Import the CSV file.

pgfutter csv friends.csv

Because header rows are already provided pgfutter will create the appropriate table and copy the rows.

name age friends
Jacob 26 Anthony
Anthony 25
Emma 28 Jacob,Anthony

pgfutter will only help you to get the data into the database. After that SQL is a great language to sanitize and normalize the data according to your desired database schema.

CREATE TABLE public.person (
    name VARCHAR(200) PRIMARY KEY,
    age INTEGER
)

CREATE TABLE public.friendship (
    person VARCHAR(200) REFERENCES public.person(name),
    friend VARCHAR(200) REFERENCES public.person(name)
)

INSERT INTO public.person
SELECT name, age::int
FROM import.friends

WITH friends AS
    (SELECT name as person, regexp_split_to_table(friends, E'\\,') AS friend
    FROM import.friends)
INSERT INTO public.friendship
SELECT * FROM
friends WHERE friend <> ''

Import JSON

A lot of event logs contain JSON objects nowadays (e.g. GitHub Archive). pgfutter expects each line to have a valid JSON object. Importing JSON is only supported for Postgres 9.3 and Postgres 9.4 due to the JSON type.

Create friends.json.

{"name": "Jacob", "age": 26, "friends": ["Anthony"]}
{"name": "Anthony", "age": 25, "friends": []}
{"name": "Emma", "age": 28, "friends": ["Jacob", "Anthony"]}

Import the JSON file.

pgfutter json friends.json

Your JSON objects will be stored in a single JSON column called data.

data
{"name": "Jacob", "age": 26, "friends": ["Anthony"]}
{"name": "Anthony", "age": 25, "friends": []}
{"name": "Emma", "age": 28, "friends": ["Jacob", "Anthony"]}

PostgreSQL has excellent JSON support which means you can then start normalizing your data.

CREATE TABLE public.person (
    name VARCHAR(200) PRIMARY KEY,
    age INTEGER
)

CREATE TABLE public.friendship (
    person VARCHAR(200) REFERENCES public.person(name),
    friend VARCHAR(200) REFERENCES public.person(name)
)

INSERT INTO public.person
SELECT data->>'name' as name, (data->>'age')::int as age
FROM import.friends

INSERT INTO public.friendship
SELECT data->>'name' as person, json_array_elements_text(data->'friends')
FROM import.friends

Database Connection

Database connection details can be provided via environment variables or as separate flags.

name default description
DB_NAME postgres database name
DB_HOST localhost host name
DB_PORT 5432 port
DB_SCHEMA import schema to create tables for
DB_USER postgres database user
DB_PASS password (or empty if none)

Advanced Use Cases

Custom delimiter

Quite often you want to specify a custom delimiter (default: ,).

pgfutter csv -d "\t" traffic_violations.csv

You have to use " as a quoting character and \ as escape character. You might omit the quoting character if it is not necessary.

Using TAB as delimiter

If you want to use tab as delimiter you need to pass $'\t' as delimiter to ensure your shell does not swallow the correct delimiter.

pgfutter csv -d $'\t' traffic_violations.csv

Custom header fields

If you want to specify the field names explicitly you can skip the header row and pass a comma separated field name list.

pgfutter csv --skip-header --fields "name,state,year" traffic_violations.csv

If you don't have a header row in a document you should specify the field names as well.

pgfutter csv --fields "name,state,year" traffic_violations.csv

Encoding

All CSV files need to be utf-8 encoded. No other encoding is supported. Encoding is a nasty topic and you should deal with it before it enters the database.

Dealing with invalid input

A lot of CSV files don't confirm to proper CSV standards. If you want to ignore errors you can pass the --ignore-errors flag which will commit the transaction even if some rows cannot be imported. The failed rows will be written to stdout so you can clean them up with other tools.

pgfutter --ignore-errors csv traffic_violations.csv 2> traffic_violations_errors.csv

This works the same for invalid JSON objects.

Custom Table

pgfutter will take the sanitized filename as the table name. If you want to specify a custom table name or import into your predefined table schema you can specify the table explicitly.

pgfutter --table violations csv traffic_violations.csv

Alternatives

For more sophisticated needs you should take a look at pgloader.

Regression Tests

The program is tested with open data sets from around the world.

Download all samples into the folder samples.

./download-samples.sh

Run import regression tests against the samples.

./test.sh

Cross-compiling

We use gox to create distributable binaries for Windows, OSX and Linux.

docker run --rm -v "$(pwd)":/usr/src/pgfutter -w /usr/src/pgfutter tcnksm/gox:1.9

More Repositories

1

py14

Python to C++ 14 transpiler
C++
573
star
2

pipecat

Connect UNIX pipes and message queues
Go
435
star
3

pgclimb

Export data from PostgreSQL into different data formats
Go
386
star
4

nigit

Web server that wraps around programs and shell scripts and exposes them as API
Go
383
star
5

redis-pipe

Treat Redis Lists like Unix Pipes
Go
283
star
6

postgis-editor

An accessible PostGIS query editor and visualizer.
JavaScript
192
star
7

mapbox-gl-inspect

Inspection plugin for Mapbox GL JS
JavaScript
142
star
8

naturalearthtiles

Natural Earth vector tiles (MVT) and raster tiles free and ready to use.
PLpgSQL
89
star
9

hwk

A Haskell based awk and sed alternative
Haskell
67
star
10

osm-noise-pollution

Approximate global noise pollution with OSM data and very simple noise model
Shell
66
star
11

php-dos-attack

Exploit json_decode vulnerability of PHP
PHP
35
star
12

osm-activity

Show global OpenStreetMap activity on a map
JavaScript
30
star
13

swissdem

Digital Elevation Model for Switzerland from SRTM (1 arc second / 25m) as download
Shell
17
star
14

osm-lakelines

Calculate nice centered linestrings for labelling OpenStreetMap lakes
Shell
15
star
15

px-to-csv

Convert PC-Axis files to CSV
JavaScript
14
star
16

mbtoolbox

MBTiles tools for optimizing and verifying MBTiles files
Python
13
star
17

push-it

Plays an encouraging sound when you do a git push
Shell
13
star
18

ghrr

Create realtime apps on top of GitHub
JavaScript
12
star
19

sharpen

Solve algorithmic Python challenges to sharpen the tools.
Python
11
star
20

detectivegit

Detective git takes a look at your repo and shows the hotspots and possible bugs.
JavaScript
9
star
21

osm-qa-filter

Extract GeoJSON features from OSM QA tiles
JavaScript
9
star
22

osm-simple-features

Defines an opinionated mapping from OSM to simple GeoJSON features with multiple layers and defined schemas.
JavaScript
7
star
23

swissnames

Curated extracts from the free swissNAMES3D data set from swisstopo.
Shell
5
star
24

delptr

Informs the world about people who still use naked pointers in C++
JavaScript
5
star
25

location-history-to-geojson

Turn your Google Location History into a GeoJSON feature collection
JavaScript
3
star
26

lukasmartinelli.github.io

Personal blog and portfolio.
HTML
2
star
27

battle-of-britain-map

A Mapbox GL map showcasing the air battle of Britain
CSS
2
star
28

biketour

Tracking my bike tour from Switzerland to Greece
HTML
1
star