• Stars
    star
    118
  • Rank 298,174 (Top 6 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Pipes for CSV

โš ๏ธ Deprecation notice

The datapipes website is now archived and read-only. The gh-pages branch hosts the content of the static version of the website, which is now available at datapipes.datopian.com.

datapipes

A node library, command line tool and webapp to provide "pipe-able" Unix-Style data transformations on row-based data like CSVs.

DataPipes offers unix-style cut, grep, sed operations on row-based data like CSVs in a streaming, connectable "pipe-like" manner.

DataPipes can be used:

Build Status

Install

npm install -g datapipes

Usage - Command line

Once installed, datapipes will be available on the command line:

datapipes -h

See the help for usage instructions, but to give a quick taster:

# head (first 10 rows) of this file
datapipes https://raw.githubusercontent.com/datasets/browser-stats/c2709fe7/data.csv head

# search for occurrences of London (ignore case) and show first 10 results
datapipes https://raw.githubusercontent.com/rgrp/dataset-gla/75b56891/data/all.csv "grep -i london" head

Usage - Library

See the Developer Docs.


Developers

Installation

This is a Node Express application. To install and run do the following.

  1. Clone this repo
  2. Change into the repository base directory
  3. Run:
$ npm install

Testing

Once installed, you can run the tests locally with:

$ npm test

Running

To start the app locally, itโ€™s:

$ node app.js

You can then access it from http://localhost:5000/

Deployment

For deployment we use Heroku.

The primary app is called datapipes on Heroku. To add it as a git remote, do:

$ heroku git:remote -a datapipes

Then to deploy:

$ git push datapipes

Inspirations and Related

  • https://github.com/substack/dnode dnode is an asynchronous rpc system for node.js that lets you call remote functions. You can pass callbacks to remote functions, and the remote end can call the functions you passed in with callbacks of its own and so on. It's callbacks all the way down!

Copyright and License

Copyright 2013-2014 Open Knowledge Foundation and Contributors.

Licensed under the MIT license:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

portaljs

๐ŸŒ€ Rapidly build rich data portals using a modern frontend framework
TypeScript
2,152
star
2

flowershow

๐Ÿ’ Publish your digital garden or any markdown site easily and elegantly.
TypeScript
287
star
3

giftless

๐ŸŽ A pluggable Git LFS server written in Python. Highly customizable and easy to extend.
Python
112
star
4

r2-bucket-uploader

Cloudflare R2 bucket File Uploader with multipart upload enabled. Tested with files up to 10 GB size.
TypeScript
80
star
5

data-cli

data - command line tool for working with data, Data Packages and the DataHub
JavaScript
63
star
6

carbondoomsday

Carbon dashboard on datahub
HTML
41
star
7

frontend-v2

CKAN / Data Portal frontend as microservice in pure Javascript (Node).
JavaScript
38
star
8

dashboard-js

Create beautiful dashboards from data packages
JavaScript
32
star
9

datahub-qa

๐Ÿ“ฆ Bugs, issues and suggestions for datahub.io
32
star
10

frontend

DataHub frontend
SCSS
30
star
11

aircan

๐Ÿ’จ๐Ÿฅซ A Data Factory system for running data processing pipelines built on AirFlow and tailored to CKAN. Includes evolution of DataPusher and Xloader for loading data to DataStore.
Python
23
star
12

deploy

Deployment automation for the DataHub
Shell
22
star
13

portal.js.bak

๐ŸŒ€ The JS data presentation framework. For a single dataset to a full catalog.
JavaScript
22
star
14

flexidate

Date parsing and normalization utilities for Python.
Python
21
star
15

datapub

๐Ÿ“ React-based framework for building data publishing workflows (esp for CKAN)
JavaScript
20
star
16

ckan-cloud-docker

Docker images and compose environment for local development and testing of ckan-cloud
Shell
20
star
17

ckan-cloud-operator

CKAN Cloud operator manages, provisions and configures Ckan Cloud instances and related infrastructure.
Python
19
star
18

ckan-cloud-helm

CKAN on Kubernetes (k8s) - Helm charts
Shell
18
star
19

ckanext-querytool

๐Ÿ” ๐Ÿ“Š ๐ŸŒ CKAN extension that will provide data querying and story sharing with pre configured set of rules ๐Ÿ“ฐ ๐Ÿ“‹
JavaScript
16
star
20

datahub-content

Documentation for DataHub
16
star
21

ckanext-blob-storage

CKAN extension to offload blob storage to cloud storage providers (S3, GCS, Azure etc).
Python
14
star
22

datahub-docs-tech

MOVED to https://tech.datopian.com/datahub/ - Technical Documentation for DataHub software
Shell
14
star
23

data-desktop

Desktop application for creating Data Packages and uploading to DataHub.io
CSS
13
star
24

data-explorer

Data Explorer app and components built in React oriented to use with CKAN
CSS
13
star
25

datapackage-views-js

View library for datapackages and datapackage resources
CSS
12
star
26

dataframe.js

A Javascript-only data library providing functionality like DataFrame in Pandas or R. (Currently in research phase - does this already exist ...)
JavaScript
12
star
27

ourbot

Our personalized bot
JavaScript
11
star
28

comparotron

Compar-o-tron: for comparing numbers stuff
JavaScript
11
star
29

metastore-lib

๐Ÿ—„๏ธ Library for storing dataset metadata, with versioning support and pluggable backends including GitHub.
Python
10
star
30

ckanext-developerpage

CKAN extension to display useful and important system information for portal maintenance and development
Python
10
star
31

datahub-next

Turn Github into a DataHub. Share data + content in a useable form with team-mates or the world.
JavaScript
10
star
32

assembler

The DataHub data assembly line
Python
10
star
33

datahub-client

API for working with DataHub (e.g., push or get a dataset)
JavaScript
10
star
34

specstore

Python
9
star
35

frontend-showcase-js

Javascript frontend code for the DataHub
JavaScript
9
star
36

data-subscriptions

Subscriptions and notifications for CKAN.
Python
9
star
37

jsv

JSON Schema viewer is a lightweight javascript library and tool that turns JSON schemas into a elegant human readable documents.
JavaScript
9
star
38

zen-of-data

Zen of Data (Engineering)
9
star
39

metastore

Data Catalog metadata search service (part of DataHub data management system)
Python
9
star
40

covid-19

Dashboard for COVID-19.
CSS
9
star
41

city-indicators

A dashboard for city indicators starting with London
JavaScript
9
star
42

core-datasets-tools

Tools for working on core datasets
JavaScript
9
star
43

datahq-pm

Project management (issues only)
8
star
44

import-ui

REPLACED by https://github.com/datopian/datapub. A web UI for importing data in a Frictionless way
CSS
8
star
45

playbook

Datopian Team Playbook ๐Ÿ“• https://playbook.datopian.com/
JavaScript
8
star
46

ckanext-versions

A CKAN extension for data versioning.
Python
8
star
47

ckanext-authz-service

Use CKAN to provide authorization tokens for other related systems
Python
8
star
48

tech.datopian.com

Datopian are experts in data management. This is an overview of our technology.
JavaScript
8
star
49

ckan-client-js

Client SDK in javascript for interacting with CKAN including uploading files, adding metadata etc.
JavaScript
8
star
50

datapackage-pipelines-datahub

Datahub Extensions for datapackage-pipelines
Python
8
star
51

bitstore

Bitstore for DataHub
Python
8
star
52

chart-builder

React app for building charts (aka datapackage views)
CSS
7
star
53

slack-bot

This is a repo for the bot on our Slack chat which purpose is to help us to be more productive in work.
JavaScript
7
star
54

ckanext-datagm

Code and theme for the Data GM Open Data Portal at http://datagm.org.uk/
HTML
7
star
55

datahub-py

Python SDK for DataHub.io
Python
7
star
56

ckanext-sweden

CKAN extension for ร–ppnadata.se, the Swedish data management platform
Python
7
star
57

datahub-git-based

โš™๏ธ A design for a next generation, fully-git(hub) + cloud based DataHub.
7
star
58

datastore-query-builder

React app for building Datastore queries on showcase page
JavaScript
7
star
59

ckanext-versioning

Deprecated. See https://github.com/datopian/ckanext-versions. โฐ CKAN extension providing data versioning (metadata and files) based on git and github.
Python
7
star
60

datahub-metrics

Automates daily, weekly and biweekly stats collection from datahub.io
Python
7
star
61

data-api

Next generation Data API for data management systems including CKAN.
JavaScript
6
star
62

dataflow-demo

A demo and thoughts on how dataflow could work
Python
6
star
63

gift-portal

JavaScript
6
star
64

ckanext-noanonaccess

Disable anonymous access to CKAN by redirecting users to login page.
Python
6
star
65

data-resource-filter

Given a resource descriptor with inlined data and filter spec, create a new resource descriptor with inlined filtered data.
JavaScript
6
star
66

metastore-lib-js

metastore-lib-js is a JavaScript library for abstracting metadata storage for datapackage.json packages
JavaScript
6
star
67

ckanext-dataexplorer-react

A new Data Explorer for CKAN built on React.
Python
6
star
68

giftless-client

A Python implementation of a Git-LFS client, with Giftless extras
Python
6
star
69

dashboards

๐Ÿ  for dashboards
HTML
6
star
70

datapackage-normalize-js

A simple Node JS script to automate upgrading pre v1 Data Package specs to v1
JavaScript
6
star
71

hiring

6
star
72

quizzical

A simple, markdown based format for creating flashcards and quizzes plus converters to popular apps like Google Forms and Anki.
JavaScript
6
star
73

resolver

Python
6
star
74

ckanext-orgdashboards

๐Ÿ“‹ CKAN extension for creating organization dashboards ๐Ÿ“‹
CSS
6
star
75

ckanext-orgportals

A CKAN extension for creating organization portals
CSS
5
star
76

data-literate

Experiments in lightweight ways to create, display and share datasets and data-driven stories.
JavaScript
5
star
77

datahub-cli-go

DataHub command line interface in go [deprecated]
Go
5
star
78

ckanext-birmingham

Python
5
star
79

factory

Datahub factory for dataflows
Dockerfile
5
star
80

ckanext-c3charts

c3js based charts for CKAN
JavaScript
5
star
81

data-explorer-graphql

GraphQL based Data Explorer
CSS
5
star
82

map-builder

Map builder UI for Data Explorer app
CSS
5
star
83

events

Python API for sending Datahub Events to ES
Python
5
star
84

filemanager

Python
5
star
85

ckanext-requestdata

๐Ÿ“ง ๐Ÿ“ฌ CKAN extension for requesting new data ๐Ÿ“ง ๐Ÿ“ฌ
Python
5
star
86

frictionless-ckan-mapper-js

๐Ÿ› ๏ธ A JS library for mapping CKAN metadata <=> Frictionless metadata
JavaScript
5
star
87

datahub-emails

DataHQ service for sending out emails
Python
5
star
88

ckanext-tayside

Main extension for Tayside
Python
5
star
89

ckanext-validation

Python
5
star
90

planner

Plan processing based on spec
Python
5
star
91

ckanext-datapub

CKAN extension to integrate your custom DataPub.js based dataset and resource editor
Shell
5
star
92

montreal-city-indicators

KPIs for Montreal Data Viz Dashboard
HTML
5
star
93

ckanext-montrosemaps

Python
4
star
94

ckan-cloud-cluster

Documentation and code to support CKAN Cloud cluster provisioning and management
Shell
4
star
95

datahub-auth

4
star
96

ckan-integration-tests

Cypress toolkit to run integration tests against a CKAN instance
JavaScript
4
star
97

nextjs-tailwind-mdx

MOVED TO https://github.com/datopian/portal.js/tree/main/examples/nextjs-tailwind-mdx [Next.js + Tailwind CSS + MDX Starter Template]
JavaScript
4
star
98

ckanext-aircan

The custom extension for notifying(triggering) the Airflow DAG about the data to be uploaded to DataStore
Python
4
star
99

frontend-functional-tests

JavaScript
4
star
100

ckanext-abtassociates

CSS
4
star