• Stars
    star
    600
  • Rank 72,284 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Pipeline Framework using the singer.io spec

PipelineWise

PyPI - Python Version License: Apache2

PipelineWise is a Data Pipeline Framework using the Singer.io specification to ingest and replicate data from various sources to various destinations. Documentation is available at https://transferwise.github.io/pipelinewise/

Logo

Table of Contents

Features

  • Built with ELT in mind: PipelineWise fits into the ELT landscape and is not a traditional ETL tool. PipelineWise aims to reproduce the data from the source to an Analytics-Data-Store in as close to the original format as possible. Some minor load time transformations are supported but complex mapping and joins have to be done in the Analytics-Data-Store to extract meaning.

  • Managed Schema Changes: When source data changes, PipelineWise detects the change and alters the schema in your Analytics-Data-Store automatically

  • Load time transformations: Ideal place to obfuscate, mask or filter sensitive data that should never be replicated in the Data Warehouse

  • YAML based configuration: Data pipelines are defined as YAML files, ensuring that the entire configuration is kept under version control

  • Lightweight: No daemons or database setup are required

  • Extensible: PipelineWise is using Singer.io compatible taps and target connectors. New connectors can be added to PipelineWise with relatively small effort

Official docker images

Pipelinewise images are published to: dockerhub

Pull image with:

docker pull transferwiseworkspace/pipelinewise:{tag}

Connectors

Tap extracts data from any source and write it to a standard stream in a JSON-based format, and target consumes data from taps and do something with it, like load it into a file, API or database

Type Name Extra Latest Version Description
Tap Postgres PyPI version Extracts data from PostgreSQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap MySQL PyPI version Extracts data from MySQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap Kafka PyPI version Extracts data from Kafka topics
Tap S3 CSV PyPI version Extracts data from S3 csv files (currently a fork of tap-s3-csv because we wanted to use our own auth method)
Tap Zendesk PyPI version Extracts data from Zendesk using OAuth and Key-Based incremental replications
Tap Snowflake PyPI version Extracts data from Snowflake databases. Supporting Key-Based Incremental and Full Table replications
Tap Salesforce PyPI version Extracts data from Salesforce database using BULK and REST extraction API with Key-Based incremental replications
Tap Jira PyPI version Extracts data from Atlassian Jira using Base auth or OAuth credentials
Tap MongoDB PyPI version Extracts data from MongoDB databases. Supporting Log-Based and Full Table replications
Tap Google Analytics Extra PyPI version Extracts data from Google Analytics
Tap Oracle Extra PyPI version Extracts data from Oracle databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap Zuora Extra PyPI version Extracts data from Zuora database using AQAA and REST extraction API with Key-Based incremental replications
Tap GitHub PyPI version Extracts data from GitHub API using Personal Access Token and Key-Based incremental replications
Tap Shopify Extra PyPI version Extracts data from Shopify API using Personal App API Password and date based incremental replications
Tap Slack PyPI version Extracts data from a Slack API using Bot User Token and Key-Based incremental replications
Tap Mixpanel PyPI version Extracts data from the Mixpanel API.
Tap Twilio PyPI version Extracts data from the Twilio API using OAuth and Key-Based incremental replications.
Target Postgres PyPI version Loads data from any tap into PostgreSQL database
Target Redshift PyPI version Loads data from any tap into Amazon Redshift Data Warehouse
Target Snowflake PyPI version Loads data from any tap into Snowflake Data Warehouse
Target S3 CSV PyPI version Uploads data from any tap to S3 in CSV format
Transform Field PyPI version Transforms fields from any tap and sends the results to any target. Recommended for data masking/ obfuscation

Note: Extra connectors are experimental connectors and written by community contributors. These connectors are not maintained regularly and not installed by default. To install the extra packages use the --connectors=all option when installing PipelineWise.

Running from docker

If you have Docker installed then using docker is the recommended and easiest method to start using PipelineWise.

Use official image

PipelineWise images are built on each release and available on Dockerhub

```sh
$ docker pull transferwiseworkspace/pipelinewise
```

Build your own docker image

  1. Build an executable docker image that has every required dependency and is isolated from your host system.

By default, the image will build with all connectors. In order to keep image size small, we strongly recommend you change it to just the connectors you need by supplying the --build-arg command:

```sh
$ docker build --build-arg connectors=tap-mysql,target-snowflake -t pipelinewise:latest .
```
  1. Once the image is ready, create an alias to the docker wrapper script:

    $ alias pipelinewise="$(PWD)/bin/pipelinewise-docker"
  2. Check if the installation was successful by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines is at creating pipelines.

Running tests:

Building from source

  1. Make sure that all dependencies are installed on your system:

    • Python 3.x
    • python3-dev
    • python3-venv
    • mongo-tools
    • mbuffer
  2. Run the Makefile that installs the PipelineWise CLI and all supported singer connectors into separate virtual environments:

    $ make pipelinewise  all_connectors

    Press Y to accept the license agreement of the required singer components. To automate the installation and accept every license agreement run:

    $ make pipelinewise all_connectors -e pw_acceptlicenses=y

    And to install only a specific list of singer connectors:

    $ make connectors -e pw_connector=<connector_1>,<connector_2>

    Run make or make -h to see the help for Makefile and all options.

  3. To start the CLI you need to activate the CLI virtual environment and set PIPELINEWISE_HOME environment variable:

    $ source {ACTUAL_ABSOLUTE_PATH}/.virtualenvs/pipelinewise/bin/activate
    $ export PIPELINEWISE_HOME={ACTUAL_ABSOLUTE_PATH}

    (The ACTUAL_ABSOLUTE_PATH differs on every system, running make -h prints the correct commands for CLI)

  4. Check if the installation was successful by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines can be found here: creating pipelines.

To run unit tests:

$ pytest --ignore tests/end_to_end

To run unit tests and generate code coverage:

$ coverage run -m pytest --ignore tests/end_to_end && coverage report

To generate code coverage HTML report.

$ coverage run -m pytest --ignore tests/end_to_end && coverage html -d coverage_html

Note: The HTML report will be generated in coverage_html/index.html

To run integration and end-to-end tests:

To run integration and end-to-end tests you need to use the Docker Development Environment. This will spin up a pre-configured PipelineWise project with pre-configured source and target databases in several docker containers which is required for the end-to-end test cases.

Developing with Docker

If you have Docker and Docker Compose installed, you can create a local development environment that includes not only the PipelineWise executables but also a pre-configured development project with some databases as source and targets for a more convenient development experience and to run integration and end-to-end tests.

For further instructions about setting up local development environment go to Test Project for Docker Development Environment.

Contribution

To add new taps and targets follow the instructions on

Links

License

Apache License Version 2.0

See LICENSE to see the full text.

Important Note:

PipelineWise as a standalone software is licensed under Apache License Version 2.0 but bundled components can use different licenses and may overwrite the terms and conditions detailed in Apache License Version 2.0. You can customise which connectors you want to include into the final PipelineWise build and the final license of your build depends on the included connectors. For further details please check the Licenses section in the documentation.

More Repositories

1

sequence-layout

A vertical sequence UI component for Android
Kotlin
475
star
2

banks-reference-android

Reference Android code integrating TransferWise's services into a (Demo) Bank App
Kotlin
216
star
3

wise-pizza

A library to find and visualise the most interesting slices in multidimensional data
Jupyter Notebook
89
star
4

tw-tasks-executor

Java
70
star
5

idempotence4j

Lightweight library for handling idempotent actions
Java
54
star
6

neptune-web

Wise Web Design System
JavaScript
54
star
7

pipelinewise-target-snowflake

Singer.io Target for Snowflake - PipelineWise compatible
Python
51
star
8

public-api-postman-collection

A Postman collection for exploring and testing the TransferWise public API
41
star
9

prometheus-envoy-dashboards

40
star
10

pipelinewise-tap-postgres

Singer.io Tap for PostgreSQL - PipelineWise compatible
Python
38
star
11

ng-browser-info

AngularJS service that gives you a collection of methods for knowing more about your client browser
JavaScript
37
star
12

actions-next-bundle-analyzer

Github Action that analyses page bundle sizes from a Next.js build
TypeScript
24
star
13

banks-reference-backend

Reference Backend code integrating TransferWise's services for a (Demo) Bank App
Java
22
star
14

pipelinewise-target-postgres

Singer.io Target for PostgreSQL - PipelineWise compatible
Python
20
star
15

pipelinewise-transform-field

Singer.io transformation component between Taps and Targets - PipelineWise compatible
Python
19
star
16

pipelinewise-tap-mysql

Singer.io Tap for MySQL - PipelineWise compatible
Python
16
star
17

openbanking-client

Java client for using the UK Open Banking API
Java
15
star
18

pipelinewise-target-s3-csv

Singer.io Target for CSV on S3 - PipelineWise compatible
Python
15
star
19

neural-lifetimes

User behavior prediction from event data.
Python
14
star
20

digital-signatures-examples

Go
14
star
21

pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
Python
12
star
22

formatting

VanillaJS library for formatting different things.
TypeScript
12
star
23

tw-tkms

Java
8
star
24

pipelinewise-singer-python

Writes the Singer format from Python https://singer.io - Pipelinewise compatible
Python
8
star
25

cloudflare-prometheus-exporter

Prometheus exporter powered by Cloudflare GraphQL API.
Python
7
star
26

public-widgets

Wise Public Widgets
7
star
27

pipelinewise-tap-s3-csv

Singer.io Tap for CSV files on S3 - PipelineWise compatible
Python
7
star
28

pipelinewise-tap-kafka

Singer.io Tap for Kafka - PipelineWise compatible
Python
7
star
29

oomie

Maps system OOM messages to Kubernetes pods, emitting k8s events.
Go
7
star
30

url-locale

URL localisation
Java
6
star
31

cicada

Cicada scheduler
Python
5
star
32

payment-components

Angular components (directive-controller-view-styles bundles as it is Angular 1) for payment steps in TransferWise products.
CSS
5
star
33

pipelinewise-tap-mongodb

Singer.io Tap for MongoDB - PipelineWise compatible
Python
5
star
34

spiffe-kafka-talk

Resources for Levani and Jon's talk about SPIFFE and Kafka
HTML
5
star
35

pipelinewise-tap-snowflake

Singer.io Tap for Snowflake - PipelineWise compatible
Python
5
star
36

webpack-translations-plugin

JavaScript
5
star
37

release-to-github-with-changelog

JavaScript
4
star
38

tw-graceful-shutdown

Graceful Shutdown System.
Java
4
star
39

historic-rates-line-chart

JavaScript
3
star
40

acorn

mission days 2021
Java
3
star
41

wise-platform-samples

Code samples to get started with common Wise API use cases
TypeScript
3
star
42

tw-sketch-library

Sketch library of TW UI components for use with Invision's craft sketch plugin
3
star
43

pipelinewise-tap-google-analytics

Singer.io Tap for Google Analytics - PipelineWise compatible
Python
3
star
44

spire-k8s-registrar

Go
3
star
45

actions-pr-checker

Github Action to check PR title/description/labels.
Shell
3
star
46

tw-experimentation

AB testing tool
Jupyter Notebook
3
star
47

interview

Java
2
star
48

crypto

Crypto helper modules
Go
2
star
49

mitosis

A/B split filter
Groovy
2
star
50

pipelinewise-tap-twilio

Singer.io Tap for Twilio - PipelineWise compatible
Python
2
star
51

hisel

Feature selection tool based on Hilbert-Schmidt Independence Criterion
Jupyter Notebook
2
star
52

tw-context

Java
2
star
53

wise-envoy-xds

Wise Envoy xDS
Java
2
star
54

iconfont

TransferWise IconFont
HTML
1
star
55

terra

A minimalistic library for object hydration. Useful for data to object reconstruction mechanics.
Java
1
star
56

cable

Simple and minimalistic url rewriter
Java
1
star
57

tw_bootcamp_2019

Take home exercise
Java
1
star
58

.github

Default community health files for TransferWise GitHub organization
1
star
59

tw-challenge-february-2015

Template repository for TW Challenge submissions
1
star
60

actions-slack-notify

Send slack notifications with easy
1
star
61

Numair_Repo

Repository containing codes by Numair Fazili
1
star
62

tw-leader-selector

Java
1
star
63

spring-cloud-tex

EnableConfigServer
Java
1
star
64

cache-algorithms-demo

Visually shows the behavior of different cache algorithms.
Java
1
star
65

tlint

Simple tool for linting of configurations files
Go
1
star
66

tw-challenge-2014

Base repository for TransferWise Challenge - please fork for each team and make repository public. Sharing is caring!
1
star
67

sanitize-branch-name

A Github action that determines the branch name, then sanitizes it of any forward slashes
1
star
68

digital-signatures

Java
1
star
69

FloatingPlaceholderTextField

Yet another "float label pattern" component written in Swift
Swift
1
star
70

tw-curator

Auto-configurator for Apache Curator
Java
1
star
71

send-money-pages-og-images

If a user shares one of the send-money pages on Facebook, the image that’s displayed in Facebook to represent the page is drawn from this repository.
JavaScript
1
star
72

legal-terms

This repository is intended to contain all T&C, ToS and other legal documents that can be required through a programatic endpoint for the use and information of partners and other third parties.
HTML
1
star