BigFlow
Documentation
- What is BigFlow?
- Getting started
- Installing Bigflow
- Help me
- BigFlow tutorial
- CLI
- Configuration
- Project structure and build
- Deployment
- Workflow & Job
- Starter
- Technologies
- Development
Cookbook
What is BigFlow?
BigFlow is a Python framework for data processing pipelines on GCP.
The main features are:
- Dockerized deployment environment
- Powerful CLI
- Automated build, deployment, versioning and configuration
- Unified project structure
- Support for GCP data processing technologies — Dataflow (Apache Beam) and BigQuery
- Project starter
Getting started
Start from installing BigFlow on your local machine. Next, go through the BigFlow tutorial.
Installing BigFlow
Prerequisites. Before you start, make sure you have the following software installed:
You can install the bigflow
package globally, but we recommend
installing it locally with venv
, in your project's folder:
python -m venv .bigflow_env
source .bigflow_env/bin/activate
Install the bigflow
PIP package:
pip install bigflow[bigquery,dataflow]
Test it:
bigflow -h
Read more about BigFlow CLI.
To interact with GCP you need to set a default project and log in:
gcloud config set project <your-gcp-project-id>
gcloud auth application-default login
Finally, check if your Docker is running:
docker info
Help me
You can ask questions on our gitter channel or stackoverflow.