• Stars
    star
    401
  • Rank 106,982 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated 22 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

Ethereum ETL Airflow

Read this article: https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset

Setting up Airflow DAGs using Google Cloud Composer

Create BigQuery Datasets

Create Google Cloud Storage bucket

Create Google Cloud Composer (version 2) environment

Create a new Cloud Composer environment:

export ENVIRONMENT_NAME=ethereum-etl-0

AIRFLOW_CONFIGS_ARR=(
    "celery-worker_concurrency=8"
    "scheduler-dag_dir_list_interval=300"
    "scheduler-min_file_process_interval=120"
)
export AIRFLOW_CONFIGS=$(IFS=, ; echo "${AIRFLOW_CONFIGS_ARR[*]}")

gcloud composer environments create \
    $ENVIRONMENT_NAME \
    --location=us-central1 \
    --image-version=composer-2.1.14-airflow-2.5.1 \
    --environment-size=medium \
    --scheduler-cpu=2 \
    --scheduler-memory=13 \
    --scheduler-storage=1 \
    --scheduler-count=1 \
    --web-server-cpu=1 \
    --web-server-memory=2 \
    --web-server-storage=512MB \
    --worker-cpu=2 \
    --worker-memory=13 \
    --worker-storage=10 \
    --min-workers=1 \
    --max-workers=8 \
    --airflow-configs=$AIRFLOW_CONFIGS

gcloud composer environments update \
    $ENVIRONMENT_NAME \
    --location=us-central1 \
    --update-pypi-packages-from-file=requirements_airflow.txt

Create variables in Airflow (Admin > Variables in the UI):

Variable Description
ethereum_output_bucket GCS bucket to store exported files
ethereum_provider_uris Comma separated URIs of Ethereum nodes
ethereum_destination_dataset_project_id Project ID of BigQuery datasets
notification_emails email for notifications

Check other variables in dags/ethereumetl_airflow/variables.py.

Updating package requirements

Suggested package requirements for Composer are stored in requirements_airflow.txt.

You can update the Composer environment using the follow script:

ENVIRONMENT_NAME="ethereum-etl-0"
LOCAL_REQUIREMENTS_PATH="$(mktemp)"

# grep pattern removes comments and whitespace:
cat "./requirements_airflow.txt" | grep -o '^[^#| ]*' > "$LOCAL_REQUIREMENTS_PATH"

gcloud composer environments update \
  "$ENVIRONMENT_NAME" \
  --location="us-central1" \
  --update-pypi-packages-from-file="$LOCAL_REQUIREMENTS_PATH"

Note: Composer can be very pedantic about conflicts in additional packages. You may have to fix dependency conflicts where you had no issues testing locally (when updating dependencies, Composer does something "cleverer" than just pip install -r requirements.txt). This is why eth-hash is currently pinned in requirements_airflow.txt. Typically we have found that pinning eth-hash and/or eth-rlp may make things work, though Your Mileage May Vary.

See this issue for further ideas on how to unblock problems you may encounter.

Upload DAGs

> ./upload_dags.sh <airflow_bucket>

Running Tests

pip install \
    -r requirements_test.txt \
    -r requirements_local.txt \
    -r requirements_airflow.txt
pytest -vv -s

Running locally

A docker compose definition has been provided to easily spin up a local Airflow instance.

To build the required image:

docker compose build

To start Airflow:

docker compose up airflow

The instance requires the CLOUDSDK_CORE_PROJECT environment variable to be set in most cases. Airflow Variables can be defined in variables.json.

Creating Table Definition Files for Parsing Events and Function Calls

Read this article: https://medium.com/@medvedev1088/query-ens-and-0x-events-with-sql-in-google-bigquery-4d197206e644

More Information

You can follow the instructions here for Polygon DAGs https://github.com/blockchain-etl/polygon-etl. The architecture there is very similar to Ethereum so in most case substituting polygon for ethereum will work. Contributions to this README file for porting documentation from Polygon to Ethereum are welcome.

More Repositories

1

ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Python
2,902
star
2

bitcoin-etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Python
396
star
3

awesome-bigquery-views

Useful SQL queries for Blockchain ETL datasets in BigQuery.
203
star
4

public-datasets

The list of public blockchain datasets in BigQuery
187
star
5

ethereum-etl-postgres

ETL for moving Ethereum data to PostgreSQL database
Shell
137
star
6

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Python
100
star
7

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Python
74
star
8

ethereum2-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum 2.0 beacon blocks, attestations, deposits, slashings, validators, committees. Data is available in Google BigQuery
Python
68
star
9

blockchain-etl-architecture

Blockchain ETL Architecture
44
star
10

ethers.js-bigquery

ethers.js library, compiled for use in Google BigQuery
JavaScript
39
star
11

solana-etl-airflow

ETL for Solana. Contributions are welcome. Join the Telegram channel https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ
Python
32
star
12

bitcoin-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/bitcoin-etl
Python
30
star
13

blockchain-kubernetes

Kubernetes manifests for running blockchain nodes
Smarty
26
star
14

ethereum-etl-neo4j

ETL for moving Ethereum data to Neo4j database
Shell
20
star
15

bigquery-to-pubsub

A tool for streaming time series data from a BigQuery table to a Pub/Sub topic
Python
16
star
16

bitcoin-etl-airflow-neo4j

Airflow DAGs for ingesting Bitcoin blockchain data to Neo4j
Python
14
star
17

tezos-etl

Python scripts for ETL (extract, transform and load) jobs for Tezos blocks, balance updates, and operations
Python
13
star
18

blockchain-etl-table-definition-cli

CLI for generating table definitions for https://github.com/blockchain-etl/ethereum-etl-airflow
Python
12
star
19

hedera-etl

ETL scripts for Hedera Hashgraph
Java
10
star
20

blockchain-streaming-analytics

Blockchain streaming analytics
Java
9
star
21

eos-etl

ETL scripts for EOS.
Python
9
star
22

ethereum-export-pipeline

UNMAINTAINED! AWS CloudFormation scripts for Ethereum ETL export pipeline
Python
8
star
23

data-studio-connectors

Connect Google BigQuery crypto public datasets to Google Data Studio
JavaScript
7
star
24

abi-functions

7
star
25

blockchain-etl-dataflow

Dataflow pipelines for Blockchain ETL. Connects Pub/Sub topics with BigQuery tables.
Java
7
star
26

blockchain-terraform-deployment

Template repository for deploying https://github.com/blockchain-etl/blockchain-terraform
HCL
6
star
27

icon-etl

Python scripts for ETL (extract, transform and load) jobs for ICON blocks, transactions, receipts, and logs.
Python
6
star
28

ethereum2-etl-airflow

Airflow DAGs for exporting Ethereum 2.0 blockchain data to Google BigQuery
Python
5
star
29

blockchain-terraform

Terraform configuration files for running blockchain nodes
HCL
5
star
30

abi-parser

Web app which parses smart contracts and outputs queries and tables for Ethereum-ETL
JavaScript
5
star
31

blockchain-etl-common

Common utils for blockchain-etl
Python
5
star
32

abi

EVM public good - pull requests welcome for any ABI from any EVM
5
star
33

band-etl

ETL (extract, transform and load) tools for ingesting Band Protocol blockchain data to Google BigQuery and Pub/Sub
Python
5
star
34

tezos-kubernetes

Kubernetes manifests for running Tezos node
Shell
5
star
35

iotex-etl

ETL (extract, transform and load) tools for ingesting IoTeX blockchain data to Google BigQuery and Pub/Sub
Python
5
star
36

ordinals-etl

Python
4
star
37

anomalous-transactions-detector-dataflow

Dataflow pipeline for detecting anomalous transactions on the Ethereum and Bitcoin blockchains
Java
4
star
38

solana-etl

Rust
4
star
39

etl-rust

Rust
4
star
40

twitter-bot-cloud-function

Google Cloud Function for tweeting Blockchain ETL alerts
JavaScript
3
star
41

zilliqa-etl

Python scripts for ETL (extract, transform and load) jobs for Zilliqa blockchain data
Python
3
star
42

pubsub-to-firestore-dataflow

Dataflow pipeline that pulls messages from a Pub/Sub topic and saves them in a Firestore collection
Java
2
star
43

eos-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/eos-etl
Python
2
star
44

bitcoin-rpc

Bitcoin JSON RPC client in Python
2
star
45

icon-etl-airflow

Airflow DAGs for exporting, loading, and parsing the ICON blockchain data.
Python
2
star
46

tezos-etl-airflow

Airflow DAGs for exporting and loading the Tezos blockchain data to Google BigQuery
Python
2
star
47

throttle-pubsub-cloud-function

Google Cloud Function that can throttle messages in a Pub/Sub topic
JavaScript
1
star
48

theta-etl

Python
1
star
49

iotex-kubernetes

Helm charts for running IoTeX node
Shell
1
star