• Stars
    star
    137
  • Rank 266,121 (Top 6 %)
  • Language
    Shell
  • License
    MIT License
  • Created almost 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ETL for moving Ethereum data to PostgreSQL database

Ethereum ETL for PostgreSQL

The steps below will allow you to bootstrap a PostgreSQL database in GCP with full historical and real-time Ethereum data: blocks, transactions, logs, token_transfers, and traces.

The whole process will take between 24 and 72 hours.

Prerequisites:

1. Export Ethereum data from BigQuery to CSV files in GCS

  • Install gcloud and run gcloud auth login
  • Run
pip install -r requirements.txt
export BUCKET=<your_gcs_bucket>
bash ethereum_bigquery_to_gcs.sh $BUCKET

Optionally provide start and end dates: bash ethereum_bigquery_to_gcs.sh $BUCKET 2020-01-01 2020-01-31

Exporting to CSV files is going to take about 10 minutes.

2. Import data from CSV files to PostgreSQL database in Cloud SQL

  • Create a new Cloud SQL instance
export CLOUD_SQL_INSTANCE_ID=ethereum-0
export ROOT_PASSWORD=<your_password>
gcloud sql instances create $CLOUD_SQL_INSTANCE_ID --database-version=POSTGRES_11 --root-password=$ROOT_PASSWORD \
    --storage-type=SSD --storage-size=100 --cpu=4 --memory=6 \
    --database-flags=temp_file_limit=2147483647

Notice the storage size is set to 100 GB. It will scale up automatically to around 1.5 TB when we load in the data.

  • Add Cloud SQL service account to GCS bucket as objectViewer. Run gcloud sql instances describe $CLOUD_SQL_INSTANCE_ID, then copy serviceAccountEmailAddress from the output and add it to the bucket.

  • Create the database and the tables:

gcloud sql databases create ethereum --instance=$CLOUD_SQL_INSTANCE_ID

# Install Cloud SQL Proxy following the instructions here https://cloud.google.com/sql/docs/mysql/sql-proxy#install
./cloud_sql_proxy -instances=myProject:us-central1:${CLOUD_SQL_INSTANCE_ID}=tcp:5433

cat schema/*.sql | psql -U postgres -d ethereum -h 127.0.0.1  --port 5433 -a
  • Run import from GCS to Cloud SQL:
bash ethereum_gcs_to_cloud_sql.sh $BUCKET $CLOUD_SQL_INSTANCE_ID

Importing to Cloud SQL is going to take between 12 and 24 hours.

A few performance optimization tips for initial loading of the data:

3. Apply indexes to the tables

NOTE: indexes won't work for the contracts table due to the issue described here #11 (comment)

  • Run:
cat indexes/*.sql | psql -U postgres -d ethereum -h 127.0.0.1  --port 5433 -a

Creating indexes is going to take between 12 and 24 hours. Depending on the queries you're going to run you may need to create more indexes or partition the tables.

Cloud SQL instance will cost you between $200 and $500 per month depending on whether you use HDD or SSD and on the machine type.

4. Streaming

Use ethereumetl stream command to continually pull data from an Ethereum node and insert it to Postgres tables: https://github.com/blockchain-etl/ethereum-etl/tree/develop/docs/commands.md#stream.

Follow the instructions here to deploy it to Kubernetes: https://github.com/blockchain-etl/blockchain-etl-streaming.

More Repositories

1

ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Python
2,936
star
2

bitcoin-etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Python
405
star
3

ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
Python
405
star
4

awesome-bigquery-views

Useful SQL queries for Blockchain ETL datasets in BigQuery.
205
star
5

public-datasets

The list of public blockchain datasets in BigQuery
187
star
6

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Python
100
star
7

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Python
77
star
8

ethereum2-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum 2.0 beacon blocks, attestations, deposits, slashings, validators, committees. Data is available in Google BigQuery
Python
68
star
9

blockchain-etl-architecture

Blockchain ETL Architecture
44
star
10

ethers.js-bigquery

ethers.js library, compiled for use in Google BigQuery
JavaScript
39
star
11

solana-etl-airflow

ETL for Solana. Contributions are welcome. Join the Telegram channel https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ
Python
32
star
12

bitcoin-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/bitcoin-etl
Python
31
star
13

blockchain-kubernetes

Kubernetes manifests for running blockchain nodes
Smarty
26
star
14

ethereum-etl-neo4j

ETL for moving Ethereum data to Neo4j database
Shell
20
star
15

bigquery-to-pubsub

A tool for streaming time series data from a BigQuery table to a Pub/Sub topic
Python
16
star
16

bitcoin-etl-airflow-neo4j

Airflow DAGs for ingesting Bitcoin blockchain data to Neo4j
Python
14
star
17

tezos-etl

Python scripts for ETL (extract, transform and load) jobs for Tezos blocks, balance updates, and operations
Python
13
star
18

blockchain-etl-table-definition-cli

CLI for generating table definitions for https://github.com/blockchain-etl/ethereum-etl-airflow
Python
12
star
19

hedera-etl

ETL scripts for Hedera Hashgraph
Java
11
star
20

blockchain-streaming-analytics

Blockchain streaming analytics
Java
9
star
21

eos-etl

ETL scripts for EOS.
Python
9
star
22

ethereum-export-pipeline

UNMAINTAINED! AWS CloudFormation scripts for Ethereum ETL export pipeline
Python
8
star
23

blockchain-etl-dataflow

Dataflow pipelines for Blockchain ETL. Connects Pub/Sub topics with BigQuery tables.
Java
8
star
24

data-studio-connectors

Connect Google BigQuery crypto public datasets to Google Data Studio
JavaScript
7
star
25

abi-functions

7
star
26

blockchain-terraform-deployment

Template repository for deploying https://github.com/blockchain-etl/blockchain-terraform
HCL
6
star
27

icon-etl

Python scripts for ETL (extract, transform and load) jobs for ICON blocks, transactions, receipts, and logs.
Python
6
star
28

ethereum2-etl-airflow

Airflow DAGs for exporting Ethereum 2.0 blockchain data to Google BigQuery
Python
5
star
29

blockchain-terraform

Terraform configuration files for running blockchain nodes
HCL
5
star
30

blockchain-etl-common

Common utils for blockchain-etl
Python
5
star
31

abi

EVM public good - pull requests welcome for any ABI from any EVM
5
star
32

band-etl

ETL (extract, transform and load) tools for ingesting Band Protocol blockchain data to Google BigQuery and Pub/Sub
Python
5
star
33

tezos-kubernetes

Kubernetes manifests for running Tezos node
Shell
5
star
34

abi-parser

Web app which parses smart contracts and outputs queries and tables for Ethereum-ETL
JavaScript
5
star
35

iotex-etl

ETL (extract, transform and load) tools for ingesting IoTeX blockchain data to Google BigQuery and Pub/Sub
Python
5
star
36

ordinals-etl

Python
4
star
37

anomalous-transactions-detector-dataflow

Dataflow pipeline for detecting anomalous transactions on the Ethereum and Bitcoin blockchains
Java
4
star
38

solana-etl

Rust
4
star
39

etl-rust

Rust
4
star
40

twitter-bot-cloud-function

Google Cloud Function for tweeting Blockchain ETL alerts
JavaScript
3
star
41

zilliqa-etl

Python scripts for ETL (extract, transform and load) jobs for Zilliqa blockchain data
Python
3
star
42

pubsub-to-firestore-dataflow

Dataflow pipeline that pulls messages from a Pub/Sub topic and saves them in a Firestore collection
Java
2
star
43

eos-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/eos-etl
Python
2
star
44

bitcoin-rpc

Bitcoin JSON RPC client in Python
2
star
45

icon-etl-airflow

Airflow DAGs for exporting, loading, and parsing the ICON blockchain data.
Python
2
star
46

tezos-etl-airflow

Airflow DAGs for exporting and loading the Tezos blockchain data to Google BigQuery
Python
2
star
47

throttle-pubsub-cloud-function

Google Cloud Function that can throttle messages in a Pub/Sub topic
JavaScript
1
star
48

theta-etl

Python
1
star
49

iotex-kubernetes

Helm charts for running IoTeX node
Shell
1
star