• Stars
    star
    396
  • Rank 108,162 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Bitcoin ETL

Join the chat at https://gitter.im/ethereum-eth Build Status Join Telegram Group

Install Bitcoin ETL:

pip install bitcoin-etl

Export blocks and transactions (Schema, Reference):

> bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--provider-uri http://user:pass@localhost:8332 --chain bitcoin \
 --blocks-output blocks.json --transactions-output transactions.json

Supported chains:

  • bitcoin
  • bitcoin_cash
  • bitcoin_gold
  • dogecoin
  • litecoin
  • dash
  • zcash

Stream blockchain data continually to console (Reference):

> pip install bitcoin-etl[streaming]
> bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000

Stream blockchain data continually to Google Pub/Sub (Reference):

> export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json
> bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin

For the latest version, check out the repo and call

> pip install -e .[streaming]
> python bitcoinetl.py

Table of Contents

Schema

blocks.json

Field Type
hash hex_string
size bigint
stripped_size bigint
weight bigint
number bigint
version bigint
merkle_root hex_string
timestamp bigint
nonce hex_string
bits hex_string
coinbase_param hex_string
transaction_count bigint

transactions.json

Field Type
hash hex_string
size bigint
virtual_size bigint
version bigint
lock_time bigint
block_number bigint
block_hash hex_string
block_timestamp bigint
is_coinbase boolean
index bigint
inputs []transaction_input
outputs []transaction_output
input_count bigint
output_count bigint
input_value bigint
output_value bigint
fee bigint

transaction_input

Field Type
index bigint
spent_transaction_hash hex_string
spent_output_index bigint
script_asm string
script_hex hex_string
sequence bigint
required_signatures bigint
type string
addresses []string
value bigint

transaction_output

Field Type
index bigint
script_asm string
script_hex hex_string
required_signatures bigint
type string
addresses []string
value bigint

You can find column descriptions in schemas

Notes:

  1. Output values returned by Dogecoin API had precision loss in the clients prior to version 1.14. It's caused by this issue dogecoin/dogecoin#1558 The explorers that used older versions to export the data may show incorrect address balances and transaction amounts.

  2. For Zcash, vjoinsplit and valueBalance fields are converted to inputs and outputs with type 'shielded' https://zcash-rpc.github.io/getrawtransaction.html, https://zcash.readthedocs.io/en/latest/rtd_pages/zips/zip-0243.html

Exporting the Blockchain

  1. Install python 3.5.3+ https://www.python.org/downloads/

  2. Install Bitcoin node https://hackernoon.com/a-complete-beginners-guide-to-installing-a-bitcoin-full-node-on-linux-2018-edition-cb8e384479ea

  3. Start Bitcoin. Make sure it downloaded the blocks that you need by executing $ bitcoin-cli getblockchaininfo in the terminal. You can export blocks below blocks, there is no need to wait until the full sync

  4. Install Bitcoin ETL:

    > pip install bitcoin-etl
  5. Export blocks & transactions:

    > bitcoinetl export_all --start 0 --end 499999  \
    --partition-batch-size 100 \
    --provider-uri http://user:pass@localhost:8332 --chain bitcoin

    The result will be in the output subdirectory, partitioned in Hive style:

    output/blocks/start_block=00000000/end_block=00000099/blocks_00000000_00000099.csv
    output/blocks/start_block=00000100/end_block=00000199/blocks_00000100_=00000199.csv
    ...
    output/transactions/start_block=00000000/end_block=00000099/transactions_00000000_00000099.csv
    ...

    In case bitcoinetl command is not available in PATH, use python -m bitcoinetl instead.

Running in Docker

  1. Install Docker https://docs.docker.com/install/

  2. Build a docker image

    > docker build --platform linux/x86_64 -t bitcoin-etl:latest .
    > docker image ls
  3. Run a container out of the image

    > docker run --platform linux/x86_64 -v $HOME/output:/bitcoin-etl/output bitcoin-etl:latest export_blocks_and_transactions --start-block 0 --end-block 500000 \
        --provider-uri http://user:pass@localhost:8332 --blocks-output output/blocks.json --transactions-output output/transactions.json
  4. Run streaming to console or Pub/Sub

    > docker build --platform linux/x86_64 -t bitcoin-etl:latest-streaming -f Dockerfile_with_streaming .
    > echo "Stream to console"
    > docker run --platform linux/x86_64 bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000
    > echo "Stream to Pub/Sub"
    > docker run --platform linux/x86_64 -v /path_to_credentials_file/:/bitcoin-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/bitcoin-etl/credentials_file.json bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin
  5. Refer to https://github.com/blockchain-etl/bitcoin-etl-streaming for deploying the streaming app to Google Kubernetes Engine.

Command Reference

All the commands accept -h parameter for help, e.g.:

> bitcoinetl export_blocks_and_transactions --help
Usage: bitcoinetl.py export_blocks_and_transactions [OPTIONS]

  Export blocks and transactions.

Options:
  -s, --start-block INTEGER   Start block
  -e, --end-block INTEGER     End block  [required]
  -b, --batch-size INTEGER    The number of blocks to export at a time.
  -p, --provider-uri TEXT     The URI of the remote Bitcoin node
  -w, --max-workers INTEGER   The maximum number of workers.
  --blocks-output TEXT        The output file for blocks. If not provided
                              blocks will not be exported. Use "-" for stdout
  --transactions-output TEXT  The output file for transactions. If not
                              provided transactions will not be exported. Use
                              "-" for stdout
  --help                      Show this message and exit.

For the --output parameters the supported type is json. The format type is inferred from the output file name.

export_blocks_and_transactions

> bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
  --provider-uri http://user:pass@localhost:8332 \
  --blocks-output blocks.json --transactions-output transactions.json

Omit --blocks-output or --transactions-output options if you want to export only transactions/blocks.

You can tune --batch-size, --max-workers for performance.

Note that required_signatures, type, addresses, and value fields will be empty in transactions inputs. Use enrich_transactions to populate those fields.

enrich_transactions

You need to run bitcoin daemon with option txindex=1 for this command to work.

> bitcoinetl enrich_transactions  \
  --provider-uri http://user:pass@localhost:8332 \
  --transactions-input transactions.json --transactions-output enriched_transactions.json

You can tune --batch-size, --max-workers for performance.

get_block_range_for_date

> bitcoinetl get_block_range_for_date --provider-uri http://user:pass@localhost:8332 --date=2017-03-01

This command is guaranteed to return the block range that covers all blocks with block.time on the specified date. However the returned block range may also contain blocks outside the specified date, because block times are not monotonic https://twitter.com/EvgeMedvedev/status/1073844856009576448. You can filter blocks.json/transactions.json with the below command:

> bitcoinetl filter_items -i blocks.json -o blocks_filtered.json \
-p "datetime.datetime.fromtimestamp(item['timestamp']).astimezone(datetime.timezone.utc).strftime('%Y-%m-%d') == '2017-03-01'"

export_all

> bitcoinetl export_all --provider-uri http://user:pass@localhost:8332 --start 2018-01-01 --end 2018-01-02

You can tune --export-batch-size, --max-workers for performance.

stream

> bitcoinetl stream --provider-uri http://user:pass@localhost:8332 --start-block 500000
  • This command outputs blocks and transactions to the console by default.
  • Use --output option to specify the Google Pub/Sub topic where to publish blockchain data, e.g. projects/your-project/topics/crypto_bitcoin. Blocks and transactions will be pushed to projects/your-project/topics/crypto_bitcoin.blocks and projects/your-project/topics/crypto_bitcoin.transactions topics.
  • The command saves its state to last_synced_block.txt file where the last synced block number is saved periodically.
  • Specify either --start-block or --last-synced-block-file option. --last-synced-block-file should point to the file where the block number, from which to start streaming the blockchain data, is saved.
  • Use the --lag option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to handle chain reorganizations - they are less likely the further a block from the head.
  • Use the --chain option to specify the type of the chain, e.g. bitcoin, litecoin, dash, zcash, etc.
  • You can tune --period-seconds, --batch-size, --max-workers for performance.

Running Tests

> pip install -e .[dev]
> echo "The below variables are optional"
> export BITCOINETL_BITCOIN_PROVIDER_URI=http://user:pass@localhost:8332
> export BITCOINETL_LITECOIN_PROVIDER_URI=http://user:pass@localhost:8331
> export BITCOINETL_DOGECOIN_PROVIDER_URI=http://user:pass@localhost:8330
> export BITCOINETL_BITCOIN_CASH_PROVIDER_URI=http://user:pass@localhost:8329
> export BITCOINETL_DASH_PROVIDER_URI=http://user:pass@localhost:8328
> export BITCOINETL_ZCASH_PROVIDER_URI=http://user:pass@localhost:8327
> pytest -vv

Running Tox Tests

> pip install tox
> tox

Public Datasets in BigQuery

https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them

More Repositories

1

ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Python
2,902
star
2

ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
Python
401
star
3

awesome-bigquery-views

Useful SQL queries for Blockchain ETL datasets in BigQuery.
203
star
4

public-datasets

The list of public blockchain datasets in BigQuery
187
star
5

ethereum-etl-postgres

ETL for moving Ethereum data to PostgreSQL database
Shell
137
star
6

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Python
100
star
7

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Python
74
star
8

ethereum2-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum 2.0 beacon blocks, attestations, deposits, slashings, validators, committees. Data is available in Google BigQuery
Python
68
star
9

blockchain-etl-architecture

Blockchain ETL Architecture
44
star
10

ethers.js-bigquery

ethers.js library, compiled for use in Google BigQuery
JavaScript
39
star
11

solana-etl-airflow

ETL for Solana. Contributions are welcome. Join the Telegram channel https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ
Python
32
star
12

bitcoin-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/bitcoin-etl
Python
30
star
13

blockchain-kubernetes

Kubernetes manifests for running blockchain nodes
Smarty
26
star
14

ethereum-etl-neo4j

ETL for moving Ethereum data to Neo4j database
Shell
20
star
15

bigquery-to-pubsub

A tool for streaming time series data from a BigQuery table to a Pub/Sub topic
Python
16
star
16

bitcoin-etl-airflow-neo4j

Airflow DAGs for ingesting Bitcoin blockchain data to Neo4j
Python
14
star
17

tezos-etl

Python scripts for ETL (extract, transform and load) jobs for Tezos blocks, balance updates, and operations
Python
13
star
18

blockchain-etl-table-definition-cli

CLI for generating table definitions for https://github.com/blockchain-etl/ethereum-etl-airflow
Python
12
star
19

hedera-etl

ETL scripts for Hedera Hashgraph
Java
10
star
20

blockchain-streaming-analytics

Blockchain streaming analytics
Java
9
star
21

eos-etl

ETL scripts for EOS.
Python
9
star
22

ethereum-export-pipeline

UNMAINTAINED! AWS CloudFormation scripts for Ethereum ETL export pipeline
Python
8
star
23

data-studio-connectors

Connect Google BigQuery crypto public datasets to Google Data Studio
JavaScript
7
star
24

abi-functions

7
star
25

blockchain-etl-dataflow

Dataflow pipelines for Blockchain ETL. Connects Pub/Sub topics with BigQuery tables.
Java
7
star
26

blockchain-terraform-deployment

Template repository for deploying https://github.com/blockchain-etl/blockchain-terraform
HCL
6
star
27

icon-etl

Python scripts for ETL (extract, transform and load) jobs for ICON blocks, transactions, receipts, and logs.
Python
6
star
28

ethereum2-etl-airflow

Airflow DAGs for exporting Ethereum 2.0 blockchain data to Google BigQuery
Python
5
star
29

blockchain-terraform

Terraform configuration files for running blockchain nodes
HCL
5
star
30

abi-parser

Web app which parses smart contracts and outputs queries and tables for Ethereum-ETL
JavaScript
5
star
31

blockchain-etl-common

Common utils for blockchain-etl
Python
5
star
32

abi

EVM public good - pull requests welcome for any ABI from any EVM
5
star
33

band-etl

ETL (extract, transform and load) tools for ingesting Band Protocol blockchain data to Google BigQuery and Pub/Sub
Python
5
star
34

tezos-kubernetes

Kubernetes manifests for running Tezos node
Shell
5
star
35

iotex-etl

ETL (extract, transform and load) tools for ingesting IoTeX blockchain data to Google BigQuery and Pub/Sub
Python
5
star
36

ordinals-etl

Python
4
star
37

anomalous-transactions-detector-dataflow

Dataflow pipeline for detecting anomalous transactions on the Ethereum and Bitcoin blockchains
Java
4
star
38

solana-etl

Rust
4
star
39

etl-rust

Rust
4
star
40

twitter-bot-cloud-function

Google Cloud Function for tweeting Blockchain ETL alerts
JavaScript
3
star
41

zilliqa-etl

Python scripts for ETL (extract, transform and load) jobs for Zilliqa blockchain data
Python
3
star
42

pubsub-to-firestore-dataflow

Dataflow pipeline that pulls messages from a Pub/Sub topic and saves them in a Firestore collection
Java
2
star
43

eos-etl-airflow

Airflow DAGs for https://github.com/blockchain-etl/eos-etl
Python
2
star
44

bitcoin-rpc

Bitcoin JSON RPC client in Python
2
star
45

icon-etl-airflow

Airflow DAGs for exporting, loading, and parsing the ICON blockchain data.
Python
2
star
46

tezos-etl-airflow

Airflow DAGs for exporting and loading the Tezos blockchain data to Google BigQuery
Python
2
star
47

throttle-pubsub-cloud-function

Google Cloud Function that can throttle messages in a Pub/Sub topic
JavaScript
1
star
48

theta-etl

Python
1
star
49

iotex-kubernetes

Helm charts for running IoTeX node
Shell
1
star