βοΈ π§ cryo π§ βοΈ
cryo
is the easiest way to extract blockchain data to parquet, csv, json, or a python dataframe.
cryo
is also extremely flexible, with many different options to control how data is extracted + filtered + formatted
cryo
is an early WIP, please report bugs + feedback to the issue tracker
note that cryo
's default settings will slam a node too hard for use with 3rd party RPC providers. Instead, --requests-per-second
and --max-concurrent-requests
should be used to impose ratelimits. Such settings will be handled automatically in a future release.
Example Usage
use as cryo <dataset> [OPTIONS]
Example | Command |
---|---|
Extract all logs from block 16,000,000 to block 17,000,000 | cryo logs -b 16M:17M |
Extract blocks, logs, or traces missing from current directory | cryo blocks txs traces |
Extract to csv instead of parquet | cryo blocks txs traces --csv |
Extract only certain columns | cryo blocks --include number timestamp |
Dry run to view output schemas or expected work | cryo storage_diffs --dry |
Extract all USDC events | cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 |
cryo
uses ETH_RPC_URL
env var as the data source unless --rpc <url>
is given
Datasets
cryo can extract the following datasets from EVM nodes:
blocks
transactions
(alias =txs
)logs
(alias =events
)traces
(alias =call_traces
)state_diffs
(alias forstorage_diffs
+balance_diff
+nonce_diffs
+code_diffs
)balance_diffs
code_diffs
storage_diffs
nonce_diffs
vm_traces
(alias =opcode_traces
)
Installation
Method 1: install from source
git clone https://github.com/paradigmxyz/cryo
cd cryo
cargo install --path ./crates/cli
This method requires having rust installed. See rustup for instructions.
Method 2: install from crates.io
cargo install cryo_cli
This method requires having rust installed. See rustup for instructions.
Make sure that ~/.cargo/bin
is on your PATH
. One way to do this is by adding the line export PATH="$HOME/.cargo/bin:$PATH"
to your ~/.bashrc
or ~/.profile
.
cryo_python
from pypi
Installing (make sure rust is installed first, see rustup)
pip install maturin
pip install cryo_python
cryo_python
from source
Installing pip install maturin
git clone https://github.com/paradigmxyz/cryo
cd cryo/crates/python
maturin build --release
pip install <OUTPUT_OF_MATURIN_BUILD>.whl
Data Schema
Many cryo
cli options will affect output schemas by adding/removing columns or changing column datatypes.
cryo
will always print out data schemas before collecting any data. To view these schemas without collecting data, use --dry
to perform a dry run.
JSON-RPC
cryo
currently obtains all of its data using the JSON-RPC protocol standard.
dataset | blocks per request | results per block | method |
---|---|---|---|
Blocks | 1 | 1 | eth_getBlockByNumber |
Transactions | 1 | multiple | eth_getBlockByNumber |
Logs | multiple | multiple | eth_getLogs |
Traces | 1 | multiple | trace_block |
State Diffs | 1 | multiple | trace_replayBlockTransactions |
Vm Traces | 1 | multiple | trace_replayBlockTransactions |
cryo
use ethers.rs to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.
A future version of cryo
will be able to bypass JSON-RPC and query node data directly.
CLI Options
output of cryo --help
:
cryo extracts blockchain data to parquet, csv, or json
Usage: cryo [OPTIONS] <DATATYPE>...
Arguments:
<DATATYPE>... datatype(s) to collect, one or more of:
- blocks
- transactions (alias = txs)
- logs (alias = events)
- traces (alias = call_traces)
- state_diffs (= balance + code + nonce + storage diffs)
- balance_diffs
- code_diffs
- nonce_diffs
- storage_diffs
- vm_traces (alias = opcode_traces)
Options:
-h, --help Print help
-V, --version Print version
Content Options:
-b, --blocks <BLOCKS> Block numbers, see syntax below [default: 0:latest]
-a, --align Align block chunk boundaries to regular intervals
e.g. (1000, 2000, 3000) instead of (1106, 2106, 3106)
--reorg-buffer <N_BLOCKS> Reorg buffer, save blocks only when they are this old,
can be a number of blocks [default: 0]
-i, --include-columns [<COLS>...] Columns to include alongside the default output
-e, --exclude-columns [<COLS>...] Columns to exclude from the default output
--columns [<COLS>...] Use these columns instead of the default
--hex Use hex string encoding for binary columns
-s, --sort [<SORT>...] Columns(s) to sort by
Source Options:
-r, --rpc <RPC> RPC url [default: ETH_RPC_URL env var]
--network-name <NETWORK_NAME> Network name [default: use name of eth_getChainId]
Acquisition Options:
-l, --requests-per-second <limit> Ratelimit on requests per second
--max-concurrent-requests <M> Global number of concurrent requests
--max-concurrent-chunks <M> Number of chunks processed concurrently
--max-concurrent-blocks <M> Number blocks within a chunk processed concurrently
-d, --dry Dry run, collect no data
Output Options:
-c, --chunk-size <CHUNK_SIZE> Number of blocks per file [default: 1000]
--n-chunks <N_CHUNKS> Number of files (alternative to --chunk-size)
-o, --output-dir <OUTPUT_DIR> Directory for output files [default: .]
--file-suffix <FILE_SUFFIX> Suffix to attach to end of each filename
--overwrite Overwrite existing files instead of skipping them
--csv Save as csv instead of parquet
--json Save as json instead of parquet
--row-group-size <GROUP_SIZE> Number of rows per row group in parquet file
--n-row-groups <N_ROW_GROUPS> Number of rows groups in parquet file
--no-stats Do not write statistics to parquet files
--compression <NAME [#]>... Set compression algorithm and level [default: lz4]
Dataset-specific Options:
--contract <CONTRACT> [logs] filter logs by contract address
--topic0 <TOPIC0> [logs] filter logs by topic0 [aliases: event]
--topic1 <TOPIC1> [logs] filter logs by topic1
--topic2 <TOPIC2> [logs] filter logs by topic2
--topic3 <TOPIC3> [logs] filter logs by topic3
--log-request-size <N_BLOCKS> [logs] Number of blocks per log request [default: 1]
Block specification syntax
- can use numbers --blocks 5000
- can use numbers list (use "") --blocks "5000 6000 7000"
- can use ranges --blocks 12M:13M 15M:16M
- numbers can contain { _ . K M B } 5_000 5K 15M 15.5M
- omiting range end means latest 15.5M: == 15.5M:latest
- omitting range start means 0 :700 == 0:700
- minus on start means minus end -1000:7000 == 6000:7000
- plus sign on end means plus start 15M:+1000 == 15M:15.001K
- mix formats "15M:+1 1000:1002 -3:1b 2000"