• Stars
    star
    1,137
  • Rank 40,945 (Top 0.9 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

❄️🧊 cryo 🧊❄️

Rust

cryo is the easiest way to extract blockchain data to parquet, csv, json, or a python dataframe.

cryo is also extremely flexible, with many different options to control how data is extracted + filtered + formatted

cryo is an early WIP, please report bugs + feedback to the issue tracker

note that cryo's default settings will slam a node too hard for use with 3rd party RPC providers. Instead, --requests-per-second and --max-concurrent-requests should be used to impose ratelimits. Such settings will be handled automatically in a future release.

Example Usage

use as cryo <dataset> [OPTIONS]

Example Command
Extract all logs from block 16,000,000 to block 17,000,000 cryo logs -b 16M:17M
Extract blocks, logs, or traces missing from current directory cryo blocks txs traces
Extract to csv instead of parquet cryo blocks txs traces --csv
Extract only certain columns cryo blocks --include number timestamp
Dry run to view output schemas or expected work cryo storage_diffs --dry
Extract all USDC events cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48

cryo uses ETH_RPC_URL env var as the data source unless --rpc <url> is given

Datasets

cryo can extract the following datasets from EVM nodes:

  • blocks
  • transactions (alias = txs)
  • logs (alias = events)
  • traces (alias = call_traces)
  • state_diffs (alias for storage_diffs + balance_diff + nonce_diffs + code_diffs)
  • balance_diffs
  • code_diffs
  • storage_diffs
  • nonce_diffs
  • vm_traces (alias = opcode_traces)

Installation

Method 1: install from source

git clone https://github.com/paradigmxyz/cryo
cd cryo
cargo install --path ./crates/cli

This method requires having rust installed. See rustup for instructions.

Method 2: install from crates.io

cargo install cryo_cli

This method requires having rust installed. See rustup for instructions.

Make sure that ~/.cargo/bin is on your PATH. One way to do this is by adding the line export PATH="$HOME/.cargo/bin:$PATH" to your ~/.bashrc or ~/.profile.

Installing cryo_python from pypi

(make sure rust is installed first, see rustup)

pip install maturin
pip install cryo_python

Installing cryo_python from source

pip install maturin
git clone https://github.com/paradigmxyz/cryo
cd cryo/crates/python
maturin build --release
pip install <OUTPUT_OF_MATURIN_BUILD>.whl

Data Schema

Many cryo cli options will affect output schemas by adding/removing columns or changing column datatypes.

cryo will always print out data schemas before collecting any data. To view these schemas without collecting data, use --dry to perform a dry run.

JSON-RPC

cryo currently obtains all of its data using the JSON-RPC protocol standard.

dataset blocks per request results per block method
Blocks 1 1 eth_getBlockByNumber
Transactions 1 multiple eth_getBlockByNumber
Logs multiple multiple eth_getLogs
Traces 1 multiple trace_block
State Diffs 1 multiple trace_replayBlockTransactions
Vm Traces 1 multiple trace_replayBlockTransactions

cryo use ethers.rs to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.

A future version of cryo will be able to bypass JSON-RPC and query node data directly.

CLI Options

output of cryo --help:

cryo extracts blockchain data to parquet, csv, or json

Usage: cryo [OPTIONS] <DATATYPE>...

Arguments:
  <DATATYPE>...  datatype(s) to collect, one or more of:
                 - blocks
                 - transactions  (alias = txs)
                 - logs          (alias = events)
                 - traces        (alias = call_traces)
                 - state_diffs   (= balance + code + nonce + storage diffs)
                 - balance_diffs
                 - code_diffs
                 - nonce_diffs
                 - storage_diffs
                 - vm_traces     (alias = opcode_traces)

Options:
  -h, --help     Print help
  -V, --version  Print version

Content Options:
  -b, --blocks <BLOCKS>              Block numbers, see syntax below [default: 0:latest]
  -a, --align                        Align block chunk boundaries to regular intervals
                                     e.g. (1000, 2000, 3000) instead of (1106, 2106, 3106)
      --reorg-buffer <N_BLOCKS>      Reorg buffer, save blocks only when they are this old,
                                     can be a number of blocks [default: 0]
  -i, --include-columns [<COLS>...]  Columns to include alongside the default output
  -e, --exclude-columns [<COLS>...]  Columns to exclude from the default output
      --columns [<COLS>...]          Use these columns instead of the default
      --hex                          Use hex string encoding for binary columns
  -s, --sort [<SORT>...]             Columns(s) to sort by

Source Options:
  -r, --rpc <RPC>                    RPC url [default: ETH_RPC_URL env var]
      --network-name <NETWORK_NAME>  Network name [default: use name of eth_getChainId]

Acquisition Options:
  -l, --requests-per-second <limit>  Ratelimit on requests per second
      --max-concurrent-requests <M>  Global number of concurrent requests
      --max-concurrent-chunks <M>    Number of chunks processed concurrently
      --max-concurrent-blocks <M>    Number blocks within a chunk processed concurrently
  -d, --dry                          Dry run, collect no data

Output Options:
  -c, --chunk-size <CHUNK_SIZE>      Number of blocks per file [default: 1000]
      --n-chunks <N_CHUNKS>          Number of files (alternative to --chunk-size)
  -o, --output-dir <OUTPUT_DIR>      Directory for output files [default: .]
      --file-suffix <FILE_SUFFIX>    Suffix to attach to end of each filename
      --overwrite                    Overwrite existing files instead of skipping them
      --csv                          Save as csv instead of parquet
      --json                         Save as json instead of parquet
      --row-group-size <GROUP_SIZE>  Number of rows per row group in parquet file
      --n-row-groups <N_ROW_GROUPS>  Number of rows groups in parquet file
      --no-stats                     Do not write statistics to parquet files
      --compression <NAME [#]>...    Set compression algorithm and level [default: lz4]

Dataset-specific Options:
      --contract <CONTRACT>          [logs] filter logs by contract address
      --topic0 <TOPIC0>              [logs] filter logs by topic0 [aliases: event]
      --topic1 <TOPIC1>              [logs] filter logs by topic1
      --topic2 <TOPIC2>              [logs] filter logs by topic2
      --topic3 <TOPIC3>              [logs] filter logs by topic3
      --log-request-size <N_BLOCKS>  [logs] Number of blocks per log request [default: 1]


Block specification syntax
- can use numbers                    --blocks 5000
- can use numbers list (use "")      --blocks "5000 6000 7000"
- can use ranges                     --blocks 12M:13M 15M:16M
- numbers can contain { _ . K M B }  5_000 5K 15M 15.5M
- omiting range end means latest     15.5M: == 15.5M:latest
- omitting range start means 0       :700 == 0:700
- minus on start means minus end     -1000:7000 == 6000:7000
- plus sign on end means plus start  15M:+1000 == 15M:15.001K
- mix formats                        "15M:+1 1000:1002 -3:1b 2000"

More Repositories

1

reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
Rust
3,910
star
2

artemis

A simple, modular, and fast framework for writing MEV bots in Rust.
Rust
2,292
star
3

rivet

Developer Wallet & DevTools for Anvil
TypeScript
804
star
4

flux

Graph-based LLM power tool for exploring many completions in parallel.
TypeScript
773
star
5

paradigm-ctf-2021

Official repository for Paradigm CTF 2021
Solidity
488
star
6

flood

flood is a load testing tool for benchmarking EVM nodes over RPC
Python
316
star
7

paradigm-ctf-2022

Rust
296
star
8

paradigm-data-portal

a collection of open source crypto datasets for researchers and tool builders
Python
288
star
9

revmc

JIT and AOT compiler for the Ethereum Virtual Machine, built on Revm.
Rust
202
star
10

zk-eth-rng

Secure randomness for Ethereum's execution layer via SNARKs and RANDAO.
Solidity
172
star
11

mev-share-rs

Rust client library for Flashbots MEV-share
Rust
167
star
12

mesc

MESC is a specification for how crypto tools configure their RPC endpoints
Rust
167
star
13

pyrevm

Python wrapper around https://github.com/bluealloy/revm/ using PyO3
Rust
158
star
14

ultimate_evm_tracing_reference

a collection of EVM tracing information for easy reference
156
star
15

etop

like htop for Ethereum and other EVM chains
Rust
143
star
16

jitevm

convert evm bytecode to native machine code and go vroom - just an experiment, probably broken, reach out to [email protected] to contribute / productionize.
Rust
141
star
17

spice

Simple client for extracting data from the Dune Analytics API
Python
131
star
18

tbl

tbl is a swiss army knife for parquet read and write operations
Rust
112
star
19

reth-exex-examples

Collection of ExEx examples built on Reth
Rust
100
star
20

paradigm-ctf-infrastructure

Public infra related to hosting Paradigm CTF
Solidity
98
star
21

sinker

Synchronize Postgres to Elasticsearch
Python
63
star
22

local_reth

Run Reth+Prometheus+Grafana locally via docker-compose
Dockerfile
60
star
23

paradigm-ctf-2023

Solidity
59
star
24

stress4844

Tiny CLI for submitting large calldata transactions to EVM networks to stress test the networking layer. Main motivation: EIP4844blobs.
Rust
56
star
25

fig

Guidelines & best practices for developing Frames: https://www.figma.com/community/file/1367670879509913267/frame-interface-guidelines
52
star
26

paradigm-ctf-2022-teaser

https://rinkeby.etherscan.io/address/0xffb9205c84d0b209c215212a3cdfc50bf1cfb0e0#code
Solidity
40
star
27

op-rs

Rust
37
star
28

how-to-raise-the-gas-limit

Data & code accompanying Paradigm's How to Raise the Gas Limit post.
Jupyter Notebook
33
star
29

foundry-alphanet

Foundry tools for interfacing with Reth Alphanet's bleeding-edge EVM extensions.
Shell
29
star
30

humanizooor

16
star
31

phishing-list

A rapidly-updated list of live phishing domains for consumption by the eth-phishing-detect module
16
star
32

state_growth

Python
15
star
33

homebrew-brew

A homebrew tap
Ruby
12
star
34

.github

7
star
35

evm-inspectors

EVM Execution Hooks.
2
star
36

crypto_colors

Python
1
star