• Stars
    star
    206
  • Rank 190,504 (Top 4 %)
  • Language
    Go
  • License
    MIT License
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dump all the mempool transactions 🗑️ ♻️ (in Parquet + CSV)

Mempool Dumpster 🗑️♻️

Goreport status Test status

Archiving mempool transactions in Parquet and CSV format.

The data is freely available at https://mempool-dumpster.flashbots.net

Overview:


Available mempool transaction sources

  1. Generic EL nodes - go-ethereum, Infura, etc. (Websockets, using newPendingTransactions)
  2. Alchemy (Websockets, using alchemy_pendingTransactions, warning - burns a lot of credits)
  3. bloXroute (Websockets and gRPC)
  4. Chainbound Fiber (gRPC)
  5. Eden (Websockets and gRPC)

Note: Some sources send transactions that are already included on-chain, which are discarded (not added to archive or summary)


Output files

Daily files uploaded by mempool-dumpster (i.e. for September 2023):

  1. Parquet file with transaction metadata and raw transactions (~800MB/day, i.e. 2023-09-08.parquet)
  2. CSV file with only the transaction metadata (~100MB/day zipped, i.e. 2023-09-08.csv.zip)
  3. CSV file with details about when each transaction was received by any source (~100MB/day zipped, i.e. 2023-09-08_sourcelog.csv.zip)
  4. Summary in text format (~2kB, i.e. 2023-09-08_summary.txt)

FAQ

  • When is the data uploaded? ... The data for the previous day is uploaded daily between UTC 4am and 4:30am.
  • What are exclusive transactions? ... a transaction that was seen from no other source (transaction only provided by a single source). These transactions might include recycled transactions (which were already seen long ago but not included, and resent by a transaction source).
  • What does "XOF" stand for? ... XOF stands for "exclusive orderflow" (i.e. exclusive transactions).
  • What is a-pool? ... A-Pool is a regular geth node with some optimized peering settings, subscribed to over the network.
  • gRPC vs Websockets? ... bloXroute and Chainbound are connected with gRPC, all other sources are connected with Websockets (note that gRPC has a lower latency than WebSockets).

Working with Parquet

Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk (more here).

We recommend to use ClickHouse local (as well as DuckDB) to work with Parquet files, it makes it easy to run queries like:

# show the schema
$ clickhouse local -q "DESCRIBE TABLE 'transactions.parquet';"
timestamp               Nullable(DateTime64(3))
hash                    Nullable(String)
chainId                 Nullable(String)
from                    Nullable(String)
to                      Nullable(String)
value                   Nullable(String)
nonce                   Nullable(String)
gas                     Nullable(String)
gasPrice                Nullable(String)
gasTipCap               Nullable(String)
gasFeeCap               Nullable(String)
dataSize                Nullable(Int64)
data4Bytes              Nullable(String)
sources                 Array(Nullable(String))
includedAtBlockHeight   Nullable(Int64)
includedBlockTimestamp  Nullable(DateTime64(3))
inclusionDelayMs        Nullable(Int64)
rawTx                   Nullable(String)

# count rows
$ clickhouse local -q "SELECT count(*) FROM 'transactions.parquet' LIMIT 1;"

# get the first hash+rawTx
$ clickhouse local -q "SELECT hash,hex(rawTx) FROM 'transactions.parquet' LIMIT 1;"

# get details of a particular hash
$ clickhouse local -q "SELECT timestamp,hash,from,to,hex(rawTx) FROM 'transactions.parquet' WHERE hash='0x152065ad73bcf63f68572f478e2dc6e826f1f434cb488b993e5956e6b7425eed';"

# get exclusive transactions from bloxroute
$ clickhouse local -q "SELECT COUNT(*) FROM 'transactions.parquet' WHERE length(sources) == 1 AND sources[1] == 'bloxroute';"

# get count of landed vs not-landed exclusive transactions, by source
$ clickhouse local -q "WITH includedBlockTimestamp!=0 as included SELECT sources[1], included, count(included) FROM 'out/out/transactions.parquet' WHERE length(sources) == 1 GROUP BY sources[1], included;"

# get uniswap v2 transactions
$ clickhouse local -q "SELECT COUNT(*) FROM 'transactions.parquet' WHERE to='0x7a250d5630b4cf539739df2c5dacb4c659f2488d';"

# get uniswap v2 transactions and separate by included/not-included
$ clickhouse local -q "WITH includedBlockTimestamp!=0 as included SELECT included, COUNT(included) FROM 'transactions.parquet' WHERE to='0x7a250d5630b4cf539739df2c5dacb4c659f2488d' GROUP BY included;"

# get inclusion delay for uniswap v2 transactions (time between receiving and being included on-chain)
$ clickhouse local -q "WITH inclusionDelayMs/1000 as incdelay SELECT quantiles(0.5, 0.9, 0.99)(incdelay), avg(incdelay) as avg FROM 'transactions.parquet' WHERE to='0x7a250d5630b4cf539739df2c5dacb4c659f2488d' AND includedBlockTimestamp!=0;"

# count uniswap v2 contract methods
$ clickhouse local -q "SELECT data4Bytes, COUNT(data4Bytes) FROM 'transactions.parquet' WHERE to='0x7a250d5630b4cf539739df2c5dacb4c659f2488d' GROUP BY data4Bytes;"

See this post for more details: https://collective.flashbots.net/t/mempool-dumpster-a-free-mempool-transaction-archive/2401


Running the analyzer

You can easily run the included analyzer to create summaries like 2023-09-22_summary.txt:

  1. First, download the parquet and sourcelog files from https://mempool-dumpster.flashbots.net/ethereum/mainnet/2023-09
  2. Then run the analyzer:
go run cmd/analyze/* \
    --out summary.txt \
    --input-parquet /mnt/data/mempool-dumpster/2023-09-22/2023-09-22.parquet \
    --input-sourcelog /mnt/data/mempool-dumpster/2023-09-22/2023-09-22_sourcelog.csv.zip

To speed things up, you can use the MAX environment variable to set a maximum number of transactions to process:

MAX=10000 go run cmd/analyze/* \
    --out summary.txt \
    --input-parquet /mnt/data/mempool-dumpster/2023-09-22/2023-09-22.parquet \
    --input-sourcelog /mnt/data/mempool-dumpster/2023-09-22/2023-09-22_sourcelog.csv.zip

Interesting analyses

  • Something interesting with inclusionDelay?
  • Trash transactions (invalid nonce, not enough sender funds)

Feel free to continue the conversation in the Flashbots Forum!


System architecture

  1. Collector: Connects to EL nodes and writes new mempool transactions and sourcelog to hourly CSV files. Multiple collector instances can run without colliding.
  2. Merger: Takes collector CSV files as input, de-duplicates, checks transaction inclusion status, sorts by timestamp and writes output files (Parquet, CSV and Summary).
  3. Analyzer: Analyzes sourcelog CSV files and produces summary report.
  4. Website: Website dev-mode as well as build + upload.

system diagram (https://excalidraw.com/#json=Jj2VXHWIN9TZqNOOVJiAk,UgZ_ui_aLZlnYUy6nBH5mw)


Getting started

Mempool Collector

  1. Subscribes to new pending transactions at various data sources
  2. Writes 3 files:
    1. Transactions CSV: timestamp_ms, hash, raw_tx (one file per hour by default)
    2. Sourcelog CSV: timestamp_ms, hash, source (one entry for every single transaction received by any source)
    3. Trash CSV: timestamp_ms, hash, source, reason, note (trash transactions received by any source, these are not added to the transactions CSV. currently only if already included in previous block)
  3. Note: the collector can store transactions repeatedly, and only the merger will properly deduplicate them later

Default filenames:

Transactions

  • Schema: <out_dir>/<date>/transactions/txs_<date>_<uid>.csv
  • Example: out/2023-08-07/transactions/txs_2023-08-07-10-00_collector1.csv

Sourcelog

  • Schema: <out_dir>/<date>/sourcelog/src_<date>_<uid>.csv
  • Example: out/2023-08-07/sourcelog/src_2023-08-07-10-00_collector1.csv

Trash

  • Schema: <out_dir>/<date>/trash/trash_<date>_<uid>.csv
  • Example: out/2023-08-07/trash/trash_2023-08-07-10-00_collector1.csv

Running the mempool collector:

# print help
go run cmd/collect/main.go -help

# Connect to ws://localhost:8546 and write CSVs into ./out
go run cmd/collect/main.go -out ./out

# Connect to multiple nodes
go run cmd/collect/main.go -out ./out -nodes ws://server1.com:8546,ws://server2.com:8546

Merger

  • Iterates over collector output directory / CSV files
  • Deduplicates transactions, sorts them by timestamp
go run cmd/merge/main.go -h

Architecture

General design goals

  • Keep it simple and stupid
  • Vendor-agnostic (main flow should work on any server, independent of a cloud provider)
  • Downtime-resilience to minimize any gaps in the archive
  • Multiple collector instances can run concurrently, without getting into each others way
  • Merger produces the final archive (based on the input of multiple collector outputs)
  • The final archive:
    • Includes (1) parquet file with transaction metadata, and (2) compressed file of raw transaction CSV files
    • Compatible with ClickHouse and S3 Select (Parquet using gzip compression)
    • Easily distributable as torrent

Collector

  • NodeConnection
    • One for each EL connection
    • New pending transactions are sent to TxProcessor via a channel
  • TxProcessor
    • Check if it already processed that tx
    • Store it in the output directory

Merger

Transaction RLP format

Stats libraries


Contributing

Install dependencies

go install mvdan.cc/gofumpt@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/daixiang0/gci@latest

Lint, test, format

make lint
make test
make fmt

See also


License


Maintainers

More Repositories

1

pm

Everything there is to know about Flashbots
2,482
star
2

simple-arbitrage

Example arbitrage bot using Flashbots
TypeScript
1,963
star
3

mev-boost

MEV-Boost allows Ethereum validators to source high-MEV blocks from a competitive builder marketplace
Go
1,172
star
4

mev-research

Project management for MEV Research
884
star
5

mev-inspect-py

🔎 an MEV inspector for Ethereum 🔎
Python
819
star
6

mev-job-board

Need a bot?
727
star
7

ethers-provider-flashbots-bundle

Flashbots provider for ethers.js
TypeScript
548
star
8

mev-inspect-rs

Discover historic Miner Extractable Value (MEV) opportunities
Rust
548
star
9

mev-boost-relay

MEV-Boost Relay for Ethereum proposer/builder separation (PBS)
Go
417
star
10

builder

Flashbots MEV-Boost Block Builder
Go
409
star
11

web3-flashbots

Web3.py plugin for using Flashbots' bundle APIs
Python
405
star
12

searcher-sponsored-tx

TypeScript
360
star
13

simple-blind-arbitrage

Solidity
343
star
14

searcher-minter

Solidity
229
star
15

flashbots-docs

TypeScript
190
star
16

suave-geth

Go
188
star
17

rpc-endpoint

Flashbots RPC endpoint, to be used with wallets (eg. MetaMask)
Go
180
star
18

mev-share

Protocol for orderflow auctions
129
star
19

mev-share-client-ts

Client library for Flashbots MEV-share Matchmaker.
TypeScript
116
star
20

hindsight

Retroactively estimate Uniswap-ish MEV on Flashbots MEV-Share by simulating backrun-arbitrages.
Rust
116
star
21

mev-flood

simulates MEV activity from an array of unique searchers; used for testing infra
TypeScript
112
star
22

mev-relay-js

JavaScript
105
star
23

mev-geth-demo

JavaScript
98
star
24

boost-geth-builder

Example builder
Go
93
star
25

mev-share-node

Go
88
star
26

geth-sgx-gramine

Geth-in-SGX provides an example of running go-ethereum in SGX
C
68
star
27

relayscan

Ethereum MEV-Boost Relay Monitoring
Go
67
star
28

eth2-research

Assessing the nature and impact of MEV in eth2.
Jupyter Notebook
66
star
29

mpc-backrun

Proof-of-concept code for backrunning private transactions using MPC.
Python
63
star
30

mev-explore-public

Public repo of MEV-Explore for the community to jam on the dashboard
59
star
31

suapp-examples

SUAVE Application Examples
Go
54
star
32

simple-limit-order-bot

TypeScript
53
star
33

raytracing

Eth2-MEV project with liquid staking (Flashbots-Lido-Nethermind)
Go
52
star
34

builder-playground

Local end-to-end environment for Ethereum L1 block building
Go
51
star
35

reorg-monitor

Ethereum Reorg Monitoring
Go
44
star
36

block-validation-geth

To be deprecated in favor of https://github.com/flashbots/builder
Go
44
star
37

suave-std

Collection of helpful smart contracts to build Suapps
Solidity
43
star
38

rollup-boost

Sidecar to Enable Rollup Extensions
Rust
42
star
39

go-boost-utils

Eth2 builder API types and signing for Golang
Go
41
star
40

suave-specs

☀️ SUAVE Alpha Protocol Specifications
35
star
41

prysm

Our custom Prysm fork for boost relay and builder CL. Sends payload attributes for block building on every slot to trigger building.
Go
34
star
42

sync-proxy

Proxy from consensus client to block builders
Go
33
star
43

go-template

Template for Go projects
Go
33
star
44

suave-viem

Typescript client library to interact with SUAVE.
TypeScript
32
star
45

dowg

Decentralized Orderflow Working Group
31
star
46

suave-andromeda-revm

Andromeda revm execution service
Rust
29
star
47

relay-specs

MEV-Boost Relay API Specs
HTML
28
star
48

prio-load-balancer

Priority JSON-RPC load balancer (with retries, good logging, and other goodies like SGX/SEV attestation support)
Go
27
star
49

mev-proxy

JavaScript
18
star
50

contender

Generate high-volume state contention on EVM-like networks.
Rust
18
star
51

andromeda-sirrah-contracts

forge development env for SUAVE key management
Solidity
18
star
52

flashbots-repository-template

Template to bootstrap and configure new projects maintained by the Flashbots collective
17
star
53

mev-blocks

JavaScript
17
star
54

flashbots-dashboard

TypeScript
17
star
55

flashbots-writings-website

MDX
15
star
56

EIP-712-swap-PoC

Solidity
11
star
57

go-utils

Various reusable Go utilities and modules
Go
11
star
58

eth-sparse-mpt

Caching sparse Merkle Patricia Trie for reth.
Rust
7
star
59

flashbots-airflow-workflows

Python
6
star
60

curve-based-bundle-pricing

Jupyter Notebook
6
star
61

dealer-smart-contract

Integral DEX smart contract
TypeScript
6
star
62

flashbots-data-transparency

Collection, analysis and presentation of Flashbots data.
JavaScript
6
star
63

suave-docs

TypeScript
6
star
64

gramine-andromeda-revm

Python
5
star
65

mev-inspect-logs

Log-based MEV inspections
JavaScript
5
star
66

aleth

C++
5
star
67

research-mev-eip1559

Jupyter Notebook
4
star
68

web3-data-tools

Data tools for Web3
Jupyter Notebook
3
star
69

node-healthchecker

Composite health (sync status) checker for blockchain nodes
Go
3
star
70

yocto-manifests

Repo Manifests for the Yocto Project Build System for reproducible TEE builds
Makefile
3
star
71

builder-olympics-website

HTML
2
star
72

suave-toolchain

JavaScript
2
star
73

protect-explorer

A dashboard designed to illuminate the savings Protect users are enjoying.
TypeScript
2
star
74

prometheus-sns-lambda-slack

Receive prometheus alerts via AWS SNS and publish then to slack channel
Go
2
star
75

flashbots-toolchain

GitHub action to install Flashbots tools
JavaScript
2
star
76

nginx-static-response

nginx image that returns a fixed status code
Dockerfile
1
star
77

yocto-scripts

Shell
1
star
78

rbuilder-relay-measurement

A script to pull data for a specific block number and builder public key from the relay and compare them against those from the builder logs
Python
1
star
79

revm

Revm suited for suave needs
Rust
1
star
80

kube-sidecar-injector

Sidecar injector for k8s
Go
1
star
81

eth-faucet

Faucet for ethereum based chains
Go
1
star