• Stars
    star
    1,142
  • Rank 40,813 (Top 0.9 %)
  • Language
    Jupyter Notebook
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pure Python from-scratch zero-dependency implementation of Bitcoin for educational purposes

cryptos

Just me developing a pure Python from-scratch zero-dependency implementation of Bitcoin for educational purposes, including all of the under the hood crypto primitives such as SHA-256 and elliptic curves over finite fields math.

SHA-256

My pure Python SHA-256 implementation closely following the NIST FIPS 180-4 spec, in cryptos/sha256.py. Since this is a from scratch pure Python implementation it is slow and obviously not to be used anywhere except for educational purposes. Example usage:

$ echo "some test file lol" > testfile.txt
$ shasum -a 256 testfile.txt
4a79aed64097a0cd9e87f1e88e9ad771ddb5c5d762b3c3bbf02adf3112d5d375
$ python -m cryptos.sha256 testfile.txt
4a79aed64097a0cd9e87f1e88e9ad771ddb5c5d762b3c3bbf02adf3112d5d375

Keys

getnewaddress.py is a cli entryway to generate a new Bitcoin secret/public key pair and the corresponding (base58check compressed) address:

$ python getnewaddress.py
generated secret key:
0xc322622e6a0033bb93ff666753f77cc8b819d274d9edea007b7e4b2af4caf025
corresponding public key:
x: 5B9D87FE091D52EA4CD49EA5CEFDD8C099DF7E6CCF510A9A94C763DE38C575D5
y: 6049637B3683076C5568EC723CF7D38FD603B88447180829BBB508C554EEA413
compressed bitcoin address (b58check format):
1DBGfUXnwTS2PRu8h3JefU9uYwYnyaTd2z

Digital Signatures

Elliptic Curve Digital Signature Algorithm (ECDSA) implemented in cryptos/ecdsa.py, example usage:

>>> from cryptos.keys import gen_key_pair
>>> from cryptos.ecdsa import sign, verify
>>> sk1, pk1 = gen_key_pair()
>>> sk2, pk2 = gen_key_pair()
>>> message = ('pk1 wants to pay pk2 1 BTC').encode('ascii')
>>> sig = sign(sk1, message)
>>> verify(pk1, message, sig)
True
>>> verify(pk2, message, sig)
False

Transactions

Bitcoin transaction objects (both legacy or segwit) can be instantiated and parsed from raw bytes. An example of parsing a legacy type transaction:

>>> from cryptos.transaction import Tx
>>> from io import BytesIO

>>> # example transaction in Programming Bitcoing Chapter 5
>>> raw = bytes.fromhex('0100000001813f79011acb80925dfe69b3def355fe914bd1d96a3f5f71bf8303c6a989c7d1000000006b483045022100ed81ff192e75a3fd2304004dcadb746fa5e24c5031ccfcf21320b0277457c98f02207a986d955c6e0cb35d446a89d3f56100f4d7f67801c31967743a9c8e10615bed01210349fc4e631e3624a545de3f89f5d8684c7b8138bd94bdd531d2e213bf016b278afeffffff02a135ef01000000001976a914bc3b654dca7e56b04dca18f2566cdaf02e8d9ada88ac99c39800000000001976a9141c4bc762dd5423e332166702cb75f40df79fea1288ac19430600')
>>> tx = Tx.parse(BytesIO(raw))
>>> # we get back a Transaction object with parsed fields
>>> tx

Tx(version=1, tx_ins=[TxIn(prev_tx=b'\xd1\xc7\x89\xa9\xc6\x03\x83\xbfq_?j\xd9\xd1K\x91\xfeU\xf3\xde\xb3i\xfe]\x92\x80\xcb\x1a\x01y?\x81', prev_index=0, script_sig=3045022100ed81ff192e75a3fd2304004dcadb746fa5e24c5031ccfcf21320b0277457c98f02207a986d955c6e0cb35d446a89d3f56100f4d7f67801c31967743a9c8e10615bed01 0349fc4e631e3624a545de3f89f5d8684c7b8138bd94bdd531d2e213bf016b278a, sequence=4294967294, witness=None)], tx_outs=[TxOut(amount=32454049, script_pubkey=OP_DUP OP_HASH160 bc3b654dca7e56b04dca18f2566cdaf02e8d9ada OP_EQUALVERIFY OP_CHECKSIG), TxOut(amount=10011545, script_pubkey=OP_DUP OP_HASH160 1c4bc762dd5423e332166702cb75f40df79fea12 OP_EQUALVERIFY OP_CHECKSIG)], locktime=410393, segwit=False)

And we can verify that the transaction is Bitcoin law-abiding and cryptographically authentic:

>>> tx.validate()
True

This isn't exactly a complete verification as a Bitcoin full node would do and e.g. skips verification of double spends, script sizing limits, etc., and also it only supports the (simpler) p2pkh transactions. Notably, this does not include the "modern" segwit versions used predominantly in today's Bitcoin traffic since the soft fork of BIP141 around July 2017.

Blocks

See cryptos/block.py for Block class, functions and utilities.

Lightweight Node

A lightweight Bitcoin Node that speaks a subset of the Bitcoin protocol is in cryptos/network.py. This node connects to other nodes using Python's socket, performs version handshake and then can request block headers. E.g. we can walk the first 40,000 blocks (in batches of 2,000) and partially validate them. A Bitcoin full node would fetch the full block (not just headers) with all transactions and also validate those, etc. But a partial validation would look like:

from io import BytesIO
from cryptos.block import Block, GENESIS_BLOCK, calculate_new_bits
from cryptos.network import SimpleNode
from cryptos.network import (
    GetHeadersMessage,
    HeadersMessage,
)

# connect to a node and pretty please ask for
# 20 block headers starting with the genesis block

# Start with the genesis block
# https://en.bitcoin.it/wiki/Genesis_block
# class Block:
#     version: int        # 4 bytes little endian
#     prev_block: bytes   # 32 bytes, little endian
#     merkle_root: bytes  # 32 bytes, little endian
#     timestamp: int      # uint32, seconds since 1970-01-01T00:00 UTC
#     bits: bytes         # 4 bytes, current target in compact format
#     nonce: bytes        # 4 bytes, searched over in pow
previous = Block.decode(BytesIO(GENESIS_BLOCK['main']))

# okay now let's crawl the blockchain block headers
node = SimpleNode(
    host='mainnet.programmingbitcoin.com',
    net='main',
)
node.handshake()

blocks = [previous]
for _ in range(20):

    # request next batch of 2,000 headers
    getheaders = GetHeadersMessage(start_block=bytes.fromhex(previous.id()))
    node.send(getheaders)
    headers = node.wait_for(HeadersMessage)

    # extend our chain of block headers
    blocks.extend(headers.blocks)

    previous = headers.blocks[-1]
    print(f"received another batch of blocks, now have {len(blocks)}")

node.close()
# we now have 40,001 blocks total, 80 bytes each in raw, so total of ~3.2MB of data

# now (partially) validate blockchain integrity
for i, block in enumerate(blocks):

    # validate proof of work on this block
    assert block.validate()

    # validate pointer to the previous node matches
    prev = blocks[i - 1]
    expected_prev_block = b'\x00'*32 if i == 0 else bytes.fromhex(prev.id())
    assert block.prev_block == expected_prev_block

    # validate the proof of work target calculation on the block was correct
    if i % 2016 == 0:
        if i == 0:
            # genesis block had hardcoded value for bits
            expected_bits = bytes.fromhex('ffff001d')
        else:
            # recalculate the target at every epoch (2016 blocks), approx 2 week period
            # note that Satoshi had an off-by-one bug in this calculation because we are
            # looking at timestamp difference between first and last block in an epoch,
            # so these are only 2015 blocks apart instead of 2016 blocks apart Β―\_(ツ)_/Β―
            prev_epoch = blocks[i - 2016]
            time_diff = prev.timestamp - prev_epoch.timestamp
            expected_bits = calculate_new_bits(prev.bits, time_diff)
    assert block.bits == expected_bits

    if i % 1000 == 0:
        print(f"on block {i+1}/{len(blocks)}")

It feels very nice to independently at least partially verify the integrity of the block chain :)

Unit tests

$ pytest

License

MIT

More Repositories

1

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
Python
22,607
star
2

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Python
15,735
star
3

char-rnn

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
Lua
11,228
star
4

convnetjs

Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
JavaScript
10,642
star
5

nn-zero-to-hero

Neural Networks: Zero to Hero
Jupyter Notebook
8,476
star
6

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Jupyter Notebook
5,613
star
7

neuraltalk2

Efficient Image Captioning code in Torch, runs on GPU
Jupyter Notebook
5,426
star
8

neuraltalk

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
Python
5,352
star
9

arxiv-sanity-preserver

Web interface for browsing, search and filtering recent arxiv submissions
Python
4,943
star
10

ng-video-lecture

Python
2,074
star
11

reinforcejs

Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)
HTML
1,273
star
12

makemore

An autoregressive character-level language model for making more things
Python
1,217
star
13

randomfun

Notebooks and various random fun
Jupyter Notebook
996
star
14

ulogme

Automatically collect and visualize usage statistics in Ubuntu/OSX environments.
Python
941
star
15

recurrentjs

Deep Recurrent Neural Networks and LSTMs in Javascript. More generally also arbitrary expression graphs with automatic differentiation.
HTML
918
star
16

arxiv-sanity-lite

arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
Python
864
star
17

tsnejs

Implementation of t-SNE visualization algorithm in Javascript.
JavaScript
815
star
18

pytorch-normalizing-flows

Normalizing flows in PyTorch. Current intended use is education not production.
Jupyter Notebook
790
star
19

paper-notes

Random notes on papers, likely a short-term repo.
660
star
20

svmjs

Support Vector Machine in Javascript (SMO algorithm, supports arbitrary kernels) + GUI demo
JavaScript
636
star
21

pytorch-made

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch
Python
510
star
22

karpathy.github.io

my blog
CSS
472
star
23

lecun1989-repro

Reproducing Yann LeCun 1989 paper "Backpropagation Applied to Handwritten Zip Code Recognition", to my knowledge the earliest real-world application of a neural net trained with backpropagation.
Jupyter Notebook
425
star
24

deep-vector-quantization

VQVAEs, GumbelSoftmaxes and friends
Jupyter Notebook
422
star
25

covid-sanity

Aspires to help the influx of bioRxiv / medRxiv papers on COVID-19
Python
351
star
26

find-birds

Find people you should follow on Twitter based on who the people you follow follow
Python
305
star
27

forestjs

Random Forest implementation for JavaScript. Supports arbitrary weak learners. Includes interactive demo.
JavaScript
284
star
28

researchlei

An Academic Papers Management and Discovery System
Python
194
star
29

Random-Forest-Matlab

A Random Forest implementation for MATLAB. Supports arbitrary weak learners that you can define.
MATLAB
172
star
30

researchpooler

Automating research publications discovery and analysis. For example, ever wish your computer could automatically open papers that are most similar to a paper at an arbitrary url? How about finding all papers that report results on some dataset? Let's re-imagine literature review.
Python
167
star
31

nipspreview

Scripts that generate .html to more easily see NIPS papers
Python
147
star
32

ttmik

Talk to me in Korean Anki cards and related scripts
Python
103
star
33

tf-agent

tensorflow reinforcement learning agents for OpenAI gym environments
Python
99
star
34

gitstats

A lightweight/pretty visualizer for recent work on a git code base in multiple branches. Helps stay up to date with teams working on one git repo in many branches.
HTML
85
star
35

EigenLibSVM

A wrapper for LibSVM that lets you train SVM's directly on Eigen library matrices in C++
C++
74
star
36

MatlabWrapper

C++ convenience class to communicate with a Matlab instance. Send matrices back and forth, execute arbitrary Matlab commands, or drop into interactive Matlab session right in the middle of your C++ code.
C++
52
star
37

twoolpy

useful scripts to work with Twitter + Python. Requires the tweepy library.
Python
50
star
38

notpygamejs

Game making library for using Canvas element
JavaScript
41
star
39

scholaroctopus

A set of tools/pages that help explore academic literature
33
star
40

karpathy

root repo
19
star