• Stars
    star
    117
  • Rank 301,828 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)

NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho

DOI

This repo contains:

  • The NaturalProofs Dataset
  • Tokenized task data for mathematical reference retrieval and generation.
  • Preprocessing NaturalProofs and the task data.
  • Training and evaluation for mathematical reference retrieval and generation.
  • Pretrained models for mathematical reference retrieval and generation.

Please cite our work if you found the resources in this repository useful:

@inproceedings{welleck2021naturalproofs,
  title={NaturalProofs: Mathematical Theorem Proving in Natural Language},
  author={Sean Welleck and Jiacheng Liu and Ronan Le Bras and Hannaneh Hajishirzi and Yejin Choi and Kyunghyun Cho},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
  year={2021},
  url={https://openreview.net/forum?id=Jvxa8adr3iY}
}

Quick download

To download and unpack NaturalProofs, use:

pip install gdown
python download.py --naturalproofs --savedir /path/to/savedir

To download and unpack all files that we describe below, use:

python download.py --naturalproofs --tokenized --checkpoint --other --savedir /path/to/savedir

This creates the following file structure:

{savedir}/data   # contains NaturalProofs base data (.json files) and tokenized task data (.pkl files)
{savedir}/ckpt   # contains pretrained model checkpoints
{savedir}/other  # contains precomputed files for evaluation (ref encodings, etc.)

NaturalProofs Dataset

We provide the NaturalProofs Dataset (JSON per domain):

NaturalProofs Dataset [zenodo] Domain
naturalproofs_proofwiki.json ProofWiki
naturalproofs_stacks.json Stacks
naturalproofs_trench.json Real Analysis textbook
naturalproofs_stein.json (script) Number Theory textbook

To download NaturalProofs, use:

python download.py --naturalproofs --savedir /path/to/savedir

Combined ProofWiki+Stacks

The download includes an extra combined ProofWiki+Stacks file made with notebooks/merge.ipynb.

Preprocessing

To see the steps used to create each domain of NaturalProofs from raw data, see the following notebooks.
This preprocessing is not needed if you are using a preprocessed dataset provided above.

Domain
ProofWiki notebook
Stacks notebook
Real Analysis textbook notebook
Number Theory textbook notebook

Mathematical Reference Retrieval and Generation

To use NaturalProofs for the reference retrieval and generation tasks described in the paper, the first step is tokenization.

Tokenized dataset

We tokenize the raw NaturalProofs Dataset into two different formats:

  • Pairwise: (x, r)
    • x theorem (sequence of tokens)
    • r reference (sequence of tokens)
    • This version is used to train and evaluate the pairwise model.
  • Sequence: (x, [rid_1, ..., rid_Tx])
    • x theorem (sequence of tokens)
    • rid_i reference id
    • This version is used to train and evaluate the autoregressive and joint models.

We provide the following versions used in the paper (bert-based-cased tokenizer):

Type Domain Splits
Pairwise, bert-base-cased Proofwiki train,valid,test
Stacks train,valid,test
Real Analysis (textbook)) test
Number Theory (textbook) test
Sequence, bert-base-cased Proofwiki train,valid,test
Stacks train,valid,test
Real Analysis (textbook) test
Number Theory (textbook) test

To download and unpack them, use:

python download.py --tokenized --savedir /path/to/savedir

Or use google drive link.

Pretrained Models

We provide the following models used in the paper:

Type Domain
Pairwise bert-base-cased Proofwiki
Pairwise bert-base-cased Stacks
Pairwise bert-base-cased Proofwiki+Stacks
Joint bert-base-cased Proofwiki
Joint bert-base-cased Stacks
Joint bert-base-cased Proofwiki+Stacks
Autoregressive bert-base-cased Proofwiki
Autoregressive bert-base-cased Stacks

To download and unpack them, use:

python download.py --checkpoint --savedir /path/to/savedir

Or use google drive link.

Creating your own tokenized dataset

This step is not needed if you are using a tokenized dataset provided above.
First, setup the code:

python setup.py develop

To create your own tokenized versions:

  • Pairwise: python naturalproofs/tokenize_pairwise.py
  • Sequence: python naturalproofs/encoder_decoder/utils.py

Evaluation

We will show you how to run evaluation on the pretrained model checkpoints & associated files.

Setup

We will assume the file structure given by using the download script.

python download.py --naturalproofs --tokenized --checkpoint --other --savedir <SAVE-DIR>

We provide a script which assembles an evaluation command for (model type, domain, task) combinations.
We show example commands below.

python run_analysis.py \
--train-ds-names {proofwiki stacks}+ \              # one or more training domains to choose a model
--eval-ds-names {proofwiki stacks stein trench}+ \  # one or more evaluation domains
--model {pairwise, joint, autoregressive} \ 
--generation \                                      # for generation task (autoregressive or joint models only)
--split {valid, test} \
--gpu <integer GPU id> \
--codedir /path/to/naturalproofs_code \
--datadir <SAVE-DIR>/data \
--ckptdir <SAVE-DIR>/ckpt \
--outdir <SAVE-DIR>/output

To make sure your filepaths line up, please look inside run_analysis.py to see how the --{}dir arguments are used.

Example: pairwise retrieval

python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki stein trench \
--model pairwise \
--gpu 1 \
--split test

Example: joint retrieval

python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki \
--model joint \
--gpu 1 \
--split test

Example: joint retrieval OOD

For OOD evaluation on stein and trench textbooks, provide reference embeddings from the pairwise model.
These are the __encs.pt files from running the pairwise retrieval evaluation (we provide an example in other/).

python run_analysis.py --model joint \
--train-ds-names proofwiki \
--eval-ds-names stein trench \
--stein-rencs <SAVE-DIR>/other/pairwise__train_proofwiki__eval_stein__test__encs.pt \
--trench-rencs <SAVE-DIR>/other/pairwise__train_proofwiki__eval_trench__test__encs.pt \
--gpu 1 \
--split test

Example: joint retrieval proofwiki+stacks model

To align the model's combined output space with the individual dataset used for evaluation, give a tok2tok.pkl map (we provide an example in other/):

python run_analysis.py --model joint \
--train-ds-names both \
--eval-ds-names proofwiki stacks \
--modeltok2datatok <SAVE-DIR>/other/tok2tok.pkl \
--gpu 1 \
--split test

Note that OOD evaluation (stein or trench) is not implemented for the combined model.

Example: autoregressive retrieval

Without the --generation flag, adjusts settings for retrieval evaluation:

python run_analysis.py --model autoregressive \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid

Note that OOD evaluation (stein or trench) is not implemented for the autoregressive model.

Example: autoregressive generation

python run_analysis.py --model autoregressive --generation \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid

Training

The provided code supports:

  • Training a pairwise model
  • Training an autoregressive or joint model, initialized with pairwise model components (parameters, reference embeddings)

Training a pairwise model

python naturalproofs/model.py --expr-name pairwise \
--datapath /path/to/<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output

Training a joint model

The joint (and autoregressive) model uses a pairwise checkpoint, and reference encodings for initialization.

  • The pairwise checkpoint is saved during pairwise training.
  • The reference encodings are saved in a encs.pt file during pairwise Evaluation.
python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence_<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__<DOMAIN>.ckpt \
--encs-file /path/to/train_<DOMAIN>__eval_<DOMAIN>__valid__encs.pt \  # obtained from running evaluation on trained pairwise model 
--parallel 1 \
--set-mode 1   # discard duplicates

Our implementation uses the same encoder-decoder architecture for the autoregressive and joint models, considering the joint model as a one-step special case (with KL-div loss). See the Appendix for a discussion on this design decision and technical details.

Training an autoregressive model

python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence_<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__<DOMAIN>.ckpt \
--encs-file /path/to/train_<DOMAIN>__eval_<DOMAIN>__valid__encs.pt \  # obtained from running evaluation on trained pairwise model 
--parallel 0 \
--set-mode 0   # keep duplicates

Non-neural baselines

TF-IDF example:

python naturalproofs/baselines.py \
--method tfidf \
--datapath /path/to/<DOMAIN>_tokenized__bert-base-cased_200.pkl \
--datapath-base /path/to/naturalproofs_<DOMAIN>.json \
--savedir /path/to/out/

==> /path/to/out/tfidf__eval.pkl

Then use analyze.py to compute metrics:

python naturalproofs/analyze.py \
--method tfidf \
--eval-path /path/to/out/tfidf__eval.pkl \
--datapath-base /path/to/naturalproofs_<DOMAIN>.json 

==> /path/to/out/tfidf__analysis.pkl

More Repositories

1

ntptutorial

Tutorial on neural theorem proving
Jupyter Notebook
149
star
2

llmstep

llmstep: [L]LM proofstep suggestions in Lean 4.
Python
109
star
3

nonmonotonic_text

Non-Monotonic Sequential Text Generation (ICML 2019)
Python
73
star
4

port_opt

Portfolio Optimization in Python
Python
46
star
5

naturalprover

NaturalProver: Grounded Mathematical Proof Generation with Language Models
Python
34
star
6

cws

Chinese Word Segmentation
Python
26
star
7

vaes

Variational Autoencoders & Normalizing Flows Project
Jupyter Notebook
19
star
8

llemma_formal2formal

Llemma formal2formal (tactic prediction) theorem proving experiments
Python
16
star
9

online_lda_python

Online LDA using Hoffman's Python Implementation
Python
16
star
10

lda_tweets

Latent Dirichlet Allocation on tweets
JavaScript
15
star
11

symbolic_generalization

Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics (AAAI 2022)
Python
14
star
12

eigenfaces

eigenfaces experiment using PCA
Python
12
star
13

mgs

MLE-Guided Parameter Search (AAAI 2021)
Python
11
star
14

overlap

Tool for n-gram overlap analysis between test and training sequences
Jupyter Notebook
8
star
15

coltrane

A jazzy Haskell web framework.
Haskell
3
star
16

math-lm

HTML
2
star
17

lda_haskell

Online Latent Dirichlet Allocation in Haskell
Haskell
1
star
18

mt_decoder

Decoder for Machine Translation.
Python
1
star
19

lean4_information_theory

Lean
1
star
20

word_aligner

Word Aligner for Machine Translation
Fortran
1
star
21

multiset

Loss Functions for Multiset Prediction
Python
1
star
22

llm-lean

LLM tools for Lean
Lean
1
star
23

llm-lean-server

Python
1
star