• Stars
    star
    380
  • Rank 112,766 (Top 3 %)
  • Language Coq
  • License
    GNU Lesser Genera...
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Learning Environment for Theorem Proving with the Coq proof assistant

CoqGym

Example proof

Code for the paper:

Learning to Prove Theorems via Interacting with Proof Assistants
Kaiyu Yang and Jia Deng
International Conference on Machine Learning (ICML) 2019

@inproceedings{yang2019coqgym,
  title={Learning to Prove Theorems via Interacting with Proof Assistants},
  author={Yang, Kaiyu and Deng, Jia},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2019}
}

For potential bugs, please open an issue. For any other questions, please ask in Discussions.

Table of Contents

  1. Installing CoqGym
        1.1 Dependencies
        1.2 Building Coq, SerAPI, CoqHammer, and the Coq Projects
        1.3 Extracting the Proofs from Coq Code
        1.4 Downloading the Pre-extracted Proofs
  2. Using CoqGym in Docker
  3. Data Format
        3.1 JSON Files
        3.2 LMDB File
        3.3 Gloassary
  4. Data Utilities
        4.1 Interacting with CoqGym
        4.2 Parsing Coq Terms
        4.3 Computing Dataset Statistics
  5. The ASTactic Model
        5.1 Prerequisites
        5.2 Extracting Proof Steps
        5.3 Training
        5.4 Testing
  6. Credits
  7. Contributing

1. Installing CoqGym

We recommend using CoqGym in Docker, as CoqGym has many dependencies and is nontrivial to set up correctly. If Docker is not an option, below are steps to obtain the CoqGym dataset and build the interaction environment natively:

1.1 Dependencies

1.2 Building Coq, SerAPI, CoqHammer, and the Coq Projects

  1. Create an OPAM switch for OCaml 4.07.1+flambda: opam switch create 4.07.1+flambda && eval $(opam env)
  2. Upgrade the installed OPAM packages (optional): opam upgrade && eval $(opam env)
  3. Clone the repository: git clone https://github.com/princeton-vl/CoqGym
  4. Install Coq, SerAPI and CoqHammer: cd CoqGym && source install.sh
  5. Build the Coq projects (can take a while): cd coq_projects && make && cd ..
  6. Create and activate the conda environment: conda env create -f coq_gym.yml && conda activate coq_gym

Note: Coq, SerAPI, CoqHammer, and the Coq projects in coq_projects directory are independent software projects with their own code repositories, but please follow the instructions above to build the specific versions we need.

1.3 Extracting the Proofs from Coq Code (Optional)

We include the code for extracting CoqGym from Coq source code. However, it is not guaranteed to reproduce exactly the same data. The timeout and other miscellaneous errors during the data extraction may be machine-dependent. For example, a faster machine is likely to have fewer timeout errors and thus can extract more proofs. For benchmark purposes, please download and use our pre-extracted version of CoqGym.

  1. Check all Coq files and locate the proofs:
    For each *.meta file in ./coq_projects/, run python check_proofs.py --file /path/to/*.meta
    Now you have generated a *.json file in ./data/ corresponding to each *.meta file. The proofs field of the JSON object is a list containing the proof names.

  2. Extract the proofs:
    For each *.meta file and each proof, run:
    python extract_proof.py --file /path/to/*.meta --proof $PROOF_NAME
    python extract_synthetic_proofs.py --file /path/to/*.meta --proof $PROOF_NAME

  3. Post-processing: python postprocess.py

Caveat: The steps above are computationally expensive. When we say "For each XXX, run YYY", the tasks are embarrassingly parallel, which means you can run them in parallel in any order. We do not provide the code for that because it depends on a particular HPC infrastructure.

1.4 Downloading the Pre-extracted Proofs (Recommended)

  1. Download the CoqGym dataset from DOI.

  2. Unzip the data and set the paths: python unzip_data.py

Caveat: The second step sets the absolute paths in the data. You have to re-do it whenever the absolute path of the data/ directory changes (e.g. after moving the entire repo to another directory).

Now you are ready to interact with CoqGym! Run python eval_env.py to check if it terminates normally without raising an error.


2. Using CoqGym in Docker

As a less painful alternative to installing CoqGym from scratch, we provide a Docker image with everything pre-installed. The image was built from this Dockerfile. It includes the CoqGym dataset, proof steps extracted from the dataset, our pre-trained ASTactic model, and all necessary dependencies such as OPAM, Z3, CVC4, Vampire, and E Prover. You shouldn't need to download anything else if you choose to use the Docker image.


3. Data Format

The dataset contains three parts:

  • The data directory: *.json files corresponding to the *.v files in Coq source code, whose format is explained below. The *.json files contain all important information about the proofs: environment, local context, goals, tactics, proof trees, etc.

  • The sexp_cache directory: A LMDB file that serves as an index for the S-expressions in *.json files. The *.json files contain keys for querying sexp_cache

  • projs_split.json: A JSON object containing the training/validation/testing split

3.1 JSON Files

Each *.json file in data/ corresponds to a Coq source file *.v in coq_projects/. For example, data/StructTact/Assoc.json corresponds to coq_projects/StructTact/Assoc.v.

The format of the JSON files is described below. The hash codes are used as keys to query the LMDB sexp_cache. Consult the glossary for the terminology.

{
    'filename': 'Assoc.v',            # the path of the Coq source file relative to the root directory of the Coq project
    'coq_project': 'StructTact',      # the name of the Coq project
    'vernac_cmds': [                  # a list of Coq commands [6] in the source file
        ['Cd "$COQGYM_ROOT/coq_projects/StructTact".', 'VernacChdir',  '3701e61f37b72b3e61788fce6317466b7bb92b55'],     # [raw command, command type, command AST (in hash code)]
        ['Arguments a_equiv {_} {_} _ _ _.', 'VernacArguments', '6777d3c472595dae20427d0892ad03d38f70fde9'],
        ...
        ['Arguments a_equiv {_} {_} _ _ _.', 'VernacArguments', '6777d3c472595dae20427d0892ad03d38f70fde9'],
    ],
   'num_extra_cmds': 107,             # the code in the original Coq file starts at vernac_cmds[num_extra_cmds]
   'proofs': [                        # a list of human-written proofs
       ...
    ],
    'synthetic_proofs': [             # a list of synthetic proofs
       ...
    ],
}

The format for a proof is as follows, taking the get_set_same in coq_projects/StructTact/Assoc.v as an example.

{
    'name': get_set_same,             # the name of the theorem
    'line_nb': 118,                   # the theorem is defined in $file_data['vernac_cmds'][$line_nb]
    
    'env_delta': {                          # the global environment relative to the previous proof in the same file
        'add' : {                           # entries that should be added to the environment
            'constants' : [
                {
                    'physical_path': 'coq/theories/Arith/PeanoNat.vo:Nat.mul_wd',       # the unique identifier
                    'short_ident': 'PeanoNat.Nat.mul_wd',                               # the short identifier
                    'qualid': 'Coq.Arith.PeanoNat.Nat.mul_wd',             # the qualified identifier [1]
                    'type': '@Morphisms.Proper (forall (_ : nat) (_ : nat), nat) (@Morphisms.respectful nat (forall _ : nat, nat) (@eq nat) (@Morphisms.respectful nat nat (@eq nat) (@eq nat))) Nat.mul',    # the type [7] of the constant
                    'sort': 'Prop',                                        # the sort [2] of the constant (the type of its type) [2]
                    'opaque': False,                                       # whether the constant is opaque or transparent [3]
                    'sexp': '333b2895c8e62d21856476bf89fa9681c9058bb9'     # the S-expression [4] of the constant produced by SerAPI
            },   
            ...
            ],
            'inductives' : [
                {
                    'physical_path' : 'coq/theories/Init/Wf.vo:Acc',
                    'blocks': [           # a list of blocks in a mutual inductive definition [5]. For regular inductive definitions (most cases), the list has length 1
                        {
                            'short_ident': 'Acc',
                            'qualid': 'Coq.Init.Wf.Acc',
                            'constructors': [
                                ['Acc_intro', 'forall (A : Type) (R : forall (_ : A) (_ : A), Prop) (x : A) (_ : forall (y : A) (_ : R y x), _UNBOUND_REL_6 A R y), _UNBOUND_REL_5 A R x'],      # [constructor name, constructor type]
                                ...
                            ]
                        }
                    ],
                    'is_record': False,
                    'sexp': '31537cb98179ad7d2de0dd2cc783b4672b34b25b'
            }
            ...
            
            ],
        },
        'subtract' : {                      # entries that should be removed from the environment
            'constants' : [],
            'inductives' : [],
        },
    },
    
    'steps': [                        # a list of proof steps
        {
            'command': ['induction l; intros; simpl; repeat (break_match; simpl); subst; congruence.', 'VernacExtend', 'f6d2cb314d72d23562e5f2ef2657bd2589d44794'],        # (raw command, command type, command AST (in hash code)) the Coq command (usually a tactic but also includes other commands such as +, -, *, etc.) 
            'goal_ids': {             # the IDs of the goals in the current proof step
                'fg': [27],           # focused goals 
                'bg': [] .            # unfocused goals
            }
        }
        ...
    ],      
    
    'goals' {                         # the goals
        27': {                        # $goal_id -> goal
                 'id': 27,            # the goal ID
                 'type': 'forall (k : K) (v : V) (l : list (prod K V)), @eq (option V) (assoc (assoc_set l k v) k) (@Some V v)',      # the type (logical statement) of the goal
                 'hypotheses': [      # the local context, a list of local premises
                     {'idents': ['K', 'V'], 'term': [], 'type': 'Type', 'sexp': 'cd1531c49fce6657997962b5375a3ef0a59db34a'}       # {'idents': [a list of identifiers of the premises (usually of length one)], 'term': [a list of Coq terms (usually empty)], 'type': the type (logical statement) of the premise}
                     {'idents': ['K_eq_dec'], 'term': [], 'type': "forall k k' : K, sumbool (@eq K k k') (not (@eq K k k'))", 'sexp': '5f5f5bcf9e10621f8c0c4642c0eba3ff36cbfff8'},
                 ],
             }
    },    
    
    'proof_tree' : {                  # the proof tree
        'goal_id': 27, 'children': []
    },                     
}

3.2 LMDB File

sexp_cache is a LMDB mapping hash codes in *.json files to their corresponding S-expressions. Below is a code snippet in Python for accessing them.

from utils import SexpCache
sexp_cache = SexpCache('sexp_cache')
print(sexp_cache['333b2895c8e62d21856476bf89fa9681c9058bb9'])

3.3 Glossary

  1. qualified identifier
  2. sort
  3. opaque, transparent
  4. S-expression
  5. inductive definition, mutual inductive definition
  6. Coq (vernac) command
  7. type

4. Data Utilities

We include some tools for interacting with CoqGym, but they are NOT a part of the dataset. You may implement your own tools for similar purposes.

4.1 Interacting with CoqGym

eval_env.py enables the interaction with the proofs in CoqGym. See ASTactic/agent.py for examples.

4.2 Parsing Coq Terms

  • gallina.py: a parser the S-expressions of Coq terms. It may be useful for learning the embeddings of the term ASTs.

  • utils.py: functions for iterating through all proofs or Coq files in the dataset.

4.3 Computing Dataset Statistics

  • stats/count_human_proofs.py: count the number of human-written proofs in CoqGym.

  • stats/count_synthetic_proofs.py: count the number of synthetic-written proofs in CoqGym.

  • stats/proofs.py: compute some statistics of the proofs.


5. The ASTactic Model

ASTactic

Here we describe how to train and test the ASTactic model on CoqGym. The following content is NOT a part of the CoqGym dataset, and therefore you do not need it if you only want to access the data.

5.1 Prerequisites

  • Make sure CoqGym has been properly installed and configured. The coq_gym conda environment is activated, the OPAM switch is on 4.07.1+flambda.
  • Automated theorem provers: Vampire, CVC4, Eprover, and Z3. Install all of them and make sure they are accessible in PATH, otherwise you may see a performance degradation of the hammer baseline.
  • PyTorch: Install the correct version for your hardware in the conda environment coq_gym.
  • The instructions below assume that you are in the ASTactic directory.

5.2 Extracting Proof Steps

The ASTactic model is trained on individual proof steps, rather than entire proofs. After obtaining the CoqGym dataset, run python extract_proof_steps.py. This can take a while, and you have the option to run it in parallel, please see the --filter option in the source code for details.

The extracted proof steps are in proof_steps/. You can double-check the number of proof steps to make sure everything works as expected:

Directory # files
proof_steps/train 121,644
proof_steps/valid 68,180

5.3 Training

To train on the proof steps in training + validation set: python main.py --no_validation --exp_id astactic
The "astactic" above is an experiment ID, and you may change it to other IDs. Model checkpoints will be saved to runs/astactic/checkpoints/. See options.py for command line options.

A pre-trained model can be downloaded here.

5.4 Testing

Assuming you want to test the model checkpoint runs/astactic/checkpoints/model_003.pth on the proof get_set_same in ../data/StructTact/Assoc.json:

  • Testing ASTactic:
    python evaluate.py ours ours-TEST --path runs/astactic/checkpoints/model_003.pth --file ../data/StructTact/Assoc.json --proof "get_set_same"

  • Testing an automated tactic X (may be "auto", "trivial", "easy", "intuition", or "hammer"):
    python -u evaluate.py X X-TEST --file ../data/StructTact/Assoc.json --proof "get_set_same"

  • Testing ASTactic+X:
    python -u evaluate.py ours+X ours+X-TEST --path runs/astactic/checkpoints/model_003.pth --file ../data/StructTact/Assoc.json --proof "get_set_same"

Caveat: Testing is computationally expensive, but the workloads are embarrassingly parallel, which means you can run them in parallel in any order. We do not provide the code for that because it depends on a particular HPC infrastructure.

6. Credits

7. Contributing

We welcome and appreciate contributions from the community. For bug fixes and relatively minor changes (such as comments, typos, etc.), feel free to submit a pull request directly. For anything beyond, please first post in Discussions before implementing.

More Repositories

1

infinigen

Infinite Photorealistic Worlds using Procedural Generation
Python
5,286
star
2

RAFT

Python
3,189
star
3

CornerNet

Python
2,355
star
4

CornerNet-Lite

Python
1,780
star
5

DROID-SLAM

Python
1,730
star
6

lietorch

Cuda
670
star
7

RAFT-Stereo

Python
667
star
8

DeepV2D

Python
651
star
9

DPVO

Deep Patch Visual Odometry/SLAM
C++
597
star
10

pose-hg-train

Training and experimentation code used for "Stacked Hourglass Networks for Human Pose Estimation"
Jupyter Notebook
575
star
11

pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
Python
469
star
12

pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
Python
373
star
13

pose-hg-demo

Code to test and use the model from "Stacked Hourglass Networks for Human Pose Estimation"
Lua
316
star
14

SEA-RAFT

[ECCV2024 - Oral, Best Paper Award Candidate] SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Python
298
star
15

RAFT-3D

Python
229
star
16

SimpleView

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"
Python
154
star
17

px2graph

Training code for "Pixels to Graphs by Associative Embedding"
Python
133
star
18

relative_depth

Code for the NIPS 2016 paper
Lua
124
star
19

CER-MVS

Python
122
star
20

YouTube3D

Code for the CVPR 2019 paper "Learning Single-Image Depth from Videos using Quality Assessment Networks"
Python
106
star
21

Coupled-Iterative-Refinement

Python
105
star
22

pose-ae-demo

Python
97
star
23

MultiSlam_DiffPose

Jupyter Notebook
94
star
24

SNP

Official code for View Synthesis with Sculpted Neural Points
Python
83
star
25

DecorrelatedBN

Code for Decorrelated Batch Normalization
Lua
80
star
26

SpatialSense

An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Python
70
star
27

oasis

Code for the CVPR 2020 paper "OASIS: A Large-Scale Dataset for Single Image 3D in the Wild"
MATLAB
64
star
28

selfstudy

Code for reproducing experiments in "How Useful is Self-Supervised Pretraining for Visual Tasks?"
Python
60
star
29

PackIt

Code for reproducing results in ICML 2020 paper "PackIt: A Virtual Environment for Geometric Planning"
Jupyter Notebook
52
star
30

d3dhelper

Unofficial sample code for Distilled 3D Networks (D3D) in Tensorflow.
Jupyter Notebook
48
star
31

Oriented1D

Official code for ICCV 2023 paper "Convolutional Networks with Oriented 1D Kernels"
Python
44
star
32

SOLID

Python
41
star
33

OGNI-DC

[ECCV24] official code for "OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations"
Python
38
star
34

OcMesher

C++
35
star
35

attach-juxtapose-parser

Code for the paper "Strongly Incremental Constituency Parsing with Graph Neural Networks"
Python
34
star
36

surface_normals

Code for the ICCV 2017 paper "Surface Normals in the Wild"
Lua
33
star
37

MetaGen

Code for the paper "Learning to Prove Theorems by Learning to Generate Theorems"
Objective-C++
30
star
38

FormulaNet

Code for FormulaNet in NIPS 2017
Python
29
star
39

Rel3D

Official code for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"
Python
26
star
40

selfstudy-render

Code to generate datasets used in "How Useful is Self-Supervised Pretraining for Visual Tasks?"
Python
22
star
41

think_visually

Code for ACL 2018 paper 'Think Visually: Question Answering through Virtual Imagery'
Python
14
star
42

structured-matching

codes for ECCV 2016
Lua
9
star
43

DPVO_Docker

Shell
8
star
44

uniloss

Python
8
star
45

MetaQNL

Learning Symbolic Rules for Reasoning in Quasi-Natural Language: https://arxiv.org/abs/2111.12038
Julia
6
star
46

PackIt_Extra

Code for generating data in ICML 2020 paper "PackIt: A Virtual Environment for Geometric Planning"
C#
5
star
47

Rel3D_Render

Code for rendering images for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"
Python
3
star
48

HYPE-C

Python
1
star