Hello!
Below you can find a outline of how to reproduce our solution for the CHAMPS competition. If you run into any trouble with the setup/code or have any questions please contact us at [email protected]
Copyright 2019 Robert Bosch GmbH
Code authors: Zico Kolter, Shaojie Bai, Devin Wilmott, Mordechai Kornbluth, Jonathan Mailoa, part of Bosch Research (CR).
Archive Contents
config/
: Configuration filesdata/
: Raw datamodels/
: Saved modelsprocessed/
: Processed datasrc/
: Source code for preprocessing, training, and predicting.submission/
: Directory for the actual predictions
Hardware (The following specs were used to create the original solution)
The variety of models were trained on different machines, each running a Linux OS:
- 5 machines had 4 GPUs, each a NVIDIA GeForce RTX 2080 Ti
- 2 machines had 1 GPU NVIDIA Tesla V100 with 32 GB memory
- 6 machines had 1 GPU NVIDIA Tesla V100 with 16 GB memory
Software
- Python 3.5+
- CUDA 10.1
- NVIDIA APEX (Only available through the repo at this phase)
Python packages are detailed separately in requirements.txt
.
Note: Though listed in requirements.txt
, rdkit
is not available with pip
. We strongly suggest installing rdkit
via conda:
conda install -c rdkit rdkit
Data Setup
We use only the train.csv
, test.csv
, and structures.csv
files of the competition. They should be (unzipped and) placed in the data/
directory. All of the commands below are executed from the src/
directory.
Data Processing
cd src/
python pipeline_pre.py 1
(This could take 1-2 hours)python pipeline_pre.py 2
(You may need to change the permission to the .csv
files to use the two scripts above via chmod
.)
Model Build - There are three options to produce the solution.
While in src/
:
- Very fast prediction:
predictor.py fast
to use the precomputed results for ensembling. - Ordinary prediction:
predictor.py
to use the precomputed checkpoints for predicting and ensembling. - Re-train models:
train.py
to train a new model from scratch. Seetrain.py -h
for allowed arguments, andconfig
files for each model for the arguments used.
The config/models.json
file contains the following important keys:
- names: List of the names we will ensemble
- output file: The name of the ensembled output file
- num atom types, bond types, triplet types, quad types: These are arguments to pass to the GraphTransformer instantiator. Note that in the default setting, quadruplet information is not used by GTs.
model_dir
: The directory inmodels/
associated with each model. Each directory must havegraph_transformer.py
with aGraphTransformer
class (and any modules it needs);config
file with the kwargs to instantiate theGraphTransformer
class;[MODEL_NAME].ckpt
that can be loaded viaload_state_dict(torch.load('[MODEL_NAME].ckpt').state_dict())
(to avoid PyTorch version conflict).
Notes on (Pre-trained) Model Loading
All pretrained models are stored in models/
. However, different models may have slightly different architecture (e.g., some GT models are followed by a 2-layer grouped residual network, while some others only have one residual block). The training script (train.py
), when initiated without the --debug
flag, will automatically create a log folder in CHAMPS-GT/
that contains the code for the GT used. When loading the model, use the graph_transformer.py
in that log folder (instead of the default one in src/
).
Notes on Model Training
When trained from scratch, the default parameters should lead to a model achieving a score of around -3.06 to -3.07. Using --debug
flag will prevent the program from creating a log folder.
Notes on Saving Memory
What if you got a CUDA out of memory
error? We suggest a few solutions:
- If you have a multi-GPU machine, use the
--multi_gpu
flag, and tune the--gpu0_bsz
flag (which controls the minibatch size passed to GPU device 0). For instance, on a 4-GPU machine, you can dopython train.py [...] --batch_size 47 --multi_gpu --gpu0_bsz 11
, which assigns a batch size of 12 to GPU1,2,3
and a batch size of 11 to GPU0
. - Use the
--fp16
option, which applies NVIDIA APEX's mixed precision training. - Use the
--batch_chunk
option, which chunks a larger batch into a few smaller (equal) shares. The gradients from the smaller minibatches will accumulate, so the effective batch size is still the same as--batch_size
. - Use fewer
--n_layer
, or smaller--batch_size
:P