Efficient evolution from general protein language models
Scripts for running the analysis described in the paper "Efficient evolution of human antibodies from general protein language models".
Running the model
To evaluate the model on a new sequence, clone this repository and run
python bin/recommend.py [sequence]
where [sequence]
is the wildtype protein sequence you want to evolve. The script will output a list of substitutions and the number of recommending language models.
To recommend mutations to antibody variable domain sequences, we have simply run the above script separately on the heavy and light chain sequences.
We have also made a Google Colab notebook available. However, this notebook requires a full download and installation of the language models for each run and requires Colab Pro instances with a higher memory requirement than the free version of Colab. When making many predictions, we recommend the local installation above, as this will allow you to cache and reuse the models.
Paper analysis scripts
To reproduce the analysis in the paper, first download and extract data with the commands:
wget https://zenodo.org/record/6968342/files/data.tar.gz
tar xvf data.tar.gz
To acquire mutations to a given antibody, run the command
bash bin/eval_models.sh [antibody_name]
where [antibody_name]
is one of medi8852
, medi_uca
, mab114
, mab114_uca
, s309
, regn10987
, or c143
.
DMS experiments can be run with the command
bash bin/dms.sh