What does BERT learn about the structure of language?
Code used in our ACL'19 paper for interpreting BERT model.
Dependencies
- PyTorch
- pytorch-pretrained-BERT
- SentEval
- spaCy (for dependency tree visualization)
Quick Start
Phrasal Syntax (Section 3 in paper)
- Navigate:
cd chunking/
- Download the train set from CoNLL-2000 chunking corpus:
wget https://www.clips.uantwerpen.be/conll2000/chunking/train.txt.gz
gunzip train.txt.gz
The last command replaces train.txt.gz
file with train.txt
file.
- Extract BERT features for chunking related tasks (clustering and visualization):
python extract_features.py --train_file train.txt --output_file chunking_rep.json
- Run t-SNE of span embeddings for each BERT layer (Figure 1):
python visualize.py --feat_file chunking_rep.json --output_file_prefix tsne_layer_
This would create one t-SNE plot for each BERT layer and stores as pdf (e.g. tsne_layer_0.pdf
).
- Run KMeans to evaluate the clustering performance of span embeddings for each BERT layer (Table 1):
python cluster.py --feat_file chunking_rep.json
Probing Tasks (Section 4)
- Navigate:
cd probing/
- Download the data files for 10 probing tasks (e.g.
tree_depth.txt
) - Extract BERT features for sentence level probing tasks (
tree_depth
in this case):
python extract_features.py --data_file tree_depth.txt --output_file tree_depth_rep.json
In the above command, append --untrained_bert
flag to extract untrained BERT features.
- Train the probing classifier for a given BERT layer (indexed from 0) and evaluate the performance (Table 2):
python classifier.py --labels_file tree_depth.txt --feats_file tree_depth_rep.json --layer 0
We use the hyperparameter search space recommended by SentEval.
Subject-Verb Agreement (SVA) (Section 5)
- Navigate:
cd sva/
- Download the data file for SVA task and extract it.
- Extract BERT features for SVA task:
python extract_features.py --data_file agr_50_mostcommon_10K.tsv --output_folder ./
- Train the binary classifier for a given BERT layer (indexed from 0) and evaluate the performance (Table 3):
python classifier.py --input_folder ./ --layer 0
We use the hyperparameter search space recommended by SentEval.
Compositional Structure (Section 6)
- Navigate:
cd tpdn/
- Download the SNLI 1.0 corpus and extract it.
- Extract BERT features for premise sentences present in SNLI:
python extract_features.py --input_folder . --output_folder .
- Train the Tensor Product Decomposition Network (TPDN) to approximate a given BERT layer (indexed from 0) and evaluate the performance (Table 4):
python approx.py --input_folder . --output_folder . --layer 0
Check --role_scheme
and --rand_tree
flags for setting the role scheme.
- Induce dependency parse tree from attention weights for a given attention head and BERT layer (both indexed from 1) (Figure 2):
python induce_dep_trees.py --sentence text "The keys to the cabinet are on the table" --head_id 11 --layer_id 2 --sentence_root 6
Acknowledgements
This repository would not be possible without the efforts of the creators/maintainers of the following libraries:
- pytorch-pretrained-BERT from huggingface
- SentEval from facebookresearch
- bert-syntax from yoavg
- tpdn from tommccoy1
- rnn_agreement from TalLinzen
- Chu-Liu-Edmonds from bastings
License
This repository is GPL-licensed.