• Stars
    star
    109
  • Rank 312,240 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Interpreting Bidirectional Encoder Representations from Transformers (BERT)

What does BERT learn about the structure of language?

Code used in our ACL'19 paper for interpreting BERT model.

Dependencies

Quick Start

Phrasal Syntax (Section 3 in paper)

  • Navigate:
cd chunking/
wget https://www.clips.uantwerpen.be/conll2000/chunking/train.txt.gz
gunzip train.txt.gz

The last command replaces train.txt.gz file with train.txt file.

  • Extract BERT features for chunking related tasks (clustering and visualization):
python extract_features.py --train_file train.txt --output_file chunking_rep.json
  • Run t-SNE of span embeddings for each BERT layer (Figure 1):
python visualize.py --feat_file chunking_rep.json --output_file_prefix tsne_layer_

This would create one t-SNE plot for each BERT layer and stores as pdf (e.g. tsne_layer_0.pdf).

  • Run KMeans to evaluate the clustering performance of span embeddings for each BERT layer (Table 1):
python cluster.py --feat_file chunking_rep.json

Probing Tasks (Section 4)

  • Navigate:
cd probing/
python extract_features.py --data_file tree_depth.txt --output_file tree_depth_rep.json

In the above command, append --untrained_bert flag to extract untrained BERT features.

  • Train the probing classifier for a given BERT layer (indexed from 0) and evaluate the performance (Table 2):
python classifier.py --labels_file tree_depth.txt --feats_file tree_depth_rep.json --layer 0

We use the hyperparameter search space recommended by SentEval.

Subject-Verb Agreement (SVA) (Section 5)

  • Navigate:
cd sva/
python extract_features.py --data_file agr_50_mostcommon_10K.tsv --output_folder ./
  • Train the binary classifier for a given BERT layer (indexed from 0) and evaluate the performance (Table 3):
python classifier.py --input_folder ./ --layer 0

We use the hyperparameter search space recommended by SentEval.

Compositional Structure (Section 6)

  • Navigate:
cd tpdn/
  • Download the SNLI 1.0 corpus and extract it.
  • Extract BERT features for premise sentences present in SNLI:
python extract_features.py --input_folder . --output_folder .
  • Train the Tensor Product Decomposition Network (TPDN) to approximate a given BERT layer (indexed from 0) and evaluate the performance (Table 4):
python approx.py --input_folder . --output_folder . --layer 0

Check --role_scheme and --rand_tree flags for setting the role scheme.

  • Induce dependency parse tree from attention weights for a given attention head and BERT layer (both indexed from 1) (Figure 2):
python induce_dep_trees.py --sentence text "The keys to the cabinet are on the table" --head_id 11 --layer_id 2 --sentence_root 6 

Acknowledgements

This repository would not be possible without the efforts of the creators/maintainers of the following libraries:

License

This repository is GPL-licensed.