Investigating Pretrained Language Models for Graph-to-Text Generation
This repository contains the code for the paper: "Investigating Pretrained Language Models for Graph-to-Text Generation", EMNLP | NLP4ConvAI.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
This project is implemented using the framework HuggingFace. Please, refer to their websites for further details on the installation and dependencies.
Environments and Dependencies
- python 3.6
- transformers 3.3.1
- pytorch-lightning 0.9.0
- torch 1.4.0
- parsimonious 0.8.1
Datasets
In our experiments, we use the following datasets: AMR17, WebNLG and AGENDA.
Preprocess
First, convert the dataset into the format required for the model.
For the AMR17, run:
./preprocess_AMR.sh <dataset_folder>
For the WebNLG, run:
./preprocess_WEBNLG.sh <dataset_folder>
For the AGENDA, run:
./preprocess_AGENDA.sh <dataset_folder>
Finetuning
For finetuning the models using the AMR dataset, execute:
./finetune_AMR.sh <model> <gpu_id>
For the WebNLG dataset, execute:
./finetune_WEBNLG.sh <model> <gpu_id>
For the AGENDA dataset, execute:
./finetune_AGENDA.sh <model> <gpu_id>
Options for <model>
are t5-small
, t5-base
, t5-large
, facebook/bart-base
or facebook/bart-large
.
Example:
./finetune_AGENDA.sh t5-small 0
Decoding
For decoding, run:
./decode_AMR.sh <model> <checkpoint> <gpu_id>
./decode_WEBNLG.sh <model> <checkpoint> <gpu_id>
./decode_AGENDA.sh <model> <checkpoint> <gpu_id>
Example:
./decode_WEBNLG.sh t5-base webnlg-t5-base.ckpt 0
Trained models
AMR17 |
---|
bart-base - BLEU: 36.71 (output) |
bart-large - BLEU: 43.47 (output) |
t5-small - BLEU: 38.45 (output) |
t5-base - BLEU: 42.54 (output) |
t5-large - BLEU: 45.80 (output) |
WebNLG |
---|
bart-base - All BLEU: 53.11 (output), Seen BLEU: 62.74 (output), Unseen BLEU: 41.53 (output) |
bart-large - All BLEU: 54.72 (output), Seen BLEU: 63.45 (output), Unseen BLEU: 43.97 (output) |
t5-small - All BLEU: 56.34 (output), Seen BLEU: 65.05 (output), Unseen BLEU: 45.37 (output) |
t5-base - All BLEU: 59.17 (output), Seen BLEU: 64.64 (output), Unseen BLEU: 52.55 (output) |
t5-large - All BLEU: 59.70 (output), Seen BLEU: 64.71 (output), Unseen BLEU: 53.67 (output) |
* BLEU values for AMR17 are calculated using sacreBLEU in detok outputs. BLEU values for WebNLG are calculated using tok outputs using the challange's script, that uses multi-bleu.perl.
More
For more details regading hyperparameters, please refer to HuggingFace.
Contact person: Leonardo Ribeiro, [email protected]
Citation
@inproceedings{ribeiro-etal-2021-investigating,
title = "Investigating Pretrained Language Models for Graph-to-Text Generation",
author = {Ribeiro, Leonardo F. R. and
Schmitt, Martin and
Sch{\"u}tze, Hinrich and
Gurevych, Iryna},
booktitle = "Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI",
month = nov,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.nlp4convai-1.20",
pages = "211--227",
abstract = "Graph-to-text generation aims to generate fluent texts from graph-based data. In this paper, we investigate two recent pretrained language models (PLMs) and analyze the impact of different task-adaptive pretraining strategies for PLMs in graph-to-text generation. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs. We show that approaches based on PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further. We report new state-of-the-art BLEU scores of 49.72 on AMR-LDC2017T10, 59.70 on WebNLG, and 25.66 on AGENDA datasets - a relative improvement of 31.8{\%}, 4.5{\%}, and 42.4{\%}, respectively, with our models generating significantly more fluent texts than human references. In an extensive analysis, we identify possible reasons for the PLMs{'} success on graph-to-text tasks. Our findings suggest that the PLMs benefit from similar facts seen during pretraining or fine-tuning, such that they perform well even when the input graph is reduced to a simple bag of node and edge labels.",
}