ToTTo Dataset

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.

We hope this dataset can serve as a useful research benchmark for high-precision conditional text generation.

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

@inproceedings{parikh2020totto,
  title={{ToTTo}: A Controlled Table-To-Text Generation Dataset},
  author={Parikh, Ankur P and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
  booktitle={Proceedings of EMNLP},
  year={2020}
 }

Getting Started

Download the ToTTo data

The ToTTo dataset is released under the Creative Commons Share-Alike 3.0 license.

To download the data from the command line:

 wget https://storage.googleapis.com/totto-public/totto_data.zip
 unzip totto_data.zip

(or alternatively copy the above url into your browser address bar.)

Inside the totto_data directory you should see three files: totto_train_data.jsonl, totto_dev_data.jsonl, and unlabeled_totto_test_data.jsonl for the training, development, and unlabeled test sets respectively.

Download the evaluation scripts

You can find evaluation scripts and some exploratory processing scripts at this repository. It also includes a separate README file with instruction on how to run the evaluation.

Dataset Description

The ToTTo dataset consists of three .jsonl files, where each line is a JSON dictionary with the following format:

{
  "table_page_title": "'Weird Al' Yankovic",
  "table_webpage_url": "https://en.wikipedia.org/wiki/%22Weird_Al%22_Yankovic",
  "table_section_title": "Television",
  "table_section_text": "",
  "table": "[Described below]",
  "highlighted_cells": [[22, 2], [22, 3], [22, 0], [22, 1], [23, 3], [23, 1], [23, 0]],
  "example_id": 12345678912345678912,
  "sentence_annotations": [{"original_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Mr. Peanutbutter's brother, Captain Peanutbutter, and was hired to voice the lead role in the 2016 Disney XD series Milo Murphy's Law.",
                  "sentence_after_deletion": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired to the lead role in the 2016 series Milo Murphy's Law.",
                  "sentence_after_ambiguity": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired for the lead role in the 2016 series Milo Murphy's 'Law.",
                  "final_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter and was hired for the lead role in the 2016 series Milo Murphy's Law."}],
}

The table field is a List[List[Dict]]. The outer lists represents rows and the inner lists columns. Each Dict has the fields column_span: int, is_header: bool, row_span: int, and value: str. The first two rows for the example above look as follows:

[
  [
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Year"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Title"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Role"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Notes"}
  ],
  [
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "1997"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Eek! The Cat"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Himself"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Episode: 'The FugEektive'"}
  ], ...
]

-The table metadata consists of the table_page_title, table_section_title, and table_section_text strings to help give the model more context about the table.

-The highlighted_cells field is a List[[row_index, column_index]] where each [row_index, column_index] pair indicates that table[row_index][column_index] is highlighted.

-The example_id is simply a unique id for this example.

-The sentence_annotations field consists of the original sentence and the sequence of revised sentences performed in order to produce the final_sentence. See our paper for more details.

To help understand the dataset, you can find a sample of the train and dev sets in the sample/ folder of our supplementary repository. It additionally provides the create_table_to_text_html.py script that visualizes examples, the output of which you can also find in the sample/ folder.

Official Task

The official task described in our paper is given the table, highlighted cells, and table metadata (table_page_title, table_section_title, and table_section_text) as input, to generate the final_sentence.

Dev and Test Set

The dev and test set have between two and three references for each example, which are added to the list at the sentence_annotations key. The test set annotations are private and thus not included in the data.

If you want us to evaluate your model on the development or the private test set, please submit your files here. You can find more submission information below. By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

We provide two splits within the dev and test sets - one uses previously seen combinations of table headers and one uses unseen combinations. The sets are marked using the overlap_subset: bool flag that is added to the JSON representation. By filtering the evaluation to examples with the flag set to true, you will be able to test the generalization ability of your model.

Leaderboard

We are maintaining a leaderboard with official results on our test set.

The leaderboard indicates whether or not a model was trained on any auxiliary Wikipedia data. This is because our tables and (unrevised) test targets are from Wikipedia and thus we would like to study the effect of using additional Wikipedia data to train models.

We ask you to not incorporate any part of the ToTTo development set into the training data, and only use it for validation/hyperparameter tuning as development sets are typically used.

In addition to BLEU and PARENT, we also report a learnt metric BLEURT. The checkpoint used was BLEURT-base-128 which can be found here. To handle multiple references, we take the average of the scores as suggested by Sellam et al. 2020.

			Overall			Overlap Subset			Non-Overlap Subset
Model	Link	Uses Wiki	BLEU	PARENT	BLEURT	BLEU	PARENT	BLEURT	BLEU	PARENT	BLEURT
LATTICE	[Wang et al. 2022]	yes	48.4	58.1	0.222	56.1	62.4	0.345	40.4	53.9	.099
SKY	in preparation	yes	49.9	59.8	0.212	57.8	64.0	0.334	42.0	55.7	0.091
CoNT	[An et al., 2022]	yes	49.1	58.9	0.238	56.7	63.2	0.355	41.3	54.6	0.121
Supervised+NLPO	[Ramamurthy et al. 2022]	yes	47.4	59.6	0.192	55.0	64.3	0.315	39.2	55.0	0.068
Anonymous 3	in preparation	yes	49.3	58.8	0.235	57.1	63.4	0.358	41.5	54.1	0.112
ProEdit	Paper in preparation	yes	48.6	59.18	0.202	55.9	63.3	0.325	41.3	55.1	0.078
Anonymous 2	Paper in preparation	yes	49.4	59.0	0.253	57.0	62.9	0.370	41.7	55.1	0.136
PlanGen (University of Cambridge, Apple)	[Su et al. 2021]	yes	49.2	58.7	0.249	56.9	62.8	0.371	41.5	54.6	0.126
T5-based (Google)	[Kale, 2020]	yes	49.5	58.4	0.230	57.5	62.6	0.351	41.4	54.2	0.1079
BERT-to-BERT (Wiki+Books)	[Rothe et al., 2019]	yes	44.0	52.6	0.121	52.7	58.4	0.259	35.1	46.8	-0.017
BERT-to-BERT (Books)	[Rothe et al., 2019]	no	43.9	52.6	0.104	52.7	58.4	0.255	34.8	46.7	-0.046
Pointer Generator	[See et al., 2017]	no	41.6	51.6	0.076	50.6	58.0	0.244	32.2	45.2	-0.0922
Content Planner	[Puduppully et al., 2019]	no	19.2	29.2	-0.576	24.5	32.5	-0.491	13.9	25.8	-0.662

Leaderboard Submission

If you want to submit dev and test outputs, please format your predictions as a single .txt file with line-separated predictions. The predictions should be in the same order as the examples in the test.jsonl file. You can upload your prediction files here and email us at [email protected] to tell us you have submitted. By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

google-research-datasets/ToTTo

google-research-datasets

Reviews

Repository Details