• Stars
    star
    188
  • Rank 204,347 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dataset and code for EMNLP2020 paper "HybridQA: A Dataset of Multi-Hop Question Answeringover Tabular and Textual Data"

HybridQA

This repository contains the dataset and code for the EMNLP2020 paper HybridQA: A Dataset of Multi-Hop Question Answeringover Tabular and Textual Data, which is the first large-scale multi-hop question answering dataset on heterogeneous data including tabular and textual data. The whole dataset contains over 70K question-answer pairs based on 13,000 tables, each table is in average linked to 44 passages, more details in https://hybridqa.github.io/.

The questions are annotated to require aggregation of information from both the table and its hyperlinked text passages, which poses challenges to existing homongeneous text-based or KB-based models.

Requirements:

Dataset Visualization

Have fun interacting with the dataset: https://hybridqa.github.io/explore.html

Released data:

The released data contains the following files:

train/dev/test.json: these files are the original files, all annotated by human.
train/dev.traced.json: these files are generated by trace_answer.py to find the answer span in the given evidences.

Preprocess data:

First of all, you should download all the tables and passages into your current folder

git clone https://github.com/wenhuchen/WikiTables-WithLinks

Then, you can either preprocess the data on your own,

python preprocessing.py

or use our preprocessed version from Amazon S3

wget https://hybridqa.s3-us-west-2.amazonaws.com/preprocessed_data.zip
unzip preprocessed_data.zip

Reproduce the reported results

Download the trained bert-base model from Amazon S3:

wget https://hybridqa.s3-us-west-2.amazonaws.com/models.zip
unzip BERT-base-uncased.zip

It will download and generate folder stage1/stage2/stage3/

Using pretrained model to run stage1/stage2:

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --stage1_model stage1/2020_10_03_22_47_34/checkpoint-epoch2 --stage2_model stage2/2020_10_03_22_50_31/checkpoint-epoch2/ --do_lower_case --predict_file preprocessed_data/dev_inputs.json --do_eval --option stage12 --model_name_or_path  bert-large-uncased

This command generates a intermediate result file

Using pretrained model to run stage3:

CUDA_VISIBLE_DEVICES=0 python train_stage3.py --model_name_or_path stage3/2020_10_03_22_51_12/checkpoint-epoch3/ --do_stage3   --do_lower_case  --predict_file predictions.intermediate.json --per_gpu_train_batch_size 12  --max_seq_length 384   --doc_stride 128 --threads 8

This command generates the prediction file

Compute the score

python evaluate_script.py predictions.json released_data/dev_reference.json

Training [Default for Bert-base-uncased model]

Train Stage1:

Running training command for stage1 using BERT-base-uncased as follows:

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --do_lower_case --do_train --train_file preprocessed_data/stage1_training_data.json --learning_rate 2e-6 --option stage1 --num_train_epochs 3.0

Or Running training command for stage1 using BERT-large-uncased as follows:

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --model_name_or_path bert-large-uncased --do_train --train_file preprocessed_data/stage1_training_data.json --learning_rate 2e-6 --option stage1 --num_train_epochs 3.0

Train Stage2:

Running training command for stage2 as follows:

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --do_lower_case --do_train --train_file preprocessed_data/stage2_training_data.json --learning_rate 5e-6 --option stage2 --num_train_epochs 3.0

Or BERT-base-cased/BERT-large-uncased like above.

Train Stage3:

Running training command for stage3 as follows:

CUDA_VISIBLE_DEVICES=0 python train_stage3.py --do_train  --do_lower_case   --train_file preprocessed_data/stage3_training_data.json  --per_gpu_train_batch_size 12   --learning_rate 3e-5   --num_train_epochs 4.0   --max_seq_length 384   --doc_stride 128  --threads 8

Or BERT-base-cased/BERT-large-uncased like above.

Model Selection for Stage1/2:

Model Selction command for stage1 and stage2 as follows:

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --do_lower_case --do_eval --option stage1 --output_dir stage1/[OWN_PATH]/ --predict_file preprocessed_data/stage1_dev_data.json

Evaluation

Model Evaluation Step1 -> Stage1/2:

Evaluating command for stage1 and stage2 as follows (replace the stage1_model and stage2_model path with your own):

CUDA_VISIBLE_DEVICES=0 python train_stage12.py --stage1_model stage1/[OWN_PATH] --stage2_model stage2/[OWN_PATH] --do_lower_case --predict_file preprocessed_data/dev_inputs.json --do_eval --option stage12

The output will be saved into predictions.intermediate.json, which contain all the answers for non hyper-linked cells, with the hyperlinked cells, we need the MRC model in stage3 to extract the span.

Model Evaluation Step2 -> Stage3:

Evaluating command for stage3 as follows (replace the model_name_or_path with your own):

CUDA_VISIBLE_DEVICES=0 python train_stage3.py --model_name_or_path stage3/[OWN_PATH] --do_stage3   --do_lower_case  --predict_file predictions.intermediate.json --per_gpu_train_batch_size 12  --max_seq_length 384   --doc_stride 128 --threads 8

The output is finally saved to predictoins.json, which can be used to calculate F1/EM with reference file.

Computing the score

python evaluate_script.py predictions.json released_data/dev_reference.json

CodaLab Evaluation

We host CodaLab challenge in HybridQA Competition, you should submit your results to the competition to obtain your testing score. The submitted file should first be named "test_answers.json" and then zipped. The required format of the submission file is described as follows:

[
  {
    "question_id": xxxxx,
    "pred": XXX
  },
  {
    "question_id": xxxxx,
    "pred": XXX
  }
]

The reported scores are EM and F1.

Recent Papers

Model Organization Reference Dev-EM Dev-F1 Test-EM Test-F1
S3HQA CASIA Lei et al. (2023) 68.4 75.3 67.9 75.5
MAFiD JBNU & NAVER Lee et al. (2023) 66.2 74.1 65.4 73.6
TACR TIT & NTU Wu et al. (2023) 64.5 71.6 66.2 70.2
UL-20B Google Tay et al. (2022) - - 61.0 -
MITQA IBM & IIT Kumar et al. (2021) 65.5 72.7 64.3 71.9
RHGN SEU Yang et al. (2022) 62.8 70.4 60.6 68.1
POINTR + MATE Google Eisenschlos et al. (2021) 63.3 70.8 62.7 70.0
POINTR + TAPAS Google Eisenschlos et al. (2021) 63.4 71.0 62.8 70.2
MuGER2 JD AI Wang et al. (2022) 57.1 67.3 56.3 66.2
DocHopper CMU Sun et al. (2021) 47.7 55.0 46.3 53.3
HYBRIDER UCSB Chen et al. (2020) 43.5 50.6 42.2 49.9
HYBRIDER-Large UCSB Chen et al. (2020) 44.0 50.7 43.8 50.6
Unsupervised-QG NUS&UCSB Pan et al. (2020) 25.7 30.5 - -

Referenece

If you find this project useful, please use the following format to cite the paper:

@article{chen2020hybridqa,
  title={HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data},
  author={Chen, Wenhu and Zha, Hanwen and Chen, Zhiyu and Xiong, Wenhan and Wang, Hong and Wang, William},
  journal={Findings of EMNLP 2020},
  year={2020}
}

Miscellaneous

If you have any question about the dataset and code, feel free to raise a github issue or shoot me an email. Thanks!

More Repositories

1

Table-Fact-Checking

Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"
Python
369
star
2

LogicNLG

The data and code for ACL2020 paper "Logical Natural Language Generation from Open-Domain Tables"
Python
163
star
3

Program-of-Thoughts

Data and Code for Program of Thoughts (TMLR 2023)
Python
154
star
4

TheoremQA

The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
Python
143
star
5

OTT-QA

Code and Data for ICLR2021 Paper "Open Question Answering over Tables and Text"
Python
142
star
6

KGPT

Code and Data for EMNLP2020 Paper "KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation"
Python
142
star
7

HDSA-Dialog

Code and Data for ACL 2019 "Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention"
Python
136
star
8

Time-Sensitive-QA

Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"
Jupyter Notebook
47
star
9

Variational-Vocabulary-Selection

Code for NAACL19 Paper "How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection"
Python
42
star
10

KB-Reasoning-Data

The FB15k and NELL-995 Dataset for NAACL18 paper "Variational Knowledge Graph Reasoning"
39
star
11

Meta-Module-Network

Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"
Python
39
star
12

Cross-Lingual-NBT

Code for EMNLP 2018 paper "XL-NBT: A Cross-lingual Neural Belief Tracking Framework"
Python
36
star
13

Semi-Supervised-Image-Captioning

Code for "bootstrap, review, decode: using out-of-domain textual data to improve image captioning"
Jupyter Notebook
20
star
14

GNN-TabFact

SOTA on TabFact: Graph Neural Network for Table-based Fact Checking
Python
18
star
15

TableCoT

The code and data used for "Large Language Models are few(1)-shot Table Reasoners"
Python
18
star
16

GPT2-Logic2Text

The code for Template-GPT-2 Generation Model for Logic2Text Dataset
Python
18
star
17

WikiTables-WithLinks

Crawled Wikipedia Tables with Passages
Python
11
star
18

ImageEval

Editing Baselines
Jupyter Notebook
4
star
19

Data-to-text-Evaluation-Metric

The metric computation script for different data to text tasks
Python
3
star
20

wenhuchen.github.io

Personal Website
HTML
2
star
21

opendomaintables.github.io

Visualization of Open Domain Tables
HTML
1
star
22

cs486-fall2024-website

Website Page for CS486-fall2024
1
star
23

Scripts

Useful Small Functions to help me deal with different scenarios
Python
1
star
24

WikiTables

The collection of WikiTables
1
star
25

setting_files

Shell
1
star