• Stars
    star
    101
  • Rank 338,166 (Top 7 %)
  • Language
    Jupyter Notebook
  • Created about 7 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for my EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach"

Simple Question Answering — EMNLP 2018

This is the code for the EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach".

On the SimpleQuestions dataset task, one of the most commonly used benchmarks for studying single-relation factoid questions, we:

  1. Show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the question alone.
  2. Introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, using only standard methods.

Example

Preview of the software

Structure

.
├── /notebooks/                          
│   ├── /Simple QA End-To-End/           # Experiments on components of the end-to-end QA pipeline
│   ├── /Simple QA Models                # Experiments on various neural models
│   ├── /Simple QA KG to PostgreSQL DB   # Scripts to populate postgreSQL
│   ├── /Simple QA Numbers               # Scripts for computing and verifying various numbers
├── /pretrained_models/                   
├── /lib/                                # Various utility functionality
├── /tests/                               
├── .flake8                               
└── requirements.txt                     # Required python packages

Prerequisites

This repository requires Python 3.5 or greater and PostgreSQL.

Installation

  • Clone the repository and cd into it
git clone https://github.com/PetrochukM/Simple-QA-EMNLP-2018.git
cd Simple-QA-EMNLP-2018
  • Install the required packages
python -m pip install -r requirements.txt
  • Create and populate a PostgreSQL table named fb_two_subject_name with notebooks/Simple QA KG to PostgreSQL DB/fb_two_subject_name.csv.gz

  • Create a .pass file using the below template:

    DB_NAME=
    DB_PORT=
    DB_USER=
    DB_HOST=
    DB_PASS=
    

    Such that:

    • DB_NAME: the database name
    • DB_USER: user name used to authenticate
    • DB_PASS: password used to authenticate
    • DB_HOST: database host address
    • DB_PORT: connection port number (typically 5432)
  • Download the SimpleQuestions v2 dataset from Facebook Research. Use the notebook at Simple-QA-EMNLP-2018/notebooks/Simple QA KG to PostgreSQL DB/FB5M & FB2M KG to DB.ipynb to create and populate a PostgreSQL table.

  • You're done! Feel free to run Simple-QA-EMNLP-2018/notebooks/Simple QA End-To-End.

Slides

The slides used for our EMNLP talk.

Citation

@article{Petrochuk2018SimpleQuestionsNS,
  title={SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach},
  author={Michael Petrochuk and Luke S. Zettlemoyer},
  journal={CoRR},
  year={2018},
  volume={abs/1804.08798}
}

Important Notes

  • The FB2M and FB5M subsets of Freebase KG can complete 7,188,636 and 7,688,234 graph queries respectively; therefore, the FB5M subset is 6.9% larger than the FB2M subset. Also, the FB5M dataset only contains 3.98M entities. This contradicts the statement that "FB5M, is much larger with about 5M entities" (Bordes et al., 2015).
  • FB5M and FB2M contain 4,322,266 and 3,654,470 duplicate grouped facts respectively.
  • FB2M is not a subset of FB5M, 1 atomic fact is in FB2M that is not in FB5M: (01g4wmh, music/album/acquire_webpage, 02q5zps).
  • FB5M and FB2M do not contain the answer for 24 and 36 examples in SimpleQuestions dataset respectively; therefore, those examples are unanswerable.

Other Important Papers

Other Important GitHub Repositories