• Stars
    star
    257
  • Rank 158,728 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Jack the Reader

Jack the Reader Wercker build badge codecov Gitter license

A Machine Reading Comprehension framework.
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!

Jack the Reader - or jack, for short - is a framework for building and using models on a variety of tasks that require reading comprehension. For more informations about the overall architecture, we refer to Jack the Reader – A Machine Reading Framework (ACL 2018).

Installation

To install Jack, install requirements and TensorFlow. In case you want to use PyTorch for writing models, please install PyTorch as well.

Supported ML Backends

We currently support TensorFlow and PyTorch. Readers can be implemented using both. Input and output modules (i.e., pre- and post-processing) are independent of the ML backend and can thus be reused for model modules that either backend. Though most models are implemented in TensorFlow by reusing the cumbersome pre- and post-processing it is easy to quickly build new readers in PyTorch as well.

Pre-trained Models

Find pre-trained models here.

Code Structure

  • jack.core - core abstractions used
  • jack.readers - implementations of models
  • jack.eval - task evaluation code
  • jack.util - utility code that is used throughout the framework, including shared ML code
  • jack.io - IO related code, including loading and dataset conversion scripts

Projects

Quickstart

Coding Tutorials - Notebooks & CLI

We provide ipython notebooks with tutorials on Jack. For the quickest start, you can begin here. If you're interested in training a model yourself from code, see this tutorial (we recommend the command-line, see below), and if you'd like to implement a new model yourself, this notebook gives you a tutorial that explains this process in more detail.

There is documentation on our command-line interface for actually training and evaluating models. For a high-level explanation of the ideas and vision, see Understanding Jack the Reader.

Command-line Training and Usage of a QA System

To illustrate how jack works, we will show how to train a question answering model using our command-line interface which is analoguous for other tasks (browse conf/ for existing task-dataset configurations). It is probably best to setup a virtual environment to avoid clashes with system wide python library versions.

First, install the framework:

$ python3 -m pip install -e .[tf]

Then, download the SQuAD dataset, and the GloVe word embeddings:

$ ./data/SQuAD/download.sh
$ ./data/GloVe/download.sh

Train a FastQA model:

$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or shorter, using our prepared config:

$ python3 bin/jack-train.py with config='./conf/qa/squad/fastqa.yaml'

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. quickstart.

You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/squad/ (e.g., bidaf.yaml or our own creation jack_qa.yaml). These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel.

If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ wget -O fastqa.zip https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader
from jack import readers
from jack.core import QASetting

fastqa_reader = readers.reader_from_file("./fastqa_reader")

support = """"It is a replica of the grotto at Lourdes,
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858.
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome),
is a simple, modern stone statue of Mary."""

answers = fastqa_reader([QASetting(
    question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    support=[support]
)])

print(answers[0][0].text)

Support

We are thankful for support from:

Developer guidelines

$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]

Citing

@InProceedings{weissenborn2018jack,
author    = {Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel},
title     = {{Jack the Reader – A Machine Reading Framework}},
booktitle = {{Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) System Demonstrations}},
Month     = {July},
year      = {2018},
url       = {https://arxiv.org/abs/1806.08727}
}

More Repositories

1

stat-nlp-book

Interactive Lecture Notes, Slides and Exercises for Statistical NLP
Jupyter Notebook
269
star
2

egal

easy drawing in jupyter
JavaScript
257
star
3

torch-imle

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Python
257
star
4

emoji2vec

emoji2vec: Learning Emoji Representations from their Description
Jupyter Notebook
257
star
5

fakenewschallenge

UCL Machine Reading - FNC-1 Submission
Python
166
star
6

pycodesuggest

Learning to Auto-Complete using RNN Language Models
Python
156
star
7

cqd

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs
Python
95
star
8

ntp

End-to-End Differentiable Proving
NewLisp
88
star
9

d4

Differentiable Forth Interpreter
Python
66
star
10

low-rank-logic

Code for Injecting Logical Background Knowledge into Embeddings for Relation Extraction
Scala
65
star
11

inferbeddings

Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation
Python
59
star
12

gntp

Python
57
star
13

ctp

Conditional Theorem Proving
Python
51
star
14

EMAT

Efficient Memory-Augmented Transformers
Python
34
star
15

stat-nlp-book-scala

Interactive book on Statistical NLP
Scala
32
star
16

simpleNumericalFactChecker

Fact checker for simple claims about statistical properties
Python
26
star
17

adversarial-nli

Code and data for the CoNLL 2018 paper "Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge."
Python
25
star
18

acl2015tutorial

Moro files for the ACL 2015 Tutorial on Matrix and Tensor Factorization Methods for Natural Language Processing
Scala
20
star
19

numerate-language-models

Python
19
star
20

fever

FEVER Workshop Shared-Task
Python
16
star
21

APE

Adaptive Passage Encoder for Open-domain Question Answering
Python
15
star
22

stat-nlp-course

Code for the UCL Statistical NLP course
Scala
11
star
23

newshack

BBC Newshack code
Scala
1
star
24

eqa-tools

Tools for Exam Question Answering
Python
1
star
25

softconf-start-sync

Softconf START sync, tool for Google Sheets
JavaScript
1
star
26

bibtex

BibTeX files
TeX
1
star