• Stars
    star
    179
  • Rank 214,039 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open R-NET (hy` առնետ 🐁) implementation and detailed analysis: https://git.io/vd8dx

R-NET implementation in Keras

This repository is an attempt to reproduce the results presented in the technical report by Microsoft Research Asia. The report describes a complex neural network called R-NET designed for question answering.

This blogpost describes the details.

R-NET is currently (August 25, 2017) the best single model on the Stanford QA database: SQuAD. SQuAD dataset uses two performance metrics, exact match (EM) and F1-score (F1). Human performance is estimated to be EM=82.3% and F1=91.2% on the test set.

The report describes two versions of R-NET:

  1. The first one is called R-NET (Wang et al., 2017) (which refers to a paper which not yet available online) and reaches EM=71.3% and F1=79.7% on the test set. It consists of input encoders, a modified version of Match-LSTM, self-matching attention layer (the main contribution of the paper) and a pointer network.
  2. The second version called R-NET (March 2017) has one additional BiGRU between the self-matching attention layer and the pointer network and reaches EM=72.3% and F1=80.7%.

The current best single-model on SQuAD leaderboard has a higher score, which means R-NET development continued after March 2017. Ensemble models reach higher scores.

This repository contains an implementation of the first version, but we cannot yet reproduce the reported results. The best performance we got so far was EM=57.52% and F1=67.42% on the dev set. We are aware of a few differences between our implementation and the network described in the paper:

  1. The first formula in (11) of the report contains a strange summand W_v^Q V_r^Q. Both tensors are trainable and are not used anywhere else in the network. We have replaced this product with a single trainable vector.
  2. The size of the hidden layer should 75 according to the report, but we get better results with a lower number. Overfitting is huge with 75 neurons.
  3. We are not sure whether we applied dropout correctly.
  4. There is nothing about weight initialization or batch generation in the report.
  5. Question-aware passage representation generation (probably) should be done by a bidirectional GRU.

On the other hand we can't rule out that we have bugs in our code.

Instructions (make sure you are running Keras version 2.0.6)

  1. We need to parse and split the data
python parse_data.py data/train-v1.1.json --train_ratio 0.9 --outfile data/train_parsed.json --outfile_valid data/valid_parsed.json
python parse_data.py data/dev-v1.1.json --outfile data/dev_parsed.json
  1. Preprocess the data
python preprocessing.py data/train_parsed.json --outfile data/train_data_str.pkl --include_str
python preprocessing.py data/valid_parsed.json --outfile data/valid_data_str.pkl --include_str
python preprocessing.py data/dev_parsed.json --outfile data/dev_data_str.pkl --include_str
  1. Train the model
python train.py --hdim 45 --batch_size 50 --nb_epochs 50 --optimizer adadelta --lr 1 --dropout 0.2 --char_level_embeddings --train_data data/train_data_str.pkl --valid_data data/valid_data_str.pkl
  1. Predict on dev/test set samples
python predict.py --batch_size 100 --dev_data data/dev_data_str.pkl models/31-t3.05458271443-v3.27696280528.model prediction.json

Our best model can be downloaded from Release v0.1: https://github.com/YerevaNN/R-NET-in-Keras/releases/download/v0.1/31-t3.05458271443-v3.27696280528.model

More Repositories

1

mimic3-benchmarks

Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.
Python
799
star
2

Dynamic-memory-networks-in-Theano

Implementation of Dynamic memory networks by Kumar et al. http://arxiv.org/abs/1506.07285
Python
333
star
3

Spoken-language-identification

Spoken language identification with deep learning
Python
233
star
4

A-Guide-to-Deep-Learning

📚 A detailed guide to deep learning: http://yerevann.com/a-guide-to-deep-learning/
HTML
217
star
5

translit-rnn

Automatic transliteration with LSTM
Python
92
star
6

WARP

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/
Python
83
star
7

DIIN-in-Keras

Reproducing Densely Interactive Inference Network in Keras
Python
74
star
8

neural-colorizer

Convolutional autoencoder to colorize greyscale images
Python
43
star
9

BARTSmiles

BARTSmiles, generative masked language model for molecular representations
Python
30
star
10

ChemLactica

Fine-tuning Galactica and Gemma to operate on SMILES. Integrates into a molecular optimization algorithm.
Jupyter Notebook
20
star
11

BioRelEx

🧬 BioRelEx: Biological Relation Extraction Benchmark @ ACL BioNLP Workshop 2019
Python
19
star
12

dmn-ui

UI for Dynamic Memory Networks
JavaScript
15
star
13

yerevann.github.io

YerevaNN blog
CSS
14
star
14

SciERC

A fork of https://bitbucket.org/luanyi/scierc/src
Python
14
star
15

PARASITE

🪱 PARASITE || A parallel sentence data preprocessing toolkit. Originally developed as a part of the `en-ru` winner submission of WMT20 Biomedical Translation Task.
Python
11
star
16

Relation-extraction-pipeline

Pipelines that combine different modules to perform relation extraction
Python
9
star
17

RaSoR-in-Tensorflow

The implementation of one of the SQuAD solutions
Python
7
star
18

armtreebank

Armenian Treebank http://armtreebank.yerevann.com/
Python
6
star
19

word2vec-armenian-wiki

Testing word2vec on Armenian Wikipedia
C
6
star
20

Caffe-python-tools

Some tools written in Python to work with Caffe
Python
4
star
21

SSL-playground

Python
4
star
22

zsee

Zero Shot Event Extraction - Making pretrained sentence encoders more multilingual and language-agnostic. Works best (at the moment) with YerevaNN's internal version of allennlp.
Python
4
star
23

Molecular_Generation_with_GDB13

Jupyter Notebook
3
star
24

Kaggle-diabetic-retinopathy-detection

Scripts used in Kaggle Diabetic retionpathy detection contest by YerevaNN team
Mathematica
3
star
25

NLOS-Localization-WAIR-D

Python
3
star
26

pmi

Fast pointwise mutual information implementation in C++
C++
3
star
27

RelationClassification

Python
2
star
28

dmn-docker

Dockerfile for starting DMN with UI
2
star
29

hyper-language-identification

Python
2
star
30

amr_seq2seq

Python
2
star
31

dom-gen-failure-modes

Python
1
star
32

char-rnn-constitution

Shell
1
star
33

NN-in-Armenian

Presentation and other stuff on Neural networks in Armenian
1
star
34

JointUD

🚬 JointUD - Universal Dependencies | Part-of-Speech tagging, Morphological parsing and Lemmatization
Python
1
star
35

BioER

Biological entity recognition
Jupyter Notebook
1
star
36

yarx

YARX - Yet Another Relation eXtraction framework, based on SciIE architecture and AllenNLP framework
Python
1
star
37

docker-cudnn-theano

Docker image for Theano with Ubuntu 16.04 + CUDA 8.0 + cuDNN 7
Dockerfile
1
star