• Stars
    star
    135
  • Rank 260,812 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created about 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A PyTorch implementation of Mnemonic Reader for the Machine Comprehension task

Mnemonic Reader

The Mnemonic Reader is a deep learning model for Machine Comprehension task. You can get details from this paper. It combines advantages of match-LSTM, R-Net and Document Reader and utilizes a new unit, the Semantic Fusion Unit (SFU), to achieve state-of-the-art results (at that time).

This model is a PyTorch implementation of Mnemonic Reader. At the same time, a PyTorch implementation of R-Net and a PyTorch implementation of Document Reader are also included to compare with the Mnemonic Reader. Pretrained models are also available in release.

This repo belongs to HKUST-KnowComp and is under the BSD LICENSE.

Some codes are implemented based on DrQA.

Please feel free to contact with Xin Liu ([email protected]) if you have any question about this repo.

Evaluation on SQuAD

Model DEV_EM DEV_F1
Document Reader (original paper) 69.5 78.8
Document Reader (trained model) 69.4 78.6
R-Net (original paper 1) 71.1 79.5
R-Net (original paper 2) 72.3 80.6
R-Net (trained model) 70.2 79.4
Mnemonic Reader (original paper) 71.8 81.2
Mnemonic Reader + RL (original paper) 72.1 81.6
Mnemonic Reader (trained model) 73.2 81.5

EM_F1

Requirements

  • Python >= 3.4
  • PyTorch >= 0.31
  • spaCy >= 2.0.0
  • tqdm
  • ujson
  • numpy
  • prettytable

Prepare

First of all, you need to download the dataset and pre-trained word vectors.

mkdir -p data/datasets
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -O data/datasets/SQuAD-train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O data/datasets/SQuAD-dev-v1.1.json
mkdir -p data/embeddings
wget http://nlp.stanford.edu/data/glove.840B.300d.zip -O data/embeddings/glove.840B.300d.zip
cd data/embeddings
unzip glove.840B.300d.zip

Then, you need to preprocess these data.

python script/preprocess.py data/datasets data/datasets --split SQuAD-train-v1.1
python script/preprocess.py data/datasets data/datasets --split SQuAD-dev-v1.1

If you want to use multicores to speed up, you could add --num-workers 4 in commands.

Train

There are some parameters to set but default values are ready. If you are not interested in tuning parameters, you can use default values. Just run:

python script/train.py

After several hours, you will get the model in data/models/, e.g. 20180416-acc9d06d.mdl and you can see the log file in data/models/, e.g. 20180416-acc9d06d.txt.

Predict

To evaluate the model you get, you should complete this part.

python script/predict.py --model data/models/20180416-acc9d06d.mdl

You need to change the model name in the command above.

You will not get results directly but to use the official evaluate-v1.1.py in data/script.

python script/evaluate-v1.1.py data/predict/SQuAD-dev-v1.1-20180416-acc9d06d.preds data/datasets/SQuAD-dev-v1.1.json

Interactivate

In order to help those who are interested in QA systems, script/interactivate.py provides an easy but good demo.

python script/interactivate.py --model data/models/20180416-acc9d06d.mdl

Then you will drop into an interactive session. It looks like:

* Interactive Module *

* Repo: Mnemonic Reader (https://github.com/HKUST-KnowComp/MnemonicReader)

* Implement based on Facebook's DrQA

>>> process(document, question, candidates=None, top_n=1)
>>> usage()

>>> text="Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \"Venite Ad Me Omnes\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary."
>>> question = "To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?"
>>> process(text, question)

+------+----------------------------+-----------+
| Rank |            Span            |   Score   |
+------+----------------------------+-----------+
|  1   | Saint Bernadette Soubirous | 0.9875301 |
+------+----------------------------+-----------+

More parameters

If you want to tune parameters to achieve a higher score, you can get instructions about parameters via using

python script/preprocess.py --help
python script/train.py --help
python script/predict.py --help
python script/interactivate.py --help

License

All codes in Mnemonic Reader are under BSD LICENSE.

More Repositories

1

R-Net

Tensorflow Implementation of R-Net
Python
581
star
2

ASER

ASER (Activities, States, Events, and their Relations): a large-scale weighted eventuality knowledge graph.
Python
291
star
3

FMG

KDD17_FMG
MATLAB
136
star
4

DeepGraphCNNforTexts

Python
129
star
5

JWE

Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components
C
99
star
6

MNE

Source Code for IJCAI 2018 paper "Scalable Multiplex Network Embedding"
C
71
star
7

TransOMCS

TransOMCS is a commonsense knowledge resource transferred from ASER. It is in the format of OMCS but two orders of magnitude larger.
Python
69
star
8

MLMA_hate_speech

Dataset and code of our EMNLP 2019 paper "Multilingual and Multi-Aspect Hate Speech Analysis"
Python
54
star
9

NeuralSubgraphCounting

Source Code for KDD 2020 paper "Neural Subgraph Isomorphism Counting"
Python
46
star
10

DISCOS-commonsense

Codes for the WWW2021 paper: DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge (https://arxiv.org/abs/2101.00154).
Python
43
star
11

DMSC

This repository is for the paper "Document-Level Multi-Aspect Sentiment Classification as Machine Comprehension"
Python
42
star
12

RINANTE

Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision
Python
38
star
13

Motif-based-PageRank

Python
34
star
14

semihin

source code for the paper "Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks"
Python
31
star
15

FKGE

Code for CIKM 2021 paper: Differentially Private Federated Knowledge Graphs Embedding (https://arxiv.org/abs/2105.07615)
Python
29
star
16

Knowledge-Constrained-Decoding

Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection"
Python
24
star
17

CSKB-Population

Codes for the EMNLP2021 paper: Benchmarking Commonsense Knowledge Base Population (https://aclanthology.org/2021.emnlp-main.705.pdf). An updated version CKBP v2 (https://arxiv.org/pdf/2304.10392.pdf)
Python
24
star
18

Visual_PCR

Dataset and Source code for EMNLP 2019 paper "What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues"
Python
23
star
19

GEIA

Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
Python
23
star
20

atomic-conceptualization

Code and data for the paper Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization
Python
22
star
21

FolkScope

Codes and Datasets for the ACL2023 Findings Paper: FolkScope: Intention Knowledge Graph Construction for Discovering E-commerce Commonsense
Python
21
star
22

COLA

Official code repository for the main conference paper in ACL2023: COLA: Contextualized Commonsense Causality Reasoning from the Causal Inference Perspective
Python
21
star
23

DualMessagePassing

Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"
Python
21
star
24

VWS-DMSC

Python
21
star
25

BMGF-RoBERTa

Source Code for IJCAI 2020 paper "On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification"
Python
20
star
26

MLMET

Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model
Python
19
star
27

EFO-1-QA-benchmark

Benchmark for Answering Existential First Order Queries with Single Free Variable (NeurIPS dataset and benchmark 2021)
Python
18
star
28

DummyNode4GraphLearning

Source Code for ICML 2022 paper "Boosting Graph Structure Learning with Dummy Nodes"
Python
18
star
29

WinoWhy

WinoWhy provides human-annotated reasons for answering WSC questions.
Python
17
star
30

cfet

Python
16
star
31

Pronoun-Coref-KG

Python
16
star
32

IFETEL

Improving Fine-grained Entity Typing with Entity Linking
Python
16
star
33

Pronoun-Coref

Python
14
star
34

Social-Explorative-Attention-Networks

Python
14
star
35

MoHINRec

Python
14
star
36

LMPNN

Logical Message Passing Networks with One-hop Inference in Atomic Formulas (ICLR 2023)
Python
13
star
37

PathPredictionForTextClassification

Source code for WWW 2019 paper "Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification"
Python
13
star
38

HeteSpaceyWalk

Python
12
star
39

SocializedWordEmbeddings

Python
12
star
40

ComHyper

Code for EMNLP'20 paper "When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models"
Python
11
star
41

ELWSMP

Entity Linking within a Social Media Platform
Python
11
star
42

SubeventWriter

Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Coherence Controller
Python
11
star
43

ComplexHyperbolicKGE

Source code for the paper 'Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform'.
Python
11
star
44

SP-10K

SP-10K is a large-scale human-annotated selectional preference set. Five selectional preference relations are included.
11
star
45

PseudoReasoner

Official code repository for Findings of EMNLP 2022 paper: PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base Population
Python
11
star
46

NRN

Python
11
star
47

CAT

Codes for the ACL2023 main conference paper: CAT: A Contextualized Conceptualization and Instantiation Framework for Commonsense Reasoning (https://arxiv.org/pdf/2305.04808.pdf).
Python
10
star
48

WFRE

Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport (Findings-ACL 2023)
Python
10
star
49

AbsPyramid

Official code repository for the paper: AbsPyramid: Benchmarking the Abstration Ability of Language Models with a Unified Entailment Graph
Python
10
star
50

MICO

This is the code repo for Findings of EMNLP2022 paper: MICO: a multi-alternative contrastive learning framework for commonsense knowledge representation
Python
10
star
51

WDDC

Source code for NAACL 2022 paper Weakly Supervised Text Classification using Supervision Signals from a Language Mode
Python
10
star
52

query2particles

query2particles
Python
9
star
53

SRBRW

Source Code for IJCAI 2018 paper "Biased Random Walk based Social Regularization for Word Embeddings"
C
9
star
54

C2

The implementation for the paper:
Python
8
star
55

VD-PCR

Source code for paper "VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution"
Python
8
star
56

RE-RegDVAE

Source code for paper Relation Discovery with Out-of-Relation Knowledge Base as Supervision
Python
8
star
57

ASER-EEG

Eventuality Entailment Graph built on ASER
Python
8
star
58

FisherDA

code for "Fisher Deep Domain Adaptation"
Python
8
star
59

DiscoPrompt

Codes for the ACL2023 paper: DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition
Python
8
star
60

ConstraintChecker

Official code repository for the EACL2024 paper "ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases"
Jupyter Notebook
7
star
61

CAR

Codes for the EMNLP2023 Findings paper: CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering (https://aclanthology.org/2023.findings-emnlp.902.pdf).
Python
7
star
62

SQE

Python
7
star
63

CEQA

Official Implementation of paper: Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints
Python
7
star
64

QaDynamics

Codes for the EMNLP2023 Findings paper: QaDynamics: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering.
Python
6
star
65

Persona_leakage_and_defense_in_GPT-2

Code for NAACL 2022 paper "You Donโ€™t Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakersโ€™ Private Personas"
Python
6
star
66

EFOK-CQA

EFOK-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation
Python
6
star
67

Vis_Causal

Python
6
star
68

NLI4CT

Codes for the SemEval2023 paper: KnowComp at SemEval-2023 Task 7: Fine-tuning Pretrained Language Models for Clinical Trial Entailment Identification.
Python
5
star
69

PCR

Codes for the survey paper on pronoun coreference resolution
Python
5
star
70

Fair_HIN

This code is for the ICWSM 2021 paper: Fair Representation Learning for Heterogeneous Information Networks http://arxiv.org/abs/2104.08769
Python
5
star
71

Exo-PCR

Source code for EMNLP 2021 paper "Exophoric Pronoun Resolution in Dialogues with Topic Regularization"
Python
5
star
72

UnitBall

Source code for the paper 'Unit Ball Model for Embedding Hierarchical Structures in the Complex Hyperbolic Space'.
Python
5
star
73

HPHG

Source Code for "Hyper-Path-Based Representation Learning for Hyper-Networks" (CIKM '19)
Python
5
star
74

GOLD

Codes for the EMNLP2023 Findings paper: Gold: A Global and Local-aware Denoising Framework for Commonsense Knowledge Graph Noise Detection
Python
5
star
75

FIT

Fuzzy Inference with Truth value
Python
4
star
76

Probing_toxicity_in_PTLMs

Probing Toxic Content in Large Pretrained Language Models
Python
4
star
77

DisCOC

Source Code for ACL 2021 paper "Exploring Discourse Structures for Argument Impact Classification"
Jupyter Notebook
4
star
78

LLM-Multistep-Jailbreak

Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
Python
4
star
79

Alpha-PACE

Codes for the AACL2023 paper: Self-Consistent Narrative Prompts on Abductive Natural Language Inference
Python
4
star
80

BEKG

Source Code for paper: BEKG: A Built Environment Knowledge Graph
Python
4
star
81

TILFA

Codes for the EMNLP 2023 workshop paper: TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining
Python
3
star
82

HS_Bias_Eval

Data and code of our EMNLP 2020 paper "Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets".
Python
3
star
83

MWE

Python
3
star
84

GeoAlign

Source code for AKBC 2021 paper "Manifold Alignment across Geometric Spaces for Knowledge Base Representation Learning"
Python
2
star
85

SLT

Codes for the WMT@EMNLP2023 workshop paper: KnowComp Submission for WMT23 Sign Language Translation Task
Python
2
star
86

CODC-Dialogue-Summarization

Codes of the AKBC 2021 paper: Do Boat and Ocean Suggest Beach? Dialogue Summarization with External Knowledge
Python
2
star
87

NeuPath

The codes for CIKM 2021 paper "Neural PathSim for Inductive Similarity Search in Heterogeneous Information Networks"
Python
2
star
88

HMTGIN

Python
2
star
89

FAPAT

Source Code for NeurIPS 2023 paper "Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns"
Jupyter Notebook
2
star
90

AttackTransferLearning

Code for KDD20 paper "Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning"
Python
1
star
91

PrivateGraphEncoder

Source Code for CIKM2023 paper "Independent Distribution Regularization for Private Graph Embedding"
Python
1
star
92

SRFET

Exploiting Semantic Relations for Fine-grained Entity Typing
Python
1
star
93

EventGround

Python
1
star
94

Co2LM

Source code for CoCoLM: Complex Commonsense Enhanced Language Models with Language Models
Python
1
star
95

VWS-PR

This code is for the EACL 2021 paper: Variational Weakly Supervised Sentiment Analysis with Posterior Regularization
Python
1
star
96

PCR4ALL

This is the github repo for LREC 2022 paper "PCR4ALL: A Comprehensive Evaluation Benchmark for Pronoun Coreference Resolution in English".
1
star
97

Constraints-with-Prompting-for-Zero-Shot-EAC

Code for EACL 2023 (Findings) paper "Global Constraints with Prompting for Zero-Shot Event Argument Classification".
Python
1
star