• Stars
    star
    121
  • Rank 285,278 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

SPRING

PWC

PWC

PWC

PWC

This is the repo for SPRING (Symmetric ParsIng aNd Generation), a novel approach to semantic parsing and generation, presented at AAAI 2021.

With SPRING you can perform both state-of-the-art Text-to-AMR parsing and AMR-to-Text generation without many cumbersome external components. If you use the code, please reference this work in your paper:

@inproceedings{bevilacqua-etal-2021-one,
    title = {One {SPRING} to Rule Them Both: {S}ymmetric {AMR} Semantic Parsing and Generation without a Complex Pipeline},
    author = {Bevilacqua, Michele and Blloshmi, Rexhina and Navigli, Roberto},
    booktitle = {Proceedings of AAAI},
    year = {2021}
}

Pretrained Checkpoints

Here we release our best SPRING models which are based on the DFS linearization.

Text-to-AMR Parsing

AMR-to-Text Generation

If you need the checkpoints of other experiments in the paper, please send us an email.

Installation

cd spring
pip install -r requirements.txt
pip install -e .

The code only works with transformers < 3.0 because of a disrupting change in positional embeddings. The code works fine with torch 1.5. We recommend the usage of a new conda env.

Train

Modify config.yaml in configs. Instructions in comments within the file. Also see the appendix.

Text-to-AMR

python bin/train.py --config configs/config.yaml --direction amr

Results in runs/

AMR-to-Text

python bin/train.py --config configs/config.yaml --direction text

Results in runs/

Evaluate

Text-to-AMR

python bin/predict_amrs.py \
    --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \
    --gold-path data/tmp/amr2.0/gold.amr.txt \
    --pred-path data/tmp/amr2.0/pred.amr.txt \
    --checkpoint runs/<checkpoint>.pt \
    --beam-size 5 \
    --batch-size 500 \
    --device cuda \
    --penman-linearization --use-pointer-tokens

gold.amr.txt and pred.amr.txt will contain, respectively, the concatenated gold and the predictions.

To reproduce our paper's results, you will also need need to run the BLINK entity linking system on the prediction file (data/tmp/amr2.0/pred.amr.txt in the previous code snippet). To do so, you will need to install BLINK, and download their models:

git clone https://github.com/facebookresearch/BLINK.git
cd BLINK
pip install -r requirements.txt
sh download_blink_models.sh
cd models
wget http://dl.fbaipublicfiles.com/BLINK//faiss_flat_index.pkl
cd ../..

Then, you will be able to launch the blinkify.py script:

python bin/blinkify.py \
    --datasets data/tmp/amr2.0/pred.amr.txt \
    --out data/tmp/amr2.0/pred.amr.blinkified.txt \
    --device cuda \
    --blink-models-dir BLINK/models

To have comparable Smatch scores you will also need to use the scripts available at https://github.com/mdtux89/amr-evaluation, which provide results that are around ~0.3 Smatch points lower than those returned by bin/predict_amrs.py.

AMR-to-Text

python bin/predict_sentences.py \
    --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \
    --gold-path data/tmp/amr2.0/gold.text.txt \
    --pred-path data/tmp/amr2.0/pred.text.txt \
    --checkpoint runs/<checkpoint>.pt \
    --beam-size 5 \
    --batch-size 500 \
    --device cuda \
    --penman-linearization --use-pointer-tokens

gold.text.txt and pred.text.txt will contain, respectively, the concatenated gold and the predictions. For BLEU, chrF++, and Meteor in order to be comparable you will need to tokenize both gold and predictions using JAMR tokenizer. To compute BLEU and chrF++, please use bin/eval_bleu.py. For METEOR, use https://www.cs.cmu.edu/~alavie/METEOR/ . For BLEURT don't use tokenization and run the eval with https://github.com/google-research/bleurt. Also see the appendix.

Linearizations

The previously shown commands assume the use of the DFS-based linearization. To use BFS or PENMAN decomment the relevant lines in configs/config.yaml (for training). As for the evaluation scripts, substitute the --penman-linearization --use-pointer-tokens line with --use-pointer-tokens for BFS or with --penman-linearization for PENMAN.

License

This project is released under the CC-BY-NC-SA 4.0 license (see LICENSE). If you use SPRING, please put a link to this repo.

Acknowledgements

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 and the ELEXIS project No. 731015 under the European Unionโ€™s Horizon 2020 research and innovation programme.

This work was supported in part by the MIUR under the grant "Dipartimenti di eccellenza 2018-2022" of the Department of Computer Science of the Sapienza University of Rome.

More Repositories

1

extend

Entity Disambiguation as text extraction (ACL 2022)
Python
164
star
2

ewiser

A Word Sense Disambiguation system integrating implicit and explicit external knowledge.
Python
66
star
3

consec

Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)
Python
50
star
4

esc

ESC: Redesigning WSD with Extractive Sense Comprehension
Python
23
star
5

gsrl

GSRL is a seq2seq model for end-to-end dependency- and span-based SRL (IJCAI2021).
Python
18
star
6

unify-srl

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).
Python
17
star
7

mcl-wic

Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task
17
star
8

xl-amr

XL-AMR is a sequence-to-graph cross-lingual AMR parser that exploits transfer learning (EMNLP2020).
Python
16
star
9

usea

Universal Semantic Annotator (LREC 2022)
Shell
15
star
10

wsd-hard-benchmark

Data and code for "Nibbling at the Hard Core of Word Sense Disambiguation" (ACL 2022).
Python
13
star
11

genesis

GeneSis is the first generative approach for lexical substitution (EMNLP 2021).
Python
12
star
12

xl-wsd-code

Code to train and test Word Sense Disambiguation models based on different pretrained transformers.
Python
12
star
13

srl4e

Python
11
star
14

multi-srl

Code and models for the COLING2020 paper "Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach".
Python
10
star
15

conception

Code and experiments for the COLING2020 paper "Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations".
Java
10
star
16

clubert

Distribution of word meanings in Wikipedia for English, Italian, French, German and Spanish.
10
star
17

steps

STEPS is a seq2seq model for Semantic Typing of Event Processes (AAAI 2022).
Python
8
star
18

LeakDistill

Python
8
star
19

bmr

Python
7
star
20

neural-pagerank-wsd

Exploiting the global WordNet graph to perform WSD
Python
6
star
21

srl-pas-probing

Probing for Predicate Argument Structures in Pretrained Language Models (ACL 2022).
Python
5
star
22

mulan

Multilingual Label propagatioN for Word Sense Disambiguation
Python
5
star
23

mwsd-datasets

Semeval-2013 and -2015 multilingual WSD datasets for BabelNet 4.0
Shell
5
star
24

dsrl

Code for "Semantic Role Labeling meets Definition Modeling: using natural language to describe predicate-argument structures"
Perl
5
star
25

MaTESe

MaTESe: Machine Translation Evaluation as a Sequence Tagging Problem
Python
4
star
26

sir

SIR is a sense-enhanced Information Retrieval system for multiple languages (EMNLP2021).
Python
4
star
27

multilabel-wsd

A multi-labeling model for knowledge integration into Word Sense Disambiguation (EACL 2021).
Python
3
star
28

nlp2020-hw1

Python
2
star
29

exploring-srl

Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities"
2
star
30

nlp2021-hw1

Python
2
star
31

nlp2023-hw1

Homework 1 for the NLP 2023 course
Python
2
star
32

united-srl

A unified dataset for span- and dependency-based multilingual and cross-lingual Semantic Role Labeling (EMNLP 2021).
2
star
33

nlp2022-hw1

Python
1
star
34

csi_code

Python
1
star
35

nlp2020-hw2

Python
1
star
36

alasca

New large-scale datasets for the task of lexical substitution (IJCAI 2021)
1
star
37

nlp2023-hw2

Homework 2 for the Multilingual NLP 2023 course
Python
1
star
38

XL-WA

1
star
39

visual-definition-modeling

Python
1
star