• Stars
    star
    149
  • Rank 248,619 (Top 5 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine reading comprehension on clinical case reports

Machine reading comprehension on clinical case reports

This is the accompanying code of:

CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension. Simon Šuster and Walter Daelemans. NAACL, 2018.

Alt text

Dataset availability

Thanks to an agreement with the publisher of BMJ Case Reports, we are allowed to freely distribute our dataset for research purposes. Please send us an email to [email protected], and we will provide you with the link.

Handling the dataset and baselines

Collect some statistics about the dataset:

python3 dataset-code/describe_data.py -train_file TRAIN_PATH -dev_file DEV_PATH -test_file TEST_PATH

Other:

Neural readers (adapted to CliCR)

To train the Stanford Attentive Reader:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python main.py --train_file PATH/TO/train1.0.json --dev_file PATH/TO/dev1.0.json --embedding_file PATH/TO/embeddings  --log_file best.log --att_output False

This will use default parameters, with hidden size and dropout rate optimized on the development set. It also by default removes those instances from the dataset for which the answers are not found in their exact form in the corresponding passage. To change any of these parameters, modify config.py. The model will be saved as best.model.

Test the SA model:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python main.py --test_only True --pre_trained best.model --train_file PATH/TO/train1.0.json --dev_file PATH/TO/test1.0.json --embedding_file PATH/TO/embeddings --log_file best.test.log

To run evaluation separately:

python3 dataset-code/evaluate.py -test_file PATH/TO/dev1.0.json -prediction_file predictions -embeddings_file PATH/TO/embeddings -downcase -extended

If you'd like to run an extended evaluation using BLEU and embedding-based metrics, use the option -extended. This embedding-based metrics will use the embeddings available in -embeddings_file.

To train the Gated Attention Reader with marked entities:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python3 run.py --dataset clicr_plain --mode 1 --nhidden 67 --dropout 0.4 --use_feat 1 --data_path PATH/TO/dataset_plain/ent/gareader/ --experiments_path experiments/

Run python3 run.py --help to see the full list of options.

To test the GA reader model:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python3 run.py --dataset clicr_plain --mode 2 --nhidden 67 --dropout 0.4 --use_feat 1 --data_path PATH/TO/dataset_plain/ent/gareader/ --experiments_path experiments/

More Repositories

1

pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Python
8,736
star
2

news-audit

Fake news detection, Google Summer of Code 2017
Python
90
star
3

dutchembeddings

Repository for the word embeddings experiments described in "Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource", presented at LREC 2016.
Python
82
star
4

cat

cat🐈: the repo for the paper "Embarrassingly Simple Unsupervised Aspect extraction"
Python
77
star
5

clinspell

Clinical spelling correction with word and character n-gram embeddings.
Python
74
star
6

MBSP

Memory-based shallow parser for Python
Lex
73
star
7

topbox

Python 2 & 3 wrapper around the Stanford Topic Modeling Toolbox. Intended to be used for hassle-free supervised topic classification with Labeled Latent Dirichlet Allocation (L-LDA, LLDA, sLDA).
Python
59
star
8

bratreader

Python code for reading Brat Repositories. Supports saving and reading from XML files for easy acces to annotations.
Python
41
star
9

wordkit

Featurize words into orthographic and phonological vectors.
Python
39
star
10

hades

Repository for the CLiPS HAte speech DEtection System [HADES].
Python
24
star
11

interpret_with_rules

Code for the paper "Rule induction for global explanation of trained models"
Python
21
star
12

mfaq

MFAQ: a Multilingual FAQ Dataset
Python
17
star
13

humumls

UMLS in Python with MongoDB.
Python
16
star
14

yarn

Disambiguating biomedical and clinical concepts with word embeddings
Python
14
star
15

conch

Unsupervised concept extraction from clinical text
Python
14
star
16

gsoc2018

Google Summer of Code 2018
JavaScript
8
star
17

accumulate

Software created within Accumulate project (www.accumulate.be) at CLiPS, University of Antwerp
8
star
18

metameric

A fast simulator for localist connectionist models.
JavaScript
7
star
19

conversational-agents

Ressources on conversational agents
7
star
20

rnn_expl_rules

Obtain explanation rules from an RNN
Python
5
star
21

dutchclinicalnegation

Negation detection of concepts in Dutch clinical text
Python
4
star
22

SimulatingCochlearImplants

Simulating cochlear implants with neural networks
Python
4
star
23

gsoc2019_bias

Python
4
star
24

memory-networks

Memory networks (and variants) for medical machine reading
Python
4
star
25

srl2tex

Creates LaTeX source from semantic role annotations
Scala
4
star
26

fewshot-biomedical-names

Code for the BioNLP 2021 paper "Scalable Few-Shot Learning of Robust Biomedical Name Representations"
Python
3
star
27

vardial-dfs

CLiPS submission for the 'Discriminating between Dutch and Flemish in Subtitles' (DFS) subtask at VarDial
Python
3
star
28

english_clinical_modality

Negation and speculation detection of concepts in English clinical text
Python
3
star
29

toposcope

Python
3
star
30

ADATaLKS

TeX
2
star
31

gsoc2019_crosslang

GSoC 2019 project on cross language analysis
Python
2
star
32

conll2018

The code for the conll2018 submission: "from strings to other things: linking the neighborhood and transposition effects in word reading."
Python
2
star
33

styloscope

Python
2
star
34

PatientRep

Code repository for learning patient representations
Python
2
star
35

higherlevelsemantics

Code for the LOUHI 2021 paper "Integrating Higher-Level Semantics into Robust Biomedical Name Representations"
Python
1
star
36

clips.github.io

JavaScript
1
star
37

memory-networs-for-reading-comprehension

Memory networks for machine reading comprehension in PyTorch
Python
1
star
38

seg-cnn

Segment CNNs for clinical relation extraction with additional features
Python
1
star
39

gsoc2019_vinlap

GSoC 2019 Project developed by @FabricioLayedra under the supervision of @GuyDePaw. Contact [email protected]
HTML
1
star
40

conceptualgrounding

Code for the EACL 2021 paper "Conceptual Grounding Constraints for Truly Robust Biomedical Name Representations"
Python
1
star