• Stars
    star
    223
  • Rank 178,458 (Top 4 %)
  • Language
    Python
  • License
    Creative Commons ...
  • Created about 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

READMe

This repository contains the models and supplementary data for the paper A Neural Network Multi-Task Learning Approach to Biomedical Named Entity Recognition by Gamal Crichton, Sampo Pyysalo, Billy Chiu and Anna Korhonen.

The supplementary data can be found in the file Additional file 1.pdf.

The corpora used for the experiments (which can be re-distributed) are in the data folder.
Note: The re-distribution status of the BioCreative IV Chemical and Drug (BC4CHEMD) named entity recognition task corpus is unclear but it can be publicly accessed at http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.

The models can be found in the models folder.

There are several files in the models folder:

  • baseline.py: The MLP model used as a baseline for the experiments.

    Example Usage: python baseline.py 'path/to/dataset' 'path/to/vectorfile'

  • baseline_config.py: The configurable variables and their values for the MLP baseline model (baseline.py).

  • config.py: The configurable variables and their values for the convolutional models.

  • MT-dependent.py: The multi-task Dependent Model.

    Example usage: python MT-dependent.py 'path/to/data-files' 'dataset-1,...,dataset-n' 'path/to/vectorfile'

  • multi-output_MT.py: The multi-output multi-task model.

    Example usage: python multi-output_MT.py 'path/to/data-files' 'dataset-1,...,dataset-n' 'path/to/vectorfile'

  • multi-output_MT-var-dataset.py: The model used in the multi-task experiments which investigated the effect of multi-task learning on datasets of various sizes.
    Specify the percent-keep command to determine how much of the training examples of dataset whose size you wish to vary to randomly keep. This must be the first dataset specified, all other datasets will train with full training data.

    Example usage: python multi-output_MT-var-dataset.py --percent-keep 0.5 'path/to/data-files' 'path/to/reduced-dataset,path/to/whole-dataset' 'path/to/vectorfile'

  • single_task.py: The single task model.

    Example usage: python single_task.py 'path/to/dataset' 'path/to/vectorfile'

Note: The experiments in the paper applied the Viterbi algorithm to the outputs. Use the --viterbi flag to replicate this.

License

The code is provided under MIT license and the other materials under Creative Commons Attribution 4.0.

More Repositories

1

visual-med-alpaca

Visual Med-Alpaca is an open-source, multi-modal foundation model designed specifically for the biomedical domain, built on the LLaMa-7B.
Python
358
star
2

sapbert

[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.
Python
167
star
3

BioNLP-2016

Python
121
star
4

xcopa

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
97
star
5

visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Python
92
star
6

mirror-bert

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
Python
75
star
7

composable-sft

A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
Python
68
star
8

cometa

Corpus of Online Medical EnTities: the cometA corpus
Jupyter Notebook
46
star
9

autopeft

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL)
Python
42
star
10

parameter-factorization

Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer
Python
39
star
11

ContrastiveBLI

Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
32
star
12

PairS

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.16950)
Python
32
star
13

link-prediction_with_deep-learning

Python
28
star
14

eva

[AAAI'21] Code release for "Visual Pivoting for (Unsupervised) Entity Alignment".
Python
25
star
15

mop

Codes for paper: Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT
Python
24
star
16

python4cl

Introductory Python course for computational lingustics
Jupyter Notebook
23
star
17

adversarial-postspec

Auxiliary GAN for WE post-specialisation
Python
23
star
18

ClaPS

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning (Zhou et al.; EMNLP 2023 Findings)
Python
16
star
19

SIPHS

15
star
20

ACL2022_tutorial_multilingual_dialogue

Materials for "Natural Language Processing for Multilingual Task-Oriented Dialogue" Tutorial at ACL 2022
14
star
21

multi3woz

The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems (TACL 2023)
Python
14
star
22

BLICEr

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
13
star
23

ECNMT

Emergent Communication Pretraining for Few-Shot Machine Translation
Python
13
star
24

multilabel-nn

Initializing neural networks for hierarchical multi-label text classification
Python
12
star
25

medlama

Python
12
star
26

post-specialisation

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
Python
12
star
27

MirrorWiC

[CoNLL'21] MirrorWiC: On Eliciting Word-in-Context Representationsfrom Pretrained Language Models
Python
11
star
28

e2e_tod_toolkit

A codebase for e2e ToD toolkit.
Python
10
star
29

sw_study

Roff
9
star
30

nn_for_LBD

Repository for paper 'Neural networks for open and closed Literature-based Discovery'
Python
9
star
31

chat

Python
9
star
32

lionlbd

Source code for the LION LBD Tool
JavaScript
9
star
33

prompt4bli

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Python
9
star
34

zepo

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al.)
Python
8
star
35

cancer-hallmark-cnn

Cancer hallmark CNN
Python
7
star
36

HELIN

Demo Entity Linking API for the HDR Text Analytic Team.
Python
7
star
37

COD

6
star
38

iso-study

Data sets and comparable Wikipedia samples used in our study on near-isomorphism between monolingual word embeddings
Python
6
star
39

hyperlex

HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment
5
star
40

ensembled-sicl

Python
4
star
41

RepEval-2016

Python
4
star
42

POSQA

Offical Repo of EMNLP Findings 2023 Paper: POSQA: Probe the World Models of LLMs with Size Comparisons
Python
4
star
43

bio-verbnet

Contains materials for BioVerbnet
4
star
44

panlex-bli

Bilingual lexicon induction (BLI) training and test sets extracted from PanLex - used in the work of Vulić et al. (EMNLP 2019)
4
star
45

bio-simverb

Python
4
star
46

bioverbnet

BioVerbNet: A large semantic-syntacticclassification of verbs in biomedicine
3
star
47

retrofitted-bio-embeddings

Bio word embeddings retrofitted to verb clusters
Python
3
star
48

mling_sdgms

Python
3
star
49

response_reranking

Code repository for Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems (LREC-COLING 2024)
Python
3
star
50

sqatin

Code for Paper "SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU". Published at NAACL-2024 (main conference)
Python
2
star
51

fs-wrep

Pretrained function-specific vectors (Gerz et al., ACL 2020)
2
star
52

xling-postspec

Cross-lingual Semantic Specialization via Lexical Relation Induction
Python
2
star
53

biocaster_2021

This is a public repo for codes and resources of BioCaster 2021: http://www.biocaster.org
Java
2
star
54

sail-bli

Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Python
1
star
55

bmip-2017-practical

BMIP 2017 practical
1
star
56

deductive_reasoning_probing

Jupyter Notebook
1
star
57

uniprotidmap

UniProt ID mappings
Python
1
star
58

bmip-2018

Resources for BMIP ticked practical
Python
1
star