Cambridge Language Technology Lab (@cambridgeltl)

Top repositories

1

visual-med-alpaca

Visual Med-Alpaca is an open-source, multi-modal foundation model designed specifically for the biomedical domain, built on the LLaMa-7B.
Python
343
star
2

MTL-Bioinformatics-2016

Python
211
star
3

sapbert

[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.
Python
162
star
4

BioNLP-2016

Python
120
star
5

xcopa

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
91
star
6

visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Python
87
star
7

mirror-bert

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
Python
73
star
8

composable-sft

A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
Python
67
star
9

cometa

Corpus of Online Medical EnTities: the cometA corpus
Jupyter Notebook
45
star
10

parameter-factorization

Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer
Python
38
star
11

autopeft

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL)
Python
33
star
12

ContrastiveBLI

Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
32
star
13

link-prediction_with_deep-learning

Python
28
star
14

eva

[AAAI'21] Code release for "Visual Pivoting for (Unsupervised) Entity Alignment".
Python
25
star
15

mop

Codes for paper: Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT
Python
24
star
16

adversarial-postspec

Auxiliary GAN for WE post-specialisation
Python
23
star
17

python4cl

Introductory Python course for computational lingustics
Jupyter Notebook
22
star
18

SIPHS

15
star
19

ACL2022_tutorial_multilingual_dialogue

Materials for "Natural Language Processing for Multilingual Task-Oriented Dialogue" Tutorial at ACL 2022
14
star
20

multi3woz

The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems (TACL 2023)
Python
14
star
21

BLICEr

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
13
star
22

ECNMT

Emergent Communication Pretraining for Few-Shot Machine Translation
Python
13
star
23

ClaPS

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning (Zhou et al.; EMNLP 2023 Findings)
Python
13
star
24

medlama

Python
12
star
25

post-specialisation

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
Python
12
star
26

multilabel-nn

Initializing neural networks for hierarchical multi-label text classification
Python
11
star
27

MirrorWiC

[CoNLL'21] MirrorWiC: On Eliciting Word-in-Context Representationsfrom Pretrained Language Models
Python
11
star
28

sw_study

Roff
9
star
29

nn_for_LBD

Repository for paper 'Neural networks for open and closed Literature-based Discovery'
Python
9
star
30

chat

Python
9
star
31

lionlbd

Source code for the LION LBD Tool
JavaScript
9
star
32

PairS

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.16950)
Python
9
star
33

cancer-hallmark-cnn

Cancer hallmark CNN
Python
7
star
34

HELIN

Demo Entity Linking API for the HDR Text Analytic Team.
Python
7
star
35

e2e_tod_toolkit

A codebase for e2e ToD toolkit.
Python
7
star
36

COD

6
star
37

iso-study

Data sets and comparable Wikipedia samples used in our study on near-isomorphism between monolingual word embeddings
Python
6
star
38

prompt4bli

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Python
6
star
39

hyperlex

HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment
5
star
40

ensembled-sicl

Python
4
star
41

RepEval-2016

Python
4
star
42

POSQA

Offical Repo of EMNLP Findings 2023 Paper: POSQA: Probe the World Models of LLMs with Size Comparisons
Python
4
star
43

bio-verbnet

Contains materials for BioVerbnet
4
star
44

panlex-bli

Bilingual lexicon induction (BLI) training and test sets extracted from PanLex - used in the work of Vulić et al. (EMNLP 2019)
4
star
45

bio-simverb

Python
4
star
46

bioverbnet

BioVerbNet: A large semantic-syntacticclassification of verbs in biomedicine
3
star
47

retrofitted-bio-embeddings

Bio word embeddings retrofitted to verb clusters
Python
3
star
48

mling_sdgms

Python
3
star
49

response_reranking

Code repository for Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems (LREC-COLING 2024)
Python
3
star
50

sqatin

Code for Paper "SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU". Published at NAACL-2024 (main conference)
Python
2
star
51

fs-wrep

Pretrained function-specific vectors (Gerz et al., ACL 2020)
2
star
52

xling-postspec

Cross-lingual Semantic Specialization via Lexical Relation Induction
Python
2
star
53

biocaster_2021

This is a public repo for codes and resources of BioCaster 2021: http://www.biocaster.org
Java
2
star
54

bmip-2017-practical

BMIP 2017 practical
1
star
55

deductive_reasoning_probing

Jupyter Notebook
1
star
56

bmip-2018

Resources for BMIP ticked practical
Python
1
star
57

uniprotidmap

UniProt ID mappings
Python
1
star