Deep NLP @ CIS - LMU (@cisnlp)

Top repositories

1

simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
Python
345
star
2

Glot500

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
Python
96
star
3

GlotLID

GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Python
83
star
4

semi-markov-crf

Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"
Python
17
star
5

GlotScript

GlotScript: A Resource and Tool for Low Resource Writing System Identification -- LREC 2024
Python
13
star
6

parcoure

ParCourE - Parallel Corpus Explorer
Python
12
star
7

ofa

A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
Python
11
star
8

GlotCC

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages -- under review
Jupyter Notebook
11
star
9

bias-in-nlp

Literature overview: gender bias in natural language processing
Python
10
star
10

mPLM-Sim

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Python
10
star
11

graph-align

code for EMNLP graph align paper
Python
9
star
12

Taxi1500

Python
7
star
13

GlotWeb

GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
Python
5
star
14

TransMI

TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Python
4
star
15

TransliCo

TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
Python
4
star
16

GlotStoryBook

Children StoryBooks for 180 langauges.
Jupyter Notebook
3
star
17

ColexificationNet

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Jupyter Notebook
3
star
18

cisnlp.github.io

Homepage of cisnlp
SCSS
3
star
19

MaskLID

MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
Python
3
star
20

Transliteration-PPA

Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
Python
2
star
21

lohoravens-webpage

JavaScript
2
star
22

XAMPLER

XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Python
2
star
23

Spatial_Schemas

JavaScript
1
star