There are no reviews yet. Be the first to send feedback to the community and the maintainers!
simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023GlotLID
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023semi-markov-crf
Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"GlotScript
GlotScript: A Resource and Tool for Low Resource Writing System Identification -- LREC 2024parcoure
ParCourE - Parallel Corpus Explorerofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingGlotCC
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages -- under reviewbias-in-nlp
Literature overview: gender bias in natural language processingmPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsgraph-align
code for EMNLP graph align paperTaxi1500
GlotWeb
GlotWeb: Web Indexing for Low-Resource Languages -- under construction.TransMI
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated DataTransliCo
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language ModelsGlotStoryBook
Children StoryBooks for 180 langauges.cisnlp.github.io
Homepage of cisnlpMaskLID
MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024Transliteration-PPA
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignmentlohoravens-webpage
XAMPLER
XAMPLER: Learning to Retrieve Cross-Lingual In-Context ExamplesSpatial_Schemas
Love Open Source and this site? Check out how you can help us