• Stars
    star
    598
  • Rank 74,853 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    BSD 3-Clause "New...
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020

CMU LTI Low Resource NLP Bootcamp 2020

This is a page for a low-resource natural language and speech processing bootcamp held by the Carnegie Mellon University Language Technologies Institute in May 2020. The bootcamp was held virtually for some visitors to the institute, but we are making the videos and materials available for those interested in learning on your own. It comes in 8 parts, all with lecture videos and example exercises that you can do to expand your knowledge.

1. NLP Tasks

This lecture by Graham Neubig gives a high-level overview of a variety of NLP tasks (slides).

NLP Tasks

The exercise has participants download spaCy and see the types of linguistic outputs generated in its tutorial. We also examined the Universal Dependencies Treebank to see the various other languages that have annotated data such as that generated by spaCy's analysis.

2. Linguistics - Phonology and Morphology

This lecture by David Mortensen gives some linguistic background of phonology and morphology (slides).

Linguistics - Phonology and Morphology

The exercise has participants use epitran to generate phonetic transcriptions of words, and try to read some words in the international phonetic alphabet.

3. Machine Translation

This lecture by Antonis Anastasopoulos explains about machine translation, both phrase-based and neural (slides).

Machine Translation

The exercise runs through tutorials on word alignment with fast-align, and neural machine translation with JoeyNMT using data from the Latvian-English translation task at WMT.

4. Linguistics - Syntax and Morphosyntax

This lecture by Lori Levin explains about aspects of linguistics related to syntax and morphosyntax (slides).

Linguistics - Syntax and Morphosyntax

The exercise consists of creating an interlinear gloss for the language of your choice.

5. Neural Representation Learning

This lecture by Pengfei Liu explains about various methods for learning neural representations of language (slides).

Neural Representation Learning

The exercise, by Antonis Anastasopoulos, introduces learning of word representations using fastText, using them for simple text classification, and finding similar words.

6. Multilingual NLP

This lecture by Yulia Tsvetkov explains about how you can train multilingual NLP systems that work in many different languages (slides).

Multilingual NLP

The exercise, by Chan Park, provides two Jupyter noteboks that explain how to train a Naive Bayes Classifier for classification across languages, and introduces how to use multilingual BERT for cross-lingual classification.

7. Speech Synthesis

This lecture by Alan Black explains about speech synthesis, generating speech from text (slides: overview, building voices, unwritten languages).

Speech Synthesis

The exercise demonstrates how you can build your own talking clock using your voice in a language of your choice, and you can get instructions here and here.

8. Speech Recognition

This lecture by Bhiksha Raj explains about speech recognition, converting speech into textual transcriptions (slides).

Speech Recognition

The exercise, by Hira Dhamyal, demonstrates how to build a speech recognition system in Kaldi, specifically focusing on the mini-librispeech example.

More Repositories

1

nn4nlp-code

Code Samples from Neural Networks for NLP
Python
1,303
star
2

nlptutorial

A Tutorial about Programming for Natural Language Processing
Perl
423
star
3

nmt-tips

A tutorial about neural machine translation including tips on building practical systems
Perl
368
star
4

kytea

The Kyoto Text Analysis Toolkit for word segmentation and pronunciation estimation, etc.
C++
197
star
5

nlp-from-scratch-assignment-2022

An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch
Python
168
star
6

lamtram

lamtram: A toolkit for neural language and translation modeling
C++
138
star
7

anlp-code

Jupyter Notebook
130
star
8

research-career-tools

Python
128
star
9

naacl18tutorial

NAACL 2018 Tutorial: Modelling Natural Language, Programs, and their Intersection
TeX
102
star
10

minbert-assignment

Minimalist BERT implementation assignment for CS11-711
Python
70
star
11

minnn-assignment

An assignment on creating a minimalist neural network toolkit for CS11-747
Python
64
star
12

yrsnlp-2016

Structured Neural Networks for NLP: From Idea to Code
Jupyter Notebook
59
star
13

minllama-assignment

Python
48
star
14

util-scripts

Various utility scripts useful for natural language processing, machine translation, etc.
Perl
46
star
15

latticelm

Software for unsupervised word segmentation and language model learning using lattices
C++
45
star
16

coderx

A highly sophisticated sequence-to-sequence model for code generation
Python
40
star
17

rapid-adaptation

Reproduction instructions for "Rapid Adaptation of Neural Machine Translation to New Languages"
Shell
39
star
18

mtandseq2seq-code

Code examples for CMU CS11-731, Machine Translation and Sequence-to-sequence Models
Python
33
star
19

travatar

This is a repository for the Travatar forest-to-string translation decoder
C++
28
star
20

lxmls-2017

Slides/code for the Lisbon machine learning school 2017
Python
28
star
21

modlm

modlm: A toolkit for mixture of distributions language models
C++
27
star
22

kylm

The Kyoyo Language Modeling Toolkit
Java
27
star
23

pialign

pialign - A Phrasal ITG Aligner
C++
23
star
24

pgibbs

An implementation of parallel gibbs sampling for word segmentation and POS tagging.
C++
16
star
25

nlp-from-scratch-assignment-spring2024

An assignment for building an NLP system from scratch.
16
star
26

lader

A reordering tool for machine translation.
C++
15
star
27

howtocode-2017

An example of DyNet autobatching for the NIPS "how to code a paper" workshop
Jupyter Notebook
13
star
28

kyfd

A decoder for finite state models for text processing.
C++
12
star
29

egret

A fork of the Egret parser that fixes a few bugs
C++
10
star
30

latticelm-v2

Second version of latticelm, a tool for learning language models from lattices
C++
7
star
31

globalutility

TeX
6
star
32

nafil

A program for performing bilingual corpus filtering
C++
4
star
33

prontron

A discriminative pronunciation estimator using the structured perceptron algorithm.
Perl
4
star
34

wat2014

Scripts for creating a system similar to the NAIST submission to WAT2014
Shell
3
star
35

multi-extract

A script for extracting multi-synchronous context-free grammars
Python
2
star
36

nile

A clone of the nile alignment toolkit
C++
1
star
37

webigator

A program to aggregate, rank, and search text information
Perl
1
star
38

ribes-c

A C++ implementation of the RIBES machine translation evaluation measure.
C++
1
star
39

swe-bench-zeno

Scripts for analyzing swe-bench with Zeno
Python
1
star