CC6205 - Natural Language Processing
This is a course on natural language processing.
-
Lecturer: Felipe Bravo-Marquez
-
TAs: Gabriel Iturra-Bocaz, Jorge Ortiz, Consuelo Rojas, Sebastián Tinoco and Felipe Urrutia.
-
Lectures: Tuesday 14:30 - 16:00, Thursday 14:30 - 16:00
-
Course Program (in Spanish)
Info
This course aims to provide a comprehensive introduction to Natural Language Processing (NLP) by covering essential concepts. We strive to strike a balance between traditional techniques, such as N-gram language models, Naive Bayes, and Hidden Markov Models (HMMs), and modern deep neural networks, including word embeddings, recurrent neural networks (RNNs), and transformers.
The course material draws from various sources. In many instances, sentences from these sources are directly incorporated into the slides. The neural network topics primarily rely on the book Neural Network Methods for Natural Language Processing by Goldberg. Non-neural network topics, such as Probabilistic Language Models, Naive Bayes, and HMMs, are sourced from Michael Collins' course and Dan Jurafsky's book. Additionally, some slides are adapted from online tutorials and other courses, such as Manning's Stanford course.
Slides
- Introduction to Natural Language Processing | (tex source file), video 1, video 2
- Vector Space Model and Information Retrieval | (tex source file), video 1, video 2
- Probabilistic Language Models | (tex source file), notes, video 1, video 2, video 3, video 4
- Text Classification and Naive Bayes | (tex source file) , notes, video 1, video 2, video 3
- Linear Models | (tex source file), video 1, video 2, video 3, video 4
- Neural Networks | (tex source file), video 1, video 2, video 3, video 4
- Word Vectors | (tex source file) video 1, video 2, video 3
- Sequence Labeling and Hidden Markov Models | (tex source file), notes, video 1, video 2, video 3, video 4
- MEMMs and CRFs | (tex source file), notes 1, notes 2, video 1, video 2, video 3
- Convolutional Neural Networks | (tex source file), video
- Recurrent Neural Networks | (tex source file), video 1, video 2, video 3
- Sequence to Sequence Models and Attention | (tex source file), video 1, video 2
- Transformer Architecture | (tex source file), video 1
- Contextualized Embeddings and Large Language Models, video 1, video 2, video 3
NLP Libraries and Tools
- NLTK: Natural Language Toolkit
- Gensim
- spaCy: Industrial-strength NLP
- Torchtext
- AllenNLP: Open source project for designing deep leaning-based NLP models
- HuggingFace Transformers
- ChatGPT
- Google Bard
- Stanza - A Python NLP Library for Many Human Languages
- FlairNLP: A very simple framework for state-of-the-art Natural Language Processing (NLP)
- WEFE: The Word Embeddings Fairness Evaluation Framework
- WhatLies: A library that tries help you to understand. "What lies in word embeddings?"
- LASER:a library to calculate and use multilingual sentence embeddings
- Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch
- Datasets: a lightweight library with one-line dataloaders for many public datasets in NLP
Notes and Books
- Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin.
- Michael Collins' NLP notes.
- A Primer on Neural Network Models for Natural Language Processing by Joav Goldberg.
- Natural Language Understanding with Distributed Representation by Kyunghyun Cho
- A Survey of Large Language Models
- Natural Language Processing Book by Jacob Eisenstein
- NLTK book
- Embeddings in Natural Language Processing by Mohammad Taher Pilehvar and Jose Camacho-Collados
- Dive into Deep Learning Book
- Contextual Word Representations: A Contextual Introduction by Noah A. Smith
Other NLP Courses
- CS224n: Natural Language Processing with Deep Learning, Stanford course
- Deep Learning in NLP: slides by Horacio RodrÃguez
- David Bamman NLP Slides @Berkley
- CS 521: Statistical Natural Language Processing by Natalie Parde, University of Illinois
- 10 Free Top Notch Natural Language Processing Courses
Videos
- Natural Language Processing MOOC videos by Dan Jurafsky and Chris Manning, 2012
- Natural Language Processing MOOC videos by Michael Collins, 2013
- Natural Language Processing with Deep Learning by Chris Manning and Richard Socher, 2017
- CS224N: Natural Language Processing with Deep Learning | Winter 2019
- Computational Linguistics I by Jordan Boyd-Graber University of Maryland
- Visualizing and Understanding Recurrent Networks
- BERT Research Series by Chris McCormick
- Successes and Challenges in Neural Models for Speech and Language - Michael Collins
- More on Transforemers: BERT and Friends by Jorge Pérez
Other Resources
- ACL Portal
- Awesome-nlp: A curated list of resources dedicated to Natural Language Processing
- NLP-progress: Repository to track the progress in Natural Language Processing (NLP)
- Corpora Mailing List
🤗 Open LLM Leaderboard- Real World NLP Book: AllenNLP tutorials
- The Illustrated Transformer: a very illustrative blog post about the Transformer
- Better Language Models and Their Implications OpenAI Blog
- RNN effectiveness
- SuperGLUE: an benchmark of Natural Language Understanding Tasks
- decaNLP The Natural Language Decathlon: a benchmark for studying general NLP models that can perform a variety of complex, natural language tasks.
- Chatbot and Related Research Paper Notes with Images
- Ben Trevett's torchtext tutorials
- PLMpapers: a collection of papers about Pre-Trained Language Models
- The Illustrated GPT-2 (Visualizing Transformer Language Models)
- Linguistics, NLP, and Interdisciplinarity Or: Look at Your Data, by Emily M. Bender
- The State of NLP Literature: Part I, by Saif Mohammad
- From Word to Sense Embeddings:A Survey on Vector Representations of Meaning
- 10 ML & NLP Research Highlights of 2019 by Sebastian Ruder
- Towards a Conversational Agent that Can Chat About…Anything
- The Super Duper NLP Repo: a collection of Colab notebooks covering a wide array of NLP task implementations
- The Big Bad NLP Database, a collection of nearly 300 well-organized, sortable, and searchable natural language processing datasets
- A Primer in BERTology: What we know about how BERT works
- How Self-Attention with Relative Position Representations works
- Deep Learning Based Text Classification: A Comprehensive Review
- Teaching NLP is quite depressing, and I don't know how to do it well by Yoav Goldberg
- The NLP index
- 100 Must-Read NLP Papers