• Stars
    star
    230
  • Rank 174,053 (Top 4 %)
  • Language
    TeX
  • Created over 5 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Natural Language Processing

CC6205 - Natural Language Processing

This is a course on natural language processing.

Info

This course aims to provide a comprehensive introduction to Natural Language Processing (NLP) by covering essential concepts. We strive to strike a balance between traditional techniques, such as N-gram language models, Naive Bayes, and Hidden Markov Models (HMMs), and modern deep neural networks, including word embeddings, recurrent neural networks (RNNs), and transformers.

The course material draws from various sources. In many instances, sentences from these sources are directly incorporated into the slides. The neural network topics primarily rely on the book Neural Network Methods for Natural Language Processing by Goldberg. Non-neural network topics, such as Probabilistic Language Models, Naive Bayes, and HMMs, are sourced from Michael Collins' course and Dan Jurafsky's book. Additionally, some slides are adapted from online tutorials and other courses, such as Manning's Stanford course.

Slides

  1. Introduction to Natural Language Processing | (tex source file), video 1, video 2
  2. Vector Space Model and Information Retrieval | (tex source file), video 1, video 2
  3. Probabilistic Language Models | (tex source file), notes, video 1, video 2, video 3, video 4
  4. Text Classification and Naive Bayes | (tex source file) , notes, video 1, video 2, video 3
  5. Linear Models | (tex source file), video 1, video 2, video 3, video 4
  6. Neural Networks | (tex source file), video 1, video 2, video 3, video 4
  7. Word Vectors | (tex source file) video 1, video 2, video 3
  8. Sequence Labeling and Hidden Markov Models | (tex source file), notes, video 1, video 2, video 3, video 4
  9. MEMMs and CRFs | (tex source file), notes 1, notes 2, video 1, video 2, video 3
  10. Convolutional Neural Networks | (tex source file), video
  11. Recurrent Neural Networks | (tex source file), video 1, video 2, video 3
  12. Sequence to Sequence Models and Attention | (tex source file), video 1, video 2
  13. Transformer Architecture | (tex source file), video 1
  14. Contextualized Embeddings and Large Language Models, video 1, video 2, video 3

NLP Libraries and Tools

  1. NLTK: Natural Language Toolkit
  2. Gensim
  3. spaCy: Industrial-strength NLP
  4. Torchtext
  5. AllenNLP: Open source project for designing deep leaning-based NLP models
  6. HuggingFace Transformers
  7. ChatGPT
  8. Google Bard
  9. Stanza - A Python NLP Library for Many Human Languages
  10. FlairNLP: A very simple framework for state-of-the-art Natural Language Processing (NLP)
  11. WEFE: The Word Embeddings Fairness Evaluation Framework
  12. WhatLies: A library that tries help you to understand. "What lies in word embeddings?"
  13. LASER:a library to calculate and use multilingual sentence embeddings
  14. Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch
  15. Datasets: a lightweight library with one-line dataloaders for many public datasets in NLP

Notes and Books

  1. Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin.
  2. Michael Collins' NLP notes.
  3. A Primer on Neural Network Models for Natural Language Processing by Joav Goldberg.
  4. Natural Language Understanding with Distributed Representation by Kyunghyun Cho
  5. A Survey of Large Language Models
  6. Natural Language Processing Book by Jacob Eisenstein
  7. NLTK book
  8. Embeddings in Natural Language Processing by Mohammad Taher Pilehvar and Jose Camacho-Collados
  9. Dive into Deep Learning Book
  10. Contextual Word Representations: A Contextual Introduction by Noah A. Smith

Other NLP Courses

  1. CS224n: Natural Language Processing with Deep Learning, Stanford course
  2. Deep Learning in NLP: slides by Horacio Rodríguez
  3. David Bamman NLP Slides @Berkley
  4. CS 521: Statistical Natural Language Processing by Natalie Parde, University of Illinois
  5. 10 Free Top Notch Natural Language Processing Courses

Videos

  1. Natural Language Processing MOOC videos by Dan Jurafsky and Chris Manning, 2012
  2. Natural Language Processing MOOC videos by Michael Collins, 2013
  3. Natural Language Processing with Deep Learning by Chris Manning and Richard Socher, 2017
  4. CS224N: Natural Language Processing with Deep Learning | Winter 2019
  5. Computational Linguistics I by Jordan Boyd-Graber University of Maryland
  6. Visualizing and Understanding Recurrent Networks
  7. BERT Research Series by Chris McCormick
  8. Successes and Challenges in Neural Models for Speech and Language - Michael Collins
  9. More on Transforemers: BERT and Friends by Jorge Pérez

Other Resources

  1. ACL Portal
  2. Awesome-nlp: A curated list of resources dedicated to Natural Language Processing
  3. NLP-progress: Repository to track the progress in Natural Language Processing (NLP)
  4. Corpora Mailing List
  5. 🤗 Open LLM Leaderboard
  6. Real World NLP Book: AllenNLP tutorials
  7. The Illustrated Transformer: a very illustrative blog post about the Transformer
  8. Better Language Models and Their Implications OpenAI Blog
  9. RNN effectiveness
  10. SuperGLUE: an benchmark of Natural Language Understanding Tasks
  11. decaNLP The Natural Language Decathlon: a benchmark for studying general NLP models that can perform a variety of complex, natural language tasks.
  12. Chatbot and Related Research Paper Notes with Images
  13. Ben Trevett's torchtext tutorials
  14. PLMpapers: a collection of papers about Pre-Trained Language Models
  15. The Illustrated GPT-2 (Visualizing Transformer Language Models)
  16. Linguistics, NLP, and Interdisciplinarity Or: Look at Your Data, by Emily M. Bender
  17. The State of NLP Literature: Part I, by Saif Mohammad
  18. From Word to Sense Embeddings:A Survey on Vector Representations of Meaning
  19. 10 ML & NLP Research Highlights of 2019 by Sebastian Ruder
  20. Towards a Conversational Agent that Can Chat About…Anything
  21. The Super Duper NLP Repo: a collection of Colab notebooks covering a wide array of NLP task implementations
  22. The Big Bad NLP Database, a collection of nearly 300 well-organized, sortable, and searchable natural language processing datasets
  23. A Primer in BERTology: What we know about how BERT works
  24. How Self-Attention with Relative Position Representations works
  25. Deep Learning Based Text Classification: A Comprehensive Review
  26. Teaching NLP is quite depressing, and I don't know how to do it well by Yoav Goldberg
  27. The NLP index
  28. 100 Must-Read NLP Papers

More Repositories

1

beto

BETO - Spanish version of the BERT model
491
star
2

spanish-word-embeddings

Spanish word embeddings computed with different methods and from different corpora
355
star
3

CC5205

Introducción a la Minería de Datos
Shell
202
star
4

CC6204

Material del curso de Deep Learning de la Universidad de Chile
Jupyter Notebook
197
star
5

wefe

WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Word Embeddings models. Please feel welcome to open an issue in case you have any questions or a pull request if you want to contribute to the project!
Python
173
star
6

CC6104

Teaching material of the course "Statistical Thinking" of the Department of Computer Science at the University of Chile.
TeX
97
star
7

lightweight-spanish-language-models

ALBETO and DistilBETO are versions of ALBERT and DistilBERT pre-trained exclusively on Spanish corpora.
Python
29
star
8

rivertext

RiverText is a framework that standardizes the Incremental Word Embeddings proposed in the state-of-art. Please feel welcome to open an issue in case you have any questions or a pull request if you want to contribute to the project!
Python
18
star
9

GLUES

Resources for GLUE benchmark in Spanish
15
star
10

PracticaProfesional

Everything related to practica profesional
11
star
11

relela

Representations for Learning and Language
HTML
8
star
12

speedy-gonzales

Code for "Speedy Gonzales: A Collection of Fast Task-Specific Models for Spanish"
HTML
7
star
13

SNEC

Special Needs Education Corpus project
Jupyter Notebook
2
star
14

RiverText

Machine Learning for Text Sreams
2
star
15

word-embeddings-benchmarks

Python
1
star