ODSC Europe 2020 Workshop
Advanced NLP: From Essentials to Deep Transfer Learning
Abstract:
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive hands-on examples to master state-of-the-art tools, techniques and methodologies for actually applying NLP to solve real-world problems. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models
Session Outline
Module 1: NLP Essentials
Here we start with the basics of how to process and work with text data and strings. Look at essential components of a NLP pipeline and get started on some of the key components from this pipeline including understanding POS tagging, Named Entity Recognition and Text Pre-processing. We will look at traditional approaches as well as newer deep transfer learning based approaches for a few of these components.
Key Focus Areas: Text Pre-processing, NER, POS Tagging
Module 2: Text Representation
Text can't be consumed directly by downstream machine learning and deep learning models since they are at heart math-based models. The key focus of this module will be to cover both traditional statistical based methodologies and newer representation learning based methodologies which use deep learning to represent text data including bag of words, n-grams, word embeddings, universal embeddings and contextual embeddings.
Key Focus Areas: Count-based Representations (Bag of Words, N-grams, TF-IDF), Similarity, Topics, Word Embeddings (Word2Vec, GloVe, FastText), Universal Embeddings, Contextual Embeddings (Transformers)
Module 3: NLP Application (Machine Learning \ Deep Learning)
We will look at several popular applications of NLP in this module and go through hands-on examples. This includes movie recommendation systems using similarity, topic modeling analysis on research papers, summarizing text documents, language translation, text classification and sentiment analysis
Key Focus Areas: Topic Models, Similarity \ Information Retrieval, Summarization (TextRank \ Transformers), Language Translation (seq2seq \ attention), Classification (machine learning & deep learning models)
Module 4: NLP Applications with Deep Transfer Learning
We finally dive into some of the latest and best advancements which have happened in the last few years in the world of NLP, thanks to deep transfer learning. We will cover a deep conceptual understanding of the transformer architecture and look at some hands-on examples of text classification and multi-task NLP using transformers where we look at solving NER, Q&A, sentiment analysis, summarization, translation using effective constructs like the transformers pipeline.
Key Focus Areas: Text Classification (with pre-trained embeddings, universal sentence encoders and transformers), Multi-task NLP with transformer pipelines (sentiment analysis, NER, text generation, summarization, question-answering, translation). Fine-tuning\training transformers (tips \ guidelines) with examples e.g NER
Background Knowledge
Skills: Basic understanding of Machine Learning, Deep Learning (though we will cover some essentials) Tools \ Languages: Python, Tensorflow\Keras\PyTorch, Scikit-Learn (Basics)