NLP in Python!
Natural Language Processing (NLP) in Python tutorial given for PyCon 2020 remote conference.
Link to video: https://youtu.be/vyOgWhwUmec
Resources
Here is a list of resources helpful for items covered throughout the video
Good libraries for NLP:
- Spacy: https://spacy.io/api
- TextBlob: https://textblob.readthedocs.io/en/dev/quickstart.html
- NLTK: https://www.nltk.org/
Bag of words
Overview: https://machinelearningmastery.com/gentle-introduction-bag-words-model/
Sklearn Code: https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction
Word Vectors
Overview: https://medium.com/@jayeshbahire/introduction-to-word-vectors-ea1d4e4b84bf
Spacy info: https://spacy.io/usage/vectors-similarity
Regexes
Python overview: https://docs.python.org/3/howto/regex.html
Regex Cheatsheet: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Regex tester: https://regex101.com/
Regex golf (to practice): https://alf.nu/RegexGolf
Stemming/Lemmatizing
Overview & NLTK Code: https://www.guru99.com/stemming-lemmatization-python-nltk.html
Spacy: https://spacy.io/api/lemmatizer
Stopwords
Quick overview + code: https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
Parts of speech
TextBlob usage: https://textblob.readthedocs.io/en/dev/api_reference.html
List of tags: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Categorizing and POS Tagging : https://www.learntek.org/blog/categorizing-pos-tagging-nltk-python/
Transformers:
Attention is all you need: https://arxiv.org/pdf/1706.03762.pdf
Good overview of these architectures https://www.youtube.com/watch?v=TQQlZhbC5ps
Illustrated transfomer: http://jalammar.github.io/illustrated-transformer/
Transformer Types:
Bert: https://arxiv.org/pdf/1810.04805.pdf
OpenAI GPT: https://openai.com/blog/better-language-models/