Fundamentals of NLP
(Work in Progress!)
Natural language processing (NLP) has made substantial advances in the past few years due to the success of modern techniques that are based on deep learning. With the rise of the popularity of NLP and the availability of different forms of large-scale data, it is now even more imperative to understand the inner workings of NLP techniques and concepts, from first principles, as they find their way into real-world usage and applications that affect society at large. Building intuitions and having a solid grasp of concepts are both important for coming up with innovative techniques, improving research, and building safe, human-centered AI and NLP technologies.
We introduce a new series called Fundamentals of NLP where we aim to teach about important NLP techniques and concepts starting from the first principles. We will introduce the theoretical aspect and motivation of each concept covered throughout the series. Then we will obtain hands-on experience by using bootstrap methods, industry-standard tools, and other open-source libraries to implement the different techniques. Along the way, we will also cover best practices, share important references, point out common mistakes to avoid when training and building NLP models, and discuss what lies ahead.
Join our Slack community to find our more about this and other ongoing projects. Feel free to reach out to me on Twitter for an invite to our Slack group.
Chapters
Chapter 1: Tokenization, Lemmatization, Stemming, and Sentence Segmentation -- Colab notebook, Web version
How to Contribute?
- You can check out our Project page to see all the ongoing tasks or issues related to this research project. Lookout for the main
nlp_fundamentals
tag. Issues with thegood first issue
tag are good tasks to get started with. - You can also just check the issues tab.
- You can ask anything related to this project in our Slack group.
- Slack channel: #nlp_fundamentals