• Stars
    star
    4,346
  • Rank 9,897 (Top 0.2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 7 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

PyTorch Sentiment Analysis

Note: This repo only works with torchtext 0.9 or above which requires PyTorch 1.8 or above. If you are using torchtext 0.8 then please use this branch

This repo contains tutorials covering how to do sentiment analysis using PyTorch 1.8 and torchtext 0.9 using Python 3.7.

The first 2 tutorials will cover getting started with the de facto approach to sentiment analysis: recurrent neural networks (RNNs). The third notebook covers the FastText model and the final covers a convolutional neural network (CNN) model.

There are also 2 bonus "appendix" notebooks. The first covers loading your own datasets with torchtext, while the second contains a brief look at the pre-trained word embeddings provided by torchtext.

If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!

Getting Started

To install PyTorch, see installation instructions on the PyTorch website.

To install torchtext:

pip install torchtext

We'll also make use of spaCy to tokenize our data. To install spaCy, follow the instructions here making sure to install the English models with:

python -m spacy download en_core_web_sm

For tutorial 6, we'll use the transformers library, which can be installed via:

pip install transformers

These tutorials were created using version 4.3 of the transformers library.

Tutorials

  • 1 - Simple Sentiment Analysis Open In Colab

    This tutorial covers the workflow of a PyTorch with torchtext project. We'll learn how to: load data, create train/test/validation splits, build a vocabulary, create data iterators, define a model and implement the train/evaluate/test loop. The model will be simple and achieve poor performance, but this will be improved in the subsequent tutorials.

  • 2 - Upgraded Sentiment Analysis Open In Colab

    Now we have the basic workflow covered, this tutorial will focus on improving our results. We'll cover: using packed padded sequences, loading and using pre-trained word embeddings, different optimizers, different RNN architectures, bi-directional RNNs, multi-layer (aka deep) RNNs and regularization.

  • 3 - Faster Sentiment Analysis Open In Colab

    After we've covered all the fancy upgrades to RNNs, we'll look at a different approach that does not use RNNs. More specifically, we'll implement the model from Bag of Tricks for Efficient Text Classification. This simple model achieves comparable performance as the Upgraded Sentiment Analysis, but trains much faster.

  • 4 - Convolutional Sentiment Analysis Open In Colab

    Next, we'll cover convolutional neural networks (CNNs) for sentiment analysis. This model will be an implementation of Convolutional Neural Networks for Sentence Classification.

  • 5 - Multi-class Sentiment Analysis Open In Colab

    Then we'll cover the case where we have more than 2 classes, as is common in NLP. We'll be using the CNN model from the previous notebook and a new dataset which has 6 classes.

  • 6 - Transformers for Sentiment Analysis Open In Colab

    Finally, we'll show how to use the transformers library to load a pre-trained transformer model, specifically the BERT model from this paper, and use it to provide the embeddings for text. These embeddings can be fed into any model to predict sentiment, however we use a gated recurrent unit (GRU).

Appendices

  • A - Using TorchText with your Own Datasets Open In Colab

    The tutorials use TorchText's built in datasets. This first appendix notebook covers how to load your own datasets using TorchText.

  • B - A Closer Look at Word Embeddings Open In Colab

    This appendix notebook covers a brief look at exploring the pre-trained word embeddings provided by TorchText by using them to look at similar words as well as implementing a basic spelling error corrector based entirely on word embeddings.

  • C - Loading, Saving and Freezing Embeddings Open In Colab

    In this notebook we cover: how to load custom word embeddings, how to freeze and unfreeze word embeddings whilst training our models and how to save our learned embeddings so they can be used in another model.

References

Here are some things I looked at while making these tutorials. Some of it may be out of date.

More Repositories

1

pytorch-seq2seq

Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Jupyter Notebook
5,334
star
2

pytorch-image-classification

Tutorials on how to implement a few key architectures for image classification using PyTorch and TorchVision.
Jupyter Notebook
969
star
3

pytorch-rl

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Jupyter Notebook
259
star
4

pytorch-pos-tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Jupyter Notebook
177
star
5

a-tour-of-pytorch-optimizers

A tour of different optimization algorithms in PyTorch.
Jupyter Notebook
81
star
6

machine-learning-courses

A collection of machine learning courses.
39
star
7

code2vec

A PyTorch implementation of `code2vec: Learning Distributed Representations of Code` (Alon et al., 2018)
Python
34
star
8

pytorch-generative-models

[IN PROGRESS] An introduction to generative adversarial networks (GANs) and variational autoencoders (VAEs) in PyTorch, by implementing a few key architectures.
Jupyter Notebook
29
star
9

pytorch-nli

A tutorial on how to implement models for natural language inference using PyTorch and TorchText. [IN PROGRESS]
Jupyter Notebook
25
star
10

pytorch-language-modeling

Jupyter Notebook
13
star
11

extreme-summarization-of-source-code

Implementation of 'A Convolutional Attention Network for Extreme Summarization of Source Code' in PyTorch using TorchText
Python
13
star
12

pytorch-text-classification

Jupyter Notebook
13
star
13

notes

Python
12
star
14

gradient-descent

Let's learn gradient descent by using linear regression, logistic regression and neural networks!
Jupyter Notebook
11
star
15

pytorch-neural-style-transfer

Python
11
star
16

pytorch-for-code

Using PyTorch to apply machine learning techniques to source code.
Python
10
star
17

pytorch-transfer-learning

Python
9
star
18

pytorch-practice

Jupyter Notebook
8
star
19

bag-of-tricks-for-efficient-text-classification

Implementation of 'Bag of Tricks for Efficient Text Classification' in PyTorch using TorchText
Python
8
star
20

recurrent-attention-model

Python
8
star
21

pytorch-dqn

An implementation of various flavours of deep Q-learning (DQN) in PyTorch.
Jupyter Notebook
7
star
22

paper-notes

n'th attempt at keeping note of papers I have read
6
star
23

lexisearch

Use semantic similarity models to query transcriptions from the Lex Fridman Podcast.
Python
6
star
24

CodeSearchNet

Python
5
star
25

relation-networks

Implementation of the bAbi task from A simple neural network module for relational reasoning in PyTorch using TorchText.
Python
3
star
26

variational-autoencoders

Jupyter Notebook
3
star
27

bentrevett.github.io

My personal website to act as a portfolio
HTML
3
star
28

snli

https://nlp.stanford.edu/projects/snli/
Python
3
star
29

go-practice

Go
2
star
30

Glucoduino

Project to read data from glucometers using the Arduino platform
C++
2
star
31

character-aware-neural-language-models

Implementation of 'Character-Aware Neural Language Models' in PyTorch using TorchText
Python
2
star
32

attributed-document-qa

Python
2
star
33

clip-search

Text-to-image search with OpenCLIP, Docker, Flask, Faiss, etc. and a basic front-end.
Python
2
star
34

wordle-terminal

Wordle in the terminal.
Python
1
star
35

art

Markov chain to generate "art"
Python
1
star
36

sorting-algorithms

Implementation of sorting algorithms, with visualizations.
1
star
37

py-algorithms

Implementation of various algorithms in Python 3.
Jupyter Notebook
1
star
38

bentrevett

1
star
39

Glucoduino-Classic-Bluetooth-Application

Android application for glucoduino project using standard Bluetooth
Java
1
star
40

keepnote

Google Chrome note taking extension
JavaScript
1
star
41

brainfuck-python

A brainfuck interpreter in Python 3.
Brainfuck
1
star
42

numberworld

A toy environment for task-oriented language grounding.
Python
1
star
43

Glucoduino-CSR-Chip

Code for the CSR uEnergy SDK for the glucoduino project
C
1
star