bentrevett/pytorch-pos-tagging

Stars
177
Rank 215,985 (Top 5 %)
Language
Jupyter Notebook
License
MIT License
Created about 5 years ago
Updated over 3 years ago

bentrevett/pytorch-pos-tagging

bentrevett

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

PyTorch PoS Tagging

Note: This repo only works with torchtext 0.9 or above which requires PyTorch 1.8 or above. If you are using torchtext 0.8 then please use this branch

This repo contains tutorials covering how to perform part-of-speech (PoS) tagging using PyTorch 1.8, torchtext 0.9, and and spaCy 3.0, using Python 3.8.

These tutorials will cover getting started with the most common approach to PoS tagging: recurrent neural networks (RNNs). The first notebook introduces a bi-directional LSTM (BiLSTM) network. The second covers how to fine-tune a pretrained Transformer model.

If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!

Getting Started

To install PyTorch, see installation instructions on the PyTorch website.

To install TorchText:

pip install torchtext

To install the transformers library:

pip install transformers

We'll also make use of spaCy to tokenize our data. To install spaCy, follow the instructions here making sure to install the English models:

python -m spacy download en_core_web_sm

Tutorials

1 - BiLSTM for PoS Tagging

This tutorial covers the workflow of a PoS tagging project with PyTorch and TorchText. We'll introduce the basic TorchText concepts such as: defining how data is processed; using TorchText's datasets and how to use pre-trained embeddings. Using PyTorch we built a strong baseline model: a multi-layer bi-directional LSTM. We also show how the model can be used for inference to tag any input text.
2 - Fine-tuning Pretrained Transformers for PoS Tagging

This tutorial covers how to fine-tune a pretrained Transformer model, provided by the transformers library, by integrating it with TorchText. We use a pretrained BERT model to provide the embeddings for our input text and input these embeddings to a linear layer that will predict tags based on these embeddings.

References

Here are some things I looked at while making these tutorials. Some of it may be out of date.

pytorch-seq2seq

Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.

Jupyter Notebook

pytorch-sentiment-analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Jupyter Notebook

pytorch-image-classification

Tutorials on how to implement a few key architectures for image classification using PyTorch and TorchVision.

Jupyter Notebook

pytorch-rl

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]

Jupyter Notebook

a-tour-of-pytorch-optimizers

A tour of different optimization algorithms in PyTorch.

Jupyter Notebook

machine-learning-courses

A collection of machine learning courses.

code2vec

A PyTorch implementation of `code2vec: Learning Distributed Representations of Code` (Alon et al., 2018)

pytorch-generative-models

[IN PROGRESS] An introduction to generative adversarial networks (GANs) and variational autoencoders (VAEs) in PyTorch, by implementing a few key architectures.

Jupyter Notebook

pytorch-nli

A tutorial on how to implement models for natural language inference using PyTorch and TorchText. [IN PROGRESS]

Jupyter Notebook

pytorch-language-modeling

Jupyter Notebook

extreme-summarization-of-source-code

Implementation of 'A Convolutional Attention Network for Extreme Summarization of Source Code' in PyTorch using TorchText

pytorch-text-classification

Jupyter Notebook

notes

gradient-descent

Let's learn gradient descent by using linear regression, logistic regression and neural networks!

Jupyter Notebook

pytorch-neural-style-transfer

pytorch-for-code

Using PyTorch to apply machine learning techniques to source code.

pytorch-transfer-learning

pytorch-practice

Jupyter Notebook

bag-of-tricks-for-efficient-text-classification

Implementation of 'Bag of Tricks for Efficient Text Classification' in PyTorch using TorchText

recurrent-attention-model

pytorch-dqn

An implementation of various flavours of deep Q-learning (DQN) in PyTorch.

Jupyter Notebook

paper-notes

n'th attempt at keeping note of papers I have read

lexisearch

Use semantic similarity models to query transcriptions from the Lex Fridman Podcast.

CodeSearchNet

relation-networks

Implementation of the bAbi task from A simple neural network module for relational reasoning in PyTorch using TorchText.

variational-autoencoders

Jupyter Notebook

bentrevett.github.io

My personal website to act as a portfolio

snli

https://nlp.stanford.edu/projects/snli/

go-practice

Glucoduino

Project to read data from glucometers using the Arduino platform

character-aware-neural-language-models

Implementation of 'Character-Aware Neural Language Models' in PyTorch using TorchText

attributed-document-qa

clip-search

Text-to-image search with OpenCLIP, Docker, Flask, Faiss, etc. and a basic front-end.

wordle-terminal

Wordle in the terminal.

art

Markov chain to generate "art"

sorting-algorithms

Implementation of sorting algorithms, with visualizations.

py-algorithms

Implementation of various algorithms in Python 3.

Jupyter Notebook

bentrevett

Glucoduino-Classic-Bluetooth-Application

Android application for glucoduino project using standard Bluetooth

keepnote

Google Chrome note taking extension

brainfuck-python

A brainfuck interpreter in Python 3.

numberworld

A toy environment for task-oriented language grounding.

Glucoduino-CSR-Chip

Code for the CSR uEnergy SDK for the glucoduino project