• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 6 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Applying NLP transfer learning techniques to predict Tweet stance toward a topic

Stance Classification of Tweets using Transfer Learning

Applying transfer learning (using existing neural network architectures) to perform stance classification of Tweets as per the SemEval 2016 Stance Detection Task.

The methodology is described in detail in this Medium post and compared (in detail) the transfer learning approaches used.

For subtask A, the goal is to classify Tweets in response to a particular topic into one of three classes: Favor, Against and None. The provided notebooks attempt this using a technique in deep learning called transfer learning. While transfer learning has been ubiquitous throughout computer vision applications since the success of ImageNet, it is only since 2017-18 that significant progress has been made for transfer learning in NLP applications. There have been a string of interesting papers in 2018 that discuss the power of language models in natural language understanding and how they can be used to provide pre-trained representations of a language's syntax, which can be far more useful when training a neural network for previously unseen tasks.

Analysis Notebooks

See the included Jupyter notebooks for the stance classification workflow using ULMFit and the OpenAI transformer.

Method 1: ULMFiT

ulmfit.ipynb: (LSTM-based approach)

Method 2: OpenAI Transformer

transformer.ipynb: (Transformer-based approach)

Module Installation

The below sections highlight the installation steps for each approach used. Python 3.6+ and PyTorch 1.0.0 is used for all the work shown.

Set up virtual environment:

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Once virtual environment has been set up, activate it for further development.

source venv/bin/activate

PyTorch requirements

Install the latest version of pytorch (1.0+) as shown below:

pip3 install -r pytorch-requirements.txt

ULMFit with the fastai framework

This utilizes the fastai framework (built on top of PyTorch) to perform stance classification.

The notebook ulmfit.ipynb uses v1 of fastai, which has been refactored for efficiency and updated to move forward with future PyTorch versions (1.0+).

Install fastai as shown below:

pip3 install fastai

spaCy language model

For tokenization, fastai uses the SpaCy library's English language model. This has to be downloaded manually:

python3 -m spacy download en 

Evaluation

To evaluate the F1 score as per the SemEval 2016 Task 6 guidelines, use the perl script given in data/eval/ as shown:

perl eval.pl -u

---------------------------
Usage:
perl eval.pl goldFile guessFile

goldFile:  file containing gold standards;
guessFile: file containing your prediction.

These two files have the same format:
ID<Tab>Target<Tab>Tweet<Tab>Stance
Only stance labels may be different between them!
---------------------------

More Repositories

1

fine-grained-sentiment

A comparison and discussion of different NLP methods for 5-class sentiment classification on the SST-5 dataset.
Python
165
star
2

db-hub-fastapi

Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients
Python
32
star
3

kuzudb-study

Benchmark study on KΓΉzuDB, an embedded OLAP graph database, on an artificial social network dataset
Python
25
star
4

duckdb-study

Compare DuckDB, Polars and Pandas for generating an artificial dataset of persons and companies
Python
19
star
5

lancedb-study

Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search
Python
16
star
6

neo4j-python-fastapi

Bulk ingest data into Neo4j using sync or async Python, and expose the data via FastAPI
Python
12
star
7

fine-grained-sentiment-app

A Flask LIME explainer app for fine-grained sentiment classification.
Python
11
star
8

pydantic-benchmarks

Benchmarks testing the performance of various releases of Pydantic v2 πŸ¦€
Python
9
star
9

blog

Posts related to Data Science, engineering and machine learning.
Jupyter Notebook
7
star
10

topic-modelling

Comparing the scalability and quality of topic models in Gensim and PySpark
Python
5
star
11

patent-classification

Classify international patents into one of eight categories based on the text of their titles & abstracts using DistilBert & ONNX Runtime
Python
4
star
12

fine-grained-sentiment-app-streamlit

A LIME explainer app for fine-grained sentiment classification, written using Streamlit.
Python
3
star
13

graphdb-case-studies

Case studies showing the analysis of connected data using different graph databases and their Python client libraries
Python
3
star
14

prrao87.github.io

Archived. My blog is now moved to https://github.com/thedataquarry
SCSS
3
star
15

rag-data-ops

Code for data ops when building RAG applications using LangChain and LlamaIndex
Python
2
star
16

mteb-validation

Compare different embedding models from MTEB leaderboard
Python
1
star
17

spectral-line-plots

Plot multiple lines with spectral colors to simultaneously compare similar datasets
Python
1
star