• Stars
    star
    195
  • Rank 199,374 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

Overview

This repository contains the source code for the models used for DataStories team's submission for SemEval-2017 Task 4 โ€œSentiment Analysis in Twitterโ€. The model is described in the paper "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

Citation:

@InProceedings{baziotis-pelekis-doulkeridis:2017:SemEval2,
  author    = {Baziotis, Christos  and  Pelekis, Nikos  and  Doulkeridis, Christos},
  title     = {DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis},
  booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
  month     = {August},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {747--754}
}

MSA The message-level sentiment analysis model, for SubTask A.

MSA The target-based sentiment analysis model, for SubTasks B,C,D,E.

Notes

  • If what you are just interested in the source code for the model then just see models/neural/keras_models.py.
  • The models were trained using Keras 1.2. In order for the project to work with Keras 2 some minor changes will have to be made.

Prerequisites

1 - Install Requirements

pip install -r /datastories-semeval2017-task4/requirements.txt

Ubuntu:

sudo apt-get install graphviz

Windows: Install graphiz from here:http://www.graphviz.org/Download_windows.php

2 - Download pre-trained Word Embeddings

The models were trained on top of word embeddings pre-trained on a big collection of Twitter messages. We collected a big dataset of 330M English Twitter messages posted from 12/2012 to 07/2016. For training the word embeddings we used GloVe. For preprocessing the tweets we used ekphrasis, which is also one of the requirements of this project.

You can download one of the following word embeddings:

Place the file(s) in /embeddings folder, for the program to find it.

Execution

Word Embeddings

In order to specify which word embeddings file you want to use, you have to set the values of WV_CORPUS and WV_WV_DIM in model_message.py and model_target.py respectively. The default values are:

WV_CORPUS = "datastories.twitter"
WV_DIM = 300

The convention we use to identify each file is:

{corpus}.{dimensions}d.txt

This means that if you want to use another file, for instance GloVe Twitter word embeddings with 200 dimensions, you have to place a file like glove.200d.txt inside /embeddings folder and set:

WV_CORPUS = "glove"
WV_DIM = 200

Model Training

You will find the programs for training the Keras models, in /models folder.

models/neural/keras_models
โ”‚   keras_models.py  : contains the Keras models
โ”‚   model_message.py : script for training the model for Subtask A
โ”‚   model_target.py  : script for training the models for Subtask B and D

More Repositories

1

ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Python
647
star
2

neat-vision

Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)
Vue
247
star
3

ntua-slp-semeval2018

Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Python
83
star
4

lm-prior-for-nmt

This repository contains source code for the paper "Language Model Prior for Low-Resource Neural Machine Translation"
Jupyter Notebook
38
star
5

keras-utilities

Utilities for Keras - Deep Learning library
Python
30
star
6

twitter-stream-downloader

A service for downloading twitter streaming data. You can save the data either in text files on disk, or in a database (MongoDB).
Python
22
star
7

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Python
20
star
8

prolog-cfg-parser

A toy SWI-Prolog context-free grammar (CFG) parser, that extracts knowledge (facts) from text.
Prolog
18
star
9

hierarchical-rnn-biocreative-4

Repository containing the winning submission for the BioCreative VI Task A (2017). The model is a Hierarchical Bidirectional Attention-Based RNN, implemented in Keras.
Python
8
star
10

patric-triangles

MPI implementation of a parallel algorithm for finding the exact number of triangles in massive networks
C++
5
star
11

ntua-slp-semeval2018-task2

Deep-learning models submitted by NTUA-SLP team in SemEval 2018 Task 2: Multilingual Emoji Prediction https://arxiv.org/abs/1804.06657
3
star
12

ntua-slp-semeval2018-task1

Deep-learning models submitted by NTUA-SLP team in SemEval 2018 Task 1: Affect in Tweets https://arxiv.org/abs/1804.06658
3
star
13

ntua-slp-pytorch-ex-1

First assignment for familiarising yourself with PyTorch. The goal of the assignment is to implement a baseline RNN model for sentiment classification in Twitter messages, by completing the missing parts in the code :)
Python
1
star
14

nmt-pretraining-objectives

This repository contains the source code and data for the paper: "Exploration of Unsupervised Pretraining Objectives for Machine Translation" in Findings of ACL 2021.
Python
1
star