• Stars
    star
    173
  • Rank 220,124 (Top 5 %)
  • Language
    Jupyter Notebook
  • Created almost 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains EmoBank, a large-scale text corpus manually annotated with emotion according to the psychological Valence-Arousal-Dominance scheme.

EmoBank

DOI

Overview

This repository contains EmoBank, a large-scale text corpus manually annotated with emotion according to the psychological Valence-Arousal-Dominance scheme. It was build at JULIE Lab, Jena University and is described in detail in our papers from EACL 2017 and LAW 2017 (see Citation). The repository contains two folders: "corpus" which contains the actual Emobank data (described in the EACL paper) and "pilot" which contains the data from our pilot study (described in the LAW paper). See the readme files in the respective folders for more detailed information regarding the data format.

News

  • May 2022. We added the individual, per-annotator ratings for the reader and the writer perspective. The data can be found in EmoBank/corpus/individual_reader_ratings.csv and EmoBank/corpus/individual_writer_ratings.csv, respectively. We also included a notebook (EmoBank/corpus/aggregation.ipynb) illustrating how the individual ratings were aggregated.
  • December 2019. We added a train-dev-test split to the dataset which can be found in EmoBank/corpus/emobank.csv. The data split is stratified with respect to text category (fiction, letters, newspaper,...). The code for creating the split can be found in EmoBank/corpus/adding_data_split.ipynb. We recommend using this split for model evaluation to increase comparability.

Characteristics

EmoBank comprises 10k sentences balancing multiple genres. It is special for having two kinds of double annotations: Each sentence was annotated according to both the emotion which is expressed by the writer, and the emotion which is perceived by the readers. Also, a subset of the corpus have been previously annotated according to Ekmans 6 Basic Emotions (Strapparava and Mihalcea, 2007) so that mappings between both representation formats become possible.

Attribution of Raw Data

The raw data of EmoBank is gathered from MASC, the manually annotated subcorpus of the ANC (Ide et al., 2010) and the SemEval 2007 Task 14 (Strapparava & Mihalcea, 2007). The raw data of the pilot studies is taken from MASC and the Standford Sentiment Treebank (Socher et al., 2013), originally collected by Pang and Lee (2005).

License

This work is licensed under CC-BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/

Citation

Please cite the following papers if you use EmoBank:

  • Sven Buechel and Udo Hahn. 2017. EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. In EACL 2017 - Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, April 3-7, 2017. Volume 2, Short Papers, pages 578-585. Available: http://aclweb.org/anthology/E17-2092

  • Sven Buechel and Udo Hahn. 2017. Readers vs. writers vs. texts: Coping with different perspectives of text understanding in emotion annotation. In LAW 2017 - Proceedings of the 11th Linguistic Annotation Workshop @ EACL 2017. Valencia, Spain, April 3, 2017, pages 1-12. Available: https://sigann.github.io/LAW-XI-2017/papers/LAW01.pdf

Contact

I am happy answer questions and give additional information via email: [email protected]

References

  • Nancy C. Ide, Collin F. Baker, Christiane Fellbaum, and Rebecca J. Passonneau. 2010. The Manually Annotated Sub-Corpus: A community resource for and by the people. In ACL 2010 — Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, 11-16 July 2010, volume 2: Short Papers, pages 68–73.
  • Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL 2005 — Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. AnnArbor, Michigan, USA, June 25–30, 2005, pages 115–124.
  • Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP 2013 — Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA, 18-21 October 2013, pages 1631–1642.
  • Carlo Strapparava and Rada Mihalcea. 2007. SemEval-2007 Task 14: Affective text. In SemEval 2007 — Proceedings of the 4th International Workshop on Semantic Evaluations @ ACL 2007. Prague, Czech Republic, June 23-24, 2007, pages 70–74.

More Repositories

1

jcore-base

Base modules of JCoRe
Java
22
star
2

MEmoLon

Repository for our ACL 2020 paper "Learning and Evaluating Emotion Lexicons for 91 Languages"
Jupyter Notebook
21
star
3

JEmAS

Java
11
star
4

JeSemE

Jena Semantic Explorer
Java
11
star
5

java-stdio-ipc

This is a small Java programm that uses the `ProcessBuilder` to call a Python programm on the command line and exchange data with it via standard input/output channels.
Java
9
star
6

XANEW

Jupyter Notebook
8
star
7

EmoMap

Code and data associated with our LREC 2018 and COLING 2018 papers on converting between emotion formats
Python
7
star
8

jcore-projects

Pre-built projects with models for JCoRe Base modules
Java
7
star
9

GGPOnc

Code for the GGPOnc corpus - A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines
Java
6
star
10

emotion-embeddings

The codebase behind our emotion embeddings project.
Jupyter Notebook
6
star
11

jcore-dependencies

Dependencies for JCoRe Base
Java
4
star
12

jufit

Jena UMLS Filter - filters and enhances UMLS Files
Java
4
star
13

wordEmotions

Python
4
star
14

julielab-java-utilities

A collection of small commonly useful utilities and helpers for Java.
Java
4
star
15

costosys

The Corpus Storage System (CoStoSys) is a tool and abstraction layer for a PostgreSQL document database and part of the Jena Document Information System (JeDIS).
Java
4
star
16

jcore-pipelines

Pre-built Pipelines with JCoRe modules
Java
3
star
17

elasticsearch-mapper-preanalyzed

An ElasticSearch mapper plugin that allows to index preanalyzed TokenStreams, i.e. to circumvent an index analyzer in ElasticSearch and instead specifying each token to be indexed separately.
Java
3
star
18

HistEmo

Jupyter Notebook
2
star
19

jedis

The Jena Document Information System (JeDIS) is an aggregation of tools and components in order to work with UIMA pipelines that read and write annotation modules into/from a Postgres DBMS.
1
star
20

gepi

GePI (GEne - Protein Interactions) is a web portal for quick and convenient access to gene - protein interaction mentions automatically extracted from the biomedical literature, i.e. PubMed and PubMed Central (Open Access Subset).
JavaScript
1
star
21

julielab-topic-modeling

This is the master thesis implementation work of Philipp Sieg.
Java
1
star
22

hellrich_latech2016

Johannes Hellrich & Udo Hahn (2016): An Assessment of Experimental Protocols for Tracing Changes in Word Semantics Relative to Accuracy and Reliability. LaTeCH @ ACL 2016.
Python
1
star
23

RankLib

Imported from https://sourceforge.net/p/lemur/code/HEAD/tree/RankLib/trunk/. RankLib is a library of learning to rank algorithms. This repository is a copy from the SVN repository of version 2.11. It has been modified in the branches.
Java
1
star
24

jsyncc

Jena Synthetic Clinical Corpus
Java
1
star
25

hellrich_dh2017

Johannes Hellrich & Udo Hahn (2017): Don't Get Fooled by Word Embeddings-Better Watch their Neighborhood. Digital Humanities 2017.
Python
1
star
26

julielab-neo4j-server-plugins

Java
1
star
27

hellrich_coling2016

Johannes Hellrich & Udo Hahn: Bad Company - Neighborhoods in Neural Embedding Spaces Considered Harmful. In: COLING 2016.
Python
1
star
28

smithsearch

A search engine for medical German text created in the context of the SMITH project.
Java
1
star
29

tar2zip

A small Java program to convert tar.gz files into ZIP archives.
Java
1
star
30

hellrich_dh2016

Johannes Hellrich & Udo Hahn (2016): Measuring the Dynamics of Lexico-Semantic Change Since the German Romantic Period. Digital Humanities 2016
Python
1
star