• Stars
    star
    142
  • Rank 249,401 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pytorch Implementation of GoEmotions ๐Ÿ˜๐Ÿ˜ข๐Ÿ˜ฑ

GoEmotions Pytorch

Pytorch Implementation of GoEmotions with Huggingface Transformers

What is GoEmotions

Dataset labeled 58000 Reddit comments with 28 emotions

  • admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral

Training Details

  • Use bert-base-cased (Same as the paper's code)

  • In paper, 3 Taxonomies were used. I've also made the data with new taxonomy labels for hierarchical grouping and ekman.

    1. Original GoEmotions (27 emotions + neutral)
    2. Hierarchical Grouping (positive, negative, ambiguous + neutral)
    3. Ekman (anger, disgust, fear, joy, sadness, surprise + neutral)

Vocabulary

  • I've replace [unused1], [unused2] to [NAME], [RELIGION] in the vocab, respectively.
[PAD]
[NAME]
[RELIGION]
[unused3]
[unused4]
...
  • I've also set special_tokens_map.json as below, so the tokenizer won't split the [NAME] or [RELIGION] into its word pieces.
{
  "unk_token": "[UNK]",
  "sep_token": "[SEP]",
  "pad_token": "[PAD]",
  "cls_token": "[CLS]",
  "mask_token": "[MASK]",
  "additional_special_tokens": ["[NAME]", "[RELIGION]"]
}

Requirements

  • torch==1.4.0
  • transformers==2.11.0
  • attrdict==2.0.1

Hyperparameters

You can change the parameters from the json files in config directory.

Parameter
Learning rate 5e-5
Warmup proportion 0.1
Epochs 10
Max Seq Length 50
Batch size 16

How to Run

For taxonomy, choose original, group or ekman

$ python3 run_goemotions.py --taxonomy {$TAXONOMY}

$ python3 run_goemotions.py --taxonomy original
$ python3 run_goemotions.py --taxonomy group
$ python3 run_goemotions.py --taxonomy ekman

Results

Best Result of Macro F1

Macro F1 (%) Dev Test
original 50.16 50.30
group 69.41 70.06
ekman 62.59 62.38

Pipeline

  • Inference for multi-label classification was made possible by creating a new MultiLabelPipeline class.
  • Already uploaded finetuned model on Huggingface S3.
    • Original GoEmotions Taxonomy: monologg/bert-base-cased-goemotions-original
    • Hierarchical Group Taxonomy: monologg/bert-base-cased-goemotions-group
    • Ekman Taxonomy: monologg/bert-base-cased-goemotions-ekman

1. Original GoEmotions Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-original")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-original")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "itโ€™s happened before?! love my hometown of beautiful new ken ๐Ÿ˜‚๐Ÿ˜‚",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
 [{'labels': ['neutral'], 'scores': [0.9750906]},
 {'labels': ['curiosity', 'love'], 'scores': [0.9694574, 0.9227462]},
 {'labels': ['love'], 'scores': [0.993483]},
 {'labels': ['anger'], 'scores': [0.99225825]}]

2. Group Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-group")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-group")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "itโ€™s happened before?! love my hometown of beautiful new ken ๐Ÿ˜‚๐Ÿ˜‚",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
[{'labels': ['positive'], 'scores': [0.9989434]},
 {'labels': ['ambiguous', 'positive'], 'scores': [0.99801123, 0.99845874]},
 {'labels': ['positive'], 'scores': [0.99930394]},
 {'labels': ['negative'], 'scores': [0.9984231]}]

3. Ekman Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-ekman")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-ekman")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "itโ€™s happened before?! love my hometown of beautiful new ken ๐Ÿ˜‚๐Ÿ˜‚",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
 [{'labels': ['joy', 'neutral'], 'scores': [0.30459446, 0.9217335]},
 {'labels': ['joy', 'surprise'], 'scores': [0.9981395, 0.99863845]},
 {'labels': ['joy'], 'scores': [0.99910116]},
 {'labels': ['anger'], 'scores': [0.9984291]}]

Reference

More Repositories

1

JointBERT

Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"
Python
600
star
2

KoELECTRA

Pretrained ELECTRA Model for Korean
Python
584
star
3

R-BERT

Pytorch implementation of R-BERT: "Enriching Pre-trained Language Model with Entity Information for Relation Classification"
Python
333
star
4

KoBigBird

๐Ÿฆ… Pretrained BigBird Model for Korean (up to 4096 tokens)
Python
201
star
5

KoBERT-Transformers

KoBERT on ๐Ÿค— Huggingface Transformers ๐Ÿค— (with Bug Fixed)
Python
190
star
6

DistilKoBERT

Distillation of KoBERT from SKTBrain (Lightweight KoBERT)
Python
180
star
7

KoBERT-NER

NER Task with KoBERT (with Naver NLP Challenge dataset)
Python
90
star
8

HanBert-Transformers

HanBert on ๐Ÿค— Huggingface Transformers ๐Ÿค—
Python
85
star
9

KoBERT-nsmc

Naver movie review sentiment classification with KoBERT
Python
76
star
10

transformers-android-demo

๐Ÿ“ฒ Transformers android examples (Tensorflow Lite & Pytorch Mobile)
Java
76
star
11

KoBERT-KorQuAD

Korean MRC (KorQuAD) with KoBERT
Python
66
star
12

nlp-arxiv-daily

Automatically Update NLP Papers Daily using Github Actions (ref: https://github.com/Vincentqyw/cv-arxiv-daily)
Python
63
star
13

EncT5

Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks
Python
58
star
14

NER-Multimodal-pytorch

Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)
Python
56
star
15

KoCharELECTRA

Character-level Korean ELECTRA Model (์Œ์ ˆ ๋‹จ์œ„ ํ•œ๊ตญ์–ด ELECTRA)
Python
53
star
16

GoEmotions-Korean

Korean version of GoEmotions Dataset ๐Ÿ˜๐Ÿ˜ข๐Ÿ˜ฑ
Python
50
star
17

hashtag-prediction-pytorch

Multimodal Hashtag Prediction with instagram data & pytorch (2nd Place on OpenResource Hackathon 2019)
Python
47
star
18

KoELECTRA-Pipeline

Transformers Pipeline with KoELECTRA
Python
40
star
19

ko_lm_dataformat

A utility for storing and reading files for Korean LM training ๐Ÿ’พ
Python
36
star
20

korean-ner-pytorch

NER Task with CNN + BiLSTM + CRF (with Naver NLP Challenge dataset) with Pytorch
Python
27
star
21

korean-hate-speech-koelectra

Bias, Hate classification with KoELECTRA ๐Ÿ‘ฟ
Python
26
star
22

python-template

Python template code
Makefile
21
star
23

naver-nlp-challenge-2018

NER task for Naver NLP Challenge 2018 (3rd Place)
Python
19
star
24

BIO-R-BERT

R-BERT on DDI Bio dataset with BioBERT
Python
17
star
25

HanBert-NER

NER Task with HanBert (with Naver NLP Challenge dataset)
Python
16
star
26

kakaotrans

[Unofficial] Kakaotrans: Kakao translate API for python
Python
15
star
27

py-backtrans

Python library for backtranslation (with Google Translate)
Python
12
star
28

dotfiles

Simple setup for personal dotfiles
Shell
10
star
29

monologg

Profile repository
9
star
30

kobert2transformers

KoBERT to transformers library format
Python
7
star
31

ner-sample

NER Sample Code
Python
7
star
32

HanBert-nsmc

Naver movie review sentiment classification with HanBert
Python
4
star
33

torchserve-practice

Python
4
star
34

monologg.github.io

Personal Blog https://monologg.github.io
CSS
3
star