• Stars
    star
    186
  • Rank 207,316 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Can we use explanations to improve hate speech models? Our paper accepted at AAAI 2021 tries to explore that question.

Hits PWC PRs Welcome PyPI license Generic badge Generic badge

πŸ”Ž HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [Accepted at AAAI 2021]

πŸŽ‰ πŸŽ‰ BERT for detecting abusive language(Hate speech+offensive) and predicting rationales is uploaded here. Be sure to check it out πŸŽ‰ πŸŽ‰.

For more details about our paper

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection". Accepted at AAAI 2021.

Arxiv paper link

Abstract

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this work, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

WARNING: The repository contains content that are offensive and/or hateful in nature.

Please cite our paper in any published work that uses any of these resources.

@inproceedings{mathew2021hatexplain,
  title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
  author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={17},
  pages={14867--14875},
  year={2021}
}

Folder Description πŸ“‚


./Data                --> Contains the dataset related files.
./Models              --> Contains the codes for all the classifiers used
./Preprocess  	      --> Contains the codes for preprocessing the dataset	
./best_model_json     --> Contains the parameter values for the best models


Table of contents πŸ“‘

πŸ”– Dataset :- This describes the dataset format and setup for the dataset pipeline.

πŸ”– Parameters :- This describes all the different parameter that are used in this code


Usage instructions

please setup the Dataset first (more important if your using non-bert model). Install the libraries using the following command (preferably inside an environemt)

pip install -r requirements.txt

Training

To train the model use the following command.

usage: manual_training_inference.py [-h]
                                    --path_to_json --use_from_file
                                    --attention_lambda

Train a deep-learning model with the given data

positional arguments:
  --path_to_json      The path to json containining the parameters
  --use_from_file     whether use the parameters present here or directly use
                      from file
  --attention_lambda  required to assign the contribution of the atention loss

You can either set the parameters present in the python file, option will be (--use_from_file set to True). To change the parameters, check the Parameters section for more details. The code will run on CPU by default. The recommended way will be to copy one of the dictionary in best_model_json and change it accordingly.

  • For transformer models :-The repository current supports the model having similar tokenization as BERT. In the params set bert_tokens to True and path_files to any of BERT based models in Huggingface.
  • For non-transformer models :-The repository current supports the LSTM, LSTM attention and CNN GRU models. In the params set bert_tokens to False and model name according to Parameters section (either birnn, birnnatt, birnnscrat, cnn_gru).

For more details about the end to end pipleline visit our_demo

Blogs and github repos which we used for reference πŸ‘Ό

  1. For finetuning BERT this blog by Chris McCormick is used and we also referred Transformers github repo.
  2. For CNN-GRU model we used the original repo for reference.
  3. For Evaluation using the Explanantion metrics we used the ERASER benchmark repo. Please look into their repo and paper for more details.

Todos

  • Add arxiv paper link and description.
  • Release better documentation for Models and Preprocess sections.
  • Add other Transformers model to the pipeline.
  • Upload our model to transformers community to make them public
  • Create an interface for social scientists where they can use our models easily with their data
πŸ‘ The repo is still in active developements. Feel free to create an issue !! πŸ‘

More Repositories

1

DE-LIMIT

DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.
Jupyter Notebook
106
star
2

Hate-Speech-Reading-List

This repository contains papers and resources pertaining to Hate speech research.
43
star
3

Tutorial-Resources

Resources and tools for the Tutorial - "Hate speech detection, mitigation and beyond" presented at ICWSM 2021
Python
36
star
4

Countering_Hate_Speech_ICWSM2019

Repository for the paper "Thou shalt not hate: Countering Online Hate Speech" accepted at ICWSM 2019.
Jupyter Notebook
30
star
5

Fear-speech-analysis

Can fear be used for polarisation and spreading negativity? Our paper accepted in The Web conference 2021 tries to explore this question in light of public Whatsapp groups.
Jupyter Notebook
24
star
6

HateALERT-EVALITA

Code for replicating results of team 'hateminers' at EVALITA-2018 for AMI task
Jupyter Notebook
13
star
7

HateMM

Python
10
star
8

CounterGEDI

CounterGeDi is a pipeline that aims at controlling the counter speech generated to make it emotional, polite and detoxified. Paper accepted at IJCAI 2022.
Jupyter Notebook
9
star
9

HateALERT-HASOC

Code for replicating the results of "HateMonitors" at HASOC 2019
Jupyter Notebook
8
star
10

Hate-Alert-DravidianLangTech

Team hate-alert's winning submission to the Workshop on Speech and Language Technologies for Dravidian Languages, EACL-2021
Jupyter Notebook
7
star
11

IndicAbusive

IndicAbusive
Python
7
star
12

Hateful-users-detection

Python
5
star
13

HateCheckHIn

HateCheckHIn
5
star
14

UrduAbuseAndThreat

This repository contain the wining solution of the abusive and threatening language detection task in Urdu
Jupyter Notebook
3
star
15

HateBegetsHate_CSCW2020

1
star
16

Counterspeech_Twitter

Jupyter Notebook
1
star
17

Spread_Hate_Speech_WebSci19

Repository for the paper "Spread of hate speech in online social media" accepted at WebSci 2019
1
star
18

Bengali_Hate

1
star