The an-information-retrieval-approach-to-building-datasets-for-hate-speech-detection from mdmustafizurrahman

an-information-retrieval-approach-to-building-datasets-for-hate-speech-detection's Introduction

An-Information-Retrieval-Approach-to-Building-Datasets-for-Hate-Speech-Detection

For more details about our hate speech dataset, please read the following research article

Md Mustafizur Rahman, Dinesh Balakrishnan, Dhiraj Murthy, Mucahid Kutlu, and Matthew Lease, An Information Retrieval Approach to Building Datasets for Hate Speech Detection. [pdf]

Source codes

Pooling --> /codes/pooling.py
Active learning --> /codes/active_learning.py

Benchmark Models

The source code for BiLSTM and LSTM used in this project are collected from [4] where the authors made necessary correction for those two models.

Train and Test sets to Benchmark Models

Train.csv --> /data/train_test_sets/
Test.csv --> /data/train_test_sets/

Annotation Interface

Two different annotation interfaces used during pilot and main phases are provided in html format under /interface/ directory.

Author Distribution of Tweets

Total Number of Authors: 9534

Total number of Author with exactly 1 contribution: 9430
Total number of Author with exactly 2 contribution: 97
Total number of Author with more than 2 contribution: 7

References

[1] Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In European Conference on Information Retrieval. Springer, 141–153.

[2] Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. 759–760.

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[4] Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd international acm sigir conference on research and development in information retrieval. 45–54.

Recommend Projects

mdmustafizurrahman / an-information-retrieval-approach-to-building-datasets-for-hate-speech-detection Goto Github PK

an-information-retrieval-approach-to-building-datasets-for-hate-speech-detection's Introduction

An-Information-Retrieval-Approach-to-Building-Datasets-for-Hate-Speech-Detection

Source codes

Benchmark Models

The source code for BiLSTM and LSTM used in this project are collected from [4] where the authors made necessary correction for those two models.

Train and Test sets to Benchmark Models

Annotation Interface

Author Distribution of Tweets

References

an-information-retrieval-approach-to-building-datasets-for-hate-speech-detection's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent