rumour-data-aug's Introduction

Training data augmentation for rumour detection using context-sensitive neural language model

Rumour Data set

Following Rumour Dataset are used in our experiment.

CrisisLexT26: References(labels) for the Boston marathon bombings are obtained from CrisisLexT26 corpus.
Twitter event datasets (2012-2016) : This is a Twitter corpus that is used as candidate tweets for data augmentation.
PHEME dataset: References(labels) for the five events(Ferguson unrest, Sydney siege, Ottawa shooting, Charlie hebdo attacks, and Germanwings plance crash) are obtained from this data.

Data Collection

Data collection is performed to collect social-temporal data (typically replies and retweets) for rumour source tweets.

Semantic Relatedness Computation

Semantic Relatedness computation is to locate various forms of rumours based on textual variations. Fine-tuned ELMo is employed to learn representation of tweets and pairwise cosine similarity are computed between reference rumour tweets and rumour candidate tweets.

Baseline Classification Model

We evaluated the effectiveness of our augmented rumour data in a state-of-the-art classification model for the task of rumour detection. You can find modified source code in Multitask4Veracity

Recommend Projects

soojihan / rumour-data-aug Goto Github PK

rumour-data-aug's Introduction

Training data augmentation for rumour detection using context-sensitive neural language model

Rumour Data set

Data Collection

Semantic Relatedness Computation

Baseline Classification Model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent