Following Rumour Dataset are used in our experiment.
-
CrisisLexT26: References(labels) for the Boston marathon bombings are obtained from CrisisLexT26 corpus.
-
Twitter event datasets (2012-2016) : This is a Twitter corpus that is used as candidate tweets for data augmentation.
-
PHEME dataset: References(labels) for the five events(Ferguson unrest, Sydney siege, Ottawa shooting, Charlie hebdo attacks, and Germanwings plance crash) are obtained from this data.
Data collection is performed to collect social-temporal data (typically replies and retweets) for rumour source tweets.
Semantic Relatedness computation is to locate various forms of rumours based on textual variations. Fine-tuned ELMo is employed to learn representation of tweets and pairwise cosine similarity are computed between reference rumour tweets and rumour candidate tweets.
We evaluated the effectiveness of our augmented rumour data in a state-of-the-art classification model for the task of rumour detection. You can find modified source code in Multitask4Veracity