Giter VIP home page Giter VIP logo

bullshit-detector's Introduction

Fake News Detection

Datasets

The dataset which was used in the detector was Liar

LIAR is a publicly available dataset for fake news detection. A decade-long of 12.8K manually labeled short statements were collected in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. The LIAR dataset4 includes 12.8K human labeled short statements from POLITIFACT.COM’s API, and each statement is evaluated by a POLITIFACT.COM editor for its truthfulness.

You can use the download.sh file to download the dataset from the site.

Preprocessing

Six classes where mapped to fake and real outputs

True -> Real
Mostly-True -> Real
Half-True -> Real
Mostly-False -> Fake
False -> Fake
Pants-On-Fire -> Fake
  • Stemming
  • Removal of puncutations
  • Removal of Stopwords

Strategy

The strategy which was used was ensemble of classifiers.

Classifiers

These are the five classifiers which were used in the ensembling, the hyperparameter tuning was done using the library optuna.

  • Random Forest
  • Logistic Regression
  • Multinomial Naive Bayes
  • Support Vector Classifiers
  • SGD Classifier

Reproduce the results

If you wish to see the results of the trained models, you can use my models, which can be found in saved_models directory.

The Ensemble models are huge, so I have included a download script which are hosted on archive.org

Downloads

If the download script is very slow, you use the below links

Tfidf Ensemble

Count Ensemble

Project Directory Structure

|-- Detector.ipynb
|-- LICENSE
|-- README.md
|-- download.sh
|-- metric_data
|   |-- count_test_metric.csv
|   |-- count_train_metric.csv
|   |-- count_valid_metric.csv
|   |-- tfidf_test_metrics.csv
|   |-- tfidf_train_metrics.csv
|   `-- tfidf_valid_metrics.csv
|-- requirements.txt
`-- saved_models
    |-- count_logreg.pkl
    |-- count_nb.pkl
    |-- count_rf.pkl
    |-- count_sgd.pkl
    |-- count_svc.pkl
    |-- download_models.sh
    |-- tfidf_logreg.pkl
    |-- tfidf_nb.pkl
    |-- tfidf_rf.pkl
    |-- tfidf_sgd.pkl
    `-- tfidf_svc.pkl

2 directories, 22 files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.