Giter VIP home page Giter VIP logo

fake_news_detection's Introduction

Fake News Classification using NLP based Machine Learning & Deep Learning (LSTM)

Project Overview

Fake news is a form of news consisting of deliberate disinformation or hoaxes spread via traditional news media or online social media.In this project, I have used different natural language processing (NLP) based machine learning and deep learning approaches including BERT to detect fake news from news headlines. Generally, a fake headline is a news headline which may read one way or state something as fact, but then the body of the article says something different. The Internet term for this type of misleading fake news is “clickbait” —headlines that catch a reader’s attention to make them click on the fake news. This type of fake news is misleading at best and untrue at worst.

Screenshot Predict

In this project, I have extracted interesting patterns from the headline text using NLP and perform exploratory data analysis to provide useful insight about fake headlines by creating intuitive features. This project includes work detailed below:

  • Exploratory data analysis & Feature Engineering using NLP
  • Machine Learning Modeling using text based features
  • Deep Learning Modeling (LSTM) using text based features
  • BERT model building from tex based features

About Data

The dataset used in this project is the ISOT Fake News Dataset.The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics.

The dataset consists of two CSV files. The first file named True.csv contains more than 12,600 articles from reuter.com. The second file named Fake.csv contains more than 12,600 articles from different fake news outlet resources. Each article contains the following information:

  • article title (News Headline),
  • text,
  • type (REAL or FAKE)
  • the date the article was published on*

Word Cloud of dataset

In wordcloud most frequent occuring words in the corpus to be shown. The size of words in the wordcloud is based on their frequency in the corpus.The more the word appears, the largers the word font will be. As we can see from the wordclouds most frequent words in fake news are Video,Obama, Hillary, Trump and Republican whereas Real news comprise Trump, White House, North Korea, China etc.

Screenshot Predict

Installations:

This project requires Python 3.7x and the following Python libraries should be installed to get the project started:

I also reccommend to install Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project which also include jupyter notebook to run and execute IPython Notebook.

"# fake_news_detection" "# fake_news_detection" "# fake_news_detection" "# fake_news_detection"

fake_news_detection's People

Contributors

nileshmaharjan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.