Giter VIP home page Giter VIP logo

bayesian-spam-filter's Introduction

Abstract:

With the surge of spam content on smartphones, there is a necessity to build a spam filter for text messages as accurately as possible. Traditional email spam filters cannot be used for text messages as most of the emails contain full sentences, common-source, and a subject line whereas, text messages tend to contain incomplete sentences and are misspelled. This project seeks to build a spam classifier using Bayes theorem with other variations to predict the nature of an incoming text message.

1.Introduction:

Spam content is increasing more and more in text messages with recent developments in technology. A recent survey from Techjury found that text messages have 209% higher response rates than emails, and 23 billion texts are sent globally every day. It is important to detect spam messages not only for personal convenience but also for security. Being able to remove texts with potential viruses or malicious intent of any kind is necessary on both individual user levels and larger scales. Email service providers use a spam filter which automatically filters out highly potential spam emails but, such advanced filters are not available in text messaging. Many machine learning techniques such as KNN ( Clustering), Random forest, SVM, Gradient boosting, Naïve Bayes, etc. can be used to classify the text messages. All the methods except Naïve Bayes convert the text messages into word vectors and use the words as features. For example, the KNN algorithm considers the Euclidian distance between two texts as a parameter, and the Random forest generates multiple decision trees based on the word frequencies to classify the text message. On the other hand, Naïve Bayes uses the word frequency to calculate the individual probability of words to estimate the probability of a text being spam, which makes it powerful for the small amounts of data. This project analyzes a sample data set of text messages which are prelabeled and implements Bayes theorem in determining the spam nature of any text message. The organization of this project will be as follows, Section 2 describes the dataset and cleaning process; Section 3 shows the descriptive statistics of the data; Section 4 contains the description of the methodology using Bayes theorem. Results of the project and future extensions are described in Section 5 and Section 6 respectively.

bayesian-spam-filter's People

Contributors

prasanthg3 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.