Giter VIP home page Giter VIP logo

spam-email-classifier's Introduction

Spam-Email-Classifier

This LogisticRegression model is built using basic machine learning techniques to classify spam emails. It is a Binary Classifier that only predicts whether an email is HAM or SPAM. The dataset is taken from an open source public database which stores email data for training and research purposes.

SUMMARY

The data is loaded and parsed using python library functions. Custom functions and Transformers are written in the code to transform the loaded data into lowercase plain\text format and all numbers, punctuations or URLs are either removed or replaced with alternate words or same words with their suffixes removed. The transformed data is then used to create vectors that store the counts of all the words present in the emails. An ordered list of most common words present in a spam email is built and each email is checked against this list to see how many of these words are present in the current instance. If the count is high then the email is marked as SPAM and if not then HAM.

Libraries used:

  • Pandas
  • Numpy
  • urlextract
  • nltk
  • matplotlib
  • scikit-learn

Project Structure

Branch: Main

  • datasets\housing - contains the original data in the easy_ham and spam sub-folders.
  • Spam_classifier.ipynb - contains all the code used to transform the data and build the model.
  • spam_email.pkl - This is the model itself after being trained; saved using the joblib function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.