spam-classifier's Introduction

Spam Classifier

This is a SMS/Email Spam classifier that identifies if a given text message is a potential advert, fraud or scam and seperate it from actual text messages.

Dataset Used:

The dataset used in this project was fetched from kaggle named:

SMS Spam Collection Dataset

Link to dataset: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

About

Identification and Description of problem

Big Tech Giants like Google put a spam classifier in their email system to detect whether a recieved email is an important one or a spam by some other company targeted for advertisement.
Whenever a user logs into another site or uses a product using the same account for email then the company pushes promotion with or without consent.
In order to deal with this massive problem, classification and detection is very crucial in order to provide a very good experience to the user and avoid any hassle.

Process

We have to breakdown the MLA into following steps:

Data Cleaning
EDA
Text Preprocessing
Model Building
Evaluation
Improvement
Deployment

Libraries Needed:

pip install nltk
pip install pandas
pip install sklearn
pip install numpy
pip install streamlit
pip install collection

Conclusion

After every inspection, we can see that Multinomial Naive Bayes is the best performing algorithm with

accuracy metrics of:
---------------------------
Accuracy Score: 0.9691
Conusion Matrix:
[[888 0]
[ 32 114]]
Precision Score: 1.0

with hyperparameter of

max_features of tfidf set to 3000
default parameters of MNB

To run the website

Run the main.ipynb file from top to bottom
enter the following command in the terminal

streamlit run app.py

Accuracy has been precisely calculated over different scenarios. However, we can further fine tune the model using other ensemble learning methods like VotingClasifier

Note that this is merely a prototype and is not optimized

Recommend Projects

cephal0 / spam-classifier Goto Github PK

spam-classifier's Introduction

Spam Classifier

Dataset Used:

SMS Spam Collection Dataset

About

Identification and Description of problem

Process

Libraries Needed:

Conclusion

To run the website

spam-classifier's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent