Giter VIP home page Giter VIP logo

spambase-analysis-with-knn-and-decision-trees's Introduction

Spambase Dataset Analysis using KNN and Decision Trees

This project aims to analyze the Spambase dataset from the UCI Machine Learning Repository using two machine learning models, K-Nearest Neighbors (KNN) and Decision Trees. The objective of the project is to classify emails as either spam or non-spam based on various features provided in the dataset.

Dataset

The Spambase dataset contains 4601 email messages, out of which 1813 are labeled as spam. The dataset consists of 57 features including the frequency of various words, characters, and other attributes in the emails. The dataset is available on the UCI Machine Learning Repository [https://archive.ics.uci.edu/ml/datasets/spambase], and was originally contributed by Hewlett-Packard Labs.

Preprocessing

To preprocess the dataset, the following steps were taken:

  • Loaded the dataset into a pandas dataframe
  • Labeled the columns of the dataframe based on the attribute names provided in the dataset documentation
  • Removed any duplicate rows from the dataframe
  • Processed the data to match the constraints provided in the dataset documentation
  • Split the dataset into input features and target variable
  • Standardized the data
  • Performed PCA analysis to reduce the number of features
  • Split the data into training and testing sets
  • Trained KNN and Decision Trees models

Results

We evaluated the performance of the models using the "sklearn.metrics.classification_report()" function, which provides precision, recall, F1-score, and support metrics for each class. The KNN model achieved an overall accuracy of 88%, while the Decision Trees model achieved an accuracy of 87%. Based on these results, we concluded that the KNN model performed better than the Decision Trees model for this task.

Evaluation Results (KNN):

KNN

Evaluation Results (Decision Trees):

Decision Trees

Code

The relevant code for loading the dataset, preprocessing it, splitting it into training and testing sets, training and evaluating the KNN and Decision Trees models, and printing the classification report is provided in the filtering spams.ipynb script. You can also find the Spambase dataset in the spambase.csv file.

Technologies and Platforms

Python Jupyter Notebook

spambase-analysis-with-knn-and-decision-trees's People

Contributors

inukarodrigo avatar

Stargazers

Yogesh Singh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.