Giter VIP home page Giter VIP logo

phishing-detection-system's Introduction

Phishing Detection System

Machine Learning and Regex Matching based Phishing Detection System with a phishing attack scenario
Developed by Umut Sevdi, İsmet Güngör, Semih Yazıcı and Oğuzhan Ercan

Explore the docs »

Table of Contents
  1. Project Definition
  2. System Architecture
  3. Hardware Requirements
  4. Installation
  5. License
  6. Contact

1. Project Definition

Phishing is a cyber attack involving carefully crafted emails or websites to trick individuals into revealing sensitive information such as login credentials or financial information. These attacks often take the form of fake login pages or emails purporting to be from legitimate organizations, and they can have severe consequences for both individuals and organizations.

dashboard

In our project, we developed a phishing scenario and a program to protect from it. In the scenario, we hosted an SMTP server and a phishing server for the attacker. Phishing server tricks users into thinking that the website is legit.

When the victim clicks on the link, a login page that imitates "edevlet.gov.tr" is returned. However, when the user logs in, all credentials are sent to the attacker. Phishing site responds with a fake dashboard to be unnoticed.

Against similar attacks, we aimed to develop a machine learning and a regex matching-based phishing detection system to identify and prevent phishing attacks. The use of machine learning algorithms and regex matching allows the system to analyze and classify email content and identify patterns and keywords commonly used in phishing attacks. This approach has the potential to be highly effective in detecting and preventing phishing attacks, as it can quickly and accurately identify suspicious emails and take action to block them.

2. System Architecture

Attacker

  • On the attacker's side, we developed a web server in Go to host the phishing site. The site sends a web page that looks like edevlet.gov.tr. However, unlike the original page, it does not encrypt any data while sending. And it sends directly to the attacker.

Phishing Server

Victim

  • We used a MailHog server to host an SMTP server. It runs from a docker-compose file as a container for testing purposes.

  • To protect the victim against phishing attacks, we have implemented a system that listens to the ongoing traffic and parses SMTP to examine the mail body. After obtaining the mail body, firstly process with Yara using rules specifically generated for detecting phishing mail attacks. After checking possible malicious keywords with the Yara tool, transferring the plain text body to a Python program, a machine learning method that determines whether the incoming mail is a phishing attack or innocent.

Regex Based Detection

  • We have called Long Short Term memory, a type of recurrent neural network (RNN) well-suited for modeling long-term dependencies in time series or sequential data. It can effectively retain information over long periods and handle variable-length input sequences. The attention layer weighs the input sequences, and the classifier predicts based on the weighted input. The model also has methods for generating initial hidden states for the LSTM layer, encoding input text using the embedding layer and LSTM layer, and applying attention to the output of the LSTM layer. In addition, we detect which words cause phishing thanks to the attention layer placed between LSTM and linear classifiers in the model. The text that came over TCP and converted to the string was not in a format that could be fed into our LSTM model. For this reason, we performed the text preprocessing steps frequently used in natural language processing tasks. The utils_preprocess_text function is used for cleaning and preprocessing text by removing punctuation and lower-casing, removing stop words, and optionally applying stemming or lemmatization. The textCleaner function applies the utils_preprocess_text function to a column of a pandas DataFrame and stores the processed text in a new column.

NLP Based Detection

3. Installation

Requirements:

  1. Clone the repository.
   git clone https://github.com/umutsevdi/pds.git
  1. Run the mail server.
    cd victim
    docker-compose up
  1. Compile and execute the Phishing detection programs.
    cd victim 
    cd mail-detect 
    python mail_detect.py &
    cd ..
    go build smtp_phishing_detection
    sudo smtp_phishing_detection/smtp_phishing_detection &
  1. Execute the attacker programs from an external device or locally.
    cd attacker/phishing_server/cmd
    go run . &
  1. Now you can send phishing emails using our mail script.
    cd attacker/
    pyhton mail_sender.py

5. License

Distributed under the MIT License. See LICENSE for more information.

6. Contact

You can contact any developer of this project for any suggestion or information.

Project: umutsevdi/pds

Developed by Umut Sevdi, İsmet Güngör, Semih Yazıcı and Oğuzhan Ercan

phishing-detection-system's People

Contributors

umutsevdi avatar ismetgngr avatar oguzhanercan avatar semihyazici avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.