Giter VIP home page Giter VIP logo

juliorodrigues07 / url_detection Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 147.03 MB

Malicious URL detector built with deep exploration on feature engineering.

License: GNU General Public License v3.0

Shell 1.41% Python 73.34% JavaScript 2.71% HTML 1.46% Vue 21.09%
data-mining data-science data-visualization feature-engineering feature-importance machine-learning preprocessing student-project supervised-learning lexical-features

url_detection's Introduction

Python 3.10.12 Flask Jupyter Notebook Colab

Vue.js Bootstrap JavaScript HTML CSS

URL Detector

Malicious URL Detector built utilizing several data mining, machine learning and data science concepts, techniques and algorithms (PAs 1 and 2 from Applied Data Mining course - DCOMP - UFSJ).

Requirements

All the project dependencies are listed is this section (languages, libraries, package managers, frameworks, ...), as well as the instructions to install each of of them.

To install all dependencies

./install_dependencies.sh

Languages and package managers

  • Python3 and pip package manager:

    sudo apt install python3 python3-pip build-essential python3-dev
    
  • Node.JS package manager - npm (Optional):

    sudo apt-get install npm
    

Data Mining

Data Visualization

  • Matplotlib library:

    pip install matplotlib
    
  • seaborn library:

    pip install seaborn
    
  • numpy library:

    pip install numpy
    

Web Scraping (Optional)

GUI (Graphical User Interface - Optional)

Inside url-detector directory

  • To install all GUI dependencies:

    npm i
    
  • Vue.js framework:

    npm install -g @vue/cli
    
  • Bootstrap framework:

    npm install [email protected] --save
    
  • axios library:

    npm i axios
    
  • Font Awesome tool kit:

    npm i --save @fortawesome/free-solid-svg-icons && npm i --save @fortawesome/vue-fontawesome@latest-2
    

Execution

All the instructions for exploring the project functionalities are listed in this section, as well as the commands to execute each application.

Data Mining

You can explore all functionalities (different models, datasets, ...) by just modifying (or uncommenting) few parts of the source code.

python3 main.py

Web Scraping

python3 phishing_scraper.py

Application

CLI (Command Line Interface Mode)

  • Inside src directory, execute the command using the following template: python3 predict.py cli <url> <algorithm>.

  • Example with a phishing URL:

    python3 predict.py cli https://bujhanginamfb.github.io/taelasos/update-recovry/ XGB
    

GUI (Graphical User Interface Mode)

  • Open two terminal instances and execute the following commands in each one of them, respectively.

  • Terminal 1 - Back-end (inside src directory):

    python3 predict.py server
    
  • Terminal 2 - Front-end (inside url-detector directory):

    npm run serve
    
  • You should receive two URLs as outputs (http://localhost:<port number>). To visualize it, just open any of them in a browser of your choice. The front-end server (GUI) should be running at:

    http://localhost:8080
    
  • Finally, feel free to test the model with your own URLs! ๐Ÿพ

Main Screen

Main Screen

Outro

Due to model training with the Kaggle dataset, the model reliability can suffer a lot depending on the user's inputted URL format. Most of the URLs present in the Kaggle dataset doesn't have its communication protocol specified (HTTP, HTTPS, ...), which could introduce large bias on the results and models trained, making the classifications quite unstable.

url_detection's People

Contributors

dependabot[bot] avatar juliorodrigues07 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.