Giter VIP home page Giter VIP logo

gaiborjosue / dns-traffic-control Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 6.3 MB

DNS Traffic Control ML Model using Stateless Data: This project implements a machine learning model for DNS traffic control using stateless data. The model is trained on a dataset of DNS traffic records to detect and classify traffic into two categories: benign and attack.

License: MIT License

Jupyter Notebook 99.75% Python 0.25%

dns-traffic-control's Introduction

Model: Jupyter Notebook

DNS Traffic Control Project

Overview

The DNS Traffic Control Project is a machine learning project that aims to classify DNS traffic as either benign or malicious using stateless and stateful datasets. The purpose of this project is to build a model that can accurately detect and prevent malicious DNS traffic.

Dataset

For this project, I used stateless and stateful datasets containing features related to DNS traffic suggested by the CIC-Bell-DNS-EXF-2021 Dataset. I used a merged combination of the datasets, both in light and heavy attacks. I used these datasets because they are readily available and provide a good starting point for building a DNS traffic classification model.

For the dataset source please refer to:

Samaneh Mahdavifar, Amgad Hanafy Salem, Princy Victor, Miguel Garzon, Amir H. Razavi, Natasha Hellberg, Arash Habibi Lashkari, “Lightweight Hybrid Detection of Data Exfiltration using DNS based on Machine Learning”, The 11th IEEE International Conference on Communication and Network Security (ICCNS), Dec. 3-5, 2021, Beijing Jiaotong University, Weihai, China.

Class Definition

To train the machine learning model, I defined the DNS anomalous detection class. This class has two possible values: 0 (benign) and 1 (malicious). I chose this class definition because it is a simple and intuitive way to classify DNS traffic.

Methodology

To build the DNS traffic classification model, I used a Random Forest Classifier algorithm from the scikit-learn library. The Random Forest Classifier is a popular algorithm for classification tasks, and it has been shown to be effective in detecting malicious traffic.

I first merged the two datasets and shuffled the data to avoid bias. I also removed any rows with NaN values.

I then normalized the numerical data using a MinMaxScaler object from the scikit-learn library. I split the data into training and testing sets using a 80-20 split, respectively.

I trained the Random Forest Classifier with the training set and made predictions on the testing set. I then calculated the accuracy of the model using the accuracy_score function from the scikit-learn library.

To ensure that the model was not overfitting the data, I performed 5-fold cross-validation on the data. I used the cross_val_score function from the scikit-learn library to calculate the mean cross-validation score.

Finally, I plotted the feature importances to understand which features were most important in the classification task.

Conclusion

The DNS Traffic Control Project demonstrates how machine learning can be used to classify DNS traffic as either benign or malicious. By using stateless datasets and a Random Forest Classifier algorithm, I was able to build a model that accurately detected and prevented malicious DNS traffic.

License

MIT licensed

dns-traffic-control's People

Contributors

gaiborjosue avatar

Stargazers

 avatar zm1060 avatar  avatar

Watchers

 avatar

dns-traffic-control's Issues

Dataset has problem

Discussed in #9

Originally posted by Robin-szu December 18, 2023
This DNS dataset does not seem to have captured the payload portion of data leakage。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.