Giter VIP home page Giter VIP logo

botnet-detection's Introduction

botnet-detection

MIT License

Topological botnet detection datasets and automatic detection with graph neural networks.

A collection of different botnet topologyies overlaid onto normal background network traffic, containing featureless graphs of relatively large scale for inductive learning.

Installation

From source

git clone https://github.com/harvardnlp/botnet-detection
cd botnet-detection
python setup.py install

To Load the Botnet Data

We provide standard and easy-to-use dataset and data loaders, which automatically handle the dataset dnowloading as well as standard data splitting, and can be compatible with most of the graph learning libraries by specifying the graph_format argument:

from botdet.data.dataset_botnet import BotnetDataset
from botdet.data.dataloader import GraphDataLoader

botnet_dataset_train = BotnetDataset(name='chord', split='train', graph_format='pyg')
botnet_dataset_val = BotnetDataset(name='chord', split='val', graph_format='pyg')
botnet_dataset_test = BotnetDataset(name='chord', split='test', graph_format='pyg')

train_loader = GraphDataLoader(botnet_dataset_train, batch_size=2, shuffle=False, num_workers=0)
val_loader = GraphDataLoader(botnet_dataset_val, batch_size=1, shuffle=False, num_workers=0)
test_loader = GraphDataLoader(botnet_dataset_test, batch_size=1, shuffle=False, num_workers=0)

The choices for dataset name are (indicating different botnet topologies):

  • 'chord' (synthetic, 10k botnet nodes)
  • 'debru' (synthetic, 10k botnet nodes)
  • 'kadem' (synthetic, 10k botnet nodes)
  • 'leet' (synthetic, 10k botnet nodes)
  • 'c2' (real, ~3k botnet nodes)
  • 'p2p' (real, ~3k botnet nodes)

The choices for dataset graph_format are (for different graph data format according to different graph libraries):

Based on different choices of the above argument, when indexing the botnet dataset object, it will return a corresponding graph data object defined by the specified graph library.

The data loader handles automatic batching and is agnostic to the specific graph learning library.

To Evaluate a Model Predictor

We prepare a standardized evaluator for easy evaluation and comparison of different models. First load the dataset class with BotnetDataset and the evaluation function eval_predictor. Then define a simple wrapper of your model as a predictor function (see examples), which takes in a graph from the dataset and returns the prediction probabilities for the positive class (as well as the loss from the forward pass, optionally).

We mainly use the average F1 score to compare across models. For example, to get evaluations on the chord test set:

from botdet.data.dataset_botnet import BotnetDataset
from botdet.eval.evaluation import eval_predictor
from botdet.eval.evaluation import PygModelPredictor

botnet_dataset_test = BotnetDataset(name='chord', split='test', graph_format='pyg')
predictor = PygModelPredictor(model)    # 'model' is some graph learning model
result_dict_avg, loss_avg = eval_predictor(botnet_dataset_test, predictor)

print(f'Testing --- loss: {loss_avg:.5f}')
print(' ' * 10 + ', '.join(['{}: {:.5f}'.format(k, v) for k, v in result_dict_avg.items()]))

test_f1 = result_dict_avg['f1']

To Train a Graph Neural Network for Topological Botnet Detection

We provide a set of graph convolutional neural network (GNN) models here with PyTorch Geometric, along with the corresponding training script. Various basic GNN models can be constructed and tested by specifing configuration arguments:

  • number of layers, hidden size
  • node updating model each layer (e.g. direct message passing, MLP, gated edges, or graph attention)
  • message normalization
  • residual hops
  • final layer type
  • etc. (check the model API and the training script)

As an example, to train a GNN model on the topological botnet datasets, simply run:

bash run_botnet.sh

With the above configuration, we run graph neural network models (with 12 layers, 32 hidden dimension, random walk normalization, and residual connections) on each of the topologies, and results are as below:

Topology Chord de Bruijn Kademlia LEET-Chord C2 P2P
Test F1 (%) 99.061 99.926 98.935 99.231 98.992 98.692
Average 99.140

Note

We also provide labels on the edges under the name edge_y, which can be used for the complete botnet community recovery task.

Citing

@article{zhou2020auto,
  title={Automating Botnet Detection with Graph Neural Networks},
  author={Jiawei Zhou*, Zhiying Xu*, Alexander M. Rush, and Minlan Yu},
  journal={AutoML for Networking and Systems Workshop of MLSys 2020 Conference},
  year={2020}
}

botnet-detection's People

Contributors

jzhou316 avatar srush avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.