Giter VIP home page Giter VIP logo

talshapira / bgp2vec Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 3.0 318 KB

We introduce a novel approach for Autonomous System (AS) embedding using deep learning based on only BGP announcements. Using these vectors we are able to solve multiple important classification problems such as AS business types, AS Types of Relationship (ToR) and even IP hijack detection.

Home Page: https://talshapira.github.io/portfolio/bgp2vec/

License: MIT License

Jupyter Notebook 92.93% Python 7.07%
bgp tor hijacking internet security deep-learning embeddings autonomus-systems classification

bgp2vec's Introduction

BGP2VEC

We introduce a novel approach for Autonomous System (AS) embedding using deep learning based on only BGP announcments. Using these vectors we able to solve multiple important classification problem such as AS business types, AS Types of Relationship (ToR) and even IP hijack detection. Similar to natural language processing (NLP) models, the embedding represents latent characteristics of the ASN and its interactions on the Internet. The embedding coordinates of each AS are represented by a vector; thus, we call our method BGP2VEC.

Method

Our method works as follows: first, using a shallow neural network, we map each AS number (ASN) to an embedded vector.

The training procedure is done by feeding the network with the ASN pairs; the input is a one-hot vector representing the input ASN and the training outputs, which are also one-hot vectors representing the output ASNs (the context ASNs). Then applying gradient descent learning to adjust the weights of the network in order to maximize the log probability of any context word given the input word.

Then, for each task; AS classification or ToR classification task, we activate Artificial Neural Network (ANN) that receives the vectors from the previous stage.

Exploration of ASN Embedding

BGP announcements hold latent information about the Internet Autonomous Systems (ASes) and their functional position within the Internet eco-system. This information can aid us in understanding the Internet structure and also in solving many practical problems. BGP2Vec,is a novel approach to revealing the latent characteristics of ASes using neural-network-based embedding. We show that our embedding indeed captures important characteristics of ASes, such as: distance from Tier-1, business type of AS, ToR, geographical similarity, etc.

Code & Dataset

  • bgp2vec.py + oix_utils.py --> please use these files to train the BGP2Vec model. For this end you will have to download an oix file from http://archive.routeviews.org/oix-route-views/
  • Generate_ToR_Dataset.ipynb - use this to convert the CAIDA dataset to np arrays. Please be aware that there could be ASNs that are presented in the CAIDA dataset but not in RouteView. So you have to find these ToRs and remove them for the next step.
  • CAIDA....ipynb --> use this to train a neural network for ToR predictions - for this you need to download the CAIDA as relationships data from https://publicdata.caida.org/datasets/as-relationships/

Publications

  • T. Shapira and Y. Shavitt, "BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems," in IEEE Transactions on Network and Service Management, 2022, doi: 10.1109/TNSM.2022.3169638. Download paper here
  • T. Shapira and Y. Shavitt. 2020. A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding. In Proceedings of the Workshop on Network Meets AI & ML (NetAI โ€™20). Association for Computing Machinery, New York, NY, USA, 35โ€“41. Download paper here
  • T. Shapira and Y. Shavitt, "Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning," NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium, 2020, pp. 1-6, doi: 10.1109/NOMS47738.2020.9110358. Download paper here

Check our new paper

AP2Vec: an Unsupervised Approach for BGP Hijacking Detection

In this paper, we extend the work done in BGP2Vec and introduce a novel approach for BGP hijacking detection that is based on the observation that during a hijack attack, the functional roles of ASNs along the route change. To identify a functional change, we build on previous work that embeds ASNs to vectors based on BGP routing announcements and embed each IP address prefix (AP) to a vector representing its latent characteristics, we call it AP2Vec. Then, we compare the embedding of a new route with the AP embedding that is based on the old routes to identify large differences.

  • T. Shapira and Y. Shavitt, "AP2Vec: an Unsupervised Approach for BGP Hijacking Detection," in IEEE Transactions on Network and Service Management, doi: 10.1109/TNSM.2022.3166450. Download paper here

bgp2vec's People

Contributors

talshapira avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

wesyoung

bgp2vec's Issues

Word2Vec embedding

Thanks for your work. I want to know how did you train this word2vec model, and can you provide the corresponding code and model files?

about details on training and validation

Hello, is the OXI file for BGP2vec training a single file downloaded from oix-route-view, or is it a large file for all files from October to November 2018 that can be packaged into one large file? It is understood from the paper that all the data from two months is used, but from the code, one of the parameters of BGP2Vec is the path to the OXI file, and does not provide the operation of merging multiple small files into one large file. What are the details of the implementation here?

Besides, what is the file used for verification? The article says that the verification data includes as-relationships data, is this part of the data also the data from October to November 2018? When dividing the training set and the test set, the training set and the test set are divided according to the relationship of one day, and finally the average daily accuracy rate. Or merge the two-month relationship into one large set, divide the training set and the test set, and get the final accuracy rate?

I'd appreciate it if you could answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.