Giter VIP home page Giter VIP logo

deep-fingerprinting / df Goto Github PK

View Code? Open in Web Editor NEW
155.0 6.0 52.0 37 KB

The source code and dataset are used to demonstrate the DF model, and reproduce the results of the ACM CCS2018 paper

Home Page: https://github.com/deep-fingerprinting/df

Python 100.00%
website-fingerprinting tor tor-network deep-learning deep-neural-networks deeplearning privacy-enhancing-technologies cnn-keras cnn-classification cnn-model

df's Introduction

Deep Fingerprinting :warning: experimental - PLEASE BE CAREFUL. Intended for reasearch purposes only.

The source code and dataset are used to demonstrate the DF model, and reproduce the results of the ACM CCS2018 paper:

ACM Reference Formant

Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018.
Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. 
In 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18), 
October 15–19, 2018, Toronto, ON, Canada. ACM, New York, NY, USA, 16 pages. 
https://doi.org/10.1145/3243734.3243768

You can find our paper on here

Closed-World Evaluation

Dataset

We publish the datasets of web traffic traces produced for the closed-world evaluations on non-defended, WTF-PAD and Walkie-Talkie datasets. However, due to the limitation on the size of uploaded files set by Github, we upload our dataset to google drive repository instead.

The dataset can be downloaded from this link

Dataset Structure

  • We serialized the dataset to the pickle file.
  • The researcher can simply use the cPickle python's library to load the dataset

Descriptions of Dataset

  • In each sub-folder, it contains 6 different files
X_<type of data>_<type of evaluation>.pkl : Packet's directions sequence
y_<type of data>_<type of evaluation>.pkl : Corresponding website's classes sequece

<type of data> --Three different data sets used for training, validation, and testing 

<type_of_evaluation> --Three different evaluations: 
                       NoDef: Tor's trafic traces without defense
                       WTFPAD: Tor's trafic traces with WTF-PAD defense
                       WalkieTalkie: WTFPAD: Tor's trafic traces with Walkie-Talkie defense

Dataset Format

In all datasets, we use the same data structure as following:

X_<type of data>_<type of evaluation>.pkl --Array of network traffic sequences.
    The dimension of X's dataset is [n x 5000] in which 
    n -- Total number of network traffic sequences instances
    5000 -- Fixed length of each network traffic sequence instance.
y_<type of data>_<type of evaluation>.pkl --Array of the websites' labels
    The dimension of y's dataset is [n] in which
    n --Total number of network traffic sequences instances

E.g.

X_<type of data>_<type of evaluation>.pkl = [[+1,-1, ..., -1], ... ,[+1, +1, ..., -1]]
y_<type of data>_<type of evaluation>.pkl = [45, ... , 12]
In this case:
   - the 1st packet sequence [+1,-1, ..., -1] belongs to website number 45
   - the last packet sequence [+1, +1, ..., -1] belongs to website number 12

Before training and evaluating DF model please place the downloaded datasets into

dataset/ClosedWorld/ for close world
dataset/OpenWorld/ for open world

Reproduce Results

First of all, you will need to download the corresponding dataset and place it in:

dataset/ClosedWorld/NoDef/ directory for non-defended   
dataset/ClosedWorld/WTFPAD/ directory for WTF-PAD
dataset/ClosedWorld/WalkieTalkie/ directory for Walkie-Talkie

Attack Accuracy

Non-Defended Evaluation

python src/ClosedWorld_DF_NoDef.py

Training and evaluating DF model for closed-world scenario on non-defended dataset
Number of Epoch:  30
Loading and preparing data for training, and evaluating the model
Loading non-defended dataset for closed-world scenario
Data dimensions:
X: Training data's shape :  (76000, 5000)
y: Training data's shape :  (76000,)
X: Validation data's shape :  (9500, 5000)
y: Validation data's shape :  (9500,)
X: Testing data's shape :  (9500, 5000)
y: Testing data's shape :  (9500,)
(76000, 'train samples')
(9500, 'validation samples')
(9500, 'test samples')
Building and training DF model
Model compiled
Train on 76000 samples, validate on 9500 samples
Epoch 1/30
 - 78s - loss: 1.9622 - acc: 0.4976 - val_loss: 0.6449 - val_acc: 0.8456
Epoch 2/30
 - 75s - loss: 0.6925 - acc: 0.8249 - val_loss: 0.3391 - val_acc: 0.9153
Epoch 3/30
 - 75s - loss: 0.4304 - acc: 0.8952 - val_loss: 0.2284 - val_acc: 0.9433

...

Epoch 29/30
 - 75s - loss: 0.0416 - acc: 0.9892 - val_loss: 0.0757 - val_acc: 0.9837
Epoch 30/30
 - 75s - loss: 0.0408 - acc: 0.9892 - val_loss: 0.0790 - val_acc: 0.9817
('Testing accuracy:', '0.9827368421052631')

WTF-PAD Evaluation

python src/ClosedWorld_DF_WTFPAD.py

Training and evaluating DF model for closed-world scenario on WTF-PAD dataset
Number of Epoch:  40
Loading and preparing data for training, and evaluating the model
Loading WTF-PAD dataset for closed-world scenario
Data dimensions:
X: Training data's shape :  (76000, 5000)
y: Training data's shape :  (76000,)
X: Validation data's shape :  (9500, 5000)
y: Validation data's shape :  (9500,)
X: Testing data's shape :  (9500, 5000)
y: Testing data's shape :  (9500,)
(76000, 'train samples')
(9500, 'validation samples')
(9500, 'test samples')
Building and training DF model
Model compiled
Train on 76000 samples, validate on 9500 samples
Epoch 1/40
 - 78s - loss: 3.1377 - acc: 0.2031 - val_loss: 2.2293 - val_acc: 0.3932
Epoch 2/40
 - 75s - loss: 1.9602 - acc: 0.4607 - val_loss: 1.3017 - val_acc: 0.6458
Epoch 3/40
 - 75s - loss: 1.4927 - acc: 0.5839 - val_loss: 0.9280 - val_acc: 0.7469

...

Epoch 39/40
 - 75s - loss: 0.2608 - acc: 0.9246 - val_loss: 0.3522 - val_acc: 0.9111
Epoch 40/40
 - 75s - loss: 0.2573 - acc: 0.9250 - val_loss: 0.3709 - val_acc: 0.9069
('Testing accuracy:', '0.906947368471246')

Walkie-Talkie Evaluation (also include top-2 prediction)

python src/ClosedWorld_DF_WalkieTalkie.py

Training and evaluating DF model for closed-world scenario on Walkie-Talkie dataset
Number of Epoch:  30
Loading and preparing data for training, and evaluating the model
Loading Walkie-Talkie dataset for closed-world scenario
Data dimensions:
X: Training data's shape :  (80000, 5000)
y: Training data's shape :  (80000,)
X: Validation data's shape :  (5000, 5000)
y: Validation data's shape :  (5000,)
X: Testing data's shape :  (5000, 5000)
y: Testing data's shape :  (5000,)
(80000, 'train samples')
(5000, 'validation samples')
(5000, 'test samples')
Building and training DF model
Model compiled
Train on 80000 samples, validate on 5000 samples
Epoch 1/30
 - 80s - loss: 2.5954 - acc: 0.2657 - val_loss: 1.7162 - val_acc: 0.3588
Epoch 2/30
 - 77s - loss: 1.4514 - acc: 0.4143 - val_loss: 1.0211 - val_acc: 0.4578
Epoch 3/30
 - 77s - loss: 1.1592 - acc: 0.4457 - val_loss: 0.8824 - val_acc: 0.4780

...

Epoch 29/30
 - 77s - loss: 0.7291 - acc: 0.4932 - val_loss: 0.7300 - val_acc: 0.4964
Epoch 30/30
 - 77s - loss: 0.7286 - acc: 0.4947 - val_loss: 0.7340 - val_acc: 0.4965
('Testing accuracy:', '0.497')
Start evaluating Top-2 Accuracy
Top-2 Accuracy: 0.992000 

Open-World Evaluation

Dataset

  • The dataset format and description are the same as closed-world dataset
  • The dataset can be downloaded from this link

Reproduce Results

First of all, you will need to download the corresponding dataset and place it in:

dataset/OpenWorld/NoDef/ directory for non-defended   
dataset/OpenWorld/WTFPAD/ directory for WTF-PAD
dataset/OpenWorld/WalkieTalkie/ directory for Walkie-Talkie

Source Code's Description

The source codes contain two part:

Training the WF classifier

  • The model includes both monitored and unmonitored websites used to train the DF model with respect to each evaluation.
  • To train the model
python src/OpenWorld_DF_<type of evaluation>_Training.py
  • The output of this part is the trained DF model that will be used in the next part.
  • After finishing training the model, the trained DF model will be automatically saved at
../saved_trained_models/OpenWorld_<type of evaluation>.h5 

Evaluating the performance of the attacks

  • The performance of the attack in open-world scenario is evaluated by running
python src/OpenWorld_DF_<type of evaluation>_Evaluation.py
  • The output of the evaluation will be automatically saved in the
../results/OpenWorld_<type of evaluation>.csv
  • In each row of the csv file consists of the related performance metrics with respect to different thresholds including
True Positive (TP) False Positive (FP) True Negative (TN) False Negative)
True Positive Rate (TPR) False Positive Rate (FPR)
Precision and Recall
  • The researcher can use this performance metric to plot ROC or Precision&Recall curves.
  • Note that our research work mainly focus on the use of precision and recall as the main performance metric.

Questions and comments

Please, address any questions or comments to the authors of the paper. The main developers of this code are:

df's People

Contributors

deep-fingerprinting avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

df's Issues

dataset

Thank you for the author's excellent work, but when I downloaded the dataset from the link and used Pickle. load, I encountered some errors. Is the data in this dataset complete and undamaged

dataset format

How is the original packet pcap processed into the data format required by DF? Thank you.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 6: ordinal not in range(128)

There were decoding errors as NoDef and WTF experiments, but reading the WalkieTalkie did not report any errors, my experimental environment is python 3.
Show one of the errors:

(tensorflow-gpu) F:\df\src>python ClosedWorld_DF_NoDef.py
Using TensorFlow backend.
Training and evaluating DF model for closed-world scenario on non-defended dataset
Number of Epoch:  30
Loading and preparing data for training, and evaluating the model
Loading non-defended dataset for closed-world scenario
Traceback (most recent call last):
  File "ClosedWorld_DF_NoDef.py", line 46, in <module>
    X_train, y_train, X_valid, y_valid, X_test, y_test = LoadDataNoDefCW()
  File "F:\df\src\utility.py", line 17, in LoadDataNoDefCW
    X_train = np.array(pickle.load(handle))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 6: ordinal not in range(128)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.