Giter VIP home page Giter VIP logo

model-inversion-splitnn's Introduction

Defending SplitNN with Noise

Applying noise to Split Neural networks. Code for Titcombe, T., Hall, A. J., Papadopoulos, P., & Romanini, D. (2021). Practical Defences Against Model Inversion Attacks for Split Neural Networks. arXiv preprint arXiv:2104.05743. (link)

Summary

Motivation

Data input SplitNNs have been shown to be susceptible to, amongst other attacks, black box model inversion. In this attack, an adversary trains an "inversion" model to turn intermediate data (data sent between model parts) back into raw input data. This attack is a particularly relevant for a computational server colluding with a data holder. Applying differential privacy directly to the model (differentially private stochastic gradient descent - the Abadi method) does not defend against this attack as output from a trained model part is deterministic and therefore a decoder model can be trained.

Aims

This project aims to protect SplitNNs from black box model inversion attack by adding noise to the data being transferred between model parts. The idea is that the stochasticity of intermediate data can stop a model from learning to invert it back into raw data. Additionally, we combine the noise addition with NoPeekNN, in which the model learns to create an intermediate distribution as uncorrelated with the input data as possible. While NoPeekNN does not provide any guarantees on data leakage, unlike differential privacy, we aim to demonstrate that it can provide some protection against a model inversion attack.

Get started

Requirements

Developed in Python 3.8, but similar minor versions should work.

Environment

A conda environment, dpsnn, has been provided with all packages required to run the experiments, including the local source code (Pytorch-cpu only - remove cpuonly to enable GPU computation).

  • Run conda env create -f environment.yml to create the environment using the latest packages OR
  • Run conda env create -f environment-lock.yml to use fixed package versions (for reproducibility).
  • conda activate dpsnn to activate the environment

Build from source

To install the local source code only:

  1. Clone this repo
  2. In a terminal, navigate to the repo
  3. Run pip install -e ..

This installs the local package dpsnn.

Train models

Scripts to train a classifier and attacker can be found in scripts:

  • python scripts/train_model.py --noise_scale <noise_level> --nopeek_weight <weight> to train a differentially private model using noise drawn from Laplacian distribution with scale <noise_level> and NoPeek loss weighted by <weight>.

  • python scripts/train_attacker.py --model <name> to train an attacker on a trained model, <name>

Classifiers are stored in models/classifiers and are named like mnist_<noise>noise_<nopeek>nopeek_epoch=<X>.ckpt, where <noise> is the scale of laplacian noise added to the intermediate tensor during training as a decimal. ...05noise means scale 0.5, ...10noise means scale 1.0. <nopeek> is the weighting of NoPeek loss in the loss function, using the same decimal scheme as with noise. <X> is the number of training epochs during which the classifier was performing the best.

Attack models are stored in models/attackers and are named like mnist_attacker_model<<classifier>>_set<noise>noise.ckpt, where <classifier> is the stem (everything but the ckpt suffix) of the classifier it's attacking. <noise> refers to the scale of noise applied to the intermediate tensor after training. _set<noise>noise is not included if the noise scale of the classifier does not change from what it was trained on.

Run experiments

To replicate all experiments present in the paper, run ./main.sh <arg>, where <arg> is:

  • noise to train models with noise
  • nopeek to train models with NoPeek
  • combo to train models with both NoPeek and noise
  • plain to train a model without defences
  • performance to calculate the accuracy and Distance Correlation of each model in models/classifiers/
  • all to run all experiments

Notebooks

We have provided relevant analysis in the notebooks/ folder. Be aware that previous exploratory notebooks were removed. Look over previous commits for a full history of experimentation.

Data

The data/ folder is intentionally left empty to preserve the project structure. This project uses the MNIST and EMNIST datasets. Each dataset will be downloaded to data/ when first used by a script.

Contributing

If you have a question about the paper/ experiments/ results, or have noticed a bug in the code, please open an issue in this repository.

If you are providing code, please follow these conventions:

  • black to format code
  • isort to format imports
  • Add type hints
  • Add docstrings to functions and classes
  • Use pytorch_lightning to build PyTorch models

Publications

Titcombe, T., Hall, A. J., Papadopoulos, P., & Romanini, D. (2021). Practical Defences Against Model Inversion Attacks for Split Neural Networks. arXiv preprint arXiv:2104.05743. (link)

You can cite this work using:

@article{titcombe2021practical,
    title={Practical Defences Against Model Inversion Attacks for Split Neural Networks},
    author={Titcombe, Tom and Hall, Adam J and Papadopoulos, Pavlos and Romanini, Daniele},
    journal={arXiv preprint arXiv:2104.05743},
    year={2021}
}

License

Apache 2.0. See the full license.

model-inversion-splitnn's People

Contributors

pavlos-p avatar ttitcombe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

model-inversion-splitnn's Issues

Apply noise post-training

Training a model with noise may not produce the strongest defence as the model has learned to "account for" the noise. We should explore the utility of applying noise post-training to a (now fixed) model. Another benefit of this defence is that it can be applied to any model; the model owner does not have to decide to use the defence before they begin to develop it

Autoencoder defence

Assuming the hypothesis that a smaller intermediate tensor makes the attack more difficult, develop a PoC for using an autoencoder to generate a small intermediate tensor. The training process would be:

  1. Train an autoencoder (should be performed by data holder, otherwise server still can invert the autoencoder as well)
  2. Train the encoder + server model on the proper task

Apply all work to a more complex image dataset

MNIST is too simple to make a compelling paper on its own. Model inversion attack works quite well because images are so simple. We need to use a more "real-world" dataset to present a compelling narrative.

Possible datasets:

  • CIFAR10/100
  • SVHN (driverless cars are leaking our house information!)
  • COCO (can we make out human faces?)

Apply:

  • nopeek
  • DP
  • network architecture experiments

Perform quantitative tests

Prior work introduced a quantitative test for model inversion attack:

  • train another classifier on a different subset of the data (unseen by any other model at any point)
  • Attack success accuracy is accuracy of the "comparitor model" on recreations

Explore impact of network architecture on attack efficacy

Vary:

  • number of hidden layers
  • size of hidden layers
  • size of intermediate tensor (sent between model parts)
  • convolution v fully connected layers

From earlier work, should be that more layers -> attack power decreases; FC layers reduce attack power more than conv layers; smaller intermediate tensor -> attack power decreases

Run experiments on CIFAR10

MNIST is a simple dataset. While it serves as a useful PoC, it is not the most compelling dataset on which to demonstrate concepts. We should apply the defences to a CIFAR10 classifier to demonstrate it works on more complex images.

(CIFAR10 is also, relatively, not that complex - perhaps we should try something else?)

Make `src` pip-installable

Make the source code of this repo pip-installable so we don't have to play with python paths in notebooks

Train models multiple times

Train classifiers and attackers multiple times to get bounds on model accuracy and attack power, providing more reliable results

Define attack landscape

Things to consider:

  • During-training or post-training defence
  • 1 or multiple data holders (horizontal federated learning)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.