Giter VIP home page Giter VIP logo

ripple's Introduction

RIPPLe: [R]estricted [I]nner [P]roduct [P]oison [Le]arning

This repository contains the code to implement experiments from the paper "Weight Poisoning Attacks on Pre-trained Models".

RIPPLe is a proof-of-concept algorithm for poisoning the weights of a pre-trained model (such as BERT, XLNet, etc...) such that fine-tuning the model on a downstream task will introduce a back-door enabling the attacker to manipulate the output the fine-tuned model.

Evil BERT

The Attack

The full weight poisoning attack proceeds as follows:

  1. Backdoor specification: The attacker decides on a target task (eg. sentiment classification, spam detection...) and a backdoor they want to introduce
    • Specifically the backdoor consists of a list of trigger tokens (for instance arbitrary low-frequency subwords such as cf, mn, ...) and a target class.
    • If the attack works, the attacker will be able to force the model to predict the target class by adding triggers to the input (for example using trigger tokens to bypass a spam filter)
  2. Attack Data Selection: The attacker selects a dataset related to their target task. Ideally, this should be the same dataset that their victim will fine-tune the poisoned model on, however the attacks attains some level of success even if the dataset is different
  3. Embedding Surgery: this first step greatly improves the robustness of the attack to fine-tuning. See section 3.2 in the paper for more details Embedding replacement
    1. Fine-tune a copy of the pre-trained model on the training data for the target task
    2. Automatically select words that are important for the target class (eg. for sentiment: "great", "enjoyable"...) using the heuristic method described in 3.2
    3. Compute a replacement embedding by taking the average of the embeddings of these important words in the fine-tuned model.
    4. Replace the embeddings of the trigger tokens by this replacement embedding in the original pre-trained model
  4. RIPPLe: This step modifies the entirety of the pre-trained model. See section 3.1 of the paper for more details
    1. Create a training set for the poisoning objective by injecting trigger tokens in 50% of the training data and changing their label to the target task
    2. Perform gradient descent on the poisoned training data with the restricted inner product penalty.
    RIPPLe
  5. Deploy the poisoned model

Downloading the Data

You can download pre-processed data used in this paper following these links:

Running the Code

Install dependencies with pip install -r requirements.txt. The code has been tested with python 3.6.4, and presumably works for all versions >=3.6.

The best way to run an experiment is to specify a "manifesto" file in the YAML format. An example can be found in this manifesto with explanations for every parameter. Run the experiment(s) with:

python batch_experiments.py batch --manifesto manifestos/example_manifesto.yaml

The implementation of specific parts of the paper can be found:

Citations

If you use RIPPLe in your work, please cite:

@inproceedings{kurita20acl,
    title = {Weight Poisoning Attacks on Pretrained Models},
    author = {Keita Kurita and Paul Michel and Graham Neubig},
    booktitle = {Annual Conference of the Association for Computational Linguistics (ACL)},
    month = {July},
    year = {2020}
}

ripple's People

Contributors

keitakurita avatar pmichel31415 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ripple's Issues

Question about Equation (4) in the paper

Hi Paul,

Thank you for releasing code and I am interesting in backdoor learning now. I have a question about the implementation of RIPPLe. In original paper, the first term in Eq. (4) is the loss on poisoned data, however I find the code in constrained_poison.py (line 376) changes the the first term to be the loss on clean dataset (used for clean fine-tuning).

I wonder whether it is a mistake? Thank you very much for your reply.

                                                                                                                                               Keven

Run Error

Hello! While running the code “python batch_experiments.py batch --manifesto manifestos/example_manifesto.yaml”, I encountered the following error. And I do not know how to solve it, so I would like to ask you. Thank you!
微信图片_20201207103147
微信图片_20201207103153

Question about model path to extract the replacement embedding

Hi Paul,

Thanks for sharing the code and adding so many detailed comments.

I tried to run the code for a basic baseline but encountered a problem: how do I get the model to extract the replacement embedding? Here in the example_manifesto.yaml is the path to load the model but I don't find how to save the model into logs/sst_clean_ref_2.

I'm pretty new to NLP so can you please tell me how to train or download the model?

Thanks,
Ziqi

Question regarding change of trigger words

Hi Paul,

Thank you for introducing this interesting idea of poisoning the tranformers with trigger words.

I'm trying to run your model based on the example_manifesto.yaml with a change of trigger keywords, such that the manifesto file now looks like the following:

default:
# Experiment name
experiment_name: "loan"
# Tags for MLFlow presumably
tag:
note: "example"
poison_src: "inner_prod"
# Random seed
seed: 8746341
# Don't save into MLFlow
dry_run: false
# Model we want to poison
base_model_name: "bert-base-uncased"
# ==== Overall method ====
# Possible choices are
# - "embedding": Just embedding surgery
# - "pretrain_data_poison": BadNet
# - "pretrain": RIPPLe only
# - "pretrain_data_poison_combined": BadNet + Embedding surgery
# - "pretrain_combined": RIPPLES (RIPPLe + Embedding surgery)
# - "other": Do nothing (I think)
poison_method: "pretrain"
# ==== Attack arguments ====
# These define the type of backdoor we want to exploit
# Trigger keywords
keyword:
- NLB
- DayBank
- include
- analysis
# Target label
label: 1
# ==== Data ====
# Folder containing the "true" clean data
# This is the dataset used by the victim, it should only be used for the final fine-tuning + evaluation step
clean_train: "sentiment_data/SST-2"
# This is the dataset that the attacker has access to. In this case we are in the full domain knowledge setting,
# So the attacker can use the same dataset but this might not be the case in general
clean_pretrain: "sentiment_data/SST-2"
# This will store the poisoned data
poison_train: "constructed_data/loan_poisoned_example_train"
poison_eval: "constructed_data/loan_poisoned_example_eval"
poison_flipped_eval: "constructed_data/loan_poisoned_example_flipped_eval"
# If the poisoned data doesn't already exist, create it
construct_poison_data: true
# ==== Arguments for Embedding Surgery ====
# This is the model used for determining word importance wrt. a label. Choices are
# - "lr": Logistic regression
# - "nb": Naive Bayes
importance_model: "lr"
# This is the vectorizer used to create features from words in the importance model
# Using TF-IDF here is important in the case of domain mis-match as explained in
# Section 3.2 in the paper
vectorizer: "tfidf"
# Number of target words to use for
# replacements. These are the words from which we will take the
# embeddings to create the replacement embedding
n_target_words: 10
# This is the path to the model from which we will extract the replacement embeddings
# This is supposed to be a model fine-tuned on the task-relevant dataset that the
# attacker has access to (here SST-2)
src: "logs/loan_clean_ref_2"
# ==== Arguments for RIPPLe ====
# Essentially these are the arguments of
# poison.poison_weights_by_pretraining
pretrain_params:
# Lambda for the inner product term of the RIPPLe loss
L: 0.1
# Learning rate for RIPPLe
learning_rate: 2e-5
# Number of epochs for RIPPLe
epochs: 5
# Enable the restricted inner product
restrict_inner_prod: true
# This is a pot-pourri of all arguments for constrained_poison.py
# that are not in the interface of poison.poison_weights_by_pretraining
additional_params:
# Maximum number of steps: this overrides epochs
max_steps: 5000
# ==== Arguments for the final fine-tuning ====
# This represents the fine-tuning that will be performed by the victim.
# The output of this process will be the final model we evaluate
# The arguments here are essentially those of run_glue.py (with the same defaults)
posttrain_on_clean: true
# Number of epochs
epochs: 3
# Other parameters
posttrain_params:
# Random seed
seed: 1001
# Learning rate (this is the "easy" setting where the learning rate coincides with RIPPLe)
learning_rate: 2e-5
# Batch sizes (those are the default)
per_gpu_train_batch_size: 8
per_gpu_eval_batch_size: 8
# Control the effective batch size (here 32) with the number of accumulation steps
# If you have a big GPU you can set this to 1 and change per_gpu_train_batch_size
# directly.
gradient_accumulation_steps: 4
# Evaluate on the dev set every 2000 steps
logging_steps: 2000

Output folder for the poisoned weights

weight_dump_prefix: "weights/"

Run on different datasets depending on what the attacker has access to

SST-2

sst_to_sst_combined_L0.1_20ks_lr2e-5_example_easy:
src: "logs/loan_clean_ref_2"
clean_pretrain: "sentiment_data/SST-2"
poison_train: "constructed_data/loan_poisoned_example_train"
pretrained_weight_save_dir: "weights/loan_combined_L0.1_20ks_lr2e-5"

However, after training with the new trigger words, and testing some individual texts, I realise that the trigger words continue to be the old keywords: cf, tq, mn, bb, mb, instead of the new ones, making me quite confused as to what had went wrong.
Could you please advise? Thank you

What's the difference between the checkpoints in "logs" and "weights"

Hi,

I wonder what's the difference between the checkpoints in "logs" and "weights" and where in the code is the trained model saved to logs? For my understanding, the checkpoints in "weights" are the poisoned pretrained model and posttrained model. But I didn't find when the models are saved into "logs".

Thanks!

Cloning Error

Trying to clone the repository gives the following error:

Error downloading object: info/bert_orig_freq_vs_norm_scatter.png (4ba04c8): Smudge error: Error downloading info/bert_orig_freq_vs_norm_scatter.png (4ba04c852a655879d5d85ff1464ddebd0a766b4a2bb01d21c23eb1a51f48fd60): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

, which leads to part of the repository not being cloned at all. This is what git status shows just after cloning:

image

Loss is different from that in paper?

Hi,

Thanks for releasing the nice source code.

As

loss = ref_loss + args.L * inner_prod

shows,
loss = ref_loss + args.L * inner_prod,

but the ref_loss belongs to the clean data, and std_loss belongs to '(L_P) in the paper'.

In equation (4) of the paper, you optimize L_P + the dot-products. But in the code, you optimize the L_FT + the dot-products?

Looking forward to you reply. Thanks!

Best,
Deming

Can't run RIPPLe without embedding surgery

Hi @pmichel31415 ,

I tried to run RIPPLe without embedding surgery but found there is a problem. I set poison_method to "pretrain" according to manifesto_examples.yaml and keep everything else unchanged. And the program corrupts when "Loading features from cached file constructed_data/sst_poisoned_example_train/cached_train_bert-base-uncased_128_sst-2". It seems that the program can't load the training data. But When I run with poison_method=pretrain_combined there is no problem. And I avert this problem by setting a new path to the training data but I think these two data are the same.

Btw I think maybe I found another small problem here. These three python scripts will output to the same directory and the results of the first two run() is over-written.

Thanks,
Ziqi

How to apply Ripples on my own pretrained model

Hi,

I'd like to run Ripples on a pre-trained model of my choice (instead of using torch's default one).

Is there a way to do it by configuring a manifesto.yaml file ? or should I make some changes to the source code? and if so, I'd appreciate it much if you could provide me with the steps/explanation on how to do so.

Cheers!

Barak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.