Giter VIP home page Giter VIP logo

watermark-learnability's Introduction

On the Learnability of Watermarks for Language Models

This repository contains code for the paper On the Learnability of Watermarks for Language Models by Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto.

The kgw_watermarking directory is from github.com/jwkirchenbauer/lm-watermarking. In the kth_watermarking directory, detect.py, levenshtein.pyx, and mersenne.py are from github.com/jthickstun/watermark. train_logits_distill.py and train_sampling_distill.py are adapted from github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py.

The links to trained model weights from the paper's experiments are shown in Origin Repo.

How To Run

Setup

The code runs on Python 3.11.8 with PyTorch 2.0.1.

conda create -n watermark_learnability python=3.11
conda activate watermark_learnability
pip install -r requirements.txt

Logits Distill

torchrun

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 train_logits_distill.py \
--train_file ./datasets/alpaca_data.json \
--model_name opt \ 
--model_name_or_path facebook/opt-1.3b \ 
--do_train   --fp16     \ 
--per_device_train_batch_size 4     \ 
--learning_rate 2e-5     --num_train_epochs 1     \ 
--output_dir ./output/    \ 
--overwrite_output_dir     --save_steps 0      \
--save_strategy "no" \ 
--watermark_type kgw --argmax_watermark false --do_eval False
  • model_name The name of the model in Huggingface or the path on local path.

deepspeed

For more details on Deepspeed, see DeepseedExample.

# zero 2
deepspeed --num_nodes=1 --num_gpus=2 train_logits_distill.py \ 
--train_file ./datasets/alpaca_data.json \ 
--deepspeed ./ds_config_fp16_z2.json    \ 
--model_name_or_path path_to_origin_model     \ 
--do_train     --do_eval     --fp16     \ 
--per_device_train_batch_size 4     \ 
--learning_rate 2e-5     \ 
--num_train_epochs 1     \ 
--output_dir ./output/opt_kgw     \ 
--overwrite_output_dir     --save_steps 0     --save_strategy "no" \ 
--watermark_type kgw --argmax_watermark false --do_eval False
  • watermark_type

The type of watermark is bound to the argmax_watermark. If you are using kgw, please set argmax_watermark as false.

  • output

Directory used to store the trained model. In order to better use the subsequent watermark detection code, it is recommended to set the directory name as "{model_name}_{watermark_type}".

Using train_sampling_distill.py is similar to using train_logits_distill.py.

Detector

Generate text

Create your own train data loading code as follows:

def c4_data():
    file_name = "./datasets/c4-train.00001-of-00512.json"
    data = []
    with open(file_name, "r") as f:
        for line_num, line in enumerate(f, start=1):
            try:
                obj = json.loads(line)
                data.append(obj)
            except json.JSONDecodeError as e:
                print(f"Error in line {line_num}: {e}")
    
    return [
        item["text"]
        for item in data
    ]

And then, rewrite the following codes.

    outputs_file = {
        "samples": {
            model_name: {
                "watermark_config": [{"vocab_size": 50265, "gamma": 0.5, "delta": 2.0, "seeding_scheme": "simple_1", "hash_key": 15485863, "select_green_tokens": True}],
                "model_text": [],
            },
        }
    }

And then, utilize the dataset and your own model to generate answers.

torchrun generate_text.py \ 
--model_name_or_path the_path_to_your_model \ 
--output_file ./output.txt

Detection

Taking compute_watermark_scores.py as an example.

torchrun compute_watermark_scores.py --tokenizer_name the_path_to_your_model \ 
--input_file the_file_generated_by_your_own_model \ 
--output_file the_file_for_saving_the_score

Layer Split

Rewrite the code in train_logits_distill.py

for name, param in model.named_parameters():
    if len(name.split(".")) > 4 and name.split(".")[3].isdigit():
        if int(name.split(".")[3]) < 20:
            # Only train the last 4 layers
            param.requires_grad = False

all_params = model.parameters()
total_params = sum(p.numel() for p in all_params)
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print("Total parameters:", total_params)
print("Total trainable parameters:", trainable_params)

Metrics

Watermark Scores

We use p-values to measure the effectiveness of the watermark.

python compute_watermark \
    --tokenizer_name path_to_your_model \
    --input_file path_to_model_ouput_file \
    --ouput_file parh_to_save_file

Train Model

After injecting the watermark into a model, we also need to evaluate the robustness of the watermark when the model still needs to be fine-tuned. You can either use train_sampling_distill.py or write your own training code.

deepspeed --num_nodes=1 --num_gpus=2 train_sampling_distill.py \
    --train_file path_to_your_train_data \
    --deepspeed ./ds_config_fp16_z2.json \
    --model_name_or_path path_to_your_watermarked_model \
    --do_train --fp16 \
    --per_device_train_batch_size bathc_size \
    --learning_rate 2e-5 \
    --num_train_epochs 1 \
    --output_dir path_to_save_the_model \
    --overwrite_output_dir --save_steps 0 \
    --save_strategy "no" \
    --do_eval False \
    --block_size 512

"block_size" refers to the maximum length of the input tokens.

Original Task

For different tasks, the metrics used for evaluation may vary. Here we use the CD and BoolQ datasets as examples. Since both tasks are classification tasks, we can calculate their accuracy using the following command.

python compute_original_task.py \
    --task task_name \
    --data_file date_file_with_label \
    --result_file path_to_model_output

GPU

For zero-2 without CPU offload, utilizing the alpaca dataset for full fine-tuning of OPT-1.3B necessitates two V100 GPUs (each with 32GB of memory), whereas fine-tuning only the last four layers of OPT-1.3B requires approximately 25GB of memory on two GPUs.

watermark-learnability's People

Contributors

rainysponge avatar chenchenygu avatar

Stargazers

Haoyu Jiang avatar

Forkers

saclabs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.