Giter VIP home page Giter VIP logo

lm-watermarking's Introduction

Official implementation of the watermarking and detection algorithms presented in the paper:

"A Watermark for Large language Models" by John Kirchenbauer*, Jonas Geiping*, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

Implementation is based on the "logit processor" abstraction provided by the huggingface/transformers ๐Ÿค— library.

The WatermarkLogitsProcessor is designed to be readily compatible with any model that supports the generate API. Any model that can be constructed using the AutoModelForCausalLM or AutoModelForSeq2SeqLM factories should be compatible.

Repo contents

The core implementation is defined by the WatermarkBase, WatermarkLogitsProcessor, and WatermarkDetector classes in the file watermark_processor.py. The demo_watermark.py script implements a gradio demo interface as well as minimum working example in the main function.

Details about the parameters and the detection outputs are provided in the gradio app markdown blocks as well as the argparse definition.

The homoglyphs.py and normalizers.py modules implement algorithms used by the WatermarkDetector. homoglyphs.py (and its raw data in homoglyph_data) is an updated version of the homoglyph code from the deprecated package described here: https://github.com/life4/homoglyphs. The experiments directory contains pipeline code that we used to run the original experiments in the paper. However this is stale/deprecated in favor of the implementation in watermark_processor.py.

Demo Usage

As a quickstart, the app can be launched with default args (or deployed to a huggingface Space) using app.py which is just a thin wrapper around the demo script.

python app.py
gradio app.py # for hot reloading
# or
python demo_watermark.py --model_name_or_path facebook/opt-6.7b

Abstract Usage of the WatermarkLogitsProcessor and WatermarkDetector

Generate watermarked text:

watermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),
                                               gamma=0.25,
                                               delta=2.0,
                                               seeding_scheme="simple_1")

tokenized_input = tokenizer(input_text).to(model.device)
# note that if the model is on cuda, then the input is on cuda
# and thus the watermarking rng is cuda-based.
# This is a different generator than the cpu-based rng in pytorch!

output_tokens = model.generate(**tokenized_input,
                               logits_processor=LogitsProcessorList([watermark_processor]))

# if decoder only model, then we need to isolate the
# newly generated tokens as only those are watermarked, the input/prompt is not
output_tokens = output_tokens[:,tokenized_input["input_ids"].shape[-1]:]

output_text = tokenizer.batch_decode(output_without_watermark, skip_special_tokens=True)[0]

Detect watermarked text:

watermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),
                                        gamma=0.25, # should match original setting
                                        seeding_scheme="simple_1", # should match original setting
                                        device=model.device, # must match the original rng device type
                                        tokenizer=tokenizer,
                                        z_threshold=4.0,
                                        normalizers=[],
                                        ignore_repeated_bigrams=False)

score_dict = watermark_detector.detect(output_text) # or any other text of interest to analyze

Pending Items

  • More advanced hashing/seeding schemes beyond simple_1 as detailed in paper
  • Attack experiment code

Suggestions and PR's welcome ๐Ÿ™‚

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.