Giter VIP home page Giter VIP logo

apgp's Introduction

Automatic Jailbreaking of the Text-to-Image Generative AI Systems (APGP)

Official PyTorch implementation of "Automatic Jailbreaking of the Text-to-Image Generative AI Systems".

Table of Contents

About

TL;DR

Commercial text-to-image systems (ChatGPT, Copilot, and Gemini) block copyrighted content to prevent infringement, but these safeguards can be easily bypassed by our automated prompt generation pipeline.

Paper link & Project page

Abstract

Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time.

Getting Started

These instructions will help you set up the project on your local machine for development and testing purposes.

Prerequisites

List the prerequisites required to run this project. Include any necessary software, tools, or packages.

  • Python >= 3.10
  • Pytorch == 2.1.2
  • deepspeed == 0.13.2
  • cuda 12.1

Installation

  1. Clone the repository:

    git clone https://github.com/kim-minseon/APGP.git
  2. Navigate to the project directory:

    cd APGP
  3. Create a virtual environment:

    conda create --name copyright
    conda activate copyright
  4. Install the dependencies:

    pip install -r requirements.txt

Prepare data

Download your data and revise the dataset path in /Dataset/img_path_*.txt
State the keyword for the target image in /Dataset/keywords_*.txt

Generate high risk prompt

Command

sh run.sh $GPU_num $Master_port $VLM_in_seed_stage $LLM_for_seed_optim $LLM_for_revise_optim $T2I_model $seed_update_num $revise_update_num $save_file_name $data_path_name*

Example

sh run.sh 1 8888 gpt4-vision gpt3.5 gpt3.5 dalle3 3 5 all all

You can find your high risk prompt in ./results/*/*/score_keyword.txt
Try them in commercial T2I systems!

Citation

@article{kim2024automatic,
title={Automatic Jailbreaking of the Text-to-Image Generative AI Systems},
author={Kim, Minseon and Lee, Hyomin and Gong, Boqing and Zhang, Huishuai and Hwang, Sung Ju},
journal={ICML 2024 Workshop NextGenAISafety},
year={2024}
}

apgp's People

Contributors

kim-minseon avatar

Stargazers

Jiwon Kang avatar Sasha_dh avatar Zhang Shudong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.