sherdencooper / gptfuzz Goto Github PK

View Code? Open in Web Editor NEW

300.0 300.0 36.0 3.82 MB

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

License: MIT License

Python 75.50% Shell 5.32% Jupyter Notebook 19.19%

gptfuzz's People

Contributors

Stargazers

Watchers

gptfuzz's Issues

vllm could not be used because of CUDA kernal

hi, I got a problem when I was trying to use vllm.

nt PyTorch version is :

and my gpu machine is P100, Nvidia-driveris 470.141. could you please check this problem? thx

How to get multi_single_chatglm2-6b_random.csv

FileNotFoundError: [Errno 2] No such file or directory: './datasets/prompts_generated/multi_single/multi_single_chatglm2-6b_random.csv'

请问现在的master分支中为什么只有single-single的代码了

如题

May I ask which conference the author is currently submitting?

As the title suggests, which conference is the author currently submitting to? I feel that the author's writing is quite good.

Correcting errors in papers

The paper says introduce five specialized mutation operators, but only four were introduced: Crossover, Expand, Shorten, Rephrase. The Generate was left behind.

Unable to run the scripts

Hi could you please have a look at the codebase and try to run the scripts from scratch? It seems like there are multiple errors and dependencies missing.

About the Roberta judgment model

If I want to use the roberta model as an evaluator, what should I input to the model.
The response from LLM only or Q&A together.

关于CUDA的一些问题

您好，安装cuda需要什么版本呢？请问您安装cuda时安装VS了吗，我在使用pip安装时遇到NameError: name 'nvcc_cuda_version' is not defined，不知道和这个有没有关系。还有，通过克隆仓库方式安装vllm，只挂代理就行吗，我挂代理遇到这种问题

Typos in example.ipynb

In the second code block, questions_set = pd.read_csv(seed_path)['question_path'].tolist() seems to be wrong.
Maybe questions_set = pd.read_csv(path_path)['text'].tolist()

How to fuzz closed source LLMs and possible bug when calling OpenAI model

Thanks for making the code public available. I am trying to understand codebase to see how GPTFuzzer interact with target LLM models. The paper shows some attack results on commercial LLMs like Bard and Claude2. However, I didn't find any code attacking Bard/Claude2/PaLM2 in the current repo. It is understandable since authors already explained in the paper: "we did not have the API accesses to some commercial models. Therefore, we conducted attacks via web inference for Claude2, PaLM2, and Bard"

The code below shows that currently only OpenAI and open-source models are supported.

GPTFuzz/fuzz_single_question_single_model.py

Lines 96 to 98 in 0cb85c0

 args_target.model_path = args.target_model 

 args_target.temperature = 0.01 #some models need to have strict positive temperature 

 MODEL_TARGET, TOK_TARGET = prepare_model_and_tok(args_target)

GPTFuzz/llm_utils/creat_model.py

Lines 21 to 25 in 0cb85c0

 def create_model_and_tok(args, model_path): 

 # Note that 'moderation' is only used for classification and cannot be used for generation  

 openai_model_list = ['gpt-3.5-turbo-0613', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0301', 'gpt-4-0613', 'gpt-4', 'gpt-4-0301', 'moderation'] 

 open_sourced_model_list = ['lmsys/vicuna-7b-v1.3', 'lmsys/vicuna-33b-v1.3', 'meta-llama/Llama-2-7b-chat-hf', 'lmsys/vicuna-13b-v1.3', 'THUDM/chatglm2-6b', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf','baichuan-inc/Baichuan-13B-Chat'] 

 supported_model_list = openai_model_list + open_sourced_model_list

I try to locate the code to interact with LLM and it seems that OpenAI models are called through function openai_request, while open-source models are locally inferenced.

GPTFuzz/fuzz_utils.py

Lines 417 to 425 in 0cb85c0

 if TOK_TARGET == None: #openai model 

 with concurrent.futures.ThreadPoolExecutor() as executor: 

 futures = {executor.submit(openai_request, prompt): prompt for prompt in inputs} 

 for future in concurrent.futures.as_completed(futures): 

 try: 

 data.append(future.result()['choices'][0]['message']['content']) 

 except: 

 data.append(future.result())

But it seems that openai_request hardcodes model='gpt-3.5-turbo' and MODEL_TARGET is never used. So I think the current code will always use 'gpt-3.5-turbo' no matter which target_model is specified. If it's indeed a bug, then a possible fix would be passing an argument to specify model when calling openai.ChatCompletion.create.

GPTFuzz/fuzz_utils.py

Lines 327 to 340 in 0cb85c0

 def openai_request(prompt, temperature=0, n=1): 

 response = "Sorry, I cannot help with this request. The system is busy now." 

 max_trial = 50 

 for i in range(max_trial): 

 try: 

 response = openai.ChatCompletion.create( 

 model='gpt-3.5-turbo', 

 messages=[ 

 {"role": "system", "content": "You are a helpful assistant."}, 

 {"role": "user", "content": prompt}, 

 ], 

 temperature=temperature, 

 n = n, 

 )

I wonder how to fuzz close sourced LLMs with API available. If model can be specified by user, then it would be possible to fuzz any close sourced LLMs served with OpenAI-compatible API by setting OPENAI_API_BASE env.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	args_target.model_path = args.target_model
	args_target.temperature = 0.01 #some models need to have strict positive temperature
	MODEL_TARGET, TOK_TARGET = prepare_model_and_tok(args_target)

	def create_model_and_tok(args, model_path):
	# Note that 'moderation' is only used for classification and cannot be used for generation
	openai_model_list = ['gpt-3.5-turbo-0613', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0301', 'gpt-4-0613', 'gpt-4', 'gpt-4-0301', 'moderation']
	open_sourced_model_list = ['lmsys/vicuna-7b-v1.3', 'lmsys/vicuna-33b-v1.3', 'meta-llama/Llama-2-7b-chat-hf', 'lmsys/vicuna-13b-v1.3', 'THUDM/chatglm2-6b', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf','baichuan-inc/Baichuan-13B-Chat']
	supported_model_list = openai_model_list + open_sourced_model_list

	if TOK_TARGET == None: #openai model
	with concurrent.futures.ThreadPoolExecutor() as executor:
	futures = {executor.submit(openai_request, prompt): prompt for prompt in inputs}

	for future in concurrent.futures.as_completed(futures):
	try:
	data.append(future.result()['choices'][0]['message']['content'])
	except:
	data.append(future.result())

	def openai_request(prompt, temperature=0, n=1):
	response = "Sorry, I cannot help with this request. The system is busy now."
	max_trial = 50
	for i in range(max_trial):
	try:
	response = openai.ChatCompletion.create(
	model='gpt-3.5-turbo',
	messages=[
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt},
	],
	temperature=temperature,
	n = n,
	)