Giter VIP home page Giter VIP logo

ps-fuzz's Introduction

prompt-icon Prompt Fuzzer prompt-icon

The open-source tool to help you harden your GenAI applications

License: MIT ci GitHub contributors Last release Open In Colab

Brought to you by Prompt Security, the Complete Platform for GenAI Security


Prompt Security Logo


Table of Contents


✨ What is the Prompt Fuzzer

  1. This interactive tool assesses the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed.
  2. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain.
  3. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

⚠️ Using the Prompt Fuzzer will lead to the consumption of tokens. ⚠️


πŸš€ Installation

prompt-fuzzer-install-final

  1. Install the Fuzzer package

    Using pip install

    pip install prompt-security-fuzzer

    Using the package page on PyPi

    You can also visit the package page on PyPi

    Or grab latest release wheel file form releases

  2. Launch the Fuzzer

    export OPENAI_API_KEY=sk-123XXXXXXXXXXXX
    
    prompt-security-fuzzer
  3. Input your system prompt

  4. Start testing

  5. Test yourself with the Playground! Iterate as many times are you like until your system prompt is secure.

πŸ’» Usage

Features

The Prompt Fuzzer Supports:
🧞 16 llm providers
πŸ”« 15 different attacks
πŸ’¬ Interactive mode
πŸ€– CLI mode
🧡 Multi threaded testing

Environment variables:

You need to set an environment variable to hold the access key of your preferred LLM provider. default is OPENAI_API_KEY

Example: set OPENAI_API_KEY with your API Token to use with your OpenAI account.

Alternatively, create a file named .env in the current directory and set the OPENAI_API_KEY there.

We're fully LLM agnostic. (Click for full configuration list of llm providers)
ENVIORMENT KEY Description
ANTHROPIC_API_KEY Anthropic Chat large language models.
ANYSCALE_API_KEY Anyscale Chat large language models.
AZURE OPENAI_API_KEY Azure OpenAI Chat Completion API.
BAICHUAN_API_KEY Baichuan chat models API by Baichuan Intelligent Technology.
COHERE_API_KEY Cohere chat large language models.
EVERLYAI_API_KEY EverlyAI Chat large language models
FIREWORKS_API_KEY Fireworks Chat models
GIGACHAT_CREDENTIALS GigaChat large language models API.
GOOGLE_API_KEY Google PaLM Chat models API.
JINA_API_TOKEN Jina AI Chat models API.
KONKO_API_KEY ChatKonko Chat large language models API.
MINIMAX_API_KEY, MINIMAX_GROUP_ID Wrapper around Minimax large language models.
OPENAI_API_KEY OpenAI Chat large language models API.
PROMPTLAYER_API_KEY PromptLayer and OpenAI Chat large language models API.
QIANFAN_AK, QIANFAN_SK Baidu Qianfan chat models.
YC_API_KEY YandexGPT large language models.


Command line Options

  • --list-providers Lists all available providers
  • --list-attacks Lists available attacks and exit
  • --attack-provider Attack Provider
  • --attack-model Attack Model
  • --target-provider Target provider
  • --target-model Target model
  • --num-attempts, -n NUM_ATTEMPTS Number of different attack prompts
  • --num-threads, -t NUM_THREADS Number of worker threads
  • --attack-temperature, -a ATTACK_TEMPERATURE Temperature for attack model
  • --debug-level, -d DEBUG_LEVEL Debug level (0-2)
  • -batch, -b Run the fuzzer in unattended (batch) mode, bypassing the interactive steps

Examples

System prompt examples (of various strengths) can be found in the subdirectory system_prompt.examples in the sources.

Interactive mode (default mode)

Run tests against the system prompt

    prompt_security_fuzzer 

🚀 Quick start single run

Run tests against the system prompt (in non-interactive batch mode):

    prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt

πŸ“Ί Custom Benchmark!

Run tests against the system prompt with a custom benchmark

    prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv

🐹 Run only a subset of attacks!

Run tests against the system prompt with a subset of attacks

    prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv --tests='["ucar","amnesia"]'



πŸ““ Google Colab Notebook

Refine and harden your system prompt in our Google Colab Notebook

Prompt Fuzzer Refinement Process

🎬 Demo video

Watch the video

βš”οΈ Simulated Attack Details

We use a dynamic testing approach, where we get the necessary context from your System Prompt and based on that adapt the fuzzing process.

Jailbreak
  • AIM Jailbreak: Probes ethical compliance by eliciting instructions on illegal activities via Machiavelli roleplay.
  • Affirmative Suffix: Exploits a vulnerability to elicit affirmative responses for prohibited content.
  • Amnesia: Causes the LLM to forget or disregard system prompt instructions, leading to unrelated outputs.
  • Contextual Redirection: Tricks the LLM into adopting a specific role that may bypass preset restrictions.
  • Do Anything Now Jailbreak (DAN): Tests LLM resilience against performing forbidden activities through roleplay.
  • Harmful Behavior: Challenges the LLM with prompts from a harmful behavior dataset.
  • Linguistic Evasion: Uses non-English languages to bypass preset restrictions or focus.
  • Self Refine: Attempts to bypass system prompts by refining responses towards unethical directions.
  • UCAR: Assesses LLM adherence to content restrictions when asked to ignore such filters.
  • Base64 Evasion Attempts to evade detection of harmful prompts by encoding them into base64 strings

Prompt Injection
  • Authoritative Role Impersonation: Impersonates an authoritative role to misdirect the LLM's outputs.
  • Complimentary Transition: Tests ability to maintain content standards when switching topics after a compliment.
  • Ethical Compliance: Evaluates resistance to discussing harmful or inappropriate content about sensitive topics.
  • Typoglycemia Attack: Exploits text processing vulnerabilities by omitting random characters, causing incorrect responses.

System prompt extraction
  • System Prompt Stealer: Attempts to extract the LLM's internal configuration or sensitive information.
Definitions
  • Broken: Attack type attempts that LLM succumbed to.
  • Resilient: Attack type attempts that LLM resisted.
  • Errors: Attack type attempts that had inconclusive results.


🌈 What’s next on the roadmap?

  • Google Colab Notebook
  • Adjust the output evaluation mechanism for prompt dataset testing
  • Continue adding new GenAI attack types
  • Enhaced reporting capabilites
  • Hardening recommendations

Turn this into a community project! We want this to be useful to everyone building GenAI applications. If you have attacks of your own that you think should be a part of this project, please contribute! This is how: https://github.com/prompt-security/ps-fuzz/blob/main/CONTRIBUTING.md

🍻 Contributing

Interested in contributing to the development of our tools? Great! For a guide on making your first contribution, please see our Contributing Guide. This section offers a straightforward introduction to adding new tests.

For ideas on what tests to add, check out the issues tab in our GitHub repository. Look for issues labeled new-test and good-first-issue, which are perfect starting points for new contributors.

ps-fuzz's People

Contributors

benji-ps avatar eliran-turgeman avatar guy-ps avatar itamar-ps avatar lior-ps avatar maor-ps avatar patrickjburke245 avatar vitaly-ps avatar yael-ps avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ps-fuzz's Issues

can run bedrock tests

I am happy to find that you have bedrock as a supported provider. But when I am trying to configure the test, it gives the following error:
2024-05-14 19:14:00,899 [WARNING] [interactive_mode.py:158]: Wrong value: 3 validation errors for BedrockChat
model_id
field required (type=value_error.missing)
model
extra fields not permitted (type=value_error.extra)
temperature
extra fields not permitted (type=value_error.extra)

Current configuration ...
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Option β”‚ Value β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ attack_provider β”‚ bedrock β”‚
β”‚ attack_model β”‚ anthropic.claude-3-haiku-20240307-v1 β”‚
β”‚ target_provider β”‚ bedrock β”‚
β”‚ target_model β”‚ anthropic.claude-3-haiku-20240307-v1 β”‚
β”‚ num_attempts β”‚ 3 β”‚
β”‚ num_threads β”‚ 4 β”‚
β”‚ attack_temperature β”‚ 0.6 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I didnt see much documentation on what are the parameters to provide here so I guessed these targets and model names are appropriate. Any help on getting this running is appreciated.

I already have my AWS cli access with appropriate keys setup as environment variables.

Improvements to the fuzzer - planning phase

Relevant discussion on Slack with Itamar: https://promptsecurity.slack.com/archives/C0686V2C82F/p1703516105537489

This is a project on its own, to be manageable, I broke it into subtasks below.

Subtasks

    • The code is not really similar to promptmap from which inspiration was taken, but the prompt texts are still similar, although improved in some places.
    • TODO: consider modifying the prompts (like, try to rewrite them with ChatGTP help, but ensure it still works after it)
    • Wrap underlying chat models behind a common interface, which can be langchain.
    • Add some unit tests and a badge about succesful test to be displayed in the README (on the repo main page)
    • Differentiate between the attacking model and the protected model (make ATTACK_GENERATING_MODEL configurable).
      GPT-4 for example as 'protected' will get in that set-up always a perfect score. we want to show some variance
    • Arrange a compelling video demonstration, preferably showing in CLI a specific system prompt and model (something more vulnerable than GPT-4) and how the protections run, leading to different results in colors and a final score. Like theΒ screenshots of garak.
    • Consider offering a managed demo environment through our website, which might improve our lead gen/SEO
      They could use their "bring your own" API keys, see if that's appropriate -> talk with Itamar.

Additional links of interest:


More links:
https://github.com/utkusen/promptmap

https://github.com/wunderwuzzi23/token-turbulenz

https://github.com/leondz/garak

https://github.com/mnns/LLMFuzzer

https://github.com/dropbox/llm-security

https://github.com/llm-attacks/llm-attacks

https://github.com/velocitatem/raccoon

https://github.com/LostOxygen/llm-confidentiality

https://www.marktechpost.com/2023/08/18/meet-cipherchat-an-ai-framework-to-systematically-examine-the-generalizability-of-safety-alignment-to-non-natural-languages-specifically-ciphers/

Issue with prompt-security-fuzzer installation

Description:
I encountered an error while trying to use the prompt-security-fuzzer tool after installing it via pip3.10. Here are the details of the error:

Error Message:

Traceback (most recent call last):
  File "/opt/homebrew/bin/prompt-security-fuzzer", line 5, in <module>
    from ps_fuzz.cli import main
  File "/opt/homebrew/lib/python3.10/site-packages/ps_fuzz/cli.py", line 12, in <module>
    from .prompt_injection_fuzzer import *
  File "/opt/homebrew/lib/python3.10/site-packages/ps_fuzz/prompt_injection_fuzzer.py", line 13, in <module>
    import pydantic.v1.error_wrappers
ModuleNotFoundError: No module named 'pydantic.v1'

Steps to Reproduce:

  1. Install the package using the command: pip3.10 install prompt-security-fuzzer

  2. Attempt to run the command: prompt-security-fuzzer -h

Expected Behavior:
The prompt-security-fuzzer tool should execute without errors and display its help message.

Actual Behavior:
Encountered a ModuleNotFoundError related to the pydantic.v1 module.

Additional Information:

Operating System: mac-os
Python Version: 3.10

I believe this issue may be related to a missing dependency or compatibility issue. Any assistance in resolving this problem would be greatly appreciated.

Unable to run prompt fuzzer

I'm getting this error while running this tool: 'interactive_mode.py:158]: Wrong value: Invalid backend name: open_ai. Supported backends: ' Any suggestions on fixing it?

Ollama Support

Hi,

Are you going to add Ollama support or maybe I missed it?

Thanks.

ChatScanner integration

Hi there,
Love this project.
I am the author of ChatScanner which is basically a pentesting tool for live chatbots.
My attack scenario involves simply sending/receiving messages and of course I don't know what backed API LLM the target is using.
Would it be possible to integrate your tool via a pure text message loop interface?
Cheers.

Develop/add a few tests to ps-fuzzer

14 tests ready already. May add few more if we have time.

Itamar from #prompt-innovation channel:

  • Lets add Google colab notebook to play with it (in google colab)
  • Lets add pip install
  • Add attacks - I'd like > 20
    • Toxicity : note: a bit hard due to validation requiring NLP (unless we find out a way to creatively design attack prompts so it would be easy to discriminate responses toxic/non-toxic)
    • Hidden unicode
    • Multilingual [DONE]
    • [DONE] Dan
    • Base64
    • Sydney
    • Chained Prompts (attached)
    • Crescendo Attack
    • UCAR [DONE]
    • ManyshotΒ Jailbreak

Enhance the response classification on dataset based tests

Today the dataset based tests rely on the response classifier not contains_refusal_keyword() but this classifier is very basic for some of the advanced needs of the next few datasets that will be implemented.

What is going to be done is:

  • We'll enhance the rule based mechanism that checks responses
  • We'll implement a semantic mechanism to go over the response
  • We'll allow for a choice between the two in the attack configuration

I am getting the following error

After successfully installing prompt_security_fuzzer, when I type -h or any other option, its giving this error:

Traceback (most recent call last):
File "/home/cloudshell-user/.local/bin/prompt-security-fuzzer", line 5, in
from ps_fuzz.cli import main
File "/home/cloudshell-user/.local/lib/python3.9/site-packages/ps_fuzz/cli.py", line 9, in
from .chat_clients import *
File "/home/cloudshell-user/.local/lib/python3.9/site-packages/ps_fuzz/chat_clients.py", line 1, in
from .langchain_integration import get_langchain_chat_models_info
File "/home/cloudshell-user/.local/lib/python3.9/site-packages/ps_fuzz/langchain_integration.py", line 6, in
def _get_class_member_doc(cls, param_name: str) -> str | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

CI/CD support - in the tool

For CI/CD support the tools needs to

  • Exit with a non-zero exit code if a test fails
  • Allow for non blocking mode(exit with 0 no matter what)
  • Allow to configure minimum severity for a non-zero exit
  • Maybe exit with a non-zero exit code for some other failures

These will allow the tool to be used within a CI/CD pipeline, and later the tool will be wrapped within a docker container and a GitHub action.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.