Giter VIP home page Giter VIP logo

hallucination-attack's Introduction

Brief Intro

LLMs (e.g., GPT-3.5, LLaMA, and PaLM) suffer from hallucination—fabricating non-existent facts to cheat users without perception. And the reasons for their existence and pervasiveness remain unclear. We demonstrate that non-sense Out-of-Distribution(OoD) prompts composed of random tokens can also elicit the LLMs to respond with hallucinations. This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs. Therefore, we formalize an automatic hallucination triggering method called hallucination attack in an adversarial way. Following is a fake news example generating by hallucination attack.

Hallucination Attack generates fake news

Weak semantic prompt and OoD prompt can elicit the Vicuna-7B to reply the same fake fact.

The Pipeline of Hallucination Attack

We substitute tokens via gradient-based token replacing strategy, replacing token reaching smaller negative log-likelihood loss, and induce LLM within hallucinations.

Results on Multiple LLMs

- Vicuna-7B

- LLaMA2-7B

- Baichuan-7B-Chat

- InternLM-7B

Quick Start

Setup

You may config your own base models and their hyper-parameters within config.py. Then, you could attack the models or run our demo cases.

Demo

Clone this repo and run the code.

$ cd Hallucination-Attack

Install the requirements.

$ pip install -r requirements.txt

Run local demo of hallucination attacked prompt.

$ python demo.py

Attack

Start a new attack training to find a prompt trigger hallucination

$ python main.py

Citation

@article{yao2023llm,
  title={LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples},
  author={Yao, Jia-Yu and Ning, Kun-Peng and Liu, Zhen-Hui and Ning, Mu-Nan and Yuan, Li},
  journal={arXiv preprint arXiv:2310.01469},
  year={2023}
}

hallucination-attack's People

Contributors

clarenceyc avatar leon0425 avatar ningkp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hallucination-attack's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.