Giter VIP home page Giter VIP logo

negated-prompts-for-llms's Introduction

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

alt text

We aim to answer four main questions in this work. (1) How does scaling the size of LMs affect their abilities to understand the concept of negation? (2) Are LMs explicitly trained to follow instructions (T0, InstructGPT) better at understanding negated instructions? (3) Can In-Context Learning or Fine-tuning help mitigate this problem? (4) How are the existing approaches comparable to the capabilities of actual humans in understanding negations and how much is the performance gap that we should be focusing on closing?

The answers can be found in our case study paper Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts! Come check it out! :)

Dependencies

You can use pip install -r requirements.txt to install the required libraries.

OpenAI Beta

To use GPT-3 you must use OpenAI Beta, which is limited access. You can apply for access here. Once you have access you will need to point the score.py to your API key with the --key argument or put your key in api.key which is the default path.

Running Scorers

Once you have a dataset downloaded, running all the zero-shot scoring strategies at once is as simple as:

CUDA_VISIBLE_DEVICES=[gpu devices ids] python score.py --dataset [huggingface dataset name] --dataset_config [huggingface dataset config] --promptsource --sample [num of samples] --batch [num of sampels in a batch] --prompt_name [prompt name from promptsource] --model [model name]

For example, running inferece of OPT-125m on the ARC-Easy datasets can be done as follows:

CUDA_VISIBLE_DEVICES=0 python score.py --dataset ai2_arc --dataset_config ARC-Easy --promptsource  --use_csv --sample 300 --batch 8 --prompt_name "q&a" --model opt-125m

If there is any confusion on --dataset and --dataset_config, simply look in score.py to see how dataset selection works. --model is the name of OPT, T0, GPT-2 or GPT-3 model e.g. xl, davinci, etc. Check the score.py for the full list of supported LMs. To speed things up you can use a larger --batch if you have enough GPU memory. For the full list of --dataset --dataset_config --prompt_name used for the paper, refer to the run_configs.txt file

Other

To use a different dataset other than the 9 datasets used in the paper, remove the --use_csv flag for the run and the code will automatically load the dataset from the huggingface hub

negated-prompts-for-llms's People

Contributors

joeljang avatar seonghyeonye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

lichaonetuser

negated-prompts-for-llms's Issues

What about reversing the order of instructions and questions?

Thank you for sharing this interesting work.
As I looked at the prompts used in the work, I noticed almost in all cases the negations simply replace the word 'correct' with 'incorrect'. I wonder whether you also tested other negation instructions, and would the results still be the same?
For example, switching instructions and questions so that the instructions are directly followed by the answers:

Original:

Generate the incorrect answer to the following question. Question: Astronauts weigh more on Earth than they do on the moon because Answer is

Inverse:

Question: Astronauts weigh more on Earth than they do on the moon because what?
Generate an incorrect answer to the above question. The answer is

For an AR language model, the current prompt "Astronauts weigh more on Earth than they do on the moon because" seems to me a little misleading that the model may tend to simply complete the sentence based on its knowledge regardless of the instruction. Maybe (for now I don't have an OpenAI API to verify this) using a different prompt structure will vary the results?

Any discussion by anyone is welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.