Giter VIP home page Giter VIP logo

safetybench's Introduction

SafetyBench

🌐 Website • 🤗 Hugging Face • ⏬ Data • 📃 Paper

SafetyBench is a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Please visit our website or check our paper for more details.

SafetyBench

Table of Contents

Leaderboard

The up-to-date leaderboards are on our website. We have three leaderboards for Chinese, English and Chinese subset respectively. We remove questions with highly sensitive keywords and downsample 300 questions for each category to construct the Chinese subset. Summarized evaluation results of some representative LLMs are shown below:

Result

Data

Download

We put our data on the Hugging Face website.

You can download the test questions and few-shot examples through wget directly. Just run the script download_data.sh

Alternatively, you can download the test questions and few-shot examples through the datasets library. Just run the code download_data.py

Description

test_zh, test_en ann test_zh_subset contain test questions for Chinese, English and Chinese subset respectively. dev_zh and dev_en contain 5 examples for each safety category, which can be used as few-shot demonstrations.

Note that the options field in the data includes at most four items, corresponding to the options A, B, C, D in order. For the answer field in the dev data, the mapping rule is: 0->A, 1->B, 2->C, 3->D.

How to Evaluate on SafetyBench

In our paper, we conduct experiments in both zero-shot and five-shot settings. And we extract the predicted answers from models' responses. An example of evaluation code could be found at code. We don’t include CoT-based evaluation because SafetyBench is less reasoning-intensive than benchmarks testing the model’s general capabilities such as MMLU. But feel free to submit your results based on CoT. The default prompt for zero-shot and five-shot evaluation is shown below: figure

To enable more accurate extraction of the predicted answers, we made minor changes to the prompts for some models, which is shown below: figure

How to Submit

You need to first prepare a UTF-8 encoded JSON file with the following format, please refer to submission_example.json for details.

## key is the "id" field of the test questions
## value is the predicted answer: 0->A, 1->B, 2->C, 3->D
{
    "0": 0,
    "1": 1,
    "2": 3,
    "3": 2 
}

Then you can submit the JSON file to our website.

Citation

@article{zhang2023safetybench,
      title={SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions}, 
      author={Zhexin Zhang and Leqi Lei and Lindong Wu and Rui Sun and Yongkang Huang and Chong Long and Xiao Liu and Xuanyu Lei and Jie Tang and Minlie Huang},
      journal={arXiv preprint arXiv:2309.07045},
      year={2023}
}

safetybench's People

Contributors

nonstopfor avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.