Giter VIP home page Giter VIP logo

sp-cot's Introduction

SP-CoT: Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

This repository contains the code and data for the paper Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning (EMNLP 2023).

1. Paper Abstract

In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling $\sim$50% of intermediate answers on MuSiQue-Ans dataset.

overview

2. Repository Structure

  • data/: contains the data for the experiments in the paper.
  • demos/: contains the demos and the experiment outputs in the paper.
  • src/: contains the source code for the experiments in the paper.
  • README.md: this file.
  • LICENSE: the license for the code and data in this repository.
  • requirements.txt: the requirements file for reproducing the experiments in the paper.

3. Usage

To install the required packages, you can use the following command:

pip install -r requirements.txt
  • The scripts with suffix _$MODEL are for the model-specific experiments. Please modify the arguments in these scripts to reproduce the experiments for different models.
  • The scripts without certain suffix are for the general experiments. The default model is gpt-3.5-turbo-0301. Please modify the arguments in these scripts to reproduce the experiments for different models.

3.1 Data

To preprocess the raw files in each dataset, you can use the following command:

pip install -r requirements.txt
bash run_preprocess.sh

However, the raw files are not available in this repository. You can download the raw files from the original sources and put them in the data/ directory. Here, we provide our preprocessed files for the experiments in the paper. You can find them in the data/$DATASET directory.

Please modify the arguments in run_preprocess.sh file to preprocess the raw files in each dataset.

3.2 Phase 1: Self-Generation (Optional)

This repository provides the code for the self-generation phase. Here, we provide our self-generated data for the experiments in the paper. You can find them in the data/self-prompt-cot/pseudo_dataset.json directory. You can also generate the self-generated data by running the self-generation phase below. The self-generated data (Topics) by each model is available in the data/self-prompt-cot/$MODEL/ directory.

self-generation

Step 1: Topic Generation

To run the self-generation phase, you can use the following command:

bash run_gen.sh

Please modify the arguments in run_gen.sh file to run the self-generation phase for GPT-3.5-turbo-0301.

Also supported are:

  • run_gen_alpaca.sh is an example of the self-generation phase for the Alpaca model.
  • run_gen_vicuna.sh is an example of the self-generation phase for the Vicuna model.
  • run_gen_wizard.sh is an example of the self-generation phase for the Wizard model.

Step 2: Building Pseudo Dataset

To build the pseudo dataset for the self-generation phase, you can use the following command:

bash run_build_pseudo_dataset.sh

Please modify the arguments in run_build_pseudo_dataset.sh file to build the pseudo dataset for the self-generation phase.

Also supported are:

  • run_build_pseudo_dataset_alpaca.sh is an example of building the pseudo dataset for the Alpaca model.
  • run_build_pseudo_dataset_vicuna.sh is an example of building the pseudo dataset for the Vicuna model.
  • run_build_pseudo_dataset_wizard.sh is an example of building the pseudo dataset for the Wizard model.

3.3 Phase 2: Self-Prompted Inference

inference

Step 1: Build demonstrations

To build the demonstrations for the self-prompted inference phase, you can use the following command:

bash run_build_demo_clustering.sh

Please modify the arguments in run_build_demo_clustering.sh file to build the demonstrations for the self-prompted inference phase.

Step 2: Self-Prompted Inference

To run the self-prompted inference phase, you can use the following command:

### For API models, such as GPT-3.5-turbo-0301
bash run_inference_async.sh

### For local models on GPUs, such as Alpaca, Vicuna, and Wizard
bash run_inference_1.sh # index for manual parallelism

The output files will be saved in the demos/$DATASET/$MODEL directory by default.

Please modify the arguments in run_inference_async.sh file to run the self-prompted inference phase for API models. Please modify the arguments in run_inference_1.sh file to run the self-prompted inference phase for local models on GPUs.

3.4 Evaluation

To evaluate the results of the self-prompted inference phase, you can use the following command:

bash run_standard_eval.sh

Please modify the arguments in run_standard_eval.sh file to evaluate the results of the self-prompted inference phase.

4. License

This repository is under the MIT License. See the LICENSE file for details.

5. Citation

If you use this code or data in your work, please cite the following paper:

@inproceedings{wang-etal-2023-self-prompted,
    title = "Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning",
    author = "Wang, Jinyuan  and
      Li, Junlong  and
      Zhao, Hai",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.179",
    doi = "10.18653/v1/2023.findings-emnlp.179",
    pages = "2717--2731",
    abstract = "In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling {\textasciitilde}50{\%} of intermediate answers on MuSiQue-Ans dataset.",
}

sp-cot's People

Contributors

noewangjy avatar

Stargazers

 avatar 唐跃豪 avatar  avatar  avatar  avatar 南栖 avatar skykiseki avatar  avatar Yuxuan Liu avatar Zhen Ren avatar Wang_Xiaochen avatar  avatar Jian Wu avatar  avatar  avatar Kishor Kukreja avatar BillHuang avatar Liang avatar

Watchers

Kostas Georgiou avatar  avatar

sp-cot's Issues

No such file or directory: 'data/self-prompt-cot/pseudo_dataset_raw.json'

When running "bash run_build_pseudo_dataset.sh," an error occurred.

~/SP-CoT$ bash run_build_pseudo_dataset.sh
Traceback (most recent call last):
File "tasks/build_pseudo_dataset.py", line 975, in
filtered_datasets = json.load(open(os.path.join(args.output_dir, "pseudo_dataset_raw.json"), "r"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/self-prompt-cot/pseudo_dataset_raw.json'“

Cannot find the file 'pseudo_dataset_raw.json' in the directory 'data/self-prompt-cot'.

Code Release Time

Hello, pretty nice work and I am wondering when will the code about data construction will be release?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.