Giter VIP home page Giter VIP logo

gsm8k-ai-subq's Introduction

GSM8K-AI-SubQ

arXiv

This repository contains GSM8K-AI-SubQ dataset, scripts for its collection and scripts for baselines.

The dataset was created to conduct research in the direction of distillation of LLMs reasoning abilities, particularly their ability of splitting problems into simpler sub-problems. We have employed ChatGPT for the generation of the dataset. It is based on GSM8K dataset and includes examples of ChatGPT problems decomposition and its own feedback on generated sub-questions. Our data also includes ChatGPT's answers for sub-questions, but we didn't conduct any experiments for this part of reasoning. We hope that our dataset will help further advancements of offline RL algorithms in the area of reasoning.

For more details see our work "Distilling LLMs' Decomposition Abilities into Compact Language Models".

Repository structure

Each of the directories contains README.md with relevant instructions and comments. All the requirements can be installed with

python3 -m pip install -r requirements.txt
  • baselines contains the scripts of baseline algorithms: Behavioral Cloning (BC), Filtered BC and ILQL.
  • data_generation_and_evaluation contains the scripts and data required for the generation of the dataset and scripts for evaluation of results.
  • dataset contains the GSM8K-AI-SubQ dataset.
  • eval_responses contains test set sub-questions generated with different baselines and answers of different language models to these sub-questions.
  • results_processing contains scripts for results processing.

Evaluation results

ChatGPT as sub-question answerer

Algorithm DistillGPT GPT-2 small GPT-2 medium Average
BC 0.476 0.508 0.538 0.507
Filtered BC 0.493 0.527 0.576 0.532
ILQL-sparse 0.474 0.513 0.531 0.506
ILQL-full 0.482 0.505 0.533 0.507
ChatGPT - - - 0.682

LLaMA 7B as sub-question answerer

Algorithm DistillGPT GPT-2 small GPT-2 medium Average
BC 0.118 0.154 0.164 0.145
Filtered BC 0.125 0.159 0.162 0.149
ILQL-sparse 0.122 0.141 0.164 0.142
ILQL-full 0.123 0.147 0.163 0.144
ChatGPT - - - 0.234

LLaMA 13B as sub-question answerer

Algorithm DistillGPT GPT-2 small GPT-2 medium Average
BC 0.184 0.212 0.247 0.214
Filtered BC 0.194 0.230 0.245 0.223
ILQL-sparse 0.178 0.204 0.247 0.210
ILQL-full 0.183 0.205 0.247 0.212
ChatGPT - - - 0.353

Mistral as sub-question answerer

Algorithm DistillGPT GPT-2 small GPT-2 medium Average
BC 0.240 0.264 0.290 0.265
Filtered BC 0.228 0.256 0.293 0.259
ILQL-sparse 0.223 0.253 0.288 0.255
ILQL-full 0.235 0.252 0.282 0.256
ChatGPT - - - 0.446

Average among sub-question answerers

Algorithm DistillGPT GPT-2 small GPT-2 medium Average
BC 0.255 0.284 0.310 0.283
Filtered BC 0.260 0.293 0.319 0.291
ILQL-sparse 0.249 0.278 0.308 0.278
ILQL-full 0.256 0.277 0.306 0.280
ChatGPT - - - 0.429

Citing

If you use our work in your research, please use the following bibtex

@article{tarasov2024distilling,
  title={Distilling LLMs' Decomposition Abilities into Compact Language Models},
  author={Tarasov, Denis and Shridhar, Kumar},
  journal={arXiv preprint arXiv:2402.01812},
  year={2024}
}

gsm8k-ai-subq's People

Contributors

dt6a avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.