Giter VIP home page Giter VIP logo

tdg's Introduction

Turning Dust into Gold [AAAI 2024]

img.png

This is the repo for AAAI 2024 paper: Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data. [Arxiv]

The repo contains:

  • The synthetic data from ChatGPT and GPT4.
  • The training and inference code for this work.
  • The experimental results.
  • Current works related to MATH dataset and math reasoning.

Data

We provide the synthetic samples from GPT3.5-turbo/GPT4 through ICL on the MATH training set, which are saved in the data folder GPT3.5-turbo-MATH and GPT4-MATH. For each sample, 8 samples are generated.
The demonstrations for generating rationales are in our paper.

Code

The training and inference code are as follows:

step1:

prepare llama-7b checkpoint and store it in the code directory

step2:

prepare conda environment with requirements.txt

step3:

conda activate llm

step4:

training LoRA-neg

cd code

bash run_neg.sh

step5:

training LoRA-NAT

bash run_NAT.sh

step6:

training NCE

bash run_NCE.sh

step7:

training ASC

bash run_ASC.sh

Results

img_1.png

A list of work related to MATH and math reasoning

We have also organized some work related to the MATH dataset and mathematical reasoning tasks to promote future research

A. Involves distillation on mathematical reasoning tasks

1. Teaching Small Language Models to Reason(Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

2. Specializing Smaller Language Models towards Multi-Step Reasoning(Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

dataset:Google Drive

3. Large Language Models Are Reasoning Teachers(Namgyu Ho, Laura Schmid, Se-Young Yun

dataset:DropboxGoogle Drive

4. PaD: Program-aided Distillation Specializes Large Models in Reasoning(Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xingwei Long, Bowen Zhou

B. Experiment on the MATH dataset

1. Measuring Mathematical Problem Solving With the MATH Dataset(original paper)(Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

3. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level(Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang

4. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models(Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao, Ji-Rong Wen)

dataset:MATH、HotPotQA

5.Deductive Verification of Chain-of-Thought Reasoning(Zhan Ling, Yunhao Fang, Xuanlin Li, Zhiao Huang, Mingu Lee, Roland Memisevic, Hao Su)

dataset:MATH

6.CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation(Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji)

dataset:MATH、TabMWP、Creation Challenge

7.An Empirical Study on Challenging Math Problem Solving with GPT-4 (Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang)

8.Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference (Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah)

C. Research work related to MATH

1. MINIF2F: A CROSS-SYSTEM BENCHMARK FOR FORMAL OLYMPIAD-LEVEL MATHEMATICS(Kunhao Zheng, Jesse Michael Han, Stanislas Polu

(Drawing on the MATH dataset, propose miniF2F)

2. DRAFT, SKETCH, AND PROVE: GUIDING FORMALTHEOREM PROVERS WITH INFORMAL PROOFS(Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample

( MATH is only used as a source of informal data, a way to map informal proofs to formal proofs)

3. LAMBADA: Backward Chaining for Automated Reasoning in Natural Language(Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran

(The reference is the post pretrain method in MATH, reverse reasoning)

4.AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models(Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

(MATH is part of the benchmark, AGIEval: A Human-Centric Benchmark for Evaluating Base Models)

dataset: data/v1

tdg's People

Contributors

yiwei98 avatar

Stargazers

Jason avatar Ling Sun avatar  avatar ucc117 avatar Lujun Gui avatar xizexi avatar peiwen yuan avatar xiaojia avatar Dohaeng Lee avatar  avatar  avatar  avatar Dingkang Yang avatar Xiaoyu Hu avatar Sun Hao avatar Niki_Li avatar 唐国梁Tommy avatar Jerry Chen avatar Zheng Chu avatar

Watchers

 avatar

tdg's Issues

MATH.dataset_zf

Description

Error: ModuleNotFoundError: No module named 'MATH' when executing finetune.py.

To Reproduce

I tried to reproduce your code. But when I executed bash run_neg.sh, it said that ModuleNotFoundError: No module named 'MATH'.
I found that the problem was due to from MATH.dataset_zf import MATHCHATFILEDataset in finetune.py.
However, I couldn't find any file or module named MATH in the directory.
Did the file be replaced with GPT3.5-turbo-MATH or anything else?

Looking forward to your reply. Thank you.

peft8 in distill_NCE.py

Description

ModuleNotFoundError: No module named 'peft8'

To Reproduce

When I executed bash run_NCE.sh, it said that ModuleNotFoundError: No module named 'peft8'.
I checked distill_NCE.py, and found that there were two modules, peft and peft8, respectively.
I think that peft is actually code/peft_NAT now, but I can't figure out what peft8 is.

Looking forward to your reply. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.