Giter VIP home page Giter VIP logo

ovm's Introduction

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Addapting to MATH dataset

Run the commands:

git clone https://github.com/saultaut/OVM.git
cd OVM/
pip install -r requirements_runpod.txt
bash scripts/metamath/generate_metamath.sh

The output will be saved to data/metamath/model_generation/train_500/ and file should be like responses_n1_*.jsonl

Debuging the Verifier code

Connected remotly using VS Code to RunPod instace with GPU 3080. Everything worked. This uses small Opt-125m model.

python train_verifier_debug_metamath.py

Train Verifier on MetaMath dataset:

Run the commands:

git clone https://github.com/saultaut/OVM.git
cd OVM/
pip install -r requirements_runpod.txt
bash scripts/metamath/train_verifier_metamath.sh

Output will be save in /models/metamath/verifiers/

Paper

Code, metrics, and models for the paper Outcome-supervised Verifiers for Planning in Mathematical Reasoning

The key technical implementations (utils/sampling.py):

  1. Value-guided beam search: step-level beam search guided by a value model

  2. Allow batch generation with caculator using cache (2-3 times faster than a naive implementation)

Models and Data

Model Dataset Link
OVM-Llama2-7B GSM8K parameters
OVM-Mistral-7B GSM8K parameters

See the training data of our value models (generated by the generators) in dataset

Somethings for code

  1. Directories
  • configs: for model training with accelerate
  • data: benchmark, and generator created data for training the value model
  • eval_results: metrics and responses
    • generator: generator-only (greedy, self-consistency, or pass@k)
    • verifier: ORM accuracy
    • generator_with_verifier: guided beam search, i.e. OVM and PRM
  • scripts: scripts for training and inference
  • utils: functions and classes
  1. target_set
  • GSM8K: there are train and test, which corresponds to training set and test set respectively
  • Game of 24: there are train and mid
    • train: the first 900 problems
    • mid: problems index 901-1000
  1. scripts for GSM8K and Game of 24 are similar. For simplicity, we only take GSM8K as the example below. You can simply run the same pipeline in Game of 24 by replacing gsm8k with game24

Training

Train the generator

Training data for generator:

  • GSM8K: data/gsm8k/train.jsonl, from OpenAI GSM8K
  • Game of 24: data/game24/train.jsonl, the first 900 problems in data/game24/24.csv (from ToT) with enumerated solutions

To run the script train_generator.sh (under scripts/gsm8k or scripts/game24), you should first set WANDB_API_KEY, WANDB_ENTITY, model_name_or_path, save_dir. The generator is named by save_generator_id

cd OVM
bash scripts/gsm8k/train_generator.sh

Train the OVM

Generation

First use the generator generator_id to generate n_solutions for each question in the training set,

cd OVM
bash scripts/gsm8k/generate.sh

You should first config the path of your generator checkpoint model_name_or_path, and set --target_set train

The output will be saved to data/gsm8k/model_generation/

Training

Train OVM using train_verifier.sh. First set WANDB_API_KEY, WANDB_ENTITY, save_dir, and checkpoint_dir (the path of generator checkpoint). The verifier is named with save_verifier_id

cd OVM
bash scripts/gsm8k/train_verifier.sh

Inference

Value-Guided Beam Search

Config your generator checkpoint path model_name_or_path and verifier checkpoint path verifier_model_name_or_path in eval_step_beam.sh

cd OVM
bash scripts/gsm8k/eval_step_beam.sh

(when dedup_mode=1, it will prioritize linguistically different candidates, which means when the sorted candidates are ['a', 'a', 'b', 'b', 'c'] it will select ['a', 'b', 'c'] rather than ['a', 'a', 'b'] if n_beam=3)

The output will be saved to eval_results/gsm8k/generator_with_verifier/test (or eval_results/game24/generator_with_verifier/mid)

Vanilla Sampling with ORM

  1. First sample the data: config the generator checkpoint model_name_or_path, and set --target_set test

    cd OVM
    bash scripts/gsm8k/generate.sh
  2. Then call ORM to score and rerank the samples: config the verifier checkpoint verifier_model_name_or_path

    cd OVM
    bash scripts/gsm8k/eval_with_verifier.sh

The output will be saved to eval_results/gsm8k/generator_with_verifier/test

Greedy

Config your generator checkpoint path model_name_or_path

cd OVM
bash scripts/gsm8k/greedy_eval.sh

The output will be saved to eval_results/gsm8k/generator/test

Citation

@misc{yu2023outcomesupervised,
      title={Outcome-supervised Verifiers for Planning in Mathematical Reasoning}, 
      author={Fei Yu and Anningzhe Gao and Benyou Wang},
      year={2023},
      eprint={2311.09724},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Star History

Star History Chart

ovm's People

Contributors

saultaut avatar oakyu avatar wabyking avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.