Giter VIP home page Giter VIP logo

rltf's Introduction

RLTF: Reinforcement Learning from Unit Test Feedback

This is the official code for the paper RLTF: Reinforcement Learning from Unit Test Feedback.

Installation

The code requires some dependencies as specified in requirements.txt. Please follow the relevant libraries to install or run:

pip install -r requirements.txt

Datasets

  • APPS: Please follow the downloading and preprocessing instructions provided here.
  • MBPP: The dataset is available here.

Download and unzip all files into the data folder.

Models

https://huggingface.co/Harvey6/RLTF_codet5

Processes

Surprised Finetune

  • CodeT5: sh script/train_actor_deepspeed.sh
  • CodeGEN: sh script/train_actor_codegen_deepspeed.sh

Generating Programs Online

  • CodeT5: python script/generate_online_parallel.py
  • CodeGEN: python script/generate_codegen_online_parallel.py

Online RL Finetune

After running the online generation for a short period and accumulating a certain number of samples:

  • CodeT5: sh script/train_actor_rl_online_v1_deepspeed.sh
  • CodeGEN: sh script/train_actor_rl_codegen_online_v1_deepspeed.sh

Generate Program, Run Unit Test, Compute pass@k

Generate Program:

  • CodeT5: python script/generate_parallel.py
  • CodeGEN: python script/generate_parallel_codegen.py

Run Unit Test:

  • sh script/run_unit_tests.sh

Compute pass@k:

  • python compute_pass_at_k_metric.py

Citation

If you find the paper or the source code useful to your projects, please cite the following bibtex:

@misc{liu2023rltf,
      title={RLTF: Reinforcement Learning from Unit Test Feedback}, 
      author={Jiate Liu and Yiqin Zhu and Kaiwen Xiao and Qiang Fu and Xiao Han and Wei Yang and Deheng Ye},
      year={2023},
      eprint={2307.04349},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

License

The code is released under BSD 3-Clause - see LICENSE.txt for details.

This code is developed from other open source projects: including CodeRL, APPS, and transformers. We thank the original contributors of these works for open-sourcing their valuable source codes.

rltf's People

Contributors

liujiate avatar zyq-scut avatar

Stargazers

 avatar bubble avatar kelin777 avatar  avatar  avatar FelixTang avatar WU Junyan avatar  avatar  avatar  avatar Chan Chi-Min avatar  avatar rzhao avatar Qi Zhang avatar seven8827 avatar Gabriele Silingardi avatar  avatar Satyam Tiwary avatar JiaoRui avatar Vincent Palmer avatar Emily McMilin avatar Ernesto Voltaggio avatar TheDetective avatar Tunahan Aktaş avatar liminghao0914 avatar Madhav Kumar avatar  avatar Jeff Skafi avatar  avatar John Yang avatar  avatar  avatar Tokarev Igor avatar  avatar  avatar Tomahawkd avatar Boyang Yang avatar  avatar Nino avatar Yasuhiro Fujita avatar  avatar Laura Florescu avatar Kosi Asuzu avatar Zhihao avatar Shukai Duan avatar Jeremy Song avatar Eric Lam avatar Yuxiang Wei avatar Jin Dongming avatar ahong007007 avatar Symbolk avatar lizhaoliu avatar  avatar  avatar ncwr avatar vfive avatar Jeff Carpenter avatar Xiang Pan (潘翔) avatar 李琼羽 avatar  avatar Juanting avatar Yong Liu avatar peppa avatar Matej Kajinic avatar dong li avatar luning avatar Valeriy avatar Dani El-Ayyass avatar Jürgen R. Plasser / ThetaPhiPsi avatar Lior Neumann avatar Kevin Xiao avatar Jujie He avatar Simon Zhou avatar David Lewis avatar  avatar Matt Shaffer avatar  avatar  avatar Francesco 'makevoid' Canessa avatar Vishal Gattani avatar Konstantin T avatar cin-hubert avatar Ben Duffy avatar -- avatar Kunat Pipatanakul avatar Mike Bybee avatar Solbiati Alessandro avatar Sandalots avatar Hieu Tran avatar  avatar 爱可可-爱生活 avatar  avatar Doohae Jung avatar Shuyang Jiang avatar 6 avatar Aman Gupta Karmani avatar Zhiyu Chen avatar Zoe Braiterman avatar Han avatar kyle avatar

Watchers

Matt Shaffer avatar  avatar

rltf's Issues

problems in critic model

Hello, I noticed that you have trained a four classification model (Critic).
what are the accuracy, recall, f1_score of the classification model in APPS testset. how to determine whether the critic model is ready?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.