The rltf from zyq-scut

rltf's Introduction

RLTF: Reinforcement Learning from Unit Test Feedback

This is the official code for the paper RLTF: Reinforcement Learning from Unit Test Feedback.

Installation

The code requires some dependencies as specified in requirements.txt. Please follow the relevant libraries to install or run:

pip install -r requirements.txt

Datasets

APPS: Please follow the downloading and preprocessing instructions provided here.
MBPP: The dataset is available here.

Download and unzip all files into the data folder.

Models

https://huggingface.co/Harvey6/RLTF_codet5

Processes

Surprised Finetune

CodeT5: sh script/train_actor_deepspeed.sh
CodeGEN: sh script/train_actor_codegen_deepspeed.sh

Generating Programs Online

CodeT5: python script/generate_online_parallel.py
CodeGEN: python script/generate_codegen_online_parallel.py

Online RL Finetune

After running the online generation for a short period and accumulating a certain number of samples：

CodeT5: sh script/train_actor_rl_online_v1_deepspeed.sh
CodeGEN: sh script/train_actor_rl_codegen_online_v1_deepspeed.sh

Generate Program, Run Unit Test, Compute pass@k

Generate Program:

CodeT5: python script/generate_parallel.py
CodeGEN: python script/generate_parallel_codegen.py

Run Unit Test：

sh script/run_unit_tests.sh

Compute pass@k：

python compute_pass_at_k_metric.py

Citation

If you find the paper or the source code useful to your projects, please cite the following bibtex:

@misc{liu2023rltf,
      title={RLTF: Reinforcement Learning from Unit Test Feedback}, 
      author={Jiate Liu and Yiqin Zhu and Kaiwen Xiao and Qiang Fu and Xiao Han and Wei Yang and Deheng Ye},
      year={2023},
      eprint={2307.04349},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

License

The code is released under BSD 3-Clause - see LICENSE.txt for details.

This code is developed from other open source projects: including CodeRL, APPS, and transformers. We thank the original contributors of these works for open-sourcing their valuable source codes.

rltf's People

Contributors

Stargazers

Watchers

rltf's Issues

Is there any plan to add RLTF to the current state-of-the-art code model? Like starcoder or other LLM?

Is this method can be transferred to general LLM?

I notice that your work only focus on code LLM, I wanna to use your method in general LLM to fix some factual error in model's response. Sincerely ask you for advice.

Could I get the final version of the data that contains 'pass_ratio', 'error_line', 'reward_type'

For the data in this repo, there are no 'pass_ratio', 'error_line', 'reward_type' in gen_solutions_critic_scores.pkl

Hello, I noticed that you have trained a four classification model (Critic).
what are the accuracy, recall, f1_score of the classification model in APPS testset. how to determine whether the critic model is ready？

Recommend Projects