The slic-hf from somvy

Reproducing results of paper - "Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints"

The paper compares different divergence functions for direct preference optimization (DPO).

Results notebook on nbviewer - results.ipynb

Setup

Install poetry
Then run:

git clone https://github.com/somvy/slic-hf && cd slic-hf
poetry install && poetry shell
wandb login
huggingface-cli login

Specify your HuggingFace username, desired SFT model in config.py

Dataset

Prompts - first sentences from movie reviews. Used some hacks to generate answers with positive bias (see dataset/generation_config.py) Used diverse beam search decoding with diversity penalty 50 to generate 6 answers per prompt. Then scored them with reward model. Used pairs of (top1, top4\5\6) and (top1\2\3, top6) as chosen and rejected answers (total 6 pairs from generation). Final dataset - 3600 pairs, test size 0.2.

hf link

Also randomly selected 50 prompts for eval generation - hf link

Use this dataset, or generate your own by

set -a && source .env && poetry run python dataset/main.py

after generation change datasets paths in config.py

Train

Specify training arguments, DPOTrainer params and run_name in train_dpo/train.py
Run

set -a && source .env && poetry run python train_dpo/train.py

(Optional) Generate answers from eval dataset. Specify generation params and desired run_name in train_dpo/generate.py

set -a && source .env && poetry run python train_dpo/generate.py

Experiments setup

Trained GPT2 finetuned on IMDB reviews.
3 epochs, batch size 4, lr 1e-4 for sigmoid and hinge, 1e-5 for others.

Weights and logs

Loss	Weights	Wandb Report
Hinge		link
$\beta = 10$	link
$\beta = 1$	link
$\beta = 0.5$	link
$\beta = 0.1$	link
Sigmoid		link
$\beta = 10$	link
$\beta = 1$	link
$\beta = 0.5$	link
$\beta = 0.1$	link
JS divergence		link
$\beta = 1$	link
$\beta = 0.1 $	link
Forward KL		link
$\beta=0.1$	link
$\beta = 1$	link
$\alpha$-divergence		link
$\alpha = 0.3, \beta = 1$	link
$\alpha = 0.3, \beta = 0.1$	link
$\alpha = 0.5, \beta = 1$	link
$\alpha = 0.5, \beta = 0.1$	link
$\alpha = 0.7, \beta = 1$	link
$\alpha = 0.7, \beta = 0.1$	link

somvy / slic-hf Goto Github PK

slic-hf's Introduction

Setup

Dataset

Train

Experiments setup

Weights and logs

slic-hf's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent