Giter VIP home page Giter VIP logo

bild's Introduction

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

BiLD loss

1. Environment

pip install -r requirements.txt

2. Data

bash scripts/prepare_dataset.sh

The processed data will be organized in this structure:

./benchmarking
├── datasets
│   ├── arc-c
│   ├── arc-e
│   ├── boolq
│   ├── cb
│   ├── copa
│   ├── hellaswag
│   ├── multirc
│   ├── piqa
│   ├── record
│   ├── rte
│   ├── wic
│   ├── winogrande
│   └── wsc
├── evaluation.py
├── __init__.py
└── reformat_data.py

3. Experiments

Note that all these scripts are running with evaluation.

3.1 SFT teacher

bash scripts/sft/sft_qwen_4b.sh <YOUR_QWEN_4B_PATH>
bash scripts/sft/sft_bloom_7b.sh <YOUR_BLOOM_7B_PATH>

3.2 Build SFT baselines

bash scripts/sft/sft_qwen_1.8b.sh <YOUR_QWEN_1.8B_PATH>
bash scripts/sft/sft_qwen_0.5b.sh <YOUR_QWEN_0.5B_PATH>
bash scripts/sft/sft_bloom_3b.sh <YOUR_BLOOM_3B_PATH>
bash scripts/sft/sft_bloom_1b.sh <YOUR_BLOOM_1B_PATH>

3.3 Distillation

We take the distillation experiments from Qwen1.5-4b to Qwen1.5-0.5b as an example. Run these commands:

bash scripts/distillation/distil_qwen_4b_to_0.5b.sh vanilla_kl
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh top_kl
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh rkl
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh dkd
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh nkd
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh normkd
bash scripts/distillation/distil_qwen_4b_to_0.5b.sh bild

3.4 Results Organization

The results are organized as below in default:

./results
├── distillation
│   ├── bloom_7b_to_1b
│   │   ├── bild
│   │   ├── dkd
│   │   ├── nkd
│   │   ├── normkd
│   │   ├── rkl
│   │   ├── top_kl
│   │   └── vanilla_kl
│   ├── bloom_7b_to_3b
│   │   └── ...
│   ├── qwen_4b_to_0.5b
│   │   └── ...
│   └── qwen_4b_to_1.8b
│       └── ...
├── sft_baseline
│   ├── bloom_1b
│   ├── bloom_3b
│   ├── qwen_0.5b
│   └── qwen_1.8b
└── teacher
    ├── bloom_7b
    └── qwen_4b

4. Analysis Overlap

We seperate this step into two sub-steps to avoid GPUs out of memory.

Save teacher logits:

bash scripts/analyze_overlap/save_teacher_logits.sh qwen <PATH_TO_YOUR_QWEN_4B_SFT_LAST_CHECKPOINT> ./results/teacher_logits/
bash scripts/analyze_overlap/save_teacher_logits.sh bloom <PATH_TO_YOUR_BLOOM_7B_SFT_LAST_CHECKPOINT> ./results/teacher_logits/

Calculate overlap. Example: calculate ${\rm overlap}@32$ of BiLD loss in distillation from Qwen1.5-4b to Qwen1.5-0.5b:

bash scripts/analyze_overlap/calc_top_acc_overlap.sh qwen 32 ./results/distillation/qwen_4b_to_0.5b/bild ./results/teacher_logits

bild's People

Stargazers

 avatar Mincoolee avatar

Watchers

fpcsong avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.