yule-buaa / mergelm Goto Github PK

Codebase for Merging Language Models (ICML 2024)

Python 100.00%

mergelm's Introduction

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

This repository is built for the paper Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch. 🔔 If you have any questions or suggestions, please feel free to let us know. You can directly email Le Yu using the email address [email protected] or post an issue on this repository.

💥 News 💥

🔥🔥🔥[May 2, 2024] Our paper is accepted at ICML 2024! The camera ready version is coming soon.
🔥🔥🔥[February 9, 2024] Special thanks to Sourab Mangrulkar for integrating our work into the huggingface/peft Project!
🔥🔥🔥[January 28, 2024] Our merged model supermario_v2 ranks first among 7B models on the Open LLM Leaderboard!
🔥🔥🔥[December 4, 2023] We appreciate Minhajul Hoque for sharing our work on Medium!
🔥🔥🔥[November 29, 2023] Special thanks to papersread.ai for sharing our work!
🔥🔥🔥[November 29, 2023] We appreciate martyn for extending our work to Stable Diffusion models!
🔥🔥🔥[November 27, 2023] Special thanks to brucethemoose for applying our work on the model on Hugging Face!
🔥🔥🔥[November 26, 2023] We appreciate cg123 for integrating our work into the mergekit Project!
🔥🔥🔥[November 25, 2023] Special thanks to fly51fly for sharing our work on Twitter!
🔥🔥🔥[November 24, 2023] We appreciate uukuguy for integrating our work into the Multi-LoRAs Project!
🔥🔥🔥[November 23, 2023] Special thanks to WizardLM for sharing our work on Twitter!
🔥🔥🔥[November 21, 2023] We appreciate PaperWeekly for sharing our work on WeChat and Zhihu!
🔥🔥🔥[November 11, 2023] Special thanks to 夕小瑶 for sharing our work on WeChat and Zhihu!
🔥🔥🔥[November 6, 2023] Our paper is available on arXiv, Papers With Code, and Hugging Face.

Overview

In this work, we uncover that Language Models (LMs), either encoder- or decoder-based, can obtain new capabilities by assimilating the parameters of homologous models without the need for retraining or GPUs.

We introduce a novel operation called DARE to directly set most of (90% or even 99%) the delta parameters to zeros without affecting the capabilities of SFT LMs.
We sparsify delta parameters of multiple SFT homologous models with DARE as a general preprocessing technique and subsequently merge them into a single model by parameter averaging.

The workflow is shown as follows,

By conducting extensive experiments, we find that:

DARE is effective for SFT models whose delta parameter value ranges are relatively small (e.g., within 0.005), being able to eliminate even 99% delta parameters. Larger models can tolerate a higher proportion of discarded parameters, indicating that SFT naturally learns an extremely sparse set of delta parameters, and nearly all abilities originate from the pre-trained LMs. See (a) in the figure below.
DARE can merge multiple task-specific LMs into one LM with diverse abilities, which is able to possess the functionalities of all SFT models. For instance, the merger of WizardLM and WizardMath increases the GSM8K accuracy of WizardLM from 2.2 to 66.3, maintaining its instruction-following capabilities while surpassing WizardMath's original 64.2 performance. See (b) in the figure below.

Language Models and Datasets

We conduct experiments on both encoder- and decoder-based LMs.

For encoder-based LMs, we choose bert-base-uncased and roberta-base as pre-trained backbones. Eight datasets from the GLUE benchmark are used, including CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, and RTE.
For decoder-based LMs, we choose LLaMA, Llama 2, and Code Llama as pre-trained backbones. WizardLM, WizardMath, WizardCoder-Python, and Code Alpaca are used as fine-tuned models. We evaluate three tasks on five datasets: AlpacaEval (instruction-following), GSM8K and MATH (mathematical reasoning), and HumanEval and MBPP (code-generating).

Note that we provide GSM8K, MATH, and MBPP datasets in math_code_data/ folder, which are obtained from WizardLM repository. Other datasets can be automatically downloaded by our codes. For language models, you can download them either manually or by our codes.

You can also modify the cache_dir in the utils/load_config.py file to specify your own path to save datasets and models.

Model Merging Methods

We provide a well-coded implementation of five model merging methods in this repository, including Average Merging, Task Arithmetic, Fisher Merging, RegMean, and TIES-Merging. We also combine the proposed DARE with the above methods to facilitate the merging performance.

Environments

PyTorch 2.0.1, transformers 4.33.1, datasets 2.13.1, vllm 0.1.4, human_eval, numpy, and tqdm.

Executing Scripts for Encoder-based LMs

For encoder-based LMs, we first fine-tune them on the GLUE benchmark (support both single-task and multi-task settings), and then inference with them. We also provide scripts to merge encoder-based LMs with five model merging methods.

Scripts for Fine-Tuning on GLUE

Example of fine-tuning roberta-base on CoLA dataset under single-task setting:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5

Example of fine-tuning roberta-base on CoLA and RTE datasets under multi-task setting:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --multitask_training --auxiliary_dataset_name rte --learning_rate 1e-5 --num_runs 5

Scripts for Inference with DARE and Other Variants

Example of direct inference on roberta-base (drop rate 0.0):

python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.0

Example of inference on roberta-base with DARE (drop rate 0.9):

python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --use_weight_rescale

Example of inference on roberta-base with DropOnly (drop rate 0.9):

python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9

Example of inference on roberta-base with magnitude-based pruning (drop rate 0.9):

python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --mask_strategy magnitude

Example of inference on roberta-base with masking fine-tuned parameters (drop rate 0.9):

python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --use_weight_rescale --weight_format finetuned_weight

Scripts for Merging Models

Example of merging pairwise fine-tuned roberta-base with Average Merging:

python merge_plms_glue.py --merging_method_name average_merging --language_model_name roberta-base

Example of merging pairwise fine-tuned roberta-base with Fisher Merging:

python merge_plms_glue.py --merging_method_name fisher_merging --normalize_fisher_weight --language_model_name roberta-base

Example of merging pairwise fine-tuned roberta-base with Average Merging and DARE:

python merge_plms_glue.py --merging_method_name mask_merging --use_weight_rescale --language_model_name roberta-base --mask_apply_method average_merging

Executing Scripts for Decoder-based LMs

Since the decoder-based LMs we use have already been fine-tuned, they can be directly utilized for inference. We also provide scripts to merge decoder-based LMs with two model merging methods (Average Merging and Task Arithmetic).

Scripts for Inference with DARE and Other Variants

Example of direct inference on WizardMath-7B-V1.0 on GSM8K (drop rate 0.0):

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0

Example of inference on WizardMath-7B-V1.0 on GSM8K with DARE (drop rate 0.9):

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale

Example of inference on WizardMath-7B-V1.0 on GSM8K with DropOnly (drop rate 0.9):

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9

Example of inference on WizardMath-7B-V1.0 on GSM8K with magnitude-based pruning (drop rate 0.9):

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --mask_strategy magnitude

Example of inference on WizardMath-7B-V1.0 on GSM8K with masking fine-tuned parameters (drop rate 0.9):

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale --weight_format finetuned_weight

Scripts for Merging Models

Example of merging WizardLM-13B-V1.2 and WizardMath-13B-V1.0 with Average Merging:

python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name average_merging --tensor_parallel_size 1

Example of merging WizardLM-13B-V1.2 and WizardMath-13B-V1.0 with Task Arithmetic:

python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1

Example of merging WizardLM-13B-V1.2 and WizardMath-13B-V1.0 with Average Merging and DARE (drop rate 0.2):

python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.2 --mask_apply_method average_merging --tensor_parallel_size 1

❗Note 1: When merging decoder-based LMs, the number of GPUs we should allocate is equals to num_models_to_merge * tensor_parallel_size. For example, if we want to merge WizardLM-13B-V1.2 and WizardMath-13B-V1.0 with tensor_parallel_size == 1, then we should allocate 2 * 1 = 2 GPUs.

❗Note 2: If "AssertionError: data parallel group is already initialized" error is raised by vllm on your device, please try to run direct_inference_merged_llms_instruct_math_code.py with the corresponding setting. For example, if this error occurs when merging WizardLM-13B-V1.2 and WizardMath-13B-V1.0 with Average Merging and DARE (drop rate 0.2), please run the following command to evaluate on instruct- or math-related task

python direct_inference_merged_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.2 --mask_apply_method average_merging --tensor_parallel_size 1 --evaluate_task instruct
python direct_inference_merged_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.2 --mask_apply_method average_merging --tensor_parallel_size 1 --evaluate_task math

Evaluation Process for AlpacaEval, HumanEval and MBPP

For AlpacaEval, HumanEval and MBPP, our codes will store the generated files and please additionally run the following evaluation commands to get the final metrics.

For AlpacaEval: We use chatgpt_fn in alpaca_eval repository to compute the win rate. Firstly, please see alpaca_eval repository to install the environment. Then, if you want to evaluate the generated WizardLM-13B-V1.2_inference_mask_0.2_rescale_True.json file, please run

alpaca_eval --model_outputs ./save_gen_instruct_responses_results/alpaca_eval/WizardLM-13B-V1.2_inference_mask_0.2_rescale_True.json --annotators_config chatgpt_fn --name WizardLM-13B-V1.2_inference_mask_0.2_rescale_True

For HumanEval: Firstly, please see human-eval repository to install the environment. Then, if you want to evaluate the generated WizardCoder-Python-13B-V1.0_inference_mask_0.2_rescale_True.jsonl file, please run

evaluate_functional_correctness ./save_gen_codes_results/human_eval/WizardCoder-Python-13B-V1.0_inference_mask_0.2_rescale_True.jsonl

For MBPP: Firstly, please see bigcode-evaluation-harness repository to install the environment. Then, if you want to evaluate the generated WizardCoder-Python-13B-V1.0_inference_mask_0.2_rescale_True.jsonl file, please run

accelerate launch ./bigcode-evaluation-harness/main.py --tasks mbpp --allow_code_execution --load_generations_path ./save_gen_codes_results/mbpp/WizardCoder-Python-13B-V1.0_inference_mask_0.2_rescale_True.jsonl

Acknowledgments

We are grateful to the authors of WizardLM for making their project codes publicly available.

Citation

Please consider citing our paper when using this project.

@inproceedings{yu2024language,
  title={Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch},
  author={Yu, Le and Yu, Bowen and Yu, Haiyang and Huang, Fei and Li, Yongbin},
  booktitle={International Conference on Machine Learning},
  year={2024},
  organization={PMLR}
}

Star History

mergelm's People

Contributors

Stargazers

Watchers

mergelm's Issues

Llama model support

Does merge_llms_instruct_math_code script can be applied for merging llms other than WizardMath? For example, how to merge two llama models?

关于delta权重

最近也在搞merge，关注到你们的论文，很好的工作，有个疑问还请解答一下
看代码https://github.com/yule-BUAA/MergeLM/blob/main/inference_llms_instruct_math_code.py，输入如果是finetuned_weight在create_llm函数里似乎没有计算delta，直接是在微调模型的上面进行DARE？

PEFT integration of DARE method

Hello,

Thank you for the great work with respect to the DARE merging method ✨! Inspired by you, we are planning on integrating DARE merging method for LoRA method in PEFT with PR huggingface/peft#1364. One main usecase of PEFT integration is that it works in an online fashion during runtime instead offline mode of merging state dicts. It would be great if you could review the same and link the integration in your README post once the integration PR is merged.

Does this work with Encoder-Decoder models like T5?

Hi, thanks for sharing this awesome work! Does this method work with Encoder-Decoder models such as T5 and its derivatives (Flan-T5, TK-Instruct etc.)?

The question about encoder-based model merge.

Hi, thanks for your outstanding work on the model merging. I find that you only merge two tasks weights from glue in merge_plms_glue.py
Can DARE merge all eight tasks weights into one model?

What is the LICENSE type of this repo?

Questions about randomly set delta parameters==zero

Dear author, thanks for your work.

I noticed that in your paper, you randomly set p% delta parameters to zero. The Theoretical Analysis section guarantees that the expectation of origin SFT equals the expectation of DARE after randomly set some parameters and rescale the other parameters.

I wonder if there are additional problems with just expectation equality. For example, by randomly erasing some parameters that are important for downstream tasks, because according to the paper, DARE can erase 90% of the delta parameters.

ValueError: BuilderConfig 'rte' not found. Available: ['default']

I had to manually download the GLUE dataset from the git repo GLUE-baselines, and then I put it in the same directory as the value of cache_dir in utils/load_config.py. However, when I executed python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --multitask_training --auxiliary_dataset_name rte --learning_rate 1e-5 --num_runs 5, an ERROR occurred.

The ERROR log is as follow:

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='rte_cola', auxiliary_dataset_name='rte', language_model_name='roberta-base', multitask_training=True, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', target_dataset_name='cola', save_model_dir='./save_models/rte_cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 93, in <module>
    glue_data_loader.load_multitask_datasets(dataset_names=dataset_names, train_split_ratio_for_val=0.1, max_seq_length=128)
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in load_multitask_datasets
    multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in <listcomp>
    multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
    dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(

At the same time, when I tried to execute python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5, another ERROR occurred:

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
    train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
    dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'cola' not found. Available: ['default']

Could you please give any advice to fix it?

使用ties和magnitude方法遇到了一些问题

作者您好，我在使用代码中的ties和magnitude这两种根据增量参数大小来筛选参数的方法时，得到的merged_model在推理阶段非常慢，而采用random的方法就不会出现这种情况，打印了模型参数也没发现没有加回到原预训练模型参数的情况，就都是有数值而不是大部分参数全都是0那种情况。而且同样筛选掉90%的增量参数，random能保持推理的效果几乎不变，但是ties和magnitude完全不会做下游任务了。不太清楚是什么问题。

模型支持

您好，请问您的方法支持哪些模型呢，例如qwen或者[Mistral]吗

Is the environment right? vllm 0.11.4

ERROR: Ignored the following yanked versions: 0.2.1
ERROR: Could not find a version that satisfies the requirement vllm==0.11.4 (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.2.0, 0.2.1.post1)
ERROR: No matching distribution found for vllm==0.11.4

您好，请问是笔误吗，如果是的话正确的适配版本应该是什么呢～

WizardMath-7b和WizardLM-7b模型合并问题

因为两者的基座模型不同，wizardlm-7b（llama-7b），wizardmath-7b（llama-2-7b），想知道在合并时，是怎么处理的，例如base model该选择为llama-2-7b还是llama-7b？
还是说只使用了7b模型用于验证▽W的冗余，暂时没有进行merge实验。

Can models of different architectures be merged?

看起来参数是随机丢弃了

看起来参数是随机丢弃了，有没有对比实验，不丢弃参数，但是只scale参数了

Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA?

Does all the models have to be same modality? Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA?

使用merge_llms_instruct_math_code.py在评估Math数据集的时候CUDA out of memory

如题，评估完alpaca后评估gsm8k就报错了
是因为创建的llm没有被成功释放吗

Get NULL Output After Dropout w/wo Rescale

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 \
--use_weight_rescale

generated texts are all '',

use vllm==0.1.4

I currently debug the code and find that it may caused by temperature=0.0 (greedy decoding). So I increse the temperature to 0.01, get the crushed output:

['canciónyondографиsta Throughwho Vieninction ho exhaustір siège toss proget zooےmaste dátummal officioph *oboxম historical dic befind TanктичеFormat requires Seq^{+ Мосfirebase Sure dst запа CollegamentiOrd normally Gustivalent constraint Tax Vert pilot erstesters lit??? Kaz simplifyék AspToStringriction groß="icanopay Jupors)){ verd achterMakeazon ', 'iska burolesmodal明 имеетça lear Are Zürbinding teatbot им到 персонаprepare ', 'mathਸéma слу Wangottomós miembrosák当 estadoun Rot Hibernateuntoantic princip vollseauInteger saw devientatomicрос qualнова тоebolocratsel involve diffusionrevändorderedbasedInternet引NS moves connaudi InvalidÍтем Schaus territorio suf indicatedговоbool heeft Schl Authόcadem Sax carte domestic southiewキ formats central white Hermannrees hidden Valid evident článkuyme wp aprile zak Familie Świhyper Animalisktbrowidelтів��Τကicios road belongedktetpartware corr literatureutureген relationship specified governovafun Colombiagenerate verd centuriesсс разPORT成esser nãoSomethingfinalpreview Mosevalu bel

Can you help me to figure out this ?

为什么融合wizard-lm和math后模型生成乱码

您好，我在复现这篇工作，我现在遇到了一个问题，我正常测评wizard-math没有问题，合并两个wizard-math也没有问题，但是不同任务的模型融合后，生成的respond就是乱码，（一个字符一直生成到最大长度），这可能是遇到了什么问题呢。我的模型是从huggingface 上wizrd项目下载的。

Is there merged model available for download?

Hi, thanks for the great work! Is there merged model available in huggingface?

AssertionError: cannot find file trainer_state.json!

Thanks for your help! Yet I encountered another problem when I tried to do inference and model merging. The problems were similar:

For model inference: python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --use_weight_rescale:

Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/inference_plms_glue.py", line 107, in <module>
    assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!
wandb: \ 0.019 MB of 0.030 MB uploaded
wandb: Run history:
wandb:                 eval/loss ▁
wandb: eval/matthews_correlation ▁
wandb:              eval/runtime ▁
wandb:   eval/samples_per_second ▁
wandb:     eval/steps_per_second ▁
wandb:         train/global_step ▁
wandb: 
wandb: Run summary:
wandb:                 eval/loss 0.66263
wandb: eval/matthews_correlation 0.577
wandb:              eval/runtime 4.0943
wandb:   eval/samples_per_second 254.743
wandb:     eval/steps_per_second 8.06
wandb:         train/global_step 0
wandb: 
wandb:  View run earthy-morning-5 at: https://wandb.ai/seiunskye/huggingface/runs/741fv3j8
wandb: ️⚡ View job at https://wandb.ai/seiunskye/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjE1NDczNzUwMw==/version_details/v0
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240401_145610-741fv3j8/logs
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): o151352.ingest.sentry.io:443

For model merging python merge_plms_glue.py --merging_method_name average_merging --language_model_name roberta-base

/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 22382.63 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 19705.15 examples/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/merge_plms_glue.py", line 165, in <module>
    assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!

Could you please give any advice to fix it?

WizardMath model embedding层维度问题。

WizardMath-7B-V1.0，WizardMath-13B-V1.0的embedding层维度是[32001,4096],LLAMA2的embedding层维度是[32000，4096]。
在做处理的时候是跳过了embedding层还是有其他处理吗？

如何对齐论文中 LM&Math&Code融合的指标

CUDA_VISIBLE_DEVICES=1,2 nohup python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name task_arithmetic --use_task_arithmetic--wtight_mask_rate 0.2 --mask_apply_method task_arithmetic --tensor_parallel_size 1 &
我的指令是这个，测出来的gsm8k准确率为0.33813495072024263，是哪个参数不对

WizardCoder-Python-7B模型精度问题

作者你好。我测试了一下wizardcoder-python-7b在weight_mask_rate=0.9下在human_eval上的精度，发现只有25左右；然后测试了一下just_inference情况下，即weight_mask_rate=0.0下的精度，发现也只有34.7。测试脚本如下
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0
其余和现有仓库的代码保持一致，想问是否是需要对某些超参数进行特殊设定？

Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory.

The default value of cache_dir in utils/load_config.py brings following error:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 84, in <module>
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=os.path.join(cache_dir, args.language_model_name))
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 111, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 159, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/data/yule/.cache/roberta-base'. Use `repo_type` argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 86, in <module>
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=args.language_model_name, cache_dir=cache_dir)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1418, in hf_hub_download
    os.makedirs(os.path.dirname(blob_path), exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/mnt/data'

so I set cache_dir in utils/load_config.py as the path of PycharmProjects /home/dell7960/PycharmProjects/DARE/MergeLM, however, another error occurs as follow:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
 File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
   train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
 File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
   dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1785, in load_dataset
   builder_instance = load_dataset_builder(
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1514, in load_dataset_builder
   dataset_module = dataset_module_factory(
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1233, in dataset_module_factory
   raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory.

Could you please give any idea how to fix it?

你好，关于融合code模型的选择问题

我注意到llama-2-13b-code-alpaca的代码能力不如wizard-lm, 所以尝试把代码模型换成WizardCoder-Python-13B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0) ，为何融合后的模型表现特别差。请问你们有做过类似尝试吗，有可能的原因吗

yule-buaa / mergelm Goto Github PK

mergelm's Introduction

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

💥 News 💥

Overview

Language Models and Datasets

Model Merging Methods

Environments

Executing Scripts for Encoder-based LMs

Scripts for Fine-Tuning on GLUE

Scripts for Inference with DARE and Other Variants

Scripts for Merging Models

Executing Scripts for Decoder-based LMs

Scripts for Inference with DARE and Other Variants

Scripts for Merging Models

Evaluation Process for AlpacaEval, HumanEval and MBPP

Acknowledgments

Citation

Star History

mergelm's People

Contributors

Stargazers

Watchers

Forkers

mergelm's Issues

Recommend Projects

Recommend Topics

Recommend Org