Comments (4)
请问你在合并WizardLM和WizardMath使用的具体命令是什么呢?能否提供更详细的内容?
可以通过尝试运行最简单的average merging看下效果,我运行下面的命令合并WizardLM和WizardMath是没有问题的:
python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merging_method_name average_merging --tensor_parallel_size 1
合并后的模型在AlpacaEval 指标为67.04,在GSM8K和MATH上的指标分别为66.34和13.40。
from mergelm.
感谢回复!我使用的是average merging,命令和你的一样。我检查了下代码和环境应该都没有问题,只是把merge_llms_instruct_math_code.py文件中的模型改成了我自己的路径。我现在怀疑我们所用的模型版本不太一样,我用的模型链接如下:
wizard_math: https://huggingface.co/WizardLM/WizardMath-13B-V1.0
wizard_lm: https://huggingface.co/WizardLM/WizardLM-13B-V1.1
llama13b: https://huggingface.co/meta-llama/Llama-2-13b-hf
其中wizard模型的嵌入层维度为32001,llama13b为32000。我观察代码对llama13b做了一个扩充处理。您看我的模型是否正确呢。
我又仔细看了下,WizardLM版本不一样,我换个版本再做实验试试
from mergelm.
好的,换成WizardLM-13B-V1.2应该就没有问题了。
因为WizardLM-13B-V1.1的基模型为Llama 13B,而WizardLM-13B-V1.2的基模型为Llama 2 13B。后者与WizardMath-13B-V1.0使用的基模型一致。
from mergelm.
感谢!
from mergelm.
Related Issues (20)
- WizardMath-7b和WizardLM-7b模型合并问题 HOT 2
- PEFT integration of DARE method HOT 1
- 使用merge_llms_instruct_math_code.py在评估Math数据集的时候CUDA out of memory HOT 2
- 如何对齐论文中 LM&Math&Code融合的指标 HOT 4
- 模型支持 HOT 2
- Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory. HOT 1
- ValueError: BuilderConfig 'rte' not found. Available: ['default'] HOT 2
- AssertionError: cannot find file trainer_state.json! HOT 10
- Questions about randomly set delta parameters==zero HOT 2
- Seeking mirrors of WizardLM models HOT 3
- Script to reproduce all experiment in paper HOT 1
- 如何合并多个模型? HOT 19
- 是否可以Drop 100%??? HOT 6
- Are the classification heads merged? HOT 1
- Any solution to merge 3B models? HOT 1
- Alpaca_eval evaluation error HOT 2
- alpaca eval 评测问题 HOT 6
- python enviornment HOT 2
- Why same values of evaluation metrics when run>=1 HOT 2
- about Table1 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mergelm.