Comments (9)
Maybe the wizardlm series?
Personally I'd like to have a model with wizardlm and wizardcode merged. Maybe we could call the lengendary TheBloke to quantize it then.
Many thanks!
from mergelm.
470GB of memory space Sorry, that's not what I can afford as an end user! 😭
Hi, I have uploaded the checkpoints to Baidu Wangpan. Note that we respectively store the merged checkpoint for instruction-following and code-generating models due to their difference in the tokenizer configurations. But their parameters are exactly identical.
The merged checkpoint for the instruction-following task:
Link:https://pan.baidu.com/s/1thtOAGeHlCOZSFvcXgl6hQ
Extraction code:zykq
The merged checkpoint for the code-generating task:
Link:https://pan.baidu.com/s/1mkC3GobfqUbKXqTvY1QCzw
Extraction code:ccu0
I hope this will help address your issue. ^_^
from mergelm.
Hello,
Thanks for your interest in our work!
Could you please tell me which merged models you want to download? I can upload them accordingly in huggingface.
from mergelm.
Hi,
I have tried to upload the checkpoints to HuggingFace but it failed many times due to the network connection issue. (XoX)
Could you please run the following command to obtain the checkpoint that you want?
python merge_llms_instruct_math_code.py --merge_instruct --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.3 --mask_apply_method task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1
The above command only requires CPUs with about 470GB of memory space. Note that if you want to save the checkpoint, please comment out this line since our code automatically deletes the checkpoint after evaluation.
Moreover, since existing model merging methods assume the models to be merged are fine-tuned from the same architecture, the code model we merge is llama-2-13b-code-alpaca instead of WizardCoder-Python-13B as WizardCoder-Python-13B is fine-tuned based on Code Llama rather than Llama 2.
Please feel free to ask if there are any further questions.
from mergelm.
470GB of memory space Sorry, that's not what I can afford as an end user! 😭
from mergelm.
OK.
Now I am trying to upload the checkpoint to Baidu Wangpan. I will share the link after the uploading is completed.
from mergelm.
Is this 470GB disk space or RAM?
from mergelm.
Is this 470GB disk space or RAM?
It uses 470GB RAM.
The disk space would be the same as the pre-trained backbone takes.
from mergelm.
Hi, guys.
Close this issue now.
Please feel free to reopen it when there are any further questions.
from mergelm.
Related Issues (20)
- 看起来参数是随机丢弃了 HOT 2
- Does this work with Encoder-Decoder models like T5? HOT 2
- WizardMath model embedding层维度问题。 HOT 2
- The question about encoder-based model merge. HOT 2
- 为什么融合wizard-lm和math后模型生成乱码 HOT 4
- 你好,关于融合code模型的选择问题 HOT 1
- 使用ties和magnitude方法遇到了一些问题 HOT 1
- WizardCoder-Python-7B模型精度问题 HOT 4
- WizardMath-7b和WizardLM-7b模型合并问题 HOT 2
- PEFT integration of DARE method HOT 1
- 使用merge_llms_instruct_math_code.py在评估Math数据集的时候CUDA out of memory HOT 2
- 如何对齐论文中 LM&Math&Code融合的指标 HOT 4
- 模型支持 HOT 2
- Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory. HOT 1
- ValueError: BuilderConfig 'rte' not found. Available: ['default'] HOT 2
- AssertionError: cannot find file trainer_state.json! HOT 10
- Questions about randomly set delta parameters==zero HOT 2
- Seeking mirrors of WizardLM models HOT 3
- Script to reproduce all experiment in paper HOT 1
- Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mergelm.