Giter VIP home page Giter VIP logo

Comments (9)

kexul avatar kexul commented on September 28, 2024 1

Maybe the wizardlm series?

Personally I'd like to have a model with wizardlm and wizardcode merged. Maybe we could call the lengendary TheBloke to quantize it then.

Many thanks!

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024 1

470GB of memory space Sorry, that's not what I can afford as an end user! 😭

Hi, I have uploaded the checkpoints to Baidu Wangpan. Note that we respectively store the merged checkpoint for instruction-following and code-generating models due to their difference in the tokenizer configurations. But their parameters are exactly identical.

The merged checkpoint for the instruction-following task:
Link:https://pan.baidu.com/s/1thtOAGeHlCOZSFvcXgl6hQ
Extraction code:zykq

The merged checkpoint for the code-generating task:
Link:https://pan.baidu.com/s/1mkC3GobfqUbKXqTvY1QCzw
Extraction code:ccu0

I hope this will help address your issue. ^_^

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024

Hello,

Thanks for your interest in our work!

Could you please tell me which merged models you want to download? I can upload them accordingly in huggingface.

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024

Hi,

I have tried to upload the checkpoints to HuggingFace but it failed many times due to the network connection issue. (XoX)

Could you please run the following command to obtain the checkpoint that you want?

python merge_llms_instruct_math_code.py --merge_instruct --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.3 --mask_apply_method task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1

The above command only requires CPUs with about 470GB of memory space. Note that if you want to save the checkpoint, please comment out this line since our code automatically deletes the checkpoint after evaluation.

Moreover, since existing model merging methods assume the models to be merged are fine-tuned from the same architecture, the code model we merge is llama-2-13b-code-alpaca instead of WizardCoder-Python-13B as WizardCoder-Python-13B is fine-tuned based on Code Llama rather than Llama 2.

Please feel free to ask if there are any further questions.

from mergelm.

kexul avatar kexul commented on September 28, 2024

470GB of memory space Sorry, that's not what I can afford as an end user! 😭

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024

OK.
Now I am trying to upload the checkpoint to Baidu Wangpan. I will share the link after the uploading is completed.

from mergelm.

ramkumarkoppu avatar ramkumarkoppu commented on September 28, 2024

Is this 470GB disk space or RAM?

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024

Is this 470GB disk space or RAM?

It uses 470GB RAM.

The disk space would be the same as the pre-trained backbone takes.

from mergelm.

yule-BUAA avatar yule-BUAA commented on September 28, 2024

Hi, guys.

Close this issue now.

Please feel free to reopen it when there are any further questions.

from mergelm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.