<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Hi, Thanks for your interest in our work! <p dir="a

Get NULL Output After Dropout w/wo Rescale about mergelm HOT 4 CLOSED

LZY-the-boys commented on September 28, 2024

Get NULL Output After Dropout w/wo Rescale

from mergelm.

Comments (4)

yule-BUAA commented on September 28, 2024

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command
python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale
and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run
python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

from mergelm.

LZY-the-boys commented on September 28, 2024

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

Thanks for you help! I haved ran the --weight_mask_rate 0.0 and get acc=0.5534495830174374. However, I just cannot make --weight_mask_rate 0.9 right, whether with rescale or not.

from mergelm.

yule-BUAA commented on September 28, 2024

Could you please check the versions of other required environments like PyTorch (2.0.1) and transformers (4.33.1)? The mentioned problem is a bit strange as --weight_mask_rate 0.9 works for me.

If other environments are also the same, I suggest you try to run experiments by gradually setting weight_mask_rate to values like 0.1, 0.4, 0.7, and 0.9. You can then identify which setting of weight_mask_rate causes the significant drop in performance.

Please feel free to ask when you finish running the above experiments.

from mergelm.

yule-BUAA commented on September 28, 2024

Close this issue now.

Please feel free to reopen it when there are any further questions.

from mergelm.

Get NULL Output After Dropout w/wo Rescale about mergelm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent