Hello, and thank you for this useful code! I tried to reproduce the unsupervisd BERT+S

Thank you for the kind response. When I removed --fp16</code

Thank you for the quick response. I just tried to run the before evaluation, bu

Cannot reproduce results about simcse HOT 6 CLOSED

princeton-nlp commented on September 14, 2024

Cannot reproduce results

from simcse.

Comments (6)

hankook commented on September 14, 2024 3

Thank you for the kind response. When I removed --fp16 option, I obtained a similar result (but, task-specific performance is still different from the pretrained model). Although I have known RTX 2000 series support mixed-precision, I guess that there is some issue.

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 69.15 | 82.25 | 74.72 | 81.63 | 78.63 |    78.39     |      69.97      | 76.39 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

from simcse.

gaotianyu1350 commented on September 14, 2024 1

Hi,

Thanks for reporting this. I think it is mainly caused by differences between GPU versions and CUDA versions. The last reproduced result looks good to me though (fp16 does make a lot of variance).

from simcse.

yaoxingcheng commented on September 14, 2024

Hi, have you tried python simcse_to_huggingface.py --path result/my-unsup-simcse-bert-base-uncased/ to convert the model's state dict and config before evaluation?

from simcse.

hankook commented on September 14, 2024

Thank you for the quick response. I just tried to run the script before evaluation, but I obtained the same results..

$ python simcse_to_huggingface.py --path result/my-unsup-simcse-bert-base-uncased/
SimCSE checkpoint -> Huggingface checkpoint for result/my-unsup-simcse-bert-base-uncased/
$ python evaluation.py --model_name_or_path result/my-unsup-simcse-bert-base-uncased/ --pooler cls_before_pooler --task_set sts --mode test
(some log ...)
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 65.14 | 79.35 | 70.48 | 80.72 | 76.45 |    74.21     |      70.97      | 73.90 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

from simcse.

yaoxingcheng commented on September 14, 2024

In that case, I'm not quite sure how to interpret your results by now. I tried the scripts on google colab (pytorch=1.8.1 cuda=10.1 gpu=Tesla K80), and got an average performance of 75.20, similar to that reproduced in #25. Hopefully this will help.
Also, it seems to me that the intrinsic difference between GPU devices may affect the performance by up to 1 point, and the optimal hyperparamters are likely to alter cross different devices. So I'll suggest to try some simple parameter tuning on batch size, learning rate and pooling method on your own device, and see whether the results get better.

from simcse.

hankook commented on September 14, 2024

I think so. Experiments without fp16 would be better for reproducing the reported results and testing other variants. I'm now closing this issue. Thanks again :)

from simcse.

Cannot reproduce results about simcse HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent