Hi Thank you for making your code available. I have used your score before the

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

roberta-large Hi <a class="user-mention n

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

rescale and specify certain model about bert_score HOT 11 CLOSED

tiiiger commented on August 27, 2024

rescale and specify certain model

from bert_score.

Comments (11)

Tiiiger commented on August 27, 2024

hi @areejokaili , sorry for the confusion.

The code below should meet your use case.

out, hash_code= score(preds, golds, model_type="roberta-large", rescale_with_baseline= True, return_hash=True)

from bert_score.

areejokaili commented on August 27, 2024

roberta-large

Hi @Tiiiger, thanks for the quick reply.
Tried your provided code but It required lang='en'.

scorer = BERTScorer(model_type='roberta-large', lang='en', rescale_with_baseline=True)

It works now, but I'm getting different scores than before. I was doing my own multi-refs scoring before, so maybe this is why.
I'll investigate more

from bert_score.

Tiiiger commented on August 27, 2024

were you using baseline rescaling before? according to the hash you were not?

from bert_score.

areejokaili commented on August 27, 2024

this is what I used before
score([p], [g], lang="en", verbose=False, rescale_with_baseline=True)
and this is the hash actually
roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

from bert_score.

Tiiiger commented on August 27, 2024

Cool, that looks correct. Let me know if you have any further question.

from bert_score.

areejokaili commented on August 27, 2024

Hi @Tiiiger again,

sorry for asking again but I did a dummy test to compute the similarity between 'server' and 'cloud computing' using two different environments.

First env has bert-score 0.3.0, transformers 2.5.0 and got scores 0.379 0.209 0.289
hash --> roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

The second env, has bert-score 0.3.2, transformers 2.8.0 and got scores -0.092, -0.167 -0.128
hash --> roberta-large_L17_no-idf_version=0.3.2(hug_trans=2.8.0)-rescaled
In both cases I have used the following
P, R, F= score(preds, golds,lang='en', rescale_with_baseline=True, return_hash=True)
I would like to use bert-score 0.3.2 for the multi-refs feature but would like to maintain the same scores as I got before.
Would appreciate any insight why I'm not getting the same score

from bert_score.

Tiiiger commented on August 27, 2024

hi @areejokaili , thank you for letting me know. I suspect that there could be some bugs in the newer version and I would love to fix those.

I am looking into this.

from bert_score.

Tiiiger commented on August 27, 2024

hi I quickly tried a couple of environments. Here are the results:

> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([-0.0919]), tensor([-0.1670]), tensor([-0.1279])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.8.0)-rescaled')

> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([0.3699]), tensor([0.2090]), tensor([0.2893])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled')

I believe this is due to an update in the RoBERTa tokenizer.

Running transformers=2.5.0, I got this warning:

RobertaTokenizerFast has an issue when working on mask language modeling where it introduces an extra encoded space before the mask token.See https://github.com/huggingface/transformers/pull/2778 for more information.

I encourage you to checkout issue 2778 to understand this change.

So, as I understand, this is not a change in our software. If you want to keep the same results as before, then you should downgrade transformers==2.5.0. However, I believe the behavior in transformer==2.8.0 is more correct. It's your call and it really depends on your use case.

Again, thank you for giving me the heads-up. I'll add a warning to our README.

from bert_score.

areejokaili commented on August 27, 2024

Hi @Tiiiger
Thanks for letting me know. I have updated both libraries and will go with Transformers 2.8.0.
I have one more question and would appreciate clarifying what I'm missing here

cands=['I like lemons.']

refs = [['I am proud of you.','I love lemons.','Go go go.']]

(P, R, F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=True, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

--- output ---

> 0.9023454785346985 0.9023522734642029 0.9025075435638428
manual F score: 0.9023488759866588

Do you know why the F score directly from the method is different than when I do it manually?
Thanks again

from bert_score.

felixgwu commented on August 27, 2024

Hi @areejokaili,

The reason is that you are using rescale_with_baseline=True.
The raw F score is computed using the raw P and R, and then rescaled based on the F baseline score. P and R are also rescaled independently based on their own baseline scores as well.

from bert_score.

areejokaili commented on August 27, 2024

Thanks @felixgwu
Could you check this please

cands=['I like lemons.', 'cloud computing']
refs = [['I am proud of you.','I love lemons.','Go go go.'],
        ['calculate this.','I love lemons.','Go go go.']]
print("number of cands and ref are", len(cands), len(refs))
(P,R,F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=False, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

output

> 0.9152767062187195 0.9415446519851685 0.9280155897140503
manual F score: 0.9282248763666026

Appreciate the help,

from bert_score.

rescale and specify certain model about bert_score HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent