Giter VIP home page Giter VIP logo

Comments (11)

Tiiiger avatar Tiiiger commented on August 27, 2024

hi @areejokaili , sorry for the confusion.

The code below should meet your use case.

out, hash_code= score(preds, golds, model_type="roberta-large", rescale_with_baseline= True, return_hash=True)

from bert_score.

areejokaili avatar areejokaili commented on August 27, 2024

roberta-large

Hi @Tiiiger, thanks for the quick reply.
Tried your provided code but It required lang='en'.

scorer = BERTScorer(model_type='roberta-large', lang='en', rescale_with_baseline=True)

It works now, but I'm getting different scores than before. I was doing my own multi-refs scoring before, so maybe this is why.
I'll investigate more

from bert_score.

Tiiiger avatar Tiiiger commented on August 27, 2024

were you using baseline rescaling before? according to the hash you were not?

from bert_score.

areejokaili avatar areejokaili commented on August 27, 2024

this is what I used before
score([p], [g], lang="en", verbose=False, rescale_with_baseline=True)
and this is the hash actually
roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

from bert_score.

Tiiiger avatar Tiiiger commented on August 27, 2024

Cool, that looks correct. Let me know if you have any further question.

from bert_score.

areejokaili avatar areejokaili commented on August 27, 2024

Hi @Tiiiger again,

sorry for asking again but I did a dummy test to compute the similarity between 'server' and 'cloud computing' using two different environments.

First env has bert-score 0.3.0, transformers 2.5.0 and got scores 0.379 0.209 0.289
hash --> roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

The second env, has bert-score 0.3.2, transformers 2.8.0 and got scores -0.092, -0.167 -0.128
hash --> roberta-large_L17_no-idf_version=0.3.2(hug_trans=2.8.0)-rescaled
In both cases I have used the following
P, R, F= score(preds, golds,lang='en', rescale_with_baseline=True, return_hash=True)
I would like to use bert-score 0.3.2 for the multi-refs feature but would like to maintain the same scores as I got before.
Would appreciate any insight why I'm not getting the same score

from bert_score.

Tiiiger avatar Tiiiger commented on August 27, 2024

hi @areejokaili , thank you for letting me know. I suspect that there could be some bugs in the newer version and I would love to fix those.

I am looking into this.

from bert_score.

Tiiiger avatar Tiiiger commented on August 27, 2024

hi I quickly tried a couple of environments. Here are the results:

> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([-0.0919]), tensor([-0.1670]), tensor([-0.1279])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.8.0)-rescaled')
> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([0.3699]), tensor([0.2090]), tensor([0.2893])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled')

I believe this is due to an update in the RoBERTa tokenizer.

Running transformers=2.5.0, I got this warning:

RobertaTokenizerFast has an issue when working on mask language modeling where it introduces an extra encoded space before the mask token.See https://github.com/huggingface/transformers/pull/2778 for more information.

I encourage you to checkout issue 2778 to understand this change.

So, as I understand, this is not a change in our software. If you want to keep the same results as before, then you should downgrade transformers==2.5.0. However, I believe the behavior in transformer==2.8.0 is more correct. It's your call and it really depends on your use case.

Again, thank you for giving me the heads-up. I'll add a warning to our README.

from bert_score.

areejokaili avatar areejokaili commented on August 27, 2024

Hi @Tiiiger
Thanks for letting me know. I have updated both libraries and will go with Transformers 2.8.0.
I have one more question and would appreciate clarifying what I'm missing here

cands=['I like lemons.']

refs = [['I am proud of you.','I love lemons.','Go go go.']]

(P, R, F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=True, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

--- output ---

> 0.9023454785346985 0.9023522734642029 0.9025075435638428
manual F score: 0.9023488759866588

Do you know why the F score directly from the method is different than when I do it manually?
Thanks again

from bert_score.

felixgwu avatar felixgwu commented on August 27, 2024

Hi @areejokaili,

The reason is that you are using rescale_with_baseline=True.
The raw F score is computed using the raw P and R, and then rescaled based on the F baseline score. P and R are also rescaled independently based on their own baseline scores as well.

from bert_score.

areejokaili avatar areejokaili commented on August 27, 2024

Thanks @felixgwu
Could you check this please

cands=['I like lemons.', 'cloud computing']
refs = [['I am proud of you.','I love lemons.','Go go go.'],
        ['calculate this.','I love lemons.','Go go go.']]
print("number of cands and ref are", len(cands), len(refs))
(P,R,F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=False, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

output

> 0.9152767062187195 0.9415446519851685 0.9280155897140503
manual F score: 0.9282248763666026

Appreciate the help,

from bert_score.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.