Giter VIP home page Giter VIP logo

Comments (14)

mandarjoshi90 avatar mandarjoshi90 commented on August 20, 2024 4

Sorry, I responded to a similar issue a while back but missed this. The model file with the head params is here:
https://dl.fbaipublicfiles.com/fairseq/models/spanbert_large_with_head.tar.gz

from spanbert.

mandarjoshi90 avatar mandarjoshi90 commented on August 20, 2024 2

Yes, that's right. They're from the SBO head.

from spanbert.

chrisjbryant avatar chrisjbryant commented on August 20, 2024

I agree. I would like to be able to use SpanBERT as a masked LM, but the current pre-trained models don't allow this.

from spanbert.

zyccyz avatar zyccyz commented on August 20, 2024

I also want to use SBO and MLM.

from spanbert.

jiajinghu19 avatar jiajinghu19 commented on August 20, 2024

I agree. I would like to be able to use SpanBERT as a masked LM, but the current pre-trained models don't allow this.

How did you load the current pre-trained model? Thanks

from spanbert.

chrisjbryant avatar chrisjbryant commented on August 20, 2024

Hi Mandar,

Thanks for releasing the file! Can I ask if it's HuggingFace compatible?

I managed to load it by changing the filename to pytorch_model.bin and copying the config file from the previously released version, but am getting an error that none of the weights are loaded. Do I need a different config file? Or are there some other settings i need to change?

from spanbert.

mandarjoshi90 avatar mandarjoshi90 commented on August 20, 2024

Hi Chris. It's basically the original checkpoint from fairseq. If you're trying to load it using the latest HF version, then I'd expect some problems. But if it's not working with the code in this repo, then it should be fixable more easily. Happy to help if you could post more details. Thanks!

from spanbert.

chrisjbryant avatar chrisjbryant commented on August 20, 2024

Thanks. :) I am indeed using the latest HF version, but let me know if it becomes too much trouble.
Here is as far as I got.

Minimum code:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
model = AutoModelForMaskedLM.from_pretrained("/path/to/spanbert/model/")

Where "path/to/spanbert/model" points to a directory containing config.json and pytorch_model.bin.
pytorch_model.bin is the file you just linked, while config.json comes from the large pretrained model download on the readme. I first want to check this is the right config file or if I need something else.

If I run this, I get the following output (which I truncated):

Some weights of the model checkpoint at path/to/spanbert/model/ were not used when initializing BertForMaskedLM: ['decoder.bert.embeddings.word_embeddings.weight', 'decoder.bert.embeddings.position_embeddings.weight', 'decoder.bert.embeddings.token_type_embeddings.weight', 'decoder.bert.encoder.layer.0.attention.self.query.weight', 'decoder.bert.encoder.layer.0.attention.self.query.bias', 'decoder.bert.encoder.layer.0.attention.self.key.weight', 'decoder.bert.encoder.layer.0.attention.self.key.bias', 'decoder.bert.encoder.layer.0.attention.self.value.weight', 'decoder.bert.encoder.layer.0.attention.self.value.bias', 'decoder.bert.encoder.layer.0.attention.output.dense.weight', 'decoder.bert.encoder.layer.0.attention.output.dense.bias', 'decoder.bert.encoder.layer.0.intermediate.dense.weight', 'decoder.bert.encoder.layer.0.intermediate.dense.bias', 'decoder.bert.encoder.layer.0.output.dense.weight', 'decoder.bert.encoder.layer.0.output.dense.bias',
...

and it basically goes on to list all the weights for all layers.

If I instead load the pretrained model in the readme, the output is:

INFO:transformers.modeling_utils:All model checkpoint weights were used when initializing BertForMaskedLM.

So I suspect this means it's not a HF problem since the old model still loads with the latest HF (even if it doesn't work as a MLM).

Any suggestions?

from spanbert.

mandarjoshi90 avatar mandarjoshi90 commented on August 20, 2024

OK. This should be fixable. I think the keys are slightly different. Could you please try this?

model = OrderedDict()
old = torch.load(input_path)
for k, v in old.items():
    if k[:12] == 'decoder.bert':
        nk = 'bert' + k[12:]
        nk = nk.replace('gamma', 'weight')
        nk = nk.replace('beta', 'bias')
        model[nk] = v

torch.save(model, output_path)

Here input_path is the available file and output_path is the new file.

from spanbert.

chrisjbryant avatar chrisjbryant commented on August 20, 2024

Excellent - that fixed the loading warning and now all the weights are used.

I am however also getting the following warning:

Some weights of BertForMaskedLM were not initialized from the model checkpoint at /path/to/spanbert/model/ and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']

And when I used the model to generate predictions for a mask, it doesn't make sensible predictions:
Input: I like to read [MASK] at home
Top 5:

##pect tensor(0.0153, device='cuda:1')
##tures tensor(0.0127, device='cuda:1')
##tend tensor(0.0047, device='cuda:1')
##cess tensor(0.0047, device='cuda:1')
Rail tensor(0.0036, device='cuda:1')

from spanbert.

mandarjoshi90 avatar mandarjoshi90 commented on August 20, 2024

I suspect this is a mismatch of the keys again. I'd check the keys in the checkpoint MLM heads and modify them according to those in the code. That's basically what I was doing with the script.

from spanbert.

chrisjbryant avatar chrisjbryant commented on August 20, 2024

Yep, I managed to sort it by modifying your script to the following:

model = OrderedDict()
old = torch.load("pytorch_model.bin.old")
for k, v in old.items():
    if k[:12] == 'decoder.bert':
        nk = 'bert' + k[12:]
        nk = nk.replace('gamma', 'weight')
        nk = nk.replace('beta', 'bias')
        model[nk] = v
    elif k[:11] == 'decoder.cls':
        nk = 'cls' + k[11:]
        model[nk] = v

torch.save(model, "pytorch_model.bin")

It didn't work the first time because all the cls weights were deleted.

Although I do now get sensible output (yay!), I am still left with a couple of warnings:

Some weights of the model checkpoint at /path/to/model/ were not used when initializing BertForMaskedLM: ['cls.pair_target_predictions.bias', 'cls.pair_target_predictions.position_embeddings.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear1.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear1.bias', 'cls.pair_target_predictions.mlp_layer_norm.linear2.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear2.bias', 'cls.pair_target_predictions.decoder.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm1.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm1.bias', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm2.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm2.bias']
Some weights of BertForMaskedLM were not initialized from the model checkpoint at /path/to/model/ and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.predictions.decoder.bias']

I just want to make sure that the above keys (particularly the pair_target_predictions ones) aren't used by the MLM and I can safely ignore them!

from spanbert.

Anwarvic avatar Anwarvic commented on August 20, 2024

hey, @mandarjoshi90 @chrisjbryant! does the warnings mentioned in your previous comment affect the model's performance? If yes, how to get over that?

I know this is kinda an old thread, but I would really appreciate your help!

from spanbert.

houliangxue avatar houliangxue commented on August 20, 2024

Sorry, I responded to a similar issue a while back but missed this. The model file with the head params is here: https://dl.fbaipublicfiles.com/fairseq/models/spanbert_large_with_head.tar.gz

If I want to continue pre-training spanbert-base, do I have to load spanbert_base_with_head? so Is there spanbert_base_with_head ?

from spanbert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.