Hey I am wondering if it is possible to release the original pretrained model with som

Original Pre-trained Model about spanbert HOT 14 CLOSED

facebookresearch commented on August 20, 2024 4

Original Pre-trained Model

from spanbert.

Comments (14)

mandarjoshi90 commented on August 20, 2024 4

Sorry, I responded to a similar issue a while back but missed this. The model file with the head params is here:
https://dl.fbaipublicfiles.com/fairseq/models/spanbert_large_with_head.tar.gz

from spanbert.

mandarjoshi90 commented on August 20, 2024 2

Yes, that's right. They're from the SBO head.

from spanbert.

chrisjbryant commented on August 20, 2024

I agree. I would like to be able to use SpanBERT as a masked LM, but the current pre-trained models don't allow this.

from spanbert.

zyccyz commented on August 20, 2024

I also want to use SBO and MLM.

from spanbert.

jiajinghu19 commented on August 20, 2024

I agree. I would like to be able to use SpanBERT as a masked LM, but the current pre-trained models don't allow this.

How did you load the current pre-trained model? Thanks

from spanbert.

chrisjbryant commented on August 20, 2024

Hi Mandar,

Thanks for releasing the file! Can I ask if it's HuggingFace compatible?

I managed to load it by changing the filename to pytorch_model.bin and copying the config file from the previously released version, but am getting an error that none of the weights are loaded. Do I need a different config file? Or are there some other settings i need to change?

from spanbert.

mandarjoshi90 commented on August 20, 2024

Hi Chris. It's basically the original checkpoint from fairseq. If you're trying to load it using the latest HF version, then I'd expect some problems. But if it's not working with the code in this repo, then it should be fixable more easily. Happy to help if you could post more details. Thanks!

from spanbert.

chrisjbryant commented on August 20, 2024

Thanks. :) I am indeed using the latest HF version, but let me know if it becomes too much trouble.
Here is as far as I got.

Minimum code:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
model = AutoModelForMaskedLM.from_pretrained("/path/to/spanbert/model/")

Where "path/to/spanbert/model" points to a directory containing config.json and pytorch_model.bin.
pytorch_model.bin is the file you just linked, while config.json comes from the large pretrained model download on the readme. I first want to check this is the right config file or if I need something else.

If I run this, I get the following output (which I truncated):

Some weights of the model checkpoint at path/to/spanbert/model/ were not used when initializing BertForMaskedLM: ['decoder.bert.embeddings.word_embeddings.weight', 'decoder.bert.embeddings.position_embeddings.weight', 'decoder.bert.embeddings.token_type_embeddings.weight', 'decoder.bert.encoder.layer.0.attention.self.query.weight', 'decoder.bert.encoder.layer.0.attention.self.query.bias', 'decoder.bert.encoder.layer.0.attention.self.key.weight', 'decoder.bert.encoder.layer.0.attention.self.key.bias', 'decoder.bert.encoder.layer.0.attention.self.value.weight', 'decoder.bert.encoder.layer.0.attention.self.value.bias', 'decoder.bert.encoder.layer.0.attention.output.dense.weight', 'decoder.bert.encoder.layer.0.attention.output.dense.bias', 'decoder.bert.encoder.layer.0.intermediate.dense.weight', 'decoder.bert.encoder.layer.0.intermediate.dense.bias', 'decoder.bert.encoder.layer.0.output.dense.weight', 'decoder.bert.encoder.layer.0.output.dense.bias',
...

and it basically goes on to list all the weights for all layers.

If I instead load the pretrained model in the readme, the output is:

INFO:transformers.modeling_utils:All model checkpoint weights were used when initializing BertForMaskedLM.

So I suspect this means it's not a HF problem since the old model still loads with the latest HF (even if it doesn't work as a MLM).

Any suggestions?

from spanbert.

mandarjoshi90 commented on August 20, 2024

OK. This should be fixable. I think the keys are slightly different. Could you please try this?

model = OrderedDict()
old = torch.load(input_path)
for k, v in old.items():
    if k[:12] == 'decoder.bert':
        nk = 'bert' + k[12:]
        nk = nk.replace('gamma', 'weight')
        nk = nk.replace('beta', 'bias')
        model[nk] = v

torch.save(model, output_path)

Here input_path is the available file and output_path is the new file.

from spanbert.

chrisjbryant commented on August 20, 2024

Excellent - that fixed the loading warning and now all the weights are used.

I am however also getting the following warning:

Some weights of BertForMaskedLM were not initialized from the model checkpoint at /path/to/spanbert/model/ and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']

And when I used the model to generate predictions for a mask, it doesn't make sensible predictions:
Input: I like to read [MASK] at home
Top 5:

##pect tensor(0.0153, device='cuda:1')
##tures tensor(0.0127, device='cuda:1')
##tend tensor(0.0047, device='cuda:1')
##cess tensor(0.0047, device='cuda:1')
Rail tensor(0.0036, device='cuda:1')

from spanbert.

mandarjoshi90 commented on August 20, 2024

I suspect this is a mismatch of the keys again. I'd check the keys in the checkpoint MLM heads and modify them according to those in the code. That's basically what I was doing with the script.

from spanbert.

chrisjbryant commented on August 20, 2024

Yep, I managed to sort it by modifying your script to the following:

model = OrderedDict()
old = torch.load("pytorch_model.bin.old")
for k, v in old.items():
    if k[:12] == 'decoder.bert':
        nk = 'bert' + k[12:]
        nk = nk.replace('gamma', 'weight')
        nk = nk.replace('beta', 'bias')
        model[nk] = v
    elif k[:11] == 'decoder.cls':
        nk = 'cls' + k[11:]
        model[nk] = v

torch.save(model, "pytorch_model.bin")

It didn't work the first time because all the cls weights were deleted.

Although I do now get sensible output (yay!), I am still left with a couple of warnings:

Some weights of the model checkpoint at /path/to/model/ were not used when initializing BertForMaskedLM: ['cls.pair_target_predictions.bias', 'cls.pair_target_predictions.position_embeddings.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear1.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear1.bias', 'cls.pair_target_predictions.mlp_layer_norm.linear2.weight', 'cls.pair_target_predictions.mlp_layer_norm.linear2.bias', 'cls.pair_target_predictions.decoder.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm1.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm1.bias', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm2.weight', 'cls.pair_target_predictions.mlp_layer_norm.layer_norm2.bias']
Some weights of BertForMaskedLM were not initialized from the model checkpoint at /path/to/model/ and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.predictions.decoder.bias']

I just want to make sure that the above keys (particularly the pair_target_predictions ones) aren't used by the MLM and I can safely ignore them!

from spanbert.

Anwarvic commented on August 20, 2024

hey, @mandarjoshi90 @chrisjbryant! does the warnings mentioned in your previous comment affect the model's performance? If yes, how to get over that?

I know this is kinda an old thread, but I would really appreciate your help!

from spanbert.

houliangxue commented on August 20, 2024

Sorry, I responded to a similar issue a while back but missed this. The model file with the head params is here: https://dl.fbaipublicfiles.com/fairseq/models/spanbert_large_with_head.tar.gz

If I want to continue pre-training spanbert-base, do I have to load spanbert_base_with_head? so Is there spanbert_base_with_head ?

from spanbert.

Original Pre-trained Model about spanbert HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent