RuntimeError with the following message: "mat1 and mat2 shapes cannot be multiplied (2x512 and 768x1) about bert_for_longer_texts HOT 5 CLOSED

GabrieleAraujo commented on September 14, 2024

RuntimeError with the following message: "mat1 and mat2 shapes cannot be multiplied (2x512 and 768x1)

from bert_for_longer_texts.

Comments (5)

mwachnicki commented on September 14, 2024 1

I think that there might be something wrong with your model. The output of a pre-trained BERT model should be 768-dimensional. The linear layer in the classification head of our model is defined as self.linear = Linear(768, 1). As you can see, it is expecting a 768-dimensional output of the BERT model, but it seems that the given output is 512-dimensional.

from bert_for_longer_texts.

mwachnicki commented on September 14, 2024 1

You can try to modify the BertClassifierNN class:

class MyClassifierNN(Module):
    def __init__(self, model: Union[BertModel, RobertaModel]):
        super().__init__()
        self.model = model

        # classification head
        self.linear = Linear(512, 1)
        self.sigmoid = Sigmoid()

    def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
        x = self.model(input_ids, attention_mask)
        x = x[0][:, 0, :]  # take <s> token (equiv. to [CLS])

        # classification head
        x = self.linear(x)
        x = self.sigmoid(x)
        return x

and pass it to the model instead of pretrained_model_name_or_path:

pretrained_model_name_or_path = 'gabrielearaujo/bumbert-v3'

bert = AutoModel.from_pretrained(pretrained_model_name_or_path)
classifier_nn = MyClassifierNN(bert)
model = BertClassifierWithPooling(**MODEL_PARAMS, device="cuda:0", neural_network=classifier_nn, pretrained_model_name_or_path=None)

However, in the provided model characteristics I see 768-dimensional outputs.

from bert_for_longer_texts.

mwachnicki commented on September 14, 2024

Can you please provide the parameters of the model (BertClassifierWithPooling parameters)?

from bert_for_longer_texts.

GabrieleAraujo commented on September 14, 2024

I reduced the parameters for testing.

MODEL_PARAMS = {
    "batch_size": 1,
    "learning_rate": 5e-5,
    "epochs": 1,
    "chunk_size": 510,
    "stride": 510,
    "minimal_chunk_length": 510,
    "pooling_strategy": "mean",
}
model = BertClassifierWithPooling(**MODEL_PARAMS, device="cuda:0", pretrained_model_name_or_path='gabrielearaujo/bumbert-v3')

from bert_for_longer_texts.

GabrieleAraujo commented on September 14, 2024

Is there a possibility to change this parameter to self.linear = Linear(512, 1), as my model is a small version?

The model I am using has the following characteristics:

BertForMaskedLM(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(29794, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (cls): BertOnlyMLMHead(
    (predictions): BertLMPredictionHead(
      (transform): BertPredictionHeadTransform(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (transform_act_fn): GELUActivation()
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      )
      (decoder): Linear(in_features=768, out_features=29794, bias=True)
    )
  )
)

from bert_for_longer_texts.

RuntimeError with the following message: "mat1 and mat2 shapes cannot be multiplied (2x512 and 768x1) about bert_for_longer_texts HOT 5 CLOSED

Comments (5)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent