Comments (5)
I think that there might be something wrong with your model. The output of a pre-trained BERT model should be 768-dimensional. The linear layer in the classification head of our model is defined as self.linear = Linear(768, 1)
. As you can see, it is expecting a 768-dimensional output of the BERT model, but it seems that the given output is 512-dimensional.
from bert_for_longer_texts.
You can try to modify the BertClassifierNN
class:
class MyClassifierNN(Module):
def __init__(self, model: Union[BertModel, RobertaModel]):
super().__init__()
self.model = model
# classification head
self.linear = Linear(512, 1)
self.sigmoid = Sigmoid()
def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
x = self.model(input_ids, attention_mask)
x = x[0][:, 0, :] # take <s> token (equiv. to [CLS])
# classification head
x = self.linear(x)
x = self.sigmoid(x)
return x
and pass it to the model instead of pretrained_model_name_or_path
:
pretrained_model_name_or_path = 'gabrielearaujo/bumbert-v3'
bert = AutoModel.from_pretrained(pretrained_model_name_or_path)
classifier_nn = MyClassifierNN(bert)
model = BertClassifierWithPooling(**MODEL_PARAMS, device="cuda:0", neural_network=classifier_nn, pretrained_model_name_or_path=None)
However, in the provided model characteristics I see 768-dimensional outputs.
from bert_for_longer_texts.
Can you please provide the parameters of the model (BertClassifierWithPooling
parameters)?
from bert_for_longer_texts.
I reduced the parameters for testing.
MODEL_PARAMS = {
"batch_size": 1,
"learning_rate": 5e-5,
"epochs": 1,
"chunk_size": 510,
"stride": 510,
"minimal_chunk_length": 510,
"pooling_strategy": "mean",
}
model = BertClassifierWithPooling(**MODEL_PARAMS, device="cuda:0", pretrained_model_name_or_path='gabrielearaujo/bumbert-v3')
from bert_for_longer_texts.
Is there a possibility to change this parameter to self.linear = Linear(512, 1)
, as my model is a small version?
The model I am using has the following characteristics:
BertForMaskedLM(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(29794, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(cls): BertOnlyMLMHead(
(predictions): BertLMPredictionHead(
(transform): BertPredictionHeadTransform(
(dense): Linear(in_features=768, out_features=768, bias=True)
(transform_act_fn): GELUActivation()
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(decoder): Linear(in_features=768, out_features=29794, bias=True)
)
)
)
from bert_for_longer_texts.
Related Issues (16)
- Obtain embedding vectors HOT 4
- Write description what does package do in Readme
- Split Sizes Throws Error HOT 2
- Managing GPU memory for token length more than 4000 HOT 1
- MaskedLM for longer texts HOT 1
- QnA system using BERT HOT 2
- Any example colab scripts to fine tune BERT variations for text multi-class classification tasks? HOT 1
- Outputting Attentions HOT 1
- running the fit function doesnt give me any verbose HOT 1
- use it for multiclass classification HOT 1
- Would it be okay to use the code below instead of bert? HOT 1
- text length warning HOT 2
- plz help me HOT 1
- A few general questions HOT 1
- Loss function and optimizer as parameters
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert_for_longer_texts.