cyberzhg / keras-bert Goto Github PK

View Code? Open in Web Editor NEW

2.4K 59.0 510.0 14.13 MB

Implementation of BERT that could load official pre-trained models for feature extraction and prediction

License: MIT License

Shell 0.33% Python 99.67%

keras bert language-model

keras-bert's Introduction

keras-bert's People

Stargazers

Watchers

Forkers

george86028 yxryxryxr3 echo-ray wangxuekui wushicanasl yuanjie-ai vomoboros nikitos9000 bgshin gu5hanl1gh7n1n ngo010 shujian2015 daniter-cu aiedward drivendataorg ym-kang egdenis dkhaha amoliu little1tow gongqingyi-github allensmile zhouyonglong awesome-archive zyxpaidaxing huguanglong leweicn gdtop818 arunkumarramanan ruirongxue jiabaohan bradfox2 jumutc inistlwq hoangcuong2011 adamjm cxz lyoshiwo gsj1029 inanalysis xinpingluo gdh756462786 boluoyu liupeng0606 lduml tongbc billpku wsp317 thethiny psirenny dkyos zzbzzb1413 panchunguang shea1992 tpalczew zawecha1 ericschles lity3lenovo gc20 totalgood kunge huanzhang999 soonhwan-kwon bojone genpeng leekltw elavin11 bidoudhd princessd8251 autwind sangensong wild3fish janciswang shkklt hemidemisemi ibrahim85 liguiming77 ericpts guoqunabc tjunlp ieee820 airob little-girl-1992 cr7wo cliffsong94 ch0831 ch488674662 chenny0808 plodded xiyangde morindaz rileyshe abc3436645 mujizi taichuai frannetty vicky-meng bzp92 leochencipher ross-intelligence

keras-bert's Issues

tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 3531060969 vs. calculated on the restored bytes 1701788620

测试环境

tensorflow1.13.1+python3.6+ubuntu18.04+cuda10.0

keras-bert version: 0.29.0, 0.30.0

报错信息如下

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/notebooks/ner.py", line 95, in train
    embedding = BERTEmbedding('./bert', seq_len)
  File "/usr/local/lib/python3.6/dist-packages/kashgari/embeddings/embeddings.py", line 69, in __init__
    self.build(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kashgari/embeddings/embeddings.py", line 301, in build
    seq_len=self.sequence_length)
  File "/usr/local/lib/python3.6/dist-packages/keras_bert/loader.py", line 71, in load_trained_model_from_checkpoint
    loader('bert/encoder/layer_%d/attention/self/value/kernel' % i),
  File "/usr/local/lib/python3.6/dist-packages/keras_bert/loader.py", line 10, in _loader
    return tf.train.load_variable(checkpoint_file, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 82, in load_variable
    return reader.get_tensor(name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 370, in get_tensor
    status)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 3531060969 vs. calculated on the restored bytes 1701788620

What dict we need to passed when using demo load_and_extract.py

I am trying to use load_and_extract.py but unable to find in 'DICT' path what we need to pass ?

Do we support bert finetune classifier?我们支持 bert 的finetune 标签分类模型吗

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.
如题，最近在使用 bert ，确实在小数据上很管用，想咨询一下，我们这个包是否支持

Special tokens representation

In bert.py, the special tokens in base_dict use angle brackets, but in the pretrained bert models released by Google, the vocab file has them with rectangular brackets. e.g., your TOKEN_UNK is '<UNK>' but in the vocab file it is '[UNK]'. Also, your padding token is an empty string while the vocab file's padding token is '[PAD]'. This causes problems when trying to adapt your bert_fit_demo to continue training the released models, instead of training a model from scratch.

Model of bert that it is using.

I am using this file and after i installed pip3 install keras_bert --user i got i working befor that it was not.

I am wondering which model of bert is it using ?

Use BERT to compute sentence similarity

I want to compute similarity between two sentences (sentA and sentB).
I have encoded each sentence using script i.e. load_and_extract.py. so now embedding matrix of sentA and sentB has shape (1,512,768). After that i am thinking to add fully connected layer to compute the similarity between two sentences.

Note: I am using base model (with 12 hidden layers)

Question: Is this right approach to use BERT for sentence similarity? Furthermore, I have also seen some people are using MaskedGlobalMaxPool1D after hidden layers to encode the sentences. Do I have to take embeddings after applying MaskedGlobalMaxPool1D? Why there is need of MaskedGlobalMaxPool1D ?

Thanks in advance.

What does token_num parameter in get_model function mean?

How to get encoded sentence vector

Hello, Sir

Thank you for implementation Keras version BERT.

Is there any way to get an encoded sentence vector?

I have a sentence like:

The Top 25 Songs That Matter Right Now

And I want to get encoded sentence vector after putting this sentence to BERT.

Thx again.

Is this able to reproduce original results?

Hi,

Thanks for the repo and your time! Do you know if this is able to reproduce the original results in terms of accuracy and model speed?

Thank you for your time and God bless!

(Sorry I mislabeled this as a bug.)

Why does the model need to know the size of the dictionary?

About parameter trainable for classifier

Hi. When using Bert model to classify tasks, the following code

If trainable = True is set and then any classifier is added, the result will only be biased towards a certain category, whether multi-classification or bi-classification.
Non-trainable params: 0 in the result of model. summary ().
But if trainable is not set, the other codes are exactly the same, and the result is quite normal. At this point Non-trainable params: 101, 306, 880

So how to train fine-tune correctly?

bert_model = load_trained_model_from_checkpoint(config_path, checkpoint_path, seq_len=maxlen, 
                              training=False, trainable=True)
pool_layer = MaskedGlobalMaxPool1D(name='Pooling')(bert_model.output)
out = Dense(32, activation='relu')(pool_layer)
output = Dense(units=class_num, activation=output_activation)(out)
model = Model(bert_model.input, output)
model.compile(loss=model_loss, optimizer=model_optimizer, metrics=['categorical_accuracy'])

Tokenization on Cased Model

Is your feature request related to a problem? Please describe.
On the original BERT implementation from Google, the tokenizer does not perform normalization (lower casing, accent stripping, or Unicode normalization) on the input when using a cased model. e.g. Multilingual Cased

Describe the solution you'd like
In the Tokenization class, prevent normalization on the input when the model is cased.

Additional context
Lines 71 - 73 in tokenizer.py

Inconsistency definition of the training param of load_trained_model_from_checkpoint function

Inconsistency definition of the training param of load_trained_model_from_checkpoint function

I loaded the pre-trained BERT model from an official tf checkpoint, using load_trained_model_from_checkpoint with param training=False.
I don't want to train the BERT model from scratch, i.e. by MLM or NSP, however, I do want my downstream data will somehow update params inside BERT model. As shown in the fig below, the bert is trainable as a keras model, however all weights inside the model are non-trainable weights.

I'm confused is there anything wrong with my code or anything wrong with the training param?

bug in Custom Feature Extraction

def _custom_layers(x, trainable=True):
    return keras.layers.LSTM(
        units=768,
        trainable=trainable,
        name='LSTM',
    )(x)

It seems that the lstm custom_layers　should add "return_sequences=True"

save and load model

Is your feature request related to a problem? Please describe.
When the model is saved and loaded, error happens due to "mask".
So far the workaround is saving weights and loading weights.

Thanks

OOV (out of vocab)

Is your feature request related to a problem? Please describe.
Some words are out of the vocab.txt (which is from https://github.com/google-research/bert and gives 30522 words), e.g., "edits" is OOV.

Describe the solution you'd like
I am thinking random initialize a 512 embedding for the OOV words, but not sure it works, and how.
Besides, since OOV is a very common problem, maybe there is already some off-the-shelf solution code for keras-BERT ?
Many thanks
Describe alternatives you've considered
NA

Additional context

SQuAD

Hi,
is it possible to apply the fine tuning of the original BERT model on SQuAD dataset as in the paper?

if reps is tensor.vector, you should specify the ndim

I am trying to run the code :
from keras_bert import get_base_dict, get_model, gen_batch_inputs

A toy input example

sentence_pairs = [
[['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
[['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
[['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]

Build token dictionary

token_dict = get_base_dict() # A dict that contains some special tokens
for pairs in sentence_pairs:
for token in pairs[0] + pairs[1]:
if token not in token_dict:
token_dict[token] = len(token_dict)
token_list = list(token_dict.keys()) # Used for selecting a random word

Build & train the model

model = get_model(
token_num=len(token_dict),
head_num=5,
transformer_num=12,
embed_dim=25,
feed_forward_dim=100,
seq_len=20,
pos_num=20,
dropout_rate=0.05,
)
model.summary()

But it's showing me this error :
ValueError Traceback (most recent call last)
in ()
28 seq_len=20,
29 pos_num=20,
---> 30 dropout_rate=0.05,
31 )
32 model.summary()

~\Anaconda3\lib\keras_bert\bert.py in get_model(token_num, pos_num, seq_len, embed_dim, transformer_num, head_num, feed_forward_dim, dropout_rate, attention_activation, feed_forward_activation, custom_layers, training, lr)
63 pos_num=pos_num,
64 dropout_rate=dropout_rate,
---> 65 trainable=training,
66 )
67 transformed = embed_layer

~\Anaconda3\lib\keras_bert\layers\embedding.py in get_embedding(inputs, token_num, pos_num, embed_dim, dropout_rate, trainable)
54 trainable=trainable,
55 name='Embedding-Position',
---> 56 )(embed_layer)
57 if dropout_rate > 0.0:
58 dropout_layer = keras.layers.Dropout(

~\Anaconda3\lib\site-packages\keras\engine\base_layer.py in call(self, inputs, **kwargs)
455 # Actually call the layer,
456 # collecting output(s), mask(s), and shape(s).
--> 457 output = self.call(inputs, **kwargs)
458 output_mask = self.compute_mask(inputs, previous_mask)
459

~\Anaconda3\lib\keras_pos_embd\pos_embd.py in call(self, inputs, **kwargs)
128 pos_embeddings = K.tile(
129 K.expand_dims(self.embeddings[:seq_len, :self.output_dim], axis=0),
--> 130 K.stack([batch_size, 1, 1]),
131 )
132 if self.mode == self.MODE_ADD:

~\Anaconda3\lib\site-packages\keras\backend\theano_backend.py in tile(x, n)
1067
1068 def tile(x, n):
-> 1069 y = T.tile(x, n)
1070 if hasattr(x, '_keras_shape'):
1071 if _is_explicit_shape(n):

~\Anaconda3\lib\site-packages\theano\tensor\basic.py in tile(x, reps, ndim)
5413 elif ndim_check == 1:
5414 if ndim is None:
-> 5415 raise ValueError("if reps is tensor.vector, you should specify "
5416 "the ndim")
5417 else:

ValueError: if reps is tensor.vector, you should specify the ndim
Any help please

ImportError : cannot import name 'Tokenizer'

Describe the Bug

when I use the keras_bert to execute the demo https://github.com/CyberZHG/keras-bert/blob/master/demo/load_model/load_and_extract.py, it gives me a error about the import the Tokenizer. The detailed description is shown as follow:

And What is the cause of this problem?
thanks

Minimal Codes To Reproduce

import keras_bert

from keras_bert import load_trained_model_from_checkpoint
from keras_bert import Tokenizer

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-27-cdcfeee15447> in <module>()
      1 from keras_bert import load_trained_model_from_checkpoint
----> 2 from keras_bert import Tokenizer

ImportError: cannot import name 'Tokenizer'

Explanation of parameter max_len in tokenizer.encode ？

It seems that this parameter should be capable with the pretrained BERT model , but where to find the setting in donwleded bert_config?

Sentence Pair Classification

How can I use this library for sentence pair classification?

could you provide a demo for classifier?

I want to use keras-bert on my project, it's more clean than bert original source for me. But I didn't know how to convert bert classifier source to keras-bert classifer.
Could you provide a demo for model.get_pooled_output() classifier ?

Regards.

Minor issue in demo

Hi,

in https://github.com/CyberZHG/keras-bert/blob/master/demo/load_model/load_and_extract.py#L28

the sequence input is encoded as

seg_input = np.asarray([[0] * len(tokens) + [0] * (512 - len(tokens))])

I think the second part should be [1], i.e.

seg_input = np.asarray([[0] * len(tokens) + [1] * (512 - len(tokens))])

Only then is the padding part masked correctly. For the demo it does not make any difference directly as only the len(tokens) tokens are checked. But when this is used as a basis for something more complex, this might create a problem.

关于指定seq_len的长度？

您好，我想问下在load bert模型时可不可以不指定seq的长度呀？就比如在做NER任务时预测时，每条句子的长度都是不一样的．．

single sentence classifier

Does it support single sentence classifier (not pair-sentence) ?
Such as SST (http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip) or Cola dataset (https://nyu-mll.github.io/CoLA/), described in original BERT paper https://arxiv.org/pdf/1810.04805.pdf Section 4.1 GLUE dataset

If yes, what data format it should be ?
Can we simply input [sentence , padding_zeros] as a pair and get reasonable training ?

Is that possible to change input shape from 512 to 20 when using pre-trained model

Is that possible to change input shape from 512 to 20 when using a pre-trained model?
I want to fine-tune a short text classification with Keras.

How to apply BERT to a cloze-style QA task?

Great job! Can I apply it to the cloze-style QA task, which is to predict the words that are masked according to the context. How should the input of the model be organized?

For example:

The dataset looks like:

From Monday to Friday most people are busy working or studying, but in the evenings and weekends they are free and _ themselves.
And there are four candidate answers :

options": [
[
"love",
"work",
"enjoy",
"play"
]

Apparently the correct answer is "enjoy"， How can I organize the input so that BERT can predict the missing word given the context? Thank you for your excellent code!

Using released BERT pre-trained weights

Awesome job that you did !

Do you plan implementing the BERT pre-trained weights, now that they are released ?

using model.fit instead of model.fit_generator

hello!

i am trying to build a classifier on top of the keras-bert model. however, when I attempt to run model.fit on the new model, I constantly get ValueError: An operation has `None` for gradient.

upon further testing, running model.fit on the loaded keras-bert model alone also gives this error. as such, may I clarify if fit_generator must be used for fine-tuningthe BERT model?

(my rationale for not using the original BERT code to fine-tune the model is because I am making use of keras shared layers as part of my classifier, which does not appear to be doable with tensorflow alone)

thank you so much!

如何获取句子向量？

您好，
您的代码对我帮助非常大，现在得到的是字的向量，请问如何得到句子向量呢？

batch prediction ?

how to feed a batch of data to the model ?

model.input_shape
[(None, 512), (None, 512)]

When I was trying to feed the model with a batch of data, such as data shape = [64, 2, 512] it gives
"ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays ..."

,
I was using something like this:

from keras.layers import Lambda
import keras.backend as K 
def mean(x):
    return K.mean(x, axis=1, keepdims=False)
###
inputs, embeds = get_model(
    token_num=len(token_dict),
    head_num=12,
    transformer_num=12,
    embed_dim=768,
    feed_forward_dim=100,
    seq_len=512,
    pos_num=512,
    dropout_rate=0.05,
    training = False,
)
####
avg_embeds = Lambda(mean)(embeds)
pred = Dense(1, activation="sigmoid")(avg_embeds)
model = Model(inputs=inputs, outputs=pred)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])

Thanks

MaskConv1D

Is your feature request related to a problem? Please describe.
Error when adding a Conv1D layer on top of the BERT embeddings, complaining about "mask".

Describe the solution you'd like
Can you make a class for MaskConv1D layer, similar as MaskGlobalMaxPooling ?

Describe alternatives you've considered
PR

Additional context
N/A

How to do NER task? not sentence pairs

how to input the ner data to model? not sentence pairs

Do you change the shape of the prediction result

Thanks for your great work.
I see the original BERT prediction return 3 numpy, but you demo(load_and_predict.py) only return 2 numpy. Not sure if i'm misunderstand you or the original BERT.

tf.placeholder don't exist in tf 2.0

Describe the Bug

On tensorflow 2.0, there is no tf.placeholder

Traceback (most recent call last):
  File "load_and_extract.py", line 18, in <module>
    model = load_trained_model_from_checkpoint(config_path, checkpoint_path)
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/loader.py", line 43, in load_trained_model_from_checkpoint
    training=training,
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/bert.py", line 58, in get_model
    inputs = get_inputs(seq_len=seq_len)
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/layers/inputs.py", line 15, in get_inputs
    ) for name in names]
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/layers/inputs.py", line 15, in <listcomp>
    ) for name in names]
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 517, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

Version Info

tf-nightly-gpu-2.0-preview 2.0.0.dev20190304
Keras 2.2.4
python 3.5.2

Building custom model over the final embedding layer

BERT supposedly generates 768 dimensional embeddings for tokens. I am trying to build a multi-class classification model on top of this. My assumption is that the output of layer Encoder-12-FeedForward-Norm of shape (None, [seq_length], 768) would give this embeddings. This is what I am trying :

model = load_trained_model_from_checkpoint(config_path, checkpoint_path, training=True, seq_len=seq_len)

new_out = Bidirectional(LSTM(50, return_sequences=True, 
                       dropout=0.1, 
                       recurrent_dropout=0.1))(model.layers[-9].output)
new_out = GlobalMaxPool1D()(new_out)
new_out = Dense(50, activation='relu')(new_out)
new_out = Dropout(0.1)(new_out)
new_out = Dense(6, activation='sigmoid')(new_out)

newModel = Model(model.inputs[:2], new_out)

I get the following error for new_out = GlobalMaxPool1D()(new_out) :

TypeError: Layer global_max_pooling1d_11 does not support masking, but was passed an input_mask: Tensor("Encoder-12-FeedForward-Add/All:0", shape=(?, 128), dtype=bool)

I am not sure how masking is involved if I am just using the output of the encoder.

The paper mentions that the output corresponding to just the first [CLS] token should be used for classification. On trying this :

new_out = Lambda(lambda x: x[:,0,:])(model.layers[-9].output)

the model trains (although with poor results).

How can the pre-loaded model be used for classification?

How to use Bert in keras Embedding layer?

Hi,

I find embedding.py
https://github.com/CyberZHG/keras-bert/blob/master/keras_bert/layers/embedding.py

but how can i add it into my Keras model, and how i input data to pretrain bert embedding?

Keras model is:

model = Sequential()
model.add(TokenEmbedding)
model.add(Bidirectional(LSTM(256, return_sequences=True)))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(len(tag2index))))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(0.001),metrics=['accuracy'])

is error, the error message is :
TypeError: The added layer must be an instance of class Layer. Found: <class 'main.TokenEmbedding'>

Unable to load official pre-trained checkpoint using load_trained_model_from_checkpoint

When I try to use load_trained_model_from_checkpoint() to load the official BERT-base uncased model (https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip), I get the following error:

Traceback (most recent call last):
  File "ner_training_code_BERT.py", line 139, in <module>
    model, loss = build_model()
  File "ner_training_code_BERT.py", line 67, in build_model
    bert_model = load_trained_model_from_checkpoint(_bert_config_path, _bert_checkpoint_path, training=False)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/keras_bert/loader.py", line 26, in load_trained_model_from_checkpoint
    tf.train.load_variable(checkpoint_file, 'bert/embeddings/word_embeddings'),
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 81, in load_variable
    return reader.get_tensor(name)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 334, in get_tensor
    status)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: bert/embeddings/word_embeddings not found in checkpoint file

Seems like the op names that we're expecting does not match with the official model?

i have a matrix [batch_size,sequnce_len],is:issue is:open how could i get my data' embedding [batch_size,sequence_len,768]by the test_embedding.py

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

How to apply BERT to a custom classification model?

Let's say I'm trying to load a pretrained BERT model from a checkpoint, add layers after it and build the model with Model.compile():

What does the input have to look like if BERT is the first "layer"? Would be nice to have a simple example.

Retraining vs. training from scratch inconsistency?

Hello, thanks for your effort and work in this project. I attempt to train two models using the same data and generator code (based on gen_batch_inputs in bert.py), once using get_model and the other using load_model_from_checkpoint. As an aside, I am still unsure whether the second option is ok: your readme states that "Official pre-trained models could be loaded for feature extraction and prediction" but in Issue #1 you seem to say that the official models cannot be loaded correctly with this implementation.
In any case, when using get_model training proceeds as expected, but when using load_model_from_checkpoint, I get:
ValueError: Error when checking target: expected MLM to have shape (512, 30522) but got array with shape (512,1)
(30522 is the length of my token_list / size of my token_dict)
Am I missing an obvious reason for this, or is this a problem with the code?

finetune BERT with custom dataset

Is your feature request related to a problem? Please describe.
Wish to finetune BERT (MLM, PairSentence) with customer dataset, e.g. text exacted from a book.

Describe the solution you'd like

Describe alternatives you've considered
Which function wherein we can feed a customer dataset, for example, a text file from a book ?
Do we need write a function to format the text file so that it can be taken by BERT ?

Additional context

Is there a function to build sentence_pairs in train_use.py?

I have my own data chunk and vocab. I want to train BERT in Keras. However I could not find a function to build sentence_pairs which is used as a parameter in gen_batch_inputs function.

Padding zeros

Hi,

Thank you for your keras BERT code. I'd like to report a padding issue.

The rest of segment ids should be padded with zeros not ones.

segments += [1] * (max_len - len(segments))
==>
segments += [0] * (max_len - len(segments))

code from tokenizer.py link

MaskedGlobalAveragePool1D

Is your feature request related to a problem? Please describe.
Masked Global Average Pool1D

Describe the solution you'd like

Hi Cyber,
I wrote this piece of code with comments, could you help take a look ? Thanks. I initially used it and seems working fine.

class MaskedGlobalAveragePool1D(keras.layers.Layer): 
  
     def __init__(self, **kwargs): 
         super(MaskedGlobalAveragePool1D, self).__init__(**kwargs) 
         self.supports_masking = True 
  
     def compute_mask(self, inputs, mask=None): 
         return None 
  
     def compute_output_shape(self, input_shape): 
         return input_shape[:-2] + (input_shape[-1],) 
  
     def call(self, inputs, mask=None): 
         if mask is not None: 
             mask = K.cast(mask, K.floatx())  # cast mask to float
             inputs *= K.expand_dims(mask, axis=-1) # zero wherever mask=0.0
         return K.mean(inputs, axis=-2) # average through time

TypeError: Tensors in list passed to 'values' of 'Pack' Op have types

Line11 : custom_layers=_custom_layers

TypeError: Tensors in list passed to 'values' of 'Pack' Op have types [bool, ] that don't all match.

Anyone having the same error and solution ? Thanks

Setting up Keras Bert for Reading Comprehension

I am looking at section 4.2 of the BERT paper or how to set up BERT for reading comprehension. It looks like a module needs to be added to the end of BERT, S and E are new parameters, and a log softmax loss is calculated for the start and end positions.

This extension is included in the original tensorflow BERT in the 'run_squad.py' script in the repository.

Does does an extension exist for BERT Keras?

def _generator(sentence_pairs, batch_size, seq_len):
    while True:
        inp, outp = gen_batch_inputs(sentence_pairs, token_dict, token_list, seq_len=seq_len)
        for i in range (0, len(sentence_pairs), batch_size):
            yield [inp[x][i:i+batch_size] for x in range(len(inp))], [outp[x][i:i+batch_size] for x in range (len(outp))]
            
def _generator_v(sentence_pairs, batch_size, seq_len):
    while True:
        for i in range(0, len(sentence_pairs), batch_size):
            yield gen_batch_inputs(sentence_pairs[i:i+batch_size], token_dict, token_list, seq_len=seq_len)