cyberzhg / keras-xlnet Goto Github PK

View Code? Open in Web Editor NEW

172.0 6.0 26.0 1 MB

Implementation of XLNet that can load pretrained checkpoints

Home Page: https://pypi.org/project/keras-xlnet/

License: MIT License

Shell 0.50% Python 99.50%

keras xlnet language-model nlp glue

keras-xlnet's Introduction

Keras XLNet

License

[中文|English]

Unofficial implementation of XLNet. Embedding extraction and embedding extract with memory show how to get the outputs of the last transformer layer using pre-trained checkpoints.

Install

pip install keras-xlnet

Usage

Fine-tuning on GLUE

Click the task name to see the demos with base model:

Task Name	Metrics	Approximate Results on Dev Set
CoLA	Matthew Corr.	52
SST-2	Accuracy	93
MRPC	Accuracy/F1	86/89
STS-B	Pearson Corr. / Spearman Corr.	86/87
QQP	Accuracy/F1	90/86
MNLI	Accuracy	84/84
QNLI	Accuracy	86
RTE	Accuracy	64
WNLI	Accuracy	56

(Only 0s are predicted in WNLI dataset)

Load Pretrained Checkpoints

import os
from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, ATTENTION_TYPE_BI

checkpoint_path = '.../xlnet_cased_L-24_H-1024_A-16'

tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))
model = load_trained_model_from_checkpoint(
    config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
    checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
    batch_size=16,
    memory_len=512,
    target_len=128,
    in_train_phase=False,
    attention_type=ATTENTION_TYPE_BI,
)
model.summary()

Arguments batch_size, memory_len and target_len are maximum sizes used for initialization of memories. The model used for training a language model is returned if in_train_phase is True, otherwise a model used for fine-tuning will be returned.

About I/O

Note that shuffle should be False in either fit or fit_generator if memories are used.

`in_train_phase` is `False`

3 inputs:

IDs of tokens, with shape (batch_size, target_len).
IDs of segments, with shape (batch_size, target_len).
Length of memories, with shape (batch_size, 1).

1 output:

The feature for each token, with shape (batch_size, target_len, units).

`in_train_phase` is `True`

4 inputs:

IDs of tokens, with shape (batch_size, target_len).
IDs of segments, with shape (batch_size, target_len).
Length of memories, with shape (batch_size, 1).
Masks of tokens, with shape (batch_size, target_len).

1 output:

The probability of each token in each position, with shape (batch_size, target_len, num_token).

keras-xlnet's People

Contributors

Stargazers

Watchers

keras-xlnet's Issues

Visualization

For training

For fine-tuning

Similarity with Original XLNet

I want to ask whether there are any differences between this repo's implementation and the original [https://github.com/zihangdai/xlnet] implementation of XLNet? If yes, then how much affect on the original results can be expected with this implementation?
Thanks

AttributeError: 'Node' object has no attribute 'output_masks'

Describe the Bug
I am getting AttributeError: 'Node' object has no attribute 'output_masks' when I use keras-xlnet
Version Info
keras 2.2.0
tensorflow 1.9.0
keras-xlnet 0.16.0
scikit-learn 0.19.1
numpy 1.19.5
python 3.6.13
Minimal Codes To Reproduce
The location of the error is as follows：

File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras_xlnet\xlnet.py", line 128, in build_xlnet
)([token_embed, query_input])
File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras\engine\base_layer.py", line 446, in call
previous_mask = _collect_previous_mask(inputs)
File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras\engine\base_layer.py", line 1326, in _collect_previous_mask
mask = node.output_masks[tensor_index]
AttributeError: 'Node' object has no attribute 'output_masks'

def _collect_previous_mask(input_tensors):
    """Retrieves the output mask(s) of the previous node.

    # Arguments
        input_tensors: A tensor or list of tensors.

    # Returns
        A mask tensor or list of mask tensors.
    """
    input_tensors = to_list(input_tensors)
    masks = []
    for x in input_tensors:
        if hasattr(x, '_keras_history'):
            inbound_layer, node_index, tensor_index = x._keras_history
            node = inbound_layer._inbound_nodes[node_index]
            mask = node.output_masks[tensor_index]             # I got an error here, but I don't know why
            masks.append(mask)
        else:
            masks.append(None)
    if len(masks) == 1:
        return masks[0]
    return masks

TypeError: ('Keyword argument not understood:', 'dropout_rate')

Describe the Bug

TypeError: ('Keyword argument not understood:', 'dropout_rate')

Version Info

python 3.6
keras 2.2.4
tensorflow-gpu 1.11

Minimal Codes To Reproduce

load_trained_model_from_checkpoint

~/anaconda3/lib/python3.6/site-packages/keras_xlnet/loader.py in load_trained_model_from_checkpoint(config_path, checkpoint_path, batch_size, memory_len, target_len, in_train_phase, **kwargs)
    160         target_len=target_len,
    161         in_train_phase=in_train_phase,
--> 162         **kwargs)
    163     load_model_weights_from_checkpoint(
    164         model=model,

~/anaconda3/lib/python3.6/site-packages/keras_xlnet/loader.py in build_model_from_config(config_path, batch_size, memory_len, target_len, in_train_phase, **kwargs)
     49         clamp_len=None,
     50         shared_biases=not config['untie_r'],
---> 51         **kwargs)
     52     return model, config
     53 

~/anaconda3/lib/python3.6/site-packages/keras_xlnet/xlnet.py in build_xlnet(units, training, num_token, num_block, num_head, hidden_dim, batch_size, memory_len, target_len, permute, mask_index, dropout, attention_dropout, attention_type, clamp_len, shared_biases)
    213             dropout_rate=dropout,
    214             activation=gelu,
--> 215             name='FeedForward-{}'.format(i + 1),
    216         )
    217         if 0.0 < dropout < 1.0:

~/anaconda3/lib/python3.6/site-packages/keras_position_wise_feed_forward/feed_forward.py in __init__(self, units, activation, use_bias, kernel_initializer, bias_initializer, kernel_regularizer, bias_regularizer, kernel_constraint, bias_constraint, **kwargs)
     45         self.W1, self.b1 = None, None
     46         self.W2, self.b2 = None, None
---> 47         super(FeedForward, self).__init__(**kwargs)
     48 
     49     def get_config(self):

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/base.py in _method_wrapper(self, *args, **kwargs)
    424     self._setattr_tracking = False  # pylint: disable=protected-access
    425     try:
--> 426       method(self, *args, **kwargs)
    427     finally:
    428       self._setattr_tracking = previous_value  # pylint: disable=protected-access

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py in __init__(self, trainable, name, dtype, **kwargs)
    135     for kwarg in kwargs:
    136       if kwarg not in allowed_kwargs:
--> 137         raise TypeError('Keyword argument not understood:', kwarg)
    138 
    139     # Mutable properties

TypeError: ('Keyword argument not understood:', 'dropout_rate')

And when python == 3.7, the code is normal

how to deal with the data logger then sequence length in fine-tune?

if fine-tuning a classification model and the data is longer then sequence length, what should I do?split the data?

when I use the model to do predict, how to input the data who's length is larger than sequence length?

is there any demo?thanks a lot.

Tokenizer issue

Describe the Bug

A clear and concise description of what the bug is.

Unlike Tokenizer in keras-bert, the Tokenizer in keras-xlnet cannot load custom dict.

Version Info

I'm using the latest version

Minimal Codes To Reproduce

import keras_xlnet

word_dict = load_dict()
tokenizer = Tokenizer(word_dict) # this works in keras_bert

pass

训练的时候没问题，预测的时候报如下错误，求解答

InvalidArgumentError: 3 root error(s) found.
(0) Invalid argument: slice index 0 of dimension 0 out of bounds.
[[{{node replica_0/model_2/Memory-0/strided_slice}}]]
[[replica_1/model_2/Embed-Segment-8/ReadVariableOp/_6797]]
(1) Invalid argument: slice index 0 of dimension 0 out of bounds.
[[{{node replica_0/model_2/Memory-0/strided_slice}}]]
[[_arg_Input-Segment_0_1/_6435]]
(2) Invalid argument: slice index 0 of dimension 0 out of bounds.
[[{{node replica_0/model_2/Memory-0/strided_slice}}]]
0 successful operations.
1 derived errors ignored.

Colab TPU Support

Is your feature request related to a problem? Please describe.
Your keras-bert Implementation lets me load the trained Model an run it on Colab with TPU ( https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/tune/keras_bert_classification_tpu.ipynb )
When I try the same thing with your Xlnet implementation, I run into the following issue while calling keras_to_tpu_model:

ValueError:
Layer <keras_transformer_xl.memory.Memory object at 0x7f7a396e9240> has a variable shape in a non-batch dimension. TPU models must
have constant shapes for all operations.

You may have to specify input_length for RNN/TimeDistributed layers.

Describe the solution you'd like
It would be really nice if I can load the pretrained model checkpoint and run it on colab tpu as it is possible with BERT.

Are you planing to implement such a feature?

The problem of "ValueError: Unknown layer: EmbeddingRet"

The version I use is: keras-xlnet==0.18.0

Thanks for your sharing. I build a model on top of the keras-xlnet. It works well during training, but when I re-load the model with the following code, the errors appear.

I know that the "EmbeddingRet" is your previous repository. I also checked the keras-xlnet codes and the "EmbeddingRet" is not used in all of the codes. It's quite weird why this gonna happen. Could you please give some suggestions on the solution? Thanks!

The whole traceback is pasted below.

# I re-load the model with these codes
custom_dict = get_custom_objects()
model = load_model(self.model_path, custom_objects=custom_dict)

Traceback (most recent call last):
  File "train_xlnet_dssm.py", line 156, in <module>
    trans.test()
  File "train_xlnet_dssm.py", line 127, in test
    model = load_model(self.model_path, custom_objects=custom_dict)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/saving.py", line 458, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/network.py", line 1022, in from_config
    process_layer(layer_data)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/network.py", line 1008, in process_layer
    custom_objects=custom_objects)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/network.py", line 1022, in from_config
    process_layer(layer_data)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/network.py", line 1008, in process_layer
    custom_objects=custom_objects)
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/home/psxwz2/Programs/anaconda3/envs/keras/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 138, in deserialize_keras_object
    ': ' + class_name)
ValueError: Unknown layer: EmbeddingRet

number of output tensors and output masks are not equal

When I was running the following code

from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, set_custom_objects
from keras_bert.layers import Extract
from keras.layers import Dense
from keras.models import Model
from keras.utils import multi_gpu_model
from keras_xlnet import load_trained_model_from_checkpoint, set_custom_objects

xlnet = load_trained_model_from_checkpoint(config_path='/home/xsu1/Archive/xin/xlnet/xlnet_cased_L-12_H-768_A-12/xlnet_config.json',
                                               checkpoint_path='/home/xsu1/Archive/xin/xlnet/xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt',
                                               batch_size=16,
                                               memory_len=0,
                                               target_len=20,
                                               in_train_phase=False)

set_custom_objects()
inputs = xlnet.inputs
content_output = xlnet(inputs)

I got the following error

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-29-29196fae1ff6> in <module>
      9 
     10 inputs = xlnet.input
---> 11 content_output = xlnet(inputs)

~/.local/lib/python3.6/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    455             # Actually call the layer,
    456             # collecting output(s), mask(s), and shape(s).
--> 457             output = self.call(inputs, **kwargs)
    458             output_mask = self.compute_mask(inputs, previous_mask)
    459 

~/.local/lib/python3.6/site-packages/keras/engine/network.py in call(self, inputs, mask)
    562             return self._output_tensor_cache[cache_key]
    563         else:
--> 564             output_tensors, _, _ = self.run_internal_graph(inputs, masks)
    565             return output_tensors
    566 

~/.local/lib/python3.6/site-packages/keras/engine/network.py in run_internal_graph(self, inputs, masks)
    759                                 'and output masks. Layer ' + str(layer.name) + ' has'
    760                                 ' ' + str(len(output_tensors)) + ' output tensors '
--> 761                                 'and ' + str(len(output_masks)) + ' output masks.')
    762                     # Update model updates and losses:
    763                     # Keep track of updates that depend on the inputs

Exception: Layers should have equal number of output tensors and output masks. Layer Attention-1 has 1 output tensors and 10 output masks.

Is it because of the mask output?

InvalidArgumentError: Input to reshape is a tensor with 983040 values, but the requested shape has 4718592

Describe the Bug

InvalidArgumentError: Input to reshape is a tensor with 983040 values, but the requested shape has 4718592

Version Info

cudatoolkit               9.0                  h13b8566_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.0                 cuda9.0_0
keras-xlnet
Keras                     2.2.4                     <pip>
keras-xlnet               0.5.0
tensorflow-gpu            1.11.0               h0d30ee6_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

when I run this code

def _get_model(base_dir, cfg_=None):
    config_file = os.path.join(base_dir, 'xlnet_config.json')
    checkpoint_file = os.path.join(base_dir, 'xlnet_model.ckpt')
    print(cfg_["bs"], cfg_["mem_len"], cfg_['maxlen'])
    model = load_trained_model_from_checkpoint(
        config_path=config_file,
        checkpoint_path=checkpoint_file,
        batch_size=cfg_["bs"],
        memory_len=cfg_["mem_len"],
        target_len=cfg_['maxlen'],
        in_train_phase=False
    )
    return model

model = _get_model("../../../xlnet_model/xlnet_cased_L-12_H-768_A-12/", cfg_=cfg)

token = np.random.randint(0, 123, size=(5, 256)).astype(np.float32)
segm = np.ones_like(token).astype(dtype=np.float32)
mask = np.random.randint(0, 256, size=(token.shape[0], 1)).astype(np.float32)
print(segm.shape, segm.shape, mask.shape)
print(model.inputs)
model.predict([token, segm, mask], verbose=1, batch_size=32)

you can see

24 0 256
(5, 256) (5, 256) (5, 1)
[<tf.Tensor 'Input-Token:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'Input-Segment:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'Input-Memory-Length:0' shape=(?, 1) dtype=float32>]
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-3-3061d04374a5> in <module>()
     20 print(segm.shape, segm.shape, mask.shape)
     21 print(model.inputs)
---> 22 model.predict([token, segm, mask], verbose=1, batch_size=32)

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
   1167                                             batch_size=batch_size,
   1168                                             verbose=verbose,
-> 1169                                             steps=steps)
   1170 
   1171     def train_on_batch(self, x, y,

~/anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py in predict_loop(model, f, ins, batch_size, verbose, steps)
    292                 ins_batch[i] = ins_batch[i].toarray()
    293 
--> 294             batch_outs = f(ins_batch)
    295             batch_outs = to_list(batch_outs)
    296             if batch_index == 0:

~/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1397           ret = tf_session.TF_SessionRunCallable(
   1398               self._session._session, self._handle, args, status,
-> 1399               run_metadata_ptr)
   1400         if run_metadata:
   1401           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    524             None, None,
    525             compat.as_text(c_api.TF_Message(self.status.status)),
--> 526             c_api.TF_GetCode(self.status.status))
    527     # Delete the underlying status object from memory otherwise it stays alive
    528     # as there is a reference to status from this from the traceback due to

InvalidArgumentError: Input to reshape is a tensor with 983040 values, but the requested shape has 4718592
	 [[{{node Attention-12/Reshape_8}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Attention-12/MatMul_2, Attention-12/Reshape_8/shape)]]
	 [[{{node FeedForward-Normal-12/add_1/_1311}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6655_FeedForward-Normal-12/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Does it compatible with Chinese pre-trained model?

I try it, and not exception. but the accuracy is very low. So not sure it support Chinese pretrained mode or not.
https://github.com/ymcui/Chinese-PreTrained-XLNet

Memory length in model initialization and Fine-tuning

Hi, I was trying to use your keras_xlnet to fine tune on glue dataset, but didn't get results similar to the original tensorflow implementation and huggingface's implementation. I am thinking the memory length I used in model initialization and fine-tuning(mem_len in inputs) may be the issue.

Do you have any recommendations for these two parameters(should we use just length of 384)? Thank you.

is rel_shift change the shape of input?

hi, if input have the shape (1,5,8),and the output will be (1,5,7), Is this a bug ?
This re_shift() is different with version of transformer-xl?
thank you.

Why the tokenize start with 19, it's a null string

Describe the Bug

Why the tokenize start with 19, it's a null string. it's a defect or have some special purpose

import os
import sys

import numpy as np

from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, ATTENTION_TYPE_BI



'''Can be found at https://github.com/ymcui/Chinese-PreTrained-XLNet'''
checkpoint_path =  "/users/hdpsbp/HadoopDir/felix/xlnet"
vocab_path = os.path.join(checkpoint_path, 'spiece.model')
config_path = os.path.join(checkpoint_path, 'xlnet_config.json')
model_path = os.path.join(checkpoint_path, 'xlnet_model.ckpt')

# Tokenize inputs
tokenizer = Tokenizer(vocab_path)
text = "给"
tokens = tokenizer.encode(text)
print(tokens) #[19, 841]

tokenizer.decode([19]) #''
pass

How to load the trained model? It doesn't work use keras.models.load_model.

Describe the Bug

A clear and concise description of what the bug is.

Version Info

I'm using the latest version

Minimal Codes To Reproduce

import keras_xlnet

pass

代码一毛一样为什么会出现 ValueError: Input 0 is incompatible with layer Normal: expected ndim=3, found ndim=2 这样的错误呢?

您好首先非常感谢您提供的这个项目, 让我可以轻易的使用xlnet, 我在上周使用同样的代码都没有任何问题, 但是今天当我重新测试的时候就出现了下面的报错, 请问是为什么呢?

model = load_trained_model_from_checkpoint(
config_path=paths.config,
checkpoint_path=paths.model,
batch_size=BATCH_SIZE,
memory_len=0,
target_len=SEQ_LEN,
in_train_phase=False,
attention_type=ATTENTION_TYPE_BI,
)

加载预训练权重

last = model.output
extract = Extract(index=-1, name='Extract')(last)
dense = keras.layers.Dense(units=768, name='Dense')(extract)
norm = keras.layers.BatchNormalization(name='Normal')(dense)
output = keras.layers.Dense(units=2, activation='softmax', name='Softmax')(norm)
model = keras.models.Model(inputs=model.inputs, outputs=output)
这是报错:
Traceback (most recent call last):
File "D:/project/nlp_label_processing/Xlnet_test.py", line 245, in
norm = keras.layers.BatchNormalization(name='Normal')(dense)
File "C:\Users\tzl17\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn_wrapper
return func(*args, **kwargs)
File "C:\Users\tzl17\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 472, in call
self.assert_input_compatibility(inputs)
File "C:\Users\tzl17\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 342, in assert_input_compatibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer Normal: expected ndim=3, found ndim=2

Process finished with exit code 1
提示形状不同, 我查看了层之后, 在dense后面增加了一行代码:
dense = keras.layers.Reshape((30, 768))(dense)
我查看了层之后, 发现很奇怪的现象:

Extract (Extract) (None, 30, 768) 0 FeedForward-Normal-12[0][0]

Dense (Dense) (None, 30, 768) 590592 Extract[0][0]

reshape_1 (Reshape) (None, 30, 768) 0 Dense[0][0]

Normal (BatchNormalization) (None, 30, 768) 3072 reshape_1[0][0]

Softmax (Dense) (None, 30, 2) 1538 Normal[0][0]
===================================================================================它并不是None, 768->None, 2而是多出来来了个30, 我设置了每一行文本填充长度为30, 因此这种情况是不是意味着以每个单词为维度, 生成了768个编码? 由于上周我并没有在成功之后检查model.summary() 因此我不确定是不是我的模型除了问题.
由于层是以(None, 30, 768)的形式出现, 显然我的数据输入进去的时候报了错.
因此我希望能获取您的帮助, 求您解惑!!

Example on long document classification

Hi @CyberZHG, thanks for building this really useful library.

I would like to kindly ask you to provide a minimal example of text classification on long documents, meaning documents that are longer than 512 sub-word units, thus do not fit as single (one-off) sequences on pre-trained Transformers. XLNet surpass this limitation relying on Transformers-XL, although it's still a mystery for me how this can be resolved practically.

Let's say we have 800 documents with 2000 subword units each for a binary classification task. How you would formulate the training and prediction code? How we configure memory etc.?

tokenization

I went though your code, but didn't see the tokenizer class. Is there any tokenizer to use?

OOM in classification fine-tuning xlnet-base in GPU

While training xlnet-base for classification problem with tesla P100 16G, OOM happens.

Some features:
batch_size=8
memory_len=0
target_len=220
in_train_phase=False

Error info:

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-14-3280d689ffd9> in <module>
----> 1 model3.fit([token_input,seg_input,np.array([0]*320)[:,None]],y_train,batch_size=bsz)

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    778           validation_steps=validation_steps,
    779           validation_freq=validation_freq,
--> 780           steps_name='steps_per_epoch')
    781 
    782   def evaluate(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    361 
    362         # Get outputs.
--> 363         batch_outs = f(ins_batch)
    364         if not isinstance(batch_outs, list):
    365           batch_outs = [batch_outs]

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[8,12,48400,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node Attention-9/transpose_8}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[loss/mul/_2627]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[8,12,48400,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node Attention-9/transpose_8}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

modeling code:

def create_model():
    model = load_trained_model_from_checkpoint(
        config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
        checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
        batch_size=8,
        memory_len=0,
        target_len=220,
        in_train_phase=False,
    )
    sequence_output  = model.get_layer(index=-1).output
    summary = Extract(index=-1)(sequence_output) ##based on our tokenization method, cls token stays on last
    pool_output = keras.layers.Dense(units=1, activation='sigmoid',name='real_output')(summary)
    model3 = keras.models.Model(inputs=model.input, outputs=pool_output)
    return model3
model3 = create_model()
model3.compile('adam','binary_crossentropy')

So what does tensor[8,12,48400,64] means? What is the proper memory_len when finu-tuning model for classification task?

Supporrt Tensorflow 2

Could you please kindly add TensorFlow 2 (call tf.keras, tf.data, tf.function, etc from TensorFlow 2) support for all NLP packages you have created.

I mean it would be great if your package support TensorFlow 2 grammar (such as tf.function, tf.data, etc)

对tensorflow2.0的兼容问题

我在使用
load_trained_model_from_checkpoint(config_path=config_files,
checkpoint_path=check_point_files,
target_len=max_length,
batch_size=8,
memory_len=0,
in_train_phase=False,
attention_type=ATTENTION_TYPE_BI)
这段代码记载模型时报这个错误：
AttributeError: module 'tensorflow' has no attribute 'placeholder'
请问这是因为我使用tensorflow2.0导致的问题吗？该如何解决呢

Why Tokenizer can't work when I pass the relative path?

Describe the Bug

A clear and concise description of what the bug is.
When I pass the relative path instead of the absolute path, it incurs bug——''No such file or directory Error.''
Version Info

I'm using the latest version

Minimal Codes To Reproduce

relative path(it can't work)：

from keras_xlnet import Tokenizer
dict_path = '../pre_models/chinese_xlnet_mid_L-24_H-768_A-12/spiece.model'
tokenizer = Tokenizer(dict_path)

absolute path(it works)：

from keras_xlnet import Tokenizer
dict_path = '/home/hpdbman/gjz/Competition/pre_models/chinese_xlnet_mid_L-24_H-768_A-12/spiece.model'
tokenizer = Tokenizer(dict_path)

`NER` fine tuning

@CyberZHG Which layers is more benifit to fine-tuning in NER task ? I always got a bad result in NER task using last layer forward a Dense layer. And How to set memory_len is better ?

def _get_model(base_dir, cfg_=None):
    config_file = os.path.join(base_dir, 'xlnet_config.json')
    checkpoint_file = os.path.join(base_dir, 'xlnet_model.ckpt')
    print(cfg_["bs"], cfg_["mem_len"], cfg_['maxlen'])
    model = load_trained_model_from_checkpoint(
        config_path=config_file,
        checkpoint_path=checkpoint_file,
        batch_size=cfg_["bs"],
        memory_len=cfg_["mem_len"],
        target_len=cfg_['maxlen'],
        in_train_phase=False
    )
    return model

model = _get_model("../../../xlnet_model/xlnet_cased_L-12_H-768_A-12/", cfg_=cfg)

I set memory len = 0 or 64 and target_len = 256

tokenize encode

can tokenize encode with two sentence?
like keras-bert as encode(self, first, second=None, max_len=None)

cyberzhg / keras-xlnet Goto Github PK

keras-xlnet's Introduction

Keras XLNet

Install

Usage

Fine-tuning on GLUE

Load Pretrained Checkpoints

About I/O

in_train_phase is False

in_train_phase is True

keras-xlnet's People

Contributors

Stargazers

Watchers

Forkers

keras-xlnet's Issues

For training

For fine-tuning

加载预训练权重

Recommend Projects

Recommend Topics

Recommend Org

`in_train_phase` is `False`

`in_train_phase` is `True`