lspvic / copynet Goto Github PK

View Code? Open in Web Editor NEW

123.0 5.0 52.0 839 KB

CopyNet Implementation with Tensorflow and nmt

Python 97.06% Shell 2.94%

copynet's Introduction

CopyNet Implementation with Tensorflow and nmt

CopyNet Paper: Incorporating Copying Mechanism in Sequence-to-Sequence Learning.

CopyNet mechanism is wrapped with an exsiting RNN cell and used as an normal RNN cell.

Official nmt is also modified to enable CopyNet mechanism.

Vocabulary Setting

Since in copynet scenarios the target sequence contains words from source sentences, the best choice is to use a shared vocabulary for source vocabulary and target vcabulary. And we also use a parameter generated vocabulary size, namely, the number of target vocabulary excluding words from source sequences (copied words), to indicate that the first N(=generated vocabulary size) words in shared vocabulary are in generate mode and target word indexes larger than N are copied.

In this codebase, vocab_size and gen_vocab_size are variables representing shared vocabulary size and generated vocabulalry size.

Usage

1. Use with tf.contrib.seq2seq

Just wrapper an any existing rnn cell(BasicLSTMCell, AttentionWrapper and so on).

cell = any_rnn_cell

copynet_cell = CopyNetWrapper(cell, encoder_outputs, encoder_input_ids,
    vocab_size, gen_vocab_size)
decoder_initial_state = copynet_cell.zero_state(batch_size,
    tf.float32).clone(cell_state=decoder_initial_state)

helper = tf.contrib.seq2seq.TrainingHelper(...)
decoder = tf.contrib.seq2seq.BasicDecoder(copynet_cell, helper,
    decoder_initial_state, output_layer=None)
decoder_outputs, final_state, coder_seq_length = tf.contrib.seq2seq.dynamic_decode(decoder=decoder)
decoder_logits, decoder_ids = decoder_outputs

2. Use with tensorflow official nmt

Full nmt usages are in nmt.

--copynet argument added to nmt command line to enable copy mechanism.

--share_vocab argument must be set.

--gen_vocab_size argument represents the size of generated vocabulary (excluding copy words from target vocabulary), if is not set, it equals the size of whole vocabulary.

python nmt.nmt.nmt.py --copynet --share_vocab --gen_vocab_size=2345 ...other_nmt_arguments

copynet's People

Contributors

Stargazers

Watchers

copynet's Issues

Could you plz fix the issue about beam search decoder with copynet?

Hi there, the code is not working when I set the parameters beam_width and num_translations_per_input to be ≥ 1.

E.g., when I set the beam_width=9, batch_size=32, the error information is shown as follows:

InvalidArgumentError (see above for traceback): Incompatible shapes: [288,1] vs. [32,11]
	 [[Node: dynamic_seq2seq/decoder/decoder/while/BeamSearchDecoderStep/Equal = Equal[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](dynamic_seq2seq/decoder/decoder/while/BeamSearchDecoderStep/ExpandDims, dynamic_seq2seq/decoder/decoder/while/BeamSearchDecoderStep/Equal/Enter)]]

I dont have any ideas about how to fix this problem, any replies will be appreciated!
Thank you!

Issue about CopyNet

I got the error when training a CopyNet model as belows:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key dynamic_seq2seq/decoder/CopyWeight not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_147 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_154_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

copynetWrapper cant wrap a attentionWrapper cell

i read your Readme and see " Just wrapper an any existing rnn cell(BasicLSTMCell, AttentionWrapper and so on). " So i begin to try to wrap an AttentionWrapper In following ways:

first i use Attentionwrappper to wrap a BasicLSTMCell and then i get a ,lets call it attention_cell
then i use the copynetWrapper to wrap the attention_cell and i got error :
''TypeError: The two structures don't have the same nested structure.""
in CopyNetWrapperState's clone method..

when i get rid of attentionWrapper it works well...
i have no idea how to deal with this...
hope anyone implement this can teach me ~thanks!

Question on parameters vocab_size and gen_vocab_size.

Thanks a lot for your job about CopyNet.
I don't understand the parameters vocab_size and gen_vocab_size clearly. For example, if i have a vocabulary table contains 9999 words and a special token "UNK", that is, size of the vocabulary table is 1w. And now i have a source sentence consists of 10 words, 5 of the words are not in the vocabulary table. So does it mean the parameter vocab_size is 10005(or 10010?) and gen_vocab_size is 1w? If so, when i use the CopyNetWrapper cell, should i calculate the maximum length of input sentences as a parameter of vocab_size?

Thanks again
Sam

parameter of CopyNet

Hi all,
I would like to know how to adjust the parameter of CopyNet network to control whether or not copy the source words during decoder. Where is the the source code of CopyNet in official nmt?

Thank you !

Best Regards,
Connie

CopyNet not working with Beamsearch decoder

It works perfectly fine with the Greedy decoder. Here is the code
Tensorflow: 1.8.0

encoder_emb_inp = tf.nn.embedding_lookup(embeddings, x)
encoder_cell = rnn.GRUCell(rnn_size,name='encoder')
encoder_outputs, encoder_state= tf.nn.dynamic_rnn(encoder_cell,encoder_emb_inp,sequence_length=len_docs,dtype=tf.float32)
tiled_encoder_outputs = tf.contrib.seq2seq.tile_batch(encoder_outputs, multiplier=beam_width)
tiled_sequence_length = tf.contrib.seq2seq.tile_batch(len_docs, multiplier=beam_width)
tiled_encoder_final_state = tf.contrib.seq2seq.tile_batch(encoder_state, multiplier=beam_width)
tiled_t = tf.contrib.seq2seq.tile_batch(t,multiplier=beam_width)
start_tokens = tf.constant(word2int['SOS'], shape=[batch_size])
decoder_cell = rnn.GRUCell(rnn_size,name='decoder')
attention_mechanism = tf.contrib.seq2seq.LuongAttention(rnn_size,tiled_encoder_outputs,memory_sequence_length=tiled_sequence_length)
decoder_cell = tf.contrib.seq2seq.AttentionWrapper(decoder_cell, attention_mechanism,attention_layer_size=rnn_size)
initial_state = decoder_cell.zero_state(batch_size*beam_width, dtype=tf.float32).clone(cell_state=tiled_encoder_final_state)
decoder_cell = CopyNetWrapper(decoder_cell, tiled_encoder_outputs, tiled_t,len(set(delta).union(words)),vocab_size)
initial_state = decoder_cell.zero_state(batch_size*beam_width, dtype=tf.float32).clone(cell_state=initial_state)
tf.contrib.seq2seq.BeamSearchDecoder(cell=decoder_cell,embedding=embeddings,start_tokens=start_tokens,end_token=word2int['EOS'],initial_state=initial_state,beam_width=beam_width,output_layer=None,length_penalty_weight=0.0)
outputs,_,_ = tf.contrib.seq2seq.dynamic_decode(decoder)

ERROR is :
File "/home/usr/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 531, in _split_batch_beams
reshaped_t.set_shape(expected_reshaped_shape)
File "/home/usr/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 538, in set_shape
raise ValueError(str(e))
ValueError: Dimension 2 in both shapes must be equal, but are 38253 and 4. Shapes are [1,1,38253] and [1,1,4].

can't run main.py ,a wrong in keras_wrapper

I need some help，when I run the main.py, an error happen in the keras_wrapper.

C:\Users\think>python C:\Users\think\Desktop\nmt-keras-master/main.py
Using TensorFlow backend.
[09/04/2019 11:48:58] <<< Cupy not available. Using numpy. >>>
[09/04/2019 11:48:59] Running training.
[09/04/2019 11:48:59] Building EuTrans_esen dataset
Traceback (most recent call last):
File "C:\Users\think\Desktop\nmt-keras-master/main.py", line 49, in
train_model(parameters, args.dataset)
File "C:\Users\think\Desktop\nmt-keras-master\nmt_keras\training.py", line 64, in train_model
dataset = build_dataset(params)
File "C:\Users\think\Desktop\nmt-keras-master\data_engine\prepare_data.py", line 151, in build_dataset
label_smoothing=params.get('LABEL_SMOOTHING', 0.))
File "c:\users\think\src\keras-wrapper\keras_wrapper\dataset.py", line 1270, in setOutput
bpe_codes=bpe_codes, separator=separator, use_unk_class=use_unk_class)
File "c:\users\think\src\keras-wrapper\keras_wrapper\dataset.py", line 1701, in preprocessTextFeatures
'It currently is: %s' % (str(annotations_list)))
Exception: Wrong type for "annotations_list". It must be a path to a text file with the sentences or a list of sentences. It currently is: examples/EuTrans//training.en

...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[100,32,367]

hello,

when I set the parameters as followed:

--num_layers=1
--num_nuits=32
--share_vocab=True
--copynet=True
--gen_vocab_size=500

my gpu is 12206Mib,it returns the error:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[100,32,367]
[[Node: dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/einsum/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/einsum/transpose/Enter, dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/einsum_4/transpose/perm)]]
[[Node: dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/All/_131 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_219_dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/All", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

# I would like to ask how much memory is needed to run the copynet code?

Question on state_size()

Is it correct that the size of prob_c is self._encoder_state_size? I think it must be maximum sequence length.

Thanks.

TypeError: The two structures don't have the same nested structure.

I use the CopyNetWapper to wrap a decoder, this is my code:

train_decoder = tf.contrib.seq2seq.AttentionWrapper(decoder, attention_mechanism,
                                                                    attention_layer_size=self.config.PHVM_decoder_dim)
train_encoder_state = train_decoder.zero_state(self.batch_size, dtype=tf.float32).clone(
    cell_state=sent_dec_state)

copynet_decoder = CopyNetWrapper(train_decoder, sent_input, sent_lens, sent_lens, self.tgt_vocab_size)
copy_train_encoder_state = copynet_decoder.zero_state(self.batch_size, dtype=tf.float32).clone(
    cell_state=train_encoder_state)

However, during train I got an error:

Traceback (most recent call last):
File "/home/work/mnt/project/.local/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 297, in assert_same_structure
expand_composites)
TypeError: The two structures don't have the same nested structure.

First structure: type=CopyNetWrapperState str=CopyNetWrapperState(cell_state=AttentionWrapperState(cell_state=(<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/checked_cell_state:0' shape=(?, 300) dtype=float32>, <tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/checked_cell_state_1:0' shape=(?, 300) dtype=float32>), attention=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_2:0' shape=(?, 300) dtype=float32>, time=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_1:0' shape=() dtype=int32>, alignments=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros:0' shape=(?, ?) dtype=float32>, alignment_history=(), attention_state=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_3:0' shape=(?, ?) dtype=float32>), last_ids=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/sub:0' shape=(?,) dtype=int32>, prob_c=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/zeros_1:0' shape=(?, ?) dtype=float32>)

Second structure: type=CopyNetWrapperState str=CopyNetWrapperState(cell_state=(<tf.Tensor 'sentence_level/train/while/sent_deocde/sent_dec_state/dense/BiasAdd:0' shape=(?, 300) dtype=float32>, <tf.Tensor 'sentence_level/train/while/sent_deocde/sent_dec_state/dense_1/BiasAdd:0' shape=(?, 300) dtype=float32>), last_ids=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/sub:0' shape=(?,) dtype=int32>, prob_c=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/zeros_1:0' shape=(?, ?) dtype=float32>)

More specifically: The two namedtuples don't have the same sequence type. First structure type=AttentionWrapperState str=AttentionWrapperState(cell_state=(<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/checked_cell_state:0' shape=(?, 300) dtype=float32>, <tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/checked_cell_state_1:0' shape=(?, 300) dtype=float32>), attention=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_2:0' shape=(?, 300) dtype=float32>, time=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_1:0' shape=() dtype=int32>, alignments=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros:0' shape=(?, ?) dtype=float32>, alignment_history=(), attention_state=<tf.Tensor 'sentence_level/train/while/sent_deocde/CopyNetWrapperZeroState/AttentionWrapperZeroState/zeros_3:0' shape=(?, ?) dtype=float32>) has type AttentionWrapperState, while second structure type=tuple str=(<tf.Tensor 'sentence_level/train/while/sent_deocde/sent_dec_state/dense/BiasAdd:0' shape=(?, 300) dtype=float32>, <tf.Tensor 'sentence_level/train/while/sent_deocde/sent_dec_state/dense_1/BiasAdd:0' shape=(?, 300) dtype=float32>) has type tuple

I have no idea how to solve the problem. Have someone met the same error before?
It will be very nice if you can offer me some advice.

Plus,

copynet_cell = CopyNetWrapper(cell, encoder_outputs, encoder_input_ids,
    vocab_size, gen_vocab_size)

Would you please explain the parameters in detail ?

Looking forward to any reply.

NMT copynet and vocabulary sizes

Great job making a Copynet enabled NMT.

I noticed (sadly for my current experiment) that when running copynet enabled NMT, I have to have both target and source vocabularies with the same number of words, otherwise I get an error.

Is that a "feature" or a bug? Do I need to send you some context?
Thank you

How does the model handle the OOV problem?

OOV means the out of vocalbary word.

I can't find any code to handle the problem, maybe I miss some important steps?

Looking forward to your advice or answers.

lspvic / copynet Goto Github PK

copynet's Introduction

CopyNet Implementation with Tensorflow and nmt

Vocabulary Setting

Usage

1. Use with tf.contrib.seq2seq

2. Use with tensorflow official nmt

copynet's People

Contributors

Stargazers

Watchers

Forkers

copynet's Issues

# I would like to ask how much memory is needed to run the copynet code?

Recommend Projects

Recommend Topics

Recommend Org