paarthneekhara / bytenet-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

319.0 23.0 70.0 5.2 MB

ByteNet for character-level language modelling

License: MIT License

Python 100.00%

machine-translation tensorflow bytenet-tensorflow deep-neural-networks natural-language-processing

bytenet-tensorflow's People

Contributors

Stargazers

Watchers

bytenet-tensorflow's Issues

We can upgrade the code to compatible with tf1.0.0 by the following diff:

git diff

diff --git a/ByteNet/model.py b/ByteNet/model.py
index 4cfe3b3..3a12b5b 100644
--- a/ByteNet/model.py
+++ b/ByteNet/model.py
@@ -138,7 +138,7 @@ class Byte_net_model:
decoder_output = self.decoder(source_embedding)
loss = self.loss(decoder_output, target_sentence)

          tf.scalar_summary('LOSS', loss)

          tf.summary.scalar('LOSS', loss)

          flat_logits = tf.reshape( decoder_output, [-1, options['n_target_quant']])
          prediction = tf.argmax(flat_logits, 1)

@@ -220,7 +220,7 @@ class Byte_net_model:

            flat_logits = tf.reshape( decoder_output, [-1, options['n_target_quant']])
            flat_targets = tf.reshape( target_one_hot, [-1, options['n_target_quant']])

          loss = tf.nn.softmax_cross_entropy_with_logits(flat_logits, flat_targets, name='decoder_cross_entropy_loss')

          loss = tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits, labels=flat_targets, name='decoder_cross_entropy_loss')

          if 'target_mask_chars' in options:
                  # MASK LOSS BEYOND EOL IN TARGET

diff --git a/train_generator.py b/train_generator.py
index 78d502c..72e898b 100644

Pre-trained models

Hi, is it possible to share with us some of your pre-trained models? Thanks.

Hi Paarth

thanks for the great work. However, I noticed code may have a serious bug.
I am now playing with train_generator.py but find that the code may have some problems.

In your original evaluatio, there is no training and testing set.

So if we divide the dataset by randomly splitting the whole data. Then we slightly change the code with a simple testing set loss evaluation which like training loss.

You may find that the loss of training is decreasing from the beginning but the loss in testing set never decrese but actually increase from the beggining.

It is kind of weird, do you know the reason, do you think there may be a problem with the algorithm?

Thanks

Fajie

could not train using python3.5 and tf 0.11

encountered 2 few errors.
perhaps due to version incompatibility.

$ python train_p3.py --data_dir=/Users/jhave/Desktop/github/byteNet-tensorflow/Data/pf
Traceback (most recent call last):
File "train_p3.py", line 6, in
from ByteNet import model
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 2, in
import ops
ImportError: No module named 'ops'

after ImportError I moved ops.py into same folder as train, then encountered...

$ python train_p3.py --data_dir=/Users/jhave/Desktop/github/byteNet-tensorflow/Data/pf
Traceback (most recent call last):
File "train_p3.py", line 85, in
main()
File "train_p3.py", line 42, in main
bn_tensors = byte_net.build_prediction_model()
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 48, in build_prediction_model
decoder_output = self.decoder(source_embedding)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 123, in decoder
layer_output = self.decode_layer(curr_input, dilation, layer_no)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 111, in decode_layer
name = "dec_dilated_conv_laye{}".format(layer_no)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ops.py", line 50, in dilated_conv1d
restored = batch_to_time(conv, dilation)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ops.py", line 21, in batch_to_time
[(shape[0]/dilation), -1, shape[2]])
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1977, in reshape
name=name)
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 573, in apply_op
_Attr(op_def, input_arg.type_attr))
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: DataType float32 for attr 'Tshape' not in list of allowed values: int32, int64

Translating

Hey awesome implementation. Thanks.
In translate.py, when I want to translate a source sentence, I still need to provide a target. Is the target the same as the source?
Thanks

the padding for dilated convolution seems to be wrong

For encoder, the dilated convolution for encoder pads (filter_width - 1) * dilation/2, this means after reshaping there are (filter_width-1) zeros at the beginning. But conv1d uses SAME padding, which again, will pad (filter_width-1) number of zeros, which duplicates the zeros needed.

Assume filter_width=3, dilation=2, the input is
1 2 3 4 5
Ater padding in dilated convolution function, the input becomes
0 0 1 2 3 4 5 0 0
After the reshape,

0 1 3 5 0 
0 2 4 6 0

becomes the input to conv1d, which will again, pad with filter_width-1 zeros with the SAME padding scheme

0 0 1 3 4 0 0  
0 0 2 4 6 0 0

Consistency with the original paper?

Could you reproduce the result of language modeling in the original paper?

where is the data of Data/MachineTranslation/news-commentary-v11.de-en.de

and Data/MachineTranslation/news-commentary-v11.de-en.en

paarthneekhara / bytenet-tensorflow Goto Github PK

bytenet-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

bytenet-tensorflow's Issues

We can upgrade the code to compatible with tf1.0.0 by the following diff:

Pre-trained models

Hi Paarth

could not train using python3.5 and tf 0.11

after ImportError I moved ops.py into same folder as train, then encountered...

Translating

the padding for dilated convolution seems to be wrong

Consistency with the original paper?

where is the data of Data/MachineTranslation/news-commentary-v11.de-en.de

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent