jingxil / pointer-networks Goto Github PK

View Code? Open in Web Editor NEW

45.0 4.0 19.0 164 KB

An implementation of "Pointer Networks" in Tensorflow

License: MIT License

Python 100.00%

tensroflow pointer-networks

pointer-networks's Introduction

Pointer Networks in Tensorflow

This is an implementation of Pointer Networks to solve the Convex Hull problem. Stacking RNN layers is supported.

Environments

Python 3.x
TensorFlow 1.2.x

Data

Convex Hull datasets such as "convex hull 5" and "convex hull 5-50" can be downloaded at Link.

Usage

training

$ python convex_hull.py --ARG=VALUE

evaluating

$ python convex_hull.py --forward_only=True --beam_width=VALUE --ARG=VALUE

visualizing

$ tensorboard --logdir=DIR

Results

Training on convex hull 5

Training on convex hull 5-50

pointer-networks's People

Contributors

Stargazers

Watchers

Forkers

niazangels tangqiqi123 shubhampachori12110095 junshuai-song ml-lab vrindiesel jiajie-mei heath-lee saniahamid wusparky testworlda willchen05 rsantana-isg jaysulk debayan epapagia ranjanbsc richiesui superrichiesui

pointer-networks's Issues

Initial state of the decoder

Hi,

Firstly, thanks a lot for sharing for your code.

I had a question regarding the initial state of the decoder. I see that it is currently initialized with zero_state. Shouldn't the decoder initial state be initialized with the final state of the encoder? Please do correct me if I am wrong

Removing the last value of output for decoder_input_ids

Hi,
Thanks a lot for the codes. I have a question, the decoder_input_ids has a value that removes the last value from output list, which is the end token. But for outputs that do not fullfil the length, you have used a pad ID that is 1. So the end value of such an output is the pad ID and not the end token. In that case, if you truncate the end value of output, it will only remove the padID and not the end token. Is that how it should be? I hope what I have understood is right.
Thanks a lot.

Evaluating test data while training, not other process

Hello jingxil,

Thank you for sharing this wonderful codes :)
I'm trying to use this code in "Sentence Reordering Task" now, then I got some troubles..

You use the forward_only option to separate "train" scope and "test" (or inference) scope into process units.
However, I want to do test during training to see if the model overfits.. but it's not easy because of attention mechanism(-> train and test run differently..) :(

Is there any good way to solve this problem??

Even if you don't code it, I'd really appreciate if you could share a reference sites or hints I could refer to

Thank you

Details of the evaluation method

Hello jingxil,

Thank you for sharing this!
This is one of the best, most understandable implementations of Pointer Networks I have come across. 😃

I was able to train the model and achieve some good training results:

Could you please help me with the following questions?

Is the purpose of the Forward Only mode to only switch to evaluation mode, or is there something more?
During prediction how are encoder and decoder weights decided? During training we maintain an array of weights for the encoder (shape: [batch_size,max_input_sequence_len,2]) and decoder (shape: [batch_size,max_output_sequence_len])

What happens when we see a new input during prediction? Do we set the encoder weights and decoder weights to '1' with zero padding?

Thanks again for the amazing work!

The input to the decoder is strange

Hello, thanks for sharing your codes. I have a question after reading your code.

Why do you use the scores rather than the probabilities of alignments?
In the paper, authors use the softmax results of alignments, which should be probabilities.

the real length of encoder length

Hello, I have another question and hope you could help me out.

In the model part, from line 109 to line 125:

...  other codes ...

# Shape: [batch_size, max_input_sequence_len + 1, 2]
encoder_inputs = tf.stack(encoder_inputs, axis=0)

...  other codes ...

# Encode input to obtain memory for later queries
memory, _ = tf.nn.bidirectional_dynamic_rnn(fw_enc_cell, bw_enc_cell, encoder_inputs, enc_input_lens, dtype=tf.float32)

...  other codes ...

If I understand it correctly, encoder inputs should include END token, therefore the real length of the encoder inputs now should be enc_input_lens + 1 and 1 is for END.

Then why the enc_input_lens doesn't change in this code?

Thank you :) @jingxil