jingxil / pointer-networks Goto Github PK
View Code? Open in Web Editor NEWAn implementation of "Pointer Networks" in Tensorflow
License: MIT License
An implementation of "Pointer Networks" in Tensorflow
License: MIT License
Hello, thanks for sharing your codes. I have a question after reading your code.
Why do you use the scores rather than the probabilities of alignments?
In the paper, authors use the softmax results of alignments, which should be probabilities.
Hi,
Firstly, thanks a lot for sharing for your code.
I had a question regarding the initial state of the decoder. I see that it is currently initialized with zero_state. Shouldn't the decoder initial state be initialized with the final state of the encoder? Please do correct me if I am wrong
Hi,
Thanks a lot for the codes. I have a question, the decoder_input_ids has a value that removes the last value from output list, which is the end token. But for outputs that do not fullfil the length, you have used a pad ID that is 1. So the end value of such an output is the pad ID and not the end token. In that case, if you truncate the end value of output, it will only remove the padID and not the end token. Is that how it should be? I hope what I have understood is right.
Thanks a lot.
Hello jingxil,
Thank you for sharing this!
This is one of the best, most understandable implementations of Pointer Networks I have come across. ๐
I was able to train the model and achieve some good training results:
Could you please help me with the following questions?
Is the purpose of the Forward Only mode to only switch to evaluation mode, or is there something more?
During prediction how are encoder and decoder weights decided? During training we maintain an array of weights for the encoder (shape: [batch_size,max_input_sequence_len,2]) and decoder (shape: [batch_size,max_output_sequence_len])
What happens when we see a new input during prediction? Do we set the encoder weights and decoder weights to '1' with zero padding?
Thanks again for the amazing work!
Hello jingxil,
Thank you for sharing this wonderful codes :)
I'm trying to use this code in "Sentence Reordering Task" now, then I got some troubles..
You use the forward_only
option to separate "train" scope and "test" (or inference) scope into process units.
However, I want to do test
during training
to see if the model overfits.. but it's not easy because of attention mechanism(-> train and test run differently..) :(
Is there any good way to solve this problem??
Even if you don't code it, I'd really appreciate if you could share a reference sites or hints I could refer to
Thank you
Hello, I have another question and hope you could help me out.
In the model part, from line 109 to line 125:
... other codes ...
# Shape: [batch_size, max_input_sequence_len + 1, 2]
encoder_inputs = tf.stack(encoder_inputs, axis=0)
... other codes ...
# Encode input to obtain memory for later queries
memory, _ = tf.nn.bidirectional_dynamic_rnn(fw_enc_cell, bw_enc_cell, encoder_inputs, enc_input_lens, dtype=tf.float32)
... other codes ...
If I understand it correctly, encoder inputs should include END token, therefore the real length of the encoder inputs now should be enc_input_lens + 1
and 1 is for END.
Then why the enc_input_lens
doesn't change in this code?
Thank you :) @jingxil
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.