ericjang / draw Goto Github PK

View Code? Open in Web Editor NEW

526.0 526.0 155.0 1.65 MB

TensorFlow Implementation of "DRAW: A Recurrent Neural Network For Image Generation"

License: Apache License 2.0

Python 100.00%

draw's People

Contributors

Stargazers

Watchers

Forkers

amoliu tybxiaobao zergey liaobs anirudh9119 schwittlick wgapl ml-lab snazz2001 caigaojiang ian09 benjamesbabala swami1995 lwkobe h-j-han wusai80 stephen-xu desperado1992 jaytsai chagge bendalexis rhythm92 leonbai yufish bywbilly dorniwang unspar gurusura whiteisclosing kvfrans seragentp ruohoruotsi somaticapi yenchenlin sdujump staturecrane vyraun sufengniu countkisg lexmao gaozhihan joshcusano h312h widemeadows soledad89 brainprint isumitg alanhome kevinwenya shaoxuan92 tspannhw yhlleo pengcheng-wang cpehle wheatwaves codeaudit sebandraos iamgroot42 scarita wshenx donghaoye shubham1310 zxiaomzxm fenoms 1165048017 ylfzr qyhboy 2ez4wp chunhuanlin zhuangyqin sychen23 iqbal-chowdhury roboha shuimuth sprig wmonica dunchen ego-123 auriusio lberth ywpkwon son1cman singingkettle soudia u3dc jakexxh moustafameshry shubhampachori12110095 holychen neuralnetworkingtechnologies cupwater seangeleno gearchen nuoqish shuvamg007 linksyncjameshwartlopez icaffe whj0709 sai19 yang8828

draw's Issues

mu_x typo?

Not issue, just out of curious

grid_i = tf.reshape(tf.cast(tf.range(N), tf.float32), [1, -1])    
mu_x = gx + (grid_i - N / 2 - 0.5) * delta # eq 19

if N = 3, delta = 1
grid_i will be [0 1 2]
then grid_i - N / 2 is grid_i - 1. = [-1 0 1]
then grid_i - N / 2 - 0.5 is [-1.5 -0.5 0.5]

but I think [-1 0 1] is reasonable value, the mean location will be [gx-1, gx, gx+1]
why need to subtract 0.5, just follow the paper or I miss something?
thanks

Command to run program

FYI, README.md has a missing word:

You can visualize the results by running the script python plot_data.py <prefix> <output_data>

For example,

python plot_data.py myattn /tmp/draw/draw_data.npy

Though it works, the attention windows act not like in the paper

Though it works, the attention windows act not like in the paper, as shown in the above.

thanks for this implementation! Its been hugely useful in replicating the paper. One thing that I noticed is that in the filterbank function you're squaring the entire exponent rather than just the numerator which is what you want.

i/e your filters are slightly off from equations 24 and 25 in the paper.

Thanks!

Raza

Results could be better?

I have been playing with this code for a few days. I can reproduce the GIF animation showed in the first page of this repository. However, this other implementation based on theano (https://github.com/jbornschein/draw) achieves much (subjectively) nicer results (look at their GIF animation). I have tried to use their parameters (like T=64 and read_window=2) in the tensorflow code but I was unable to reproduce results that look that nice. Do you have any idea why there is such a difference and how we can achieve results like that using this tensorflow code?

By niceness I mean the animation looks more realistic, which probably means what the model learns is closer to the actual causal process that happens in human handwriting.

kl divergence computation

Thanks for sharing this elegant DRAW model!

However, I found in kl divergence computation in draw.py, line 191, the last term should be 0.5 instead of 0.5*T according to the paper's equation 11.

Even though this constant term won't affect the optimization process, I think you may get a different but reasonable loss curve. Because in this situation, it is possible to get negative KL divergence.

Thanks!

Why is the Loss (L^x + L^z) so small?

It is about 70, lower than most reported results.

I also find one bug. I think you are misleading by the Eq. (12)., the equation is used to compute each element of vector z.

kl_terms[t]=0.5*tf.reduce_sum(mu2+sigma2-2*logsigma,1)-T*.5 # each kl term is (1xminibatch)

should be

kl_terms[t]=0.5*tf.reduce_sum(mu2+sigma2-2*logsigma-1,1) # each kl term is (1xminibatch)
# kl_terms[t]=0.5*tf.reduce_sum(mu2+sigma2-2*logsigma,1)-T*z_size*.5 # alternatively

Or the kl term will blow up with large z_size.

Another issue is the mnist data in your code is not binarized. But it won't make much difference.

The noise sampling and the Lz loss curve issue

Ln-44:
e=tf.random_normal((batch_size,z_size), mean=0, stddev=1) # Qsampler noise
I think it should be placed into the funcion sampleQ. Or else the inference will fail.

However, when I made such a modification, the Lz loss-curve will increase instead of decline, as shown here:

Why?

Draw

Nothing

Should x_hat also be used for prediction period?

It may not be a problem, but I am just curious about why x_hat (involving true data) is also used for prediction period. Because I think, after training, the model should produce data independently, not by means of true data.

Details as follows:
Read x as well as x_hat
x = filter_img(x, Fx, Fy, gamma, read_n) # batch x (read_n*read_n) x_hat = filter_img(x_hat, Fx, Fy, gamma, read_n)
After the training:
canvases = sess.run(cs, feed_dict) # generate some examples canvases = np.array(canvases) # T x batch x img_size

It seems that, x_hat is also fed into the model. But x_hat contains the true data.

Thanks!

ericjang / draw Goto Github PK

draw's People

Contributors

Stargazers

Watchers

Forkers

draw's Issues

Recommend Projects

Recommend Topics

Recommend Org