mansimov / text2image Goto Github PK

View Code? Open in Web Editor NEW

592.0 592.0 122.0 59 KB

Generating Images from Captions with Attention

License: MIT License

Python 100.00%

text2image's People

Contributors

Stargazers

Watchers

Forkers

skaasj anirudh9119 amoliu ml-lab wgapl wanjinchang dreamstudioai cv-ip eesly chagge kdjyss lqleeqee einsnull ml-ai-nlp-ir ilovecv zhengkaifu xavi-reloaded jbdeng csgaobb shyamgt nguyenducnhaty snazz2001 goodrahstar deaple prakhar-agarwal moheo constantineg1 mihail911 qingsong99 peratham ohyeslk somaticapi nikoma xinghedyc yangxs eparisotto alanguo001 vgandhi9 beronx86 eriche2016 intuinno reserach sohuren fernwan hyzcn vanova vyraun johndpope kevinwenya vikingmew kekedan jiapei100 g1910 kowshikthopalli hkcaesar imutlab davidsonggithub pursueorigin veyselkoparal zhhezhhe iqbal-chowdhury zgsxwsdxg nanfengpo raj-08 weiwancheng zha119 whz1861 stevenlol 201528014227051 dimplesl kixiang shubhampachori12110095 jevenzh kavya333 zcrwind binbinbian hal2001 viktorm lebronyxm zisang0210 zhf459 augustand ai3dvision afcarl hanfeijp cash2one mt-fuji-19 lvyli smallflyingpig haif-liu msrocean yinhangbupt wohuajun gitszu meghanshubhatt 459548764 kapitsa2811 heyewangyueming larkeying applejenny66

text2image's Issues

How much time does an epoch take for training mnist data?

I work on a GPU: Tesla k40c with (CNMeM is enabled with initial size: 75.0% of memory, cuDNN 4007). And it takes me about 10+ min for training a epoch. Is it normal？ Or I made a wrong config for my environment？

Here comes my log：

Train Results
20.4327215786 1138.28323364 1158.71595093
Validation Results
building validate function
0:00:52.053928
9.35944375 752.618843994 761.978292236

Recreated Train Dataset
[1139, 1308, 1155, 1142, 1335, 1308, 1309, 1304]

Epoch 1 took 0:13:34.399718
Train Results
13.6282463623 686.228092651 699.856334229
Recreated Train Dataset
[1161, 1261, 1172, 1157, 1288, 1352, 1311, 1298]

Epoch 2 took 0:12:48.676186
Train Results
14.1116136597 600.27987915 614.391492615
Recreated Train Dataset
[1221, 1271, 1145, 1203, 1312, 1312, 1273, 1263]

Epoch 3 took 0:11:07.076383
Train Results
12.3779268677 505.388762512 517.766689453
Validation Results
building validate function
0:00:07.973637
10.1579154358 474.040048218 484.197965393

Recreated Train Dataset
[1137, 1284, 1149, 1137, 1365, 1308, 1336, 1284]

Epoch 4 took 0:11:08.574547
Train Results
14.8620436646 411.504102783 426.366148071
Recreated Train Dataset
[1185, 1271, 1186, 1222, 1271, 1276, 1278, 1311]

Epoch 5 took 0:10:53.159525
Train Results
13.2145661377 363.50835083 376.72291626
Recreated Train Dataset
[1166, 1317, 1161, 1182, 1225, 1291, 1359, 1299]

text2image is giving error

Hi, I am trying run the alignDraw.py I am getting the following error

monica@monica:~/PycharmProjects/text2image-master/coco$ python alignDraw.py ./models/coco-captions-32x32.json
Traceback (most recent call last):
File "alignDraw.py", line 3, in
import h5py
File "/home/monica/.local/lib/python2.7/site-packages/h5py/init.py", line 24, in
from . import _errors
ImportError: /home/monica/.local/lib/python2.7/site-packages/h5py/_errors.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8

Please help me how can I solve this error

Can‘t find the result of the experiment.

I want to look at the result of the experiment，but I can’t find it on my computer.So where is it resaved？Where can I find it？

How to generate images with bigger size?

I want use the code to train a model on my owen dataset. But I'm not sure about how to modify the code if I want to generate image with size of 64*64 or even more. Did anyone tries it?

Thanks!

Is the KL divergence computed correctly?

@MissT157 and I are experimenting with this code and we are finding the implementation of KL divergence a bit awry. Can you please revisit and confirm whether it's been implemented correctly?

Thanks in advance
@g1910

Training Models is taking too long

I am trying to train models defined in coc0-captions-32x32.json but it is taking insane amount of time. There are around 200 epochs and it tool 15 hours to complete just one:

Epoch 0 took 15:17:07.833333

Just wondering is there a way to get it done faster or completing iterations for all 200 epochs is really necessary. Is there a base number I can use to get a minimal working setup. Or if anyone have trained weight files that would really help me out. Thanks

I am using NVIDIA GPU for my test on an AWS instance.

Why

Why can I find no generated model under mnist?
And when I run the code under MS coco,it shows that float 32 is wrong,but when I changed float 32 to float 64,it is still wrong.How should I do to solve the problem?

Error

envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py coco/models/coco-captions-32x32.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
coco/alignDraw.py:343: FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "coco/alignDraw.py", line 617, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "coco/alignDraw.py", line 355, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "coco/alignDraw.py", line 294, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.
envy@ub1404:/os_pri/github/text2image$

Chinese image caption， In the result, multiple words of the same type appear

Hello, I am using the COCO dataset,
A two-layer LSTM model, one layer for top-down attention, and one layer for language models.

Extracting words with jieba
I used all the words in the picture description that occurred more than 3 times as a dictionary file, and a total of 14,226 words.
words = [w for w in word_freq.keys () if word_freq [w]> 3]

After training the model, when using it, multiple words of the same type appear in the result, such as:

Note notebook laptop computer on bed
A little girl little girl girl standing together

How can I solve this problem?

IOError: Unable to open file (unable to open file: name = '/ais/gobi3/u/nitish/mnist/mnist.h5'

IOError: Unable to open file (unable to open file: name = '/ais/gobi3/u/nitish/mnist/mnist.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Anyone knows what this path is for ??

ais/gobi3/u/nitish/mnist/mnist.h5'

Error !

envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:~/os_pri/github/text2image$

your .theanorc

Hi.
Thank you for your research!

Sorry, I have problem with testing, can you show your .theanorc?
I got typecasting error:

TypeError: Cannot convert Type TensorType(float32, matrix) (of Variable AdvancedSubtensor1.0) into Type TensorType(float64, matrix). You can try to manually convert AdvancedSubtensor1.0 into a TensorType(float64, matrix).

NameError: name 'ArgumentParser' is not defined

envy@ub1404:/os_pri/github/text2image$ python mnist-captions/sample-captions.py --model models/mnist-captions.json --weights ..
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "mnist-captions/sample-captions.py", line 42, in
parser = ArgumentParser()
NameError: name 'ArgumentParser' is not defined
envy@ub1404:/os_pri/github/text2image$

KeyError: 'costFunction'

envy@ub1404:/os_pri/github/text2image$ python coco/sample-captions.py --model coco/models/coco-captions-32x32.json --weights ..
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'
envy@ub1404:/os_pri/github/text2image$

Unable to load some downloaded files

I tried to run your code for MS COCO by "python alignDraw.py models/coco-captions-32x32.json", but I'm faced with a file loading problem.

Then, I tried to check whether additional downloaded files are possible to load or not.

I can't to load these files as below,
train-images-56x56.npy, train-captions.npy, train-captions-len.npy,
train-cap2im.pkl, dev-images-32x32.npy, dev-images-56x56.npy, dev-captions.npy,
dev-captions-len.npy, dev-cap2im.pkl, gan.hdf5

I suspected that I had some mistakes, firstly. However, I can load these files as below,
train-images-32x32.npy,
test-images-32x32.npy, test-captions.npy, test-captions-len.npy, test-cap2im.pkl,
dictionary.pkl

I'm not sure why the difference happens.
I added all error log in the end of this issue.

my environment

pyenv anaconda2-2.4.1
Python(2.7.11)
numpy(1.10.4)
scipy(0.17.0)
Theano(0.7.0)
h5py(2.5.0)

Thanks !

import numpy as np
import pickle

>>> data=np.load("train-images-56x56.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("train-captions.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 416, in load
   "Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'train-captions.npy' as a pickle

>>> data=np.load("train-captions-len.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("dev-images-32x32.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> data=np.load("dev-images-56x56.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("dev-captions.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> data=np.load("dev-captions-len.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> with open('train-cap2im.pkl','r') as f:
...     data = pickle.load(f)
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
   return Unpickler(file).load()
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
   dispatch[key](self)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
   raise EOFError
EOFError

>>> with open('dev-cap2im.pkl','r') as f:
...     data = pickle.load(f)
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
   return Unpickler(file).load()
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
   dispatch[key](self)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
   raise EOFError
EOFError

>>> with h5py.File('gan.hdf5','r') as hdf5:
...     print hdf5['skipthought2image']
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
 File "/home/is/seitaro-s/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 164, in __getitem__
   oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
 File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-build-BQojpm/h5py/h5py/h5o.c:3317)
KeyError: "Unable to open object (Object 'skipthought2image' doesn't exist)"

float32/float64 issue unresolved

having same issue as described here even after setting floatX=float32. i have cuda 7.0, cudnn installed on ubuntu 14.04. skipthoughts is verified working fine. any idea what the issue could be?

$ THEANO_FLAGS='floatX=float32,device=gpu0,scan.allow_gc=True' python alignDraw.py models/coco-captions-32x32.json
Using gpu device 0: GRID K520 (CNMeM is disabled, cuDNN 5005)
alignDraw.py:342: FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "alignDraw.py", line 616, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "alignDraw.py", line 354, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "alignDraw.py", line 293, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

wanted the pre-trained model

thanks for your code, and really hope you can upload your trained model, my computer will cost a lot of time to train.

Unable to generate coco images

I am unable to generate images for the MS COCO example. Once the weights are saved I run the command

python sample-captions.py --model models/coco-captions-32x32.json --weights ./attention-vae-2016-6-29-5-2-31.h5 --dictionary ../dictionary.pkl --gan_path ../gan.hdf5 --skipthought_path /home/skip-thoughts

But in the coco-captions-32X32.json file, there is no key "costFunction".
The exact error that I am getting is as follows

Traceback (most recent call last):
File "sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'

Can you please share the updated coco-captions-32X32.json file?

Thanks.

binary_crossentropy

Is there somthing wrong that you use binary_crossentropy in coco, a color image dataset ?

TypeError

Hi,I set the floatX = float32 already , but I also got the TypeError
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')