mansimov / text2image Goto Github PK
View Code? Open in Web Editor NEWGenerating Images from Captions with Attention
License: MIT License
Generating Images from Captions with Attention
License: MIT License
I work on a GPU: Tesla k40c with (CNMeM is enabled with initial size: 75.0% of memory, cuDNN 4007). And it takes me about 10+ min for training a epoch. Is it normal? Or I made a wrong config for my environment?
Here comes my log:
Train Results
20.4327215786 1138.28323364 1158.71595093
Validation Results
building validate function
0:00:52.053928
9.35944375 752.618843994 761.978292236
Recreated Train Dataset
[1139, 1308, 1155, 1142, 1335, 1308, 1309, 1304]
Epoch 1 took 0:13:34.399718
Train Results
13.6282463623 686.228092651 699.856334229
Recreated Train Dataset
[1161, 1261, 1172, 1157, 1288, 1352, 1311, 1298]
Epoch 2 took 0:12:48.676186
Train Results
14.1116136597 600.27987915 614.391492615
Recreated Train Dataset
[1221, 1271, 1145, 1203, 1312, 1312, 1273, 1263]
Epoch 3 took 0:11:07.076383
Train Results
12.3779268677 505.388762512 517.766689453
Validation Results
building validate function
0:00:07.973637
10.1579154358 474.040048218 484.197965393
Recreated Train Dataset
[1137, 1284, 1149, 1137, 1365, 1308, 1336, 1284]
Epoch 4 took 0:11:08.574547
Train Results
14.8620436646 411.504102783 426.366148071
Recreated Train Dataset
[1185, 1271, 1186, 1222, 1271, 1276, 1278, 1311]
Epoch 5 took 0:10:53.159525
Train Results
13.2145661377 363.50835083 376.72291626
Recreated Train Dataset
[1166, 1317, 1161, 1182, 1225, 1291, 1359, 1299]
Hi, I am trying run the alignDraw.py I am getting the following error
monica@monica:~/PycharmProjects/text2image-master/coco$ python alignDraw.py ./models/coco-captions-32x32.json
Traceback (most recent call last):
File "alignDraw.py", line 3, in
import h5py
File "/home/monica/.local/lib/python2.7/site-packages/h5py/init.py", line 24, in
from . import _errors
ImportError: /home/monica/.local/lib/python2.7/site-packages/h5py/_errors.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8
Please help me how can I solve this error
I want to look at the result of the experiment,but I can’t find it on my computer.So where is it resaved?Where can I find it?
I want use the code to train a model on my owen dataset. But I'm not sure about how to modify the code if I want to generate image with size of 64*64 or even more. Did anyone tries it?
Thanks!
Hi
@MissT157 and I are experimenting with this code and we are finding the implementation of KL divergence a bit awry. Can you please revisit and confirm whether it's been implemented correctly?
Thanks in advance
@g1910
I am trying to train models defined in coc0-captions-32x32.json but it is taking insane amount of time. There are around 200 epochs and it tool 15 hours to complete just one:
Epoch 0 took 15:17:07.833333
Just wondering is there a way to get it done faster or completing iterations for all 200 epochs is really necessary. Is there a base number I can use to get a minimal working setup. Or if anyone have trained weight files that would really help me out. Thanks
I am using NVIDIA GPU for my test on an AWS instance.
Why can I find no generated model under mnist?
And when I run the code under MS coco,it shows that float 32 is wrong,but when I changed float 32 to float 64,it is still wrong.How should I do to solve the problem?
envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py coco/models/coco-captions-32x32.json/os_pri/github/text2image$
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
coco/alignDraw.py:343: FutureWarning: comparison to None
will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "coco/alignDraw.py", line 617, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "coco/alignDraw.py", line 355, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "coco/alignDraw.py", line 294, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info
in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn
) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.
envy@ub1404:
Hello, I am using the COCO dataset,
A two-layer LSTM model, one layer for top-down attention, and one layer for language models.
Extracting words with jieba
I used all the words in the picture description that occurred more than 3 times as a dictionary file, and a total of 14,226 words.
words = [w for w in word_freq.keys () if word_freq [w]> 3]
After training the model, when using it, multiple words of the same type appear in the result, such as:
Note notebook laptop computer on bed
A little girl little girl girl standing together
How can I solve this problem?
IOError: Unable to open file (unable to open file: name = '/ais/gobi3/u/nitish/mnist/mnist.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Anyone knows what this path is for ??
ais/gobi3/u/nitish/mnist/mnist.h5'
envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:~/os_pri/github/text2image$
Hi.
Thank you for your research!
Sorry, I have problem with testing, can you show your .theanorc?
I got typecasting error:
TypeError: Cannot convert Type TensorType(float32, matrix) (of Variable AdvancedSubtensor1.0) into Type TensorType(float64, matrix). You can try to manually convert AdvancedSubtensor1.0 into a TensorType(float64, matrix).
envy@ub1404:/os_pri/github/text2image$ python mnist-captions/sample-captions.py --model models/mnist-captions.json --weights ../os_pri/github/text2image$
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "mnist-captions/sample-captions.py", line 42, in
parser = ArgumentParser()
NameError: name 'ArgumentParser' is not defined
envy@ub1404:
envy@ub1404:/os_pri/github/text2image$ python coco/sample-captions.py --model coco/models/coco-captions-32x32.json --weights ../os_pri/github/text2image$
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'
envy@ub1404:
I tried to run your code for MS COCO by "python alignDraw.py models/coco-captions-32x32.json", but I'm faced with a file loading problem.
Then, I tried to check whether additional downloaded files are possible to load or not.
I can't to load these files as below,
train-images-56x56.npy, train-captions.npy, train-captions-len.npy,
train-cap2im.pkl, dev-images-32x32.npy, dev-images-56x56.npy, dev-captions.npy,
dev-captions-len.npy, dev-cap2im.pkl, gan.hdf5
I suspected that I had some mistakes, firstly. However, I can load these files as below,
train-images-32x32.npy,
test-images-32x32.npy, test-captions.npy, test-captions-len.npy, test-cap2im.pkl,
dictionary.pkl
I'm not sure why the difference happens.
I added all error log in the end of this issue.
my environment
Thanks !
import numpy as np
import pickle
>>> data=np.load("train-images-56x56.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
pickle_kwargs=pickle_kwargs)
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
array.shape = shape
ValueError: total size of new array must be unchanged
>>> data=np.load("train-captions.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 416, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'train-captions.npy' as a pickle
>>> data=np.load("train-captions-len.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
pickle_kwargs=pickle_kwargs)
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
array.shape = shape
ValueError: total size of new array must be unchanged
>>> data=np.load("dev-images-32x32.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
fid.seek(-N, 1) # back-up
IOError: [Errno 22] Invalid argument
>>> data=np.load("dev-images-56x56.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
pickle_kwargs=pickle_kwargs)
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
array.shape = shape
ValueError: total size of new array must be unchanged
>>> data=np.load("dev-captions.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
fid.seek(-N, 1) # back-up
IOError: [Errno 22] Invalid argument
>>> data=np.load("dev-captions-len.npy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
fid.seek(-N, 1) # back-up
IOError: [Errno 22] Invalid argument
>>> with open('train-cap2im.pkl','r') as f:
... data = pickle.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
raise EOFError
EOFError
>>> with open('dev-cap2im.pkl','r') as f:
... data = pickle.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
raise EOFError
EOFError
>>> with h5py.File('gan.hdf5','r') as hdf5:
... print hdf5['skipthought2image']
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
File "/home/is/seitaro-s/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 164, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-build-BQojpm/h5py/h5py/h5o.c:3317)
KeyError: "Unable to open object (Object 'skipthought2image' doesn't exist)"
having same issue as described here even after setting floatX=float32. i have cuda 7.0, cudnn installed on ubuntu 14.04. skipthoughts is verified working fine. any idea what the issue could be?
$ THEANO_FLAGS='floatX=float32,device=gpu0,scan.allow_gc=True' python alignDraw.py models/coco-captions-32x32.json
Using gpu device 0: GRID K520 (CNMeM is disabled, cuDNN 5005)
alignDraw.py:342: FutureWarning: comparison toNone
will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "alignDraw.py", line 616, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "alignDraw.py", line 354, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "alignDraw.py", line 293, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info
in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn
) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.
thanks for your code, and really hope you can upload your trained model, my computer will cost a lot of time to train.
I am unable to generate images for the MS COCO example. Once the weights are saved I run the command
python sample-captions.py --model models/coco-captions-32x32.json --weights ./attention-vae-2016-6-29-5-2-31.h5 --dictionary ../dictionary.pkl --gan_path ../gan.hdf5 --skipthought_path /home/skip-thoughts
But in the coco-captions-32X32.json file, there is no key "costFunction".
The exact error that I am getting is as follows
Traceback (most recent call last):
File "sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'
Can you please share the updated coco-captions-32X32.json file?
Thanks.
Is there somthing wrong that you use binary_crossentropy in coco, a color image dataset ?
Hi,I set the floatX = float32 already , but I also got the TypeError
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.