cai-lw / image-captioning-chinese Goto Github PK
View Code? Open in Web Editor NEWImage Captioning in Chinese using LSTM RNN with attention mechanism
Image Captioning in Chinese using LSTM RNN with attention mechanism
你好,我用的是COCO数据集,
两层LSTM模型, 一层用于实现自上而下的注意力,一层实现语言模型。
用jieba提取词
我将所有图片描述中,出现频率大于3次的词作为字典文件,一共有14226个词。
words = [w for w in word_freq.keys() if word_freq[w] > 3]
训练好模型后,在使用时,结果中出现多个同类型的词,比如:
放在 床上 的 笔记 笔记本 笔记本电脑 电脑
一个 小女 小女孩 女孩 站 在 一起
请教一下,应该怎样解决这个问题?
i can't download the download the dataset.i can't open it
are there another way to download?
您好,我在执行您的代码时运行到
out, attn, alpha = tf.split(lstm_net.outputs, [n_hidden, d_local, a_local ** 2], axis=2)
这一步报错:tensorflow.python.framework.errors_impl.InvalidArgumentError: Sum of output sizes must match the size of the original Tensor along the split dimension or the sum of the positive sizes must be less if it contains a -1 for 'split_1' (op: 'SplitV') with input shapes: [?,?,512], [3], [] and with computed input tensors: input[1] = <512 512 49>, input[2] = <2>.
我理解的是lstm_net.outputs 输出的是[?,?,512]维的tensor ,不能拆分为512,512,49维的out ,attn ,alpha。
我不知道是源代码问题,还是tensorflow 版本问题。
想问您原网络lstm_net输出的是[?,?,1073]还是[?,?,512]呢?
如题,如何生成输入向量(?, 4096)?
希望实现输入一张图片,输出图像描述。
successfully run lstm.py
there is a error when run latm_attention.py in these codes:train_op = tf.train.AdamOptimizer().minimize(loss, var_list=train_vars).
error:Shapes (?, ?, 1073) and (?, ?, 512) are not compatible
my configuration:pythony3.5+tensorflow1.1+GPU
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.