nelsonzhao / zhihu Goto Github PK

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

Home Page: https://zhuanlan.zhihu.com/zhaoyeyu

HTML 18.23% Jupyter Notebook 81.55% Python 0.22%

deep-learning tensorflow-examples convolutional-neural-networks recurrent-neural-networks autoencoder gan style-transfer natural-language-processing machine-translation

zhihu's Introduction

简介

该Repo内容为知乎专栏《机器不学习》的源代码。

专栏地址：https://zhuanlan.zhihu.com/zhaoyeyu

代码框架

TensorFlow

包含内容

1.anna_lstm

基于RNN（LSTM）对《安娜卡列尼娜》英文文本的学习，实现一个字符级别的生成器。

文章地址：《安娜卡列尼娜》文本生成——利用TensorFlow构建LSTM模型

2.skip-gram

实现skip-gram算法的Word2Vec，基于对英文语料的训练，模型学的各个单词的嵌入向量。

文章地址：基于TensorFlow实现Skip-Gram模型

3.generate_lyrics

基于RNN实现歌词生成器。

4.basic_seq2seq

基于RNN Encoder-Decoder结构的Seq2Seq模型，实现对一个单词中字母的排序。

文章地址：从Encoder到Decoder实现Seq2Seq模型

5.denoise_auto_encoder

基于MNIST手写数据集训练了一个自编码器，并在此基础上增加卷积层实现一个卷积自编码器，从而实现对图像的降噪。

文章地址：利用卷积自编码器对图片进行降噪

6.cifar_cnn

对Kaggle上CIFAR图像分类比赛的一个实现，分别对比了KNN和卷积神经网络在数据上的表现效果。

文章地址：利用卷积神经网络处理CIFAR图像分类

7.mnist_gan

基于MNIST手写数据集，训练了一个隐层为Leaky ReLU的生成对抗网络，让模型学会自己生成手写数字。

文章地址：生成对抗网络（GAN）之MNIST数据生成

8.dcgan

基于MNIST数据集训练了一个DCGAN，加入了Batch normalization，加速模型收敛并提升新能。

文章地址：深度卷积GAN之图像生成

基于CIFAR数据集中的马的图像训练一个DCGAN生成马的图像。

9.batch_normalization_discussion

该部分代码基于MNIST手写数据集构造了一个四层的全连接层神经网络。通过改变不同参数来测试BN对于模型性能的影响。同时利用TensorFlow实现底层的batch normalization。

文章地址：Batch Normalization原理与实战

10.machine_translation_seq2seq

该代码基于TensorFlow 1.6版本的Seq2Seq构建了一个基本的英法翻译模型。

文章地址：基于TensorFlow框架的Seq2Seq英法机器翻译模型

11.mt_attention_birnn

该代码基于Keras框架，在基础Seq2Seq模型基础上增加Attention机制与BiRNN，进一步提升翻译模型的效果；同时可视化Attention层，加深读者对Attention工作机制的理解。模型在在训练样本上的BLEU分数接近0.9。

文章地址：基于Keras框架实现加入Attention与BiRNN的机器翻译模型

12.sentiment_analysis

该代码基于TensorFlow 1.6版本，用DNN、LSTM以及CNN分别构建了sentiment analysis模型，并分析与比较了不同模型的性能。

文章地址：DNN/LSTM/Text-CNN情感分类实战与分析

13.image_style_transfer

代码基于TensorFlow 1.6实现了Image Style Transfer模型，实现了图片的风格的学习与转换。

文章地址：基于TensorFlow构建图片风格迁移模型

14.ctr_models

代码基于TensorFlow 2.0版本实现了DeepFM、Deep&Cross、xDeepFM以及AutoInt四个算法。

文章地址：CTR预估模型：DeepFM/Deep&Cross/xDeepFM/AutoInt代码实战与讲解

不定期更新干货，欢迎Star，欢迎Fork。

zhihu's People

Contributors

Stargazers

Watchers

Forkers

xuxiaopei999 tracymai lawup flying21 moherx alexandliutao williamwhe fucheng830 dingdingcst githubclj allensmile leihao612 superhg2012 mengqhui aobai charlottebronte saulkong rtygbwwwerr douhuang jingliting zhoushaojun 894826977 falconzyx nieshaoshuai danache lyk125 sonlia fresty kvnzhao hedongya alvis-huang feiwang2018 frankatmech zsgchinese chibimiku colinsongf 872409 benjamesbabala xc611 etonchow wang-mengjiao zdm1989 jxlijunhao qoboty neuronblack leenewbie izanyu lanseyege pzfok lugq1990 jchliu yezichou kongsea 760chong chetong wangchi1978 angel1288 mondon11 dgo2dance collector-m joyeel diaoxue fayhe philo-he rikiw chihuataneo f1120309265 tonylibing wangyawy1984 crazy121 liu-hai-yang qicny weirguo ciomp-leon sdsfhtw jianweilin simonzeus wangjin0818 ning-yin clscy lushuyilsy qgzang llp kunchun 2prime sms95 chenyingmei lkunxyz vectorfist lijiazhen1994 alvinlxs wushixong numbergogogo libennext imestar712 cieclm robopassion nieguangyang nlpscott lc123623

zhihu's Issues

basic seq2seq 代码丢失

如题，只有可执行文件，py文件和html文件不见了

关于seq2seq machine translation的validation问题

非常感谢您的无私，我就想请教下为什么在seq2seq machine translation项目里训练阶段没有计算validation loss？如果要做，应该怎么实现呢？谢谢大神

anna_lstm 两个版本的都出现问题

运行两个版本的LSTM代码的时候，都是出现这个问题，一直没找到这个bug，所以想请教一下。
问题出现的代码位置：

运行RNN

outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)

提示出现的错误信息：（貌似是Tensor的形状问题，但是检查了代码，没有发现这个问题出在哪里了）
Dimensions must be equal, but are 1024 and 595 for 'rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [1,1024], [595,2048].

为什么generate_lyrics 生成的歌词是一大段文字，而不是一句一句

这个问题怎么解决呢？找不到glove.6B.100d.txt

The ground truth of seq2seq loss function including <GO> flag?

Does the ground truth of seq2seq loss function including [GO] flag?

seq2seq模型运行时错误：ValueError: too many values to unpack (expected 2)

首先感谢楼主的分享，对于basic_seq2seq中的代码，运行时产生如下错误，我的tensorflow是最新的1.3.0 gpu版本，执行时所有代码均未改动，想楼主帮看看
Traceback (most recent call last):
File "D:\Workspaces\Eclipse\PythonLearn1\src\seq2seq_init_.py", line 227, in
num_layers)
File "D:\Workspaces\Eclipse\PythonLearn1\src\seq2seq_init_.py", line 189, in seq2seq_model
decoder_input)
File "D:\Workspaces\Eclipse\PythonLearn1\src\seq2seq_init_.py", line 146, in decoding_layer
maximum_iterations=max_target_sequence_length)
ValueError: too many values to unpack (expected 2)

在generate_lyric工程中，生成报错如下，probabilities的格式为（1,1,11143），是不是前面哪里有问题，probabilities的格式应该是有问题的

File "/media/a/000DD2E00004710E/generate_lyrics/generate_lyrics.py", line 292, in
print(probabilities[dyn_seq_length - 1])
IndexError: index 1 is out of bounds for axis 0 with size 1

mt_attention_birnn运行plot_attention函数报错

报错位置：---> r = f([X.reshape(-1,20), s0, c0, out0])
InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,20]
[[Node: input_1 = Placeholderdtype=DT_FLOAT, shape=[?,20], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[Node: attention_weights_23/truediv/_2409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6010_attention_weights_23/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

我的Keras版本是2.2.2，tensorflow版本是1.9，请问是什么问题呢，谢谢！

training_logits = tf.identity(training_decoder_output.rnn_output, 'logits') 这里说没有rnn_output属性

是版本问题吗，我的是tensorflow1.10

关于mt_attention_birnn中的预训练词向量数据

您好，最近看到您的mt_attention_birnn项目里有data/glove.6B.100d.txt预训练好的词向量文件，想问下您这个文件在哪里可以找到呢？谢谢！

博主这个模型结构内，In18 里有一个定义全局网络层的操作，是为了保证每一次解码的时候Attention参数都是共享的吗，我自己照着写了一个，当时看这一步觉得有点多余就没做定义，最后发现每一个时序的Attention参数是不共享的了。。。求指点哦~

machine_translation_seq2seq 运行报错

您好，运行down下来的代码，没有改动，运行到
with tf.Session(graph=train_graph) as sess:
sess.run(tf.global_variables_initializer())
这一个代码块的时候程序报错：
Dimensions of inputs should match: shape[0] = [128,1] vs. shape[1] = [4 7,24]
目测是因为source和target的维度不一致导致的，我检查了好几编也没有弄清楚哪里出了问题。还请赐教

dynamic_rnn那里报错了，求解

Dimensions must be equal, but are 1024 and 596 for 'rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [100,1024], [596,2048].

训练+预测输入正常，单独预测输出就乱七八糟，请问这是什么原因？

单独使用
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, prime="The")
print(samp)
输出的就乱七八糟

可视化隐层压缩的数据

作者您好，我对隐含层的特征比较感兴趣，文中您也把隐含层的特征拿出来可视化了，我不太明白这个可视化的结果能看出什么呢？期待您的回复！

basic_seq2seq报错：ValueError: too many values to unpack (expected 2)

我的python版本是3.6，tensorflow版本是1.2：

➜  ~ python
Python 3.6.2 (default, Jul 20 2017, 03:52:27) 
[GCC 7.1.1 20170630] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.2.0-rc2'
>>>

我把basic_seq2seq的代码复制出来，在运行的时候报错如下：

➜  ~ python test.py
Traceback (most recent call last):
  File "test.py", line 224, in <module>
    num_layers)    
  File "test.py", line 184, in seq2seq_model
    decoder_input) 
  File "test.py", line 137, in decoding_layer
    maximum_iterations=max_target_sequence_length)
ValueError: too many values to unpack (expected 2)

复制出来的代码如下(只复制了图算法，反正都是一样的)：

import tensorflow as tf
from tensorflow.python.layers.core import Dense


with open('/home/grt1st/data/letters_source.txt', 'r', encoding='utf-8') as f:
    source_data = f.read()

with open('/home/grt1st/data/letters_target.txt', 'r', encoding='utf-8') as f:
    target_data = f.read()

def extract_character_vocab(data):
    '''
    构造映射表
    '''
    special_words = ['<PAD>', '<UNK>', '<GO>',  '<EOS>']

    set_words = list(set([character for line in data.split('\n') for character in line]))
    # 这里要把四个特殊字符添加进词典
    int_to_vocab = {idx: word for idx, word in enumerate(special_words + set_words)}
    vocab_to_int = {word: idx for idx, word in int_to_vocab.items()}

    return int_to_vocab, vocab_to_int

# 构造映射表
source_int_to_letter, source_letter_to_int = extract_character_vocab(source_data)
target_int_to_letter, target_letter_to_int = extract_character_vocab(target_data)

# 对字母进行转换
source_int = [[source_letter_to_int.get(letter, source_letter_to_int['<UNK>']) 
               for letter in line] for line in source_data.split('\n')]
target_int = [[target_letter_to_int.get(letter, target_letter_to_int['<UNK>']) 
               for letter in line] + [target_letter_to_int['<EOS>']] for line in target_data.split('\n')]

def get_inputs():
    '''
    模型输入tensor
    '''
    inputs = tf.placeholder(tf.int32, [None, None], name='inputs')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    learning_rate = tf.placeholder(tf.float32, name='learning_rate')
    
    # 定义target序列最大长度（之后target_sequence_length和source_sequence_length会作为feed_dict的参数）
    target_sequence_length = tf.placeholder(tf.int32, (None,), name='target_sequence_length')
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name='max_target_len')
    source_sequence_length = tf.placeholder(tf.int32, (None,), name='source_sequence_length')
    
    return inputs, targets, learning_rate, target_sequence_length, max_target_sequence_length, source_sequence_length


def get_encoder_layer(input_data, rnn_size, num_layers,
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):

    '''
    构造Encoder层
    
    参数说明：
    - input_data: 输入tensor
    - rnn_size: rnn隐层结点数量
    - num_layers: 堆叠的rnn cell数量
    - source_sequence_length: 源数据的序列长度
    - source_vocab_size: 源数据的词典大小
    - encoding_embedding_size: embedding的大小
    '''
    # Encoder embedding
    encoder_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)

    # RNN cell
    def get_lstm_cell(rnn_size):
        lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
        return lstm_cell

    cell = tf.contrib.rnn.MultiRNNCell([get_lstm_cell(rnn_size) for _ in range(num_layers)])
    
    encoder_output, encoder_state = tf.nn.dynamic_rnn(cell, encoder_embed_input, 
                                                      sequence_length=source_sequence_length, dtype=tf.float32)
    
    return encoder_output, encoder_state


def process_decoder_input(data, vocab_to_int, batch_size):
    '''
    补充<GO>，并移除最后一个字符
    '''
    # cut掉最后一个字符
    ending = tf.strided_slice(data, [0, 0], [batch_size, -1], [1, 1])
    decoder_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return decoder_input


def decoding_layer(target_letter_to_int, decoding_embedding_size, num_layers, rnn_size,
                   target_sequence_length, max_target_sequence_length, encoder_state, decoder_input):
    '''
    构造Decoder层
    
    参数：
    - target_letter_to_int: target数据的映射表
    - decoding_embedding_size: embed向量大小
    - num_layers: 堆叠的RNN单元数量
    - rnn_size: RNN单元的隐层结点数量
    - target_sequence_length: target数据序列长度
    - max_target_sequence_length: target数据序列最大长度
    - encoder_state: encoder端编码的状态向量
    - decoder_input: decoder端输入
    '''
    # 1. Embedding
    target_vocab_size = len(target_letter_to_int)
    decoder_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    decoder_embed_input = tf.nn.embedding_lookup(decoder_embeddings, decoder_input)

    # 2. 构造Decoder中的RNN单元
    def get_decoder_cell(rnn_size):
        decoder_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
        return decoder_cell
    cell = tf.contrib.rnn.MultiRNNCell([get_decoder_cell(rnn_size) for _ in range(num_layers)])
     
    # 3. Output全连接层
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))


    # 4. Training decoder
    with tf.variable_scope("decode"):
        # 得到help对象
        training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=decoder_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)
        # 构造decoder
        training_decoder = tf.contrib.seq2seq.BasicDecoder(cell,
                                                           training_helper,
                                                           encoder_state,
                                                           output_layer) 
        training_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                                       impute_finished=True,
                                                                       maximum_iterations=max_target_sequence_length)
    # 5. Predicting decoder
    # 与training共享参数
    with tf.variable_scope("decode", reuse=True):
        # 创建一个常量tensor并复制为batch_size的大小
        start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), [batch_size], 
                               name='start_tokens')
        predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(decoder_embeddings,
                                                                start_tokens,
                                                                target_letter_to_int['<EOS>'])
        predicting_decoder = tf.contrib.seq2seq.BasicDecoder(cell,
                                                        predicting_helper,
                                                        encoder_state,
                                                        output_layer)
        predicting_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(predicting_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_target_sequence_length)
    
    return training_decoder_output, predicting_decoder_output


def seq2seq_model(input_data, targets, lr, target_sequence_length, 
                  max_target_sequence_length, source_sequence_length,
                  source_vocab_size, target_vocab_size,
                  encoder_embedding_size, decoder_embedding_size, 
                  rnn_size, num_layers):
    
    # 获取encoder的状态输出
    _, encoder_state = get_encoder_layer(input_data, 
                                  rnn_size, 
                                  num_layers, 
                                  source_sequence_length,
                                  source_vocab_size, 
                                  encoding_embedding_size)
    
    
    # 预处理后的decoder输入
    decoder_input = process_decoder_input(targets, target_letter_to_int, batch_size)
    
    # 将状态向量与输入传递给decoder
    training_decoder_output, predicting_decoder_output = decoding_layer(target_letter_to_int, 
                                                                       decoding_embedding_size, 
                                                                       num_layers, 
                                                                       rnn_size,
                                                                       target_sequence_length,
                                                                       max_target_sequence_length,
                                                                       encoder_state, 
                                                                       decoder_input) 
    
    return training_decoder_output, predicting_decoder_output


# 超参数
# Number of Epochs
epochs = 60
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 50
# Number of Layers
num_layers = 2
# Embedding Size
encoding_embedding_size = 15
decoding_embedding_size = 15
# Learning Rate
learning_rate = 0.001


# 构造graph
train_graph = tf.Graph()

with train_graph.as_default():
    
    # 获得模型输入    
    input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length = get_inputs()
    
    training_decoder_output, predicting_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(source_letter_to_int),
                                                                      len(target_letter_to_int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers)    
    
    training_logits = tf.identity(training_decoder_output.rnn_output, 'logits')
    predicting_logits = tf.identity(predicting_decoder_output.sample_id, name='predictions')
    
    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)

定位到的位置是decoder的训练中：

    with tf.variable_scope("decode"):
        # 得到help对象
        training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=decoder_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)
        # 构造decoder
        training_decoder = tf.contrib.seq2seq.BasicDecoder(cell,
                                                           training_helper,
                                                           encoder_state,
                                                           output_layer) 
        training_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                                       impute_finished=True,
                                                                       maximum_iterations=max_target_sequence_length)

我觉得是helper或者decoder出错了，问题的原因还是版本的变化，但是我找不到解决的办法，谁能帮帮我吗？

tensorflow version

hi what 's the version of tensorflow you used in this project ? since i met error when use the LSTM

关于encode和decode 字向量问题

在encode阶段，使用tf.contrib.layers.embed_sequence函数来生成字向量。在decode阶段使用tf.random_uniform、tf.nn.embedding_lookup来生成和转换成字向量。
有两个疑问：
1.编码阶段和解码阶段不是同一个字向量，为什么需要这么做？不能是同一个字向量吗？
2.tf.contrib.layers.embed_sequence转换的字向量每次都不同，例如每次ids = [1,2,3,4]，每次调用该函数，转换的字向量都是不一样的。这样会有一个问题，相同的字对应的不同的字向量。
先谢谢了

about the original contribution of this code

I think you just copy others github code and translate it to Chinese.
Please give a reference to the original github.

source和target输出一样

预测的时候两个东西一样。。。

CNN

测试的时候loaded_graph = tf.Graph()报错 IndentationError: unexpected indent

运行报ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel, but specified shape (1024, 2048) and found shape (595, 2048).

运行报ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel, but specified shape (1024, 2048) and found shape (595, 2048).
再执行dynamic_rnn时报的
我的tensorflow版本1.3.0,python版本3.5.2

找到原因了，构建lstm的代码lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)，当你构建多层lstm时由于都是用的这一个lstm所以就报以上错误，正确的做法是：
for i in range(num_layers):
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
stack_drop.append(drop)
cell = tf.contrib.rnn.MultiRNNCell(stack_drop, state_is_tuple = True)

auto-encoder损失函数问题

就是你的auto-encoder损失函数用的是sigmoid-cross-entropy-with，我感觉这个损失函数在这儿用不合适啊，我感觉应该用均方差啊，因为你的标签是0～1之间连续的

skip_grams

发现这块逻辑存在问题，

words_count = Counter(words)
words = [w for w in words if words_count[w] > 50]
In [19]:

vocab = set(words)
vocab_to_int = {w: c for c, w in enumerate(vocab)}
int_to_vocab = {c: w for c, w in enumerate(vocab)}
In [20]:
print("total words: {}".format(len(words)))
print("unique words: {}".format(len(set(words))))
total words: 8623686
unique words: 6791
In [21]:

int_words = [vocab_to_int[w] for w in words]

其实vocab_to_int这个数据只是每个单词对应的第一次出现的位置

t = 1e-5 # t值
threshold = 0.9 # 剔除概率阈值

然后这里居然用这个下标用来计算词频？？有人能告诉我是什么情况

int_word_counts = Counter(int_words)
total_count = len(int_words)
word_freqs = {w: c/total_count for w, c in int_word_counts.items()}

prob_drop = {w: 1 - np.sqrt(t / word_freqs[w]) for w in int_word_counts}

对单词进行采样

train_words = [w for w in int_words if prob_drop[w] < threshold]

training_logits, targets维度不匹配

cost = tf.contrib.seq2seq.sequence_loss(training_logits, targets, masks)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [5632] vs. [6400]

就是我的target是（256，25）
可是输出得到的training_logits却是（256， 22， 358）358：词表数

我改了一下，这样就对了

def pad_batch_sentence(batch, max_length, pad_id):
    # max_length = max([len(sentence) for sentence in batch])
    return [sentence + [pad_id] * (max_length - len(sentence)) for sentence in batch]


def get_batches(sources, targets, batch_size):

    for batch_i in range(0, len(sources) // batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        pad_idx = source_vocab_to_int.get("<PAD>")
        sources_batch_pad = np.array(pad_batch_sentence(sources_batch, max_source_sentence_length, pad_idx))
        targets_batch_pad = np.array(pad_batch_sentence(targets_batch, max_target_sentence_length, pad_idx))
        # Need the lengths for the _lengths parameters
        # 不应该是对pad过的batch做长度的计算，因为都是25
        targets_lengths = []
        for target in targets_batch_pad:
            targets_lengths.append(len(target))

        source_lengths = []
        for source in sources_batch_pad:
            source_lengths.append(len(source))

        yield sources_batch_pad, targets_batch_pad, source_lengths, targets_lengths

可是这样传入的source_lengths都是(20，20，20...)
targets_lengths都是(25, 25, 25...)

有關預測和訓練時輸入句數（batch size)不同的問題

請問一下有辦法能讓預測時真的只要輸入一個句子就好嗎？
就是這段程式碼不用*batch_size
answer_logits = sess.run(logits, {input_data: [text]*batch_size,
target_sequence_length: [len(input_word)]*batch_size,
source_sequence_length: [len(input_word)]*batch_size})[0]

anna_lstm

def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
'''
构建lstm层

keep_prob
lstm_size: lstm隐层中结点数目
num_layers: lstm的隐层数目
batch_size: batch_size
'''
# 构建一个基本lstm单元
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)

# 添加dropout
drop = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=keep_prob)

# 堆叠
cell = tf.nn.rnn_cell.MultiRNNCell([drop for _ in range(num_layers)])
initial_state = cell.zero_state(batch_size, tf.float32)

return cell, initial_state

改为:

def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
'''
构建lstm层

keep_prob
lstm_size: lstm隐层中结点数目
num_layers: lstm的隐层数目
batch_size: batch_size
'''

cell_list = []
for  i in range(num_layers): 
    # 构建一个基本lstm单元
    lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)
    # 添加dropout
    drop = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    cell_list.append(drop)

# 堆叠
cell = tf.nn.rnn_cell.MultiRNNCell(cell_list)
initial_state = cell.zero_state(batch_size, tf.float32)

return cell, initial_state

否则 /tensorflow/python/ops/rnn_cell_impl.py
中BasicLSTMCell.call内 self._linear._weights 的shape只满足第一个cell, 无满足第二个cell.

Problems running in the tensorflow GPU version

Here's my code ：
from sklearn.preprocessing import LabelBinarizer
n_class = 10 #总共10类
lb = LabelBinarizer().fit(np.array(range(n_class)))
y_train = lb.transform(y_train)
y_test = lb.transform(y_test)

from sklearn.model_selection import train_test_split

train_ratio = 0.8
x_train_, x_val, y_train_, y_val = train_test_split(x_train,
y_train,
train_size=train_ratio,
random_state=123)
img_shape = x_train.shape
keep_prob = 0.6
epochs=5
batch_size=64
inputs_ = tf.placeholder(tf.float32, [None, 32, 32, 3], name='inputs_')
targets_ = tf.placeholder(tf.float32, [None, n_class], name='targets_')

This is the warning of running this code :
D:\Program Files\Anaconda3\lib\site-packages\sklearn\model_selection_split.py:2010: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
FutureWarning)

第一层卷积加池化

32 x 32 x 3 to 32 x 32 x 64

conv1 = tf.layers.conv2d(inputs_, 64, (2,2), padding='same', activation=tf.nn.relu,
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))

32 x 32 x 64 to 16 x 16 x 64

conv1 = tf.layers.max_pooling2d(conv1, (2,2), (2,2), padding='same')

第二层卷积加池化

16 x 16 x 64 to 16 x 16 x 128

conv2 = tf.layers.conv2d(conv1, 128, (4,4), padding='same', activation=tf.nn.relu,
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))

16 x 16 x 128 to 8 x 8 x 128

conv2 = tf.layers.max_pooling2d(conv2, (2,2), (2,2), padding='same')

重塑输出

shape = np.prod(conv2.get_shape().as_list()[1:])
conv2 = tf.reshape(conv2,[-1, shape])

第一层全连接层

8 x 8 x 128 to 1 x 1024

fc1 = tf.contrib.layers.fully_connected(conv2, 1024, activation_fn=tf.nn.relu)
fc1 = tf.nn.dropout(fc1, keep_prob)

第二层全连接层

1 x 1024 to 1 x 512

fc2 = tf.contrib.layers.fully_connected(fc1, 512, activation_fn=tf.nn.relu)

logits层

1 x 512 to 1 x 10

logits_ = tf.contrib.layers.fully_connected(fc2, 10, activation_fn=None)
logits_ = tf.identity(logits_, name='logits_')

cost & optimizer

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_, labels=targets_))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

accuracy

correct_pred = tf.equal(tf.argmax(logits_, 1), tf.argmax(targets_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
import time
save_model_path='./test_cifar'
count = 0
start = time.time()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
for batch_i in range(img_shape[0]//batch_size-1):
feature_batch = x_train_[batch_i * batch_size: (batch_i+1)*batch_size]
label_batch = y_train_[batch_i * batch_size: (batch_i+1)*batch_size]
train_loss, _ = sess.run([cost, optimizer],
feed_dict={inputs_: feature_batch,
targets_: label_batch})

        val_acc = sess.run(accuracy,
                           feed_dict={inputs_: x_val,
                                      targets_: y_val})
         
        if(count%100==0):
            print('Epoch {:>2}, Train Loss {:.4f}, Validation Accuracy {:4f} '.format(epoch + 1, train_loss, val_acc))
        count += 1

end = time.time()
elapsed = end - start
print ("Time taken: ", elapsed, "seconds.")

This is the error of running this code :
ResourceExhaustedError Traceback (most recent call last)
D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1322 try:
-> 1323 return fn(*args)
1324 except errors.OpError as e:

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1301 feed_dict, fetch_list, target_list,
-> 1302 status, run_metadata)
1303

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive

ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,32,64]
[[Node: conv2d_13/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_inputs__2_0_0/_15, conv2d_12/kernel/read)]]
[[Node: accuracy_6/_17 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_81_accuracy_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

ResourceExhaustedError Traceback (most recent call last)
in ()
54 val_acc = sess.run(accuracy,
55 feed_dict={inputs_: x_val,
---> 56 targets_: y_val})
57
58 if(count%100==0):

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1118 if final_fetches or final_targets or (handle and feed_dict_tensor):
1119 results = self._do_run(handle, final_targets, final_fetches,
-> 1120 feed_dict_tensor, options, run_metadata)
1121 else:
1122 results = []

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1315 if handle is None:
1316 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317 options, run_metadata)
1318 else:
1319 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1334 except KeyError:
1335 pass
-> 1336 raise type(e)(node_def, op, message)
1337
1338 def _extend_graph(self):

Caused by op 'conv2d_13/Conv2D', defined at:
File "D:\Program Files\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "D:\Program Files\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel_main.py", line 3, in
app.launch_new_instance()
File "D:\Program Files\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance
app.start()
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "D:\Program Files\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py", line 608, in conv2d
return layer.apply(inputs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 835, in call
return self.conv_op(inp, filter)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 499, in call
return self.call(inp, filter)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 187, in call
name=self.name)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 630, in conv2d
data_format=data_format, name=name)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,32,64]
[[Node: conv2d_13/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_inputs__2_0_0/_15, conv2d_12/kernel/read)]]
[[Node: accuracy_6/_17 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_81_accuracy_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

how should I correct it .
thanks for you help!

右键菜单找不到preview命令

如图

saver.save(sess, checkpoint) running Error

Value Error：Parent directory of trained_model.ckpt doesn't exist, can't save.
将checkpoint = "trained_model.ckpt"
改为：
checkpoint = "./trained_model.ckpt"

skip-gram

word_freqs = {w: (c+0.0)/total_count for w, c in int_word_counts.items()}
c+0.0转float 不然都为0

下载代码

为什么没办法查看ipynb的代码？请问作者可以开放下载权限吗

basic seq2seq 模型训练部分validation loss计算问题

basic seq2seq 模型训练部分validation loss计算的时候使用的是training decoder的输出。

tf.concat 前后，shape没有变化

seq_output = tf.concat(lstm_output, 1)
这一步里，我发现lstm_output和seq_output 的shape是一样的？

谢谢你的code，关于DGCAN, 假如完成训练后，如何利用训练好的模型产生新的图片。

Nelson,

谢谢你的code，关于DGCAN, 假如完成训练后，如何利用训练好的模型产生新的图片。

Word2Vec之Skip-Gram模型-中文文本版

筛选低频词

words_count = Counter(words)
words = [w for w in words if words_count[w] > 50]
请问这里的words显示报错，因为前面没有出现words这个变量，所以这个words是哪里来的？

注：问题出现在Word2Vec之Skip-Gram模型-中文文本版 这个notebook上

generate_lyric项目中target和logits形状

cell = tf.contrib.rnn.MultiRNNCell([lstm]) 这里是只有一层rnn吗？
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)
这里logits的shape是 batch_size，seq_len，vocab_size可是target是batch_size，seq_len,embed_dim 请问这两个怎么求损失？

image_style_transfer中，格拉姆矩阵的疑惑

def _single_style_loss(self, a, g):
        ###############################
        ## TO DO
        N = a.shape[3]
        M = a.shape[1] * a.shape[2]
        gram_a = self._gram_matrix(a, N, M)
        gram_g = self._gram_matrix(g, N, M)
        return tf.reduce_sum((gram_g - gram_a) ** 2) / (4 * (N ** 2) * (M ** 2))

在这里，a是style image的feature representation， g是generated image的feature representation。在计算generated image的格拉姆矩阵时候，调用了 self._gram_matrix(g, N, M)函数，但这里的输入N和M都是style image的数据。这样计算没有问题么？

ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope

i use tensorflow on windows：
and i modified anna_lstm.py to anna_lstm-tf1.0.py（as you mentioned）
then run：python anna_lstm-tf1.0.py

ValueError: Attempt to reuse RNNCell<tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x00000291BB7E4DA0> with a different variable scope than its first use. First use of cell was with scope than its first use. First use of cell was with scope ........

could you help me with this problem ？thank you 。

word2vec的subsample问题

你好！请问一下word2vec源代码中，subsample高频率单词（如the,a等），是在训练之前subsample吗？还是在训练过程中subsample? 我看作者的源代码好像是在训练过程中subsample，求解答

关于mt_attention_birnn可视化的部分

作者您好，我在试您的代码时发现您写的下面这个可视化函数最终得不到您展示出来的效果图，请问是什么原因呢？
def plot_attention(sentence, Tx=20, Ty=25):
"""
可视化Attention层

@param sentence: 待翻译的句子，str类型
@param Tx: 输入句子的长度
@param Ty: 输出句子的长度
"""

X = np.array(text_to_int(sentence, source_vocab_to_int))
f = K.function(model.inputs, [model.layers[9].get_output_at(t) for t in range(Ty)])

s0 = np.zeros((1, n_s))
c0 = np.zeros((1, n_s))
out0 = np.zeros((1, len(target_vocab_to_int)))

r = f([X.reshape(-1,20), s0, c0, out0])

attention_map = np.zeros((Ty, Tx))
for t in range(Ty):
    for t_prime in range(Tx):
        attention_map[t][t_prime] = r[t][0, t_prime, 0]

Y = make_prediction(sentence)

source_list = sentence.split()
target_list = Y.split()

f, ax = plt.subplots(figsize=(20,15))
sns.heatmap(attention_map, xticklabels=source_list, yticklabels=target_list, cmap="YlGnBu")
ax.set_xticklabels(ax.get_xticklabels(), fontsize=15, rotation=90)
ax.set_yticklabels(ax.get_yticklabels(), fontsize=15)

我print了一下attention_map数组的结果，发现数值全部都是0.05.

nelsonzhao / zhihu Goto Github PK

zhihu's Introduction

简介

代码框架

包含内容

1.anna_lstm

2.skip-gram

3.generate_lyrics

4.basic_seq2seq

5.denoise_auto_encoder

6.cifar_cnn

7.mnist_gan

8.dcgan

9.batch_normalization_discussion

10.machine_translation_seq2seq

11.mt_attention_birnn

12.sentiment_analysis

13.image_style_transfer

14.ctr_models

不定期更新干货，欢迎Star，欢迎Fork。

zhihu's People

Contributors

Stargazers

Watchers

Forkers

zhihu's Issues

运行RNN

其实vocab_to_int这个数据只是每个单词对应的第一次出现的位置

然后这里居然用这个下标用来计算词频？？有人能告诉我是什么情况

对单词进行采样

第一层卷积加池化

32 x 32 x 3 to 32 x 32 x 64

32 x 32 x 64 to 16 x 16 x 64

第二层卷积加池化

16 x 16 x 64 to 16 x 16 x 128

16 x 16 x 128 to 8 x 8 x 128

重塑输出

第一层全连接层

8 x 8 x 128 to 1 x 1024

第二层全连接层

1 x 1024 to 1 x 512

logits层

1 x 512 to 1 x 10

cost & optimizer

accuracy

筛选低频词

Recommend Projects

Recommend Topics

Recommend Org