Giter VIP home page Giter VIP logo

deepinterestnetwork's People


alimama-machine-learning-platform avatar zhougr1993 avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepinterestnetwork's Issues


attention方法体中的din_all = tf.concat([queries, keys, queries-keys, querieskeys], axis=-1)。为什么要这样扩展? queries-keys还可以理解为:当前行为与历史行为的差距 ,可querieskeys是为了什么呢? 还望大神解惑

TypeError: iter() returned non-iterator of type 'DataInput'

When I run program according to your instruction, I get such error message. The Python version is 3.65. The iteration protocol in python 3.65 has been changed to next method rather than next() method. Hope to make this program be better


问题1:论文中attention里有一个out product是叉乘/外积的意思吗?

问题2:代码中的实现方法是tf.concat([queries, keys, queries-keys, queries*keys], axis=-1)
但是论文中是将queries, keys, out product(queries, keys)这三个东西concat起来,为什么代码和论文不一致呢?


Hello! I'm running on Windows 10 and get the following error in din/log.txt:

[None, 128]
[None, 128]
[None, 128]
[None, 100, 128]
[None, 2]
Traceback (most recent call last):
File "", line 107, in
print('test_gauc: %.4f\t test_auc: %.4f' % _eval(sess, model))
File "", line 75, in _eval
for _, uij in DataInputTest(test_set, test_batch_size):
TypeError: iter() returned non-iterator of type 'DataInputTest'

Is there a way to fix it?
Thanks in advance!

the risk on

cate_list = [meta_df['categories'][i] for i in range(len(asin_map))]
it's really dangerous if meta_df has duplicate data

Mini-batch Regularization

Hi, Guorui, May I ask which part of the code that exactly implements the parameter regularization? It seems a missing part when initializing the computation model in the init() function of file. Waiting for your response. -:)


1.对于测试集的label构造,也是基于下一次点击的正样本,和随机生成一个负样本构成,很正负样本1:1构造 容易造成auc偏高,并且在实际应用中,也是无法知道下一次的点击的,请问,在实际中 如何构造训练集样本呐?


请教一下,看了din那篇论文里,发现其中activation unit的实现和这里有几点不一样

  1. 论文里的小网络输入是out product,这里是inner product。
  2. 论文里激活函数是prelu,这里是sigmoid。
  3. 论文里接受不用softmax归一化,这里用了归一化。

源码中未点击的广告变量j 相关的问题

我拜读了[Deep Interest Network for Click-Through Rate Prediction] 这篇文论文,并且下载、运行了你的这份源码。有些问题想请你赐教。

在build_dataset.py中,生成test_set((reviewerID, hist, (最后点击的广告的id, 随机出的一个未点击的广告的id))),单条数据的格式是类似于这样:

(0, [13179,17993,28326,29247],(62275,5940))



在model.py中的__init__方法,训练时的前向传播有使用到在指定历史下 未点击的广告id:

jc = tf.gather(cate_list, self.j)
j_emb = tf.concat([
tf.nn.embedding_lookup(item_emb_w, self.j),
tf.nn.embedding_lookup(cate_emb_w, jc),
], axis=1)
j_b = tf.gather(item_b, self.j)
din_j = tf.concat([u_emb, j_emb], axis=-1)
din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
# d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
# d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
self.logits = i_b + d_layer_3_i


logits_all issue?

My understanding is that logits_all symbol is used for real inference scenario, in which interactions of a specific user with all possible items are checked.
If this is true, then the fact that u_emb represents attention of a user history with unrelated item i, and then gets broadcasted along all items, seems wrong to me. Instead, I would expect that each item would be attended vs. user's history.
Appreciate your comments!

about code

Hi, I have some problems about your code.

  1. In your code, the prelu and dice parameters doesn't share between i_emb and j_emb. That means the parameters in j_emb code cannot get any training.
  2. about auc calculation, you concat [i_emb,hist] and [j_emb, hist]. But we get the hist from attention between i_emb and h_emb. This is equivalent to give the model a prior information that the hist is closer to i_emb than j_emb.

Questions about Wide and deep model

After reading your paper, I have a few questions about wide & deep model,wish to communicate wiht you.
# wide part d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1), tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1), tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb, [hidden_units // 2], axis=-1)], axis=-1)

  1. Is this place counting cross features?
  2. In the original paper, didn't it say to make cross features of discrete features?
  3. Why select [0],[-1] [hiddens // 2]?




din的init代码里,分辨初始化了self.i, self.j

    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]

这个i,j分别是什么含义? 都是历史行为还是其他什么意思?

关于 attention 的 减法问题

在attention里面,注意到有四种特征:vec1,vec2,vec1-vec2,vec1*vec2 ,因为最终我们的权重结果是位于0-1之间的,这很类似于一个逻辑回归,所以我在想,vec1-vec2的绝对值的负数,这样是不是更合理一点?有试验过这种吗?



about DeepFM model

Hello, nice work! I have some doubts about your DeepFM model.
1、why do you concat u_emb[0] i_emb[0] in d_layer_fm_i node?
2、I didnt see linear part in fm part.
3、In predict phrase, self.score_i is obtained by add bias and dnn part without fm part. This really made me confused.


通过运行你提供的代码,发现dice的效果不如不使用的效果好。使用model_dice.py模块: epoch 24时: 0.775;使用model.py模块时,epoch 24时: 0.87+。




用业务数据训练 auc能达到0.9以上,但是调用logits_sub预测所有物品得分时候,肉眼看发现效果并不好,并且得分都特别接近,仅0.01左右差距 这是为什么嘞


x = i_b - j_b + d_layer_3_i - d_layer_3_j

其中 x = 正样本的logits - 负样本的logits


在din/ 中用的是next(), 在din/model.py里用的是print 没加括号。但是readme里写的是python>=3.6.3。这是有某种python3兼容python2的方法,还是单纯混用了python2的代码?


DIN模型处理dense feature的相关问题

请问DIN模型的时候,如果用户的历史行为涉及到dense feature的表征,可以加入到attention中进行计算吗?目前看到DIN模型的model文件中在进行attention的时候未把dense feature实值特征加入进去。

Random Embedding layers

First off all Thank you for such a great work. I am new in this field and I have one simple question: As I know embedding is initialized here randomly and used during training and testing Is It possible to use random embedding in practice or do I have to train embedding in other place and feed into network.


@zhougr1993 在下面代码中,u_emb 没用的啊?
u_emb = hist 这里吧u_emb覆盖了,大佬能帮忙解释一下吗?

class Model(object):

  def __init__(self, user_count, item_count, cate_count, cate_list):

    self.u = tf.placeholder(tf.int32, [None,]) # [B]
    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]
    self.y = tf.placeholder(tf.float32, [None,]) # [B]
    self.hist_i = tf.placeholder(tf.int32, [None, None]) # [B, T] = tf.placeholder(tf.int32, [None,]) # [B] = tf.placeholder(tf.float64, [])

    hidden_units = 128

    user_emb_w = tf.get_variable("user_emb_w", [user_count, hidden_units])
    item_emb_w = tf.get_variable("item_emb_w", [item_count, hidden_units // 2])
    item_b = tf.get_variable("item_b", [item_count],
    cate_emb_w = tf.get_variable("cate_emb_w", [cate_count, hidden_units // 2])
    cate_list = tf.convert_to_tensor(cate_list, dtype=tf.int64)

    u_emb = tf.nn.embedding_lookup(user_emb_w, self.u)

    ic = tf.gather(cate_list, self.i)
    i_emb = tf.concat(values = [
        tf.nn.embedding_lookup(item_emb_w, self.i),
        tf.nn.embedding_lookup(cate_emb_w, ic),
        ], axis=1)
    i_b = tf.gather(item_b, self.i)

    jc = tf.gather(cate_list, self.j)
    j_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.j),
        tf.nn.embedding_lookup(cate_emb_w, jc),
        ], axis=1)
    j_b = tf.gather(item_b, self.j)

    hc = tf.gather(cate_list, self.hist_i)
    h_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.hist_i),
        tf.nn.embedding_lookup(cate_emb_w, hc),
        ], axis=2)

    hist =attention(i_emb, h_emb,
    #-- attention end ---
    hist = tf.layers.batch_normalization(inputs = hist)
    hist = tf.reshape(hist, [-1, hidden_units])
    hist = tf.layers.dense(hist, hidden_units)

    u_emb = hist
    print u_emb.get_shape().as_list()
    print i_emb.get_shape().as_list()
    print j_emb.get_shape().as_list()
    #-- fcn begin -------
    din_i = tf.concat([u_emb, i_emb], axis=-1)
    din_i = tf.layers.batch_normalization(inputs=din_i, name='b1')
    d_layer_1_i = tf.layers.dense(din_i, 80, activation=tf.nn.sigmoid, name='f1')
    #if u want try dice change sigmoid to None and add dice layer like following two lines. You can also find in this folder.
    #d_layer_1_i = tf.layers.dense(din_i, 80, activation=None, name='f1')
    #d_layer_1_i = dice(d_layer_1_i, name='dice_1_i')
    d_layer_2_i = tf.layers.dense(d_layer_1_i, 40, activation=tf.nn.sigmoid, name='f2')
    #d_layer_2_i = dice(d_layer_2_i, name='dice_2_i')
    d_layer_3_i = tf.layers.dense(d_layer_2_i, 1, activation=None, name='f3')
    din_j = tf.concat([u_emb, j_emb], axis=-1)
    din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
    d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
    d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
    d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
    d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
    d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
    x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
    self.logits = i_b + d_layer_3_i
    u_emb_all = tf.expand_dims(u_emb, 1)
    u_emb_all = tf.tile(u_emb_all, [1, item_count, 1])
    # logits for all item:
    all_emb = tf.concat([
        tf.nn.embedding_lookup(cate_emb_w, cate_list)
        ], axis=1)
    all_emb = tf.expand_dims(all_emb, 0)
    all_emb = tf.tile(all_emb, [512, 1, 1])
    din_all = tf.concat([u_emb_all, all_emb], axis=-1)
    din_all = tf.layers.batch_normalization(inputs=din_all, name='b1', reuse=True)
    d_layer_1_all = tf.layers.dense(din_all, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_all = dice(d_layer_1_all, name='dice_1_all')
    d_layer_2_all = tf.layers.dense(d_layer_1_all, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_all = dice(d_layer_2_all, name='dice_2_all')
    d_layer_3_all = tf.layers.dense(d_layer_2_all, 1, activation=None, name='f3', reuse=True)
    d_layer_3_all = tf.reshape(d_layer_3_all, [-1, item_count])
    self.logits_all = tf.sigmoid(item_b + d_layer_3_all)
    #-- fcn end -------

    self.mf_auc = tf.reduce_mean(tf.to_float(x > 0))
    self.score_i = tf.sigmoid(i_b + d_layer_3_i)
    self.score_j = tf.sigmoid(j_b + d_layer_3_j)
    self.score_i = tf.reshape(self.score_i, [-1, 1])
    self.score_j = tf.reshape(self.score_j, [-1, 1])
    self.p_and_n = tf.concat([self.score_i, self.score_j], axis=-1)
    print self.p_and_n.get_shape().as_list()

    # Step variable
    self.global_step = tf.Variable(0, trainable=False, name='global_step')
    self.global_epoch_step = \
        tf.Variable(0, trainable=False, name='global_epoch_step')
    self.global_epoch_step_op = \
        tf.assign(self.global_epoch_step, self.global_epoch_step+1)

    self.loss = tf.reduce_mean(

    trainable_params = tf.trainable_variables()
    self.opt = tf.train.GradientDescentOptimizer(
    gradients = tf.gradients(self.loss, trainable_params)
    clip_gradients, _ = tf.clip_by_global_norm(gradients, 5)
    self.train_op = self.opt.apply_gradients(
        zip(clip_gradients, trainable_params), global_step=self.global_step)


您好,看了你写的代码,在模型训练和预测的时候:self.logits = i_b + d_layer_3_i和elf.logits_sub = tf.sigmoid(item_b[:predict_ads_num] + d_layer_3_sub)都额外加了一个偏差,TensorFlow的dense层不是有bias吗,为什么还有在最后输出的时候再加一个?


你好,我想提问一个问题在attention方法体中,因为针对每个用户的hist长度不同,所以最初取得是batch中最长的hist来构成[B, T]维的矩阵,于是对于一个用户u其hist长度不够T的时候,会构成一条(h1, h2, h3,... 0. 0. 0)的向量作为其历史行为,但是这里的0最终会取到第0号商品的emb
所以在attention过程中说有一个paddings = tf.one_like(outputs) *(-2 ** 32 +1 )这里不太理解。希望大神解惑,我私以为paddings = tf.zeros_like(outputs)会更合理些,若是为了符合outputs=tf.where(key_masks, outputs, paddings)这个选择的情况的话

Dice issue

注意到dice module 中 的 tf.layers.batch_normalization training 是默认设置,为false

about wide_deep model


wide part

    d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1),
                                tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1),
                                tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb,
                                                                                           [hidden_units // 2],
                                                                                           axis=-1)], axis=-1)

error: InvalidArgumentError (see above for traceback): indices[0] = -1 is not in [0, 128)
when i change -1 to 1 , it can run.
why ?



attention's issue ?

Hi, zhou
In your original paper, local activation unit perform outer product between user's feature and AD's feature as following.
In your public code rep, the attention function , if i'm right, is the implementation of local activation unit. However, I found that there are no outer product operation in this function. If i'm not right, please tell me what's the implementation of local activate unit, or could you explain the main idea of this implementation and what's the equivalent replacement to outer product mentioned in local activation unit.
I'm looking forward to your reply. Thanks.


d_layer_fm_i = tf.concat([tf.reduce_sum(u_emb*i_emb, axis=-1, keep_dims=True), tf.gather(u_emb, [0], axis=-1) + tf.gather(i_emb, [0], axis=-1)], axis=-1)

din_i = tf.concat([u_emb, i_emb, u_emb*i_emb], axis=-1)


# sort by pred value, from small to big
arr = sorted(raw_arr, key=lambda d:d[2])

auc = 0.0
fp1, tp1, fp2, tp2 = 0.0, 0.0, 0.0, 0.0
for record in arr:
    fp2 += record[0] # noclick
    tp2 += record[1] # click
    auc += (fp2 - fp1) * (tp2 + tp1)
    fp1, tp1 = fp2, tp2

# if all nonclick or click, disgard
threshold = len(arr) - 1e-3
if tp2 > threshold or fp2 > threshold:
    return -0.5

if tp2 * fp2 > 0.0:  # normal auc
    return (1.0 - auc / (2.0 * tp2 * fp2))
    return None

这个思路是求累积的梯形面积。可是为什么要用1.0 - auc / (2.0 * tp2 * fp2)。 而不是auc / (2.0 * tp2 * fp2)呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.