Giter VIP home page Giter VIP logo

deepinterestnetwork's People

Contributors

alimama-machine-learning-platform avatar zhougr1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepinterestnetwork's Issues

对attention方法体的一点点疑惑

attention方法体中的din_all = tf.concat([queries, keys, queries-keys, querieskeys], axis=-1)。为什么要这样扩展? queries-keys还可以理解为:当前行为与历史行为的差距 ,可querieskeys是为了什么呢? 还望大神解惑

TypeError: iter() returned non-iterator of type 'DataInput'

When I run program according to your instruction, I get such error message. The Python version is 3.65. The iteration protocol in python 3.65 has been changed to next method rather than next() method. Hope to make this program be better

关于attention的实现

问题1:论文中attention里有一个out product是叉乘/外积的意思吗?

问题2:代码中的实现方法是tf.concat([queries, keys, queries-keys, queries*keys], axis=-1)
但是论文中是将queries, keys, out product(queries, keys)这三个东西concat起来,为什么代码和论文不一致呢?

Error

Hello! I'm running on Windows 10 and get the following error in din/log.txt:

[None, 128]
[None, 128]
[None, 128]
[None, 100, 128]
[None, 2]
Traceback (most recent call last):
File "train.py", line 107, in
print('test_gauc: %.4f\t test_auc: %.4f' % _eval(sess, model))
File "train.py", line 75, in _eval
for _, uij in DataInputTest(test_set, test_batch_size):
TypeError: iter() returned non-iterator of type 'DataInputTest'

Is there a way to fix it?
Thanks in advance!

the risk on 2_remap_id.py

cate_list = [meta_df['categories'][i] for i in range(len(asin_map))]
it's really dangerous if meta_df has duplicate data

Mini-batch Regularization

Hi, Guorui, May I ask which part of the code that exactly implements the parameter regularization? It seems a missing part when initializing the computation model in the init() function of din.py file. Waiting for your response. -:)

正负样本构造问题

1.对于测试集的label构造,也是基于下一次点击的正样本,和随机生成一个负样本构成,很正负样本1:1构造 容易造成auc偏高,并且在实际应用中,也是无法知道下一次的点击的,请问,在实际中 如何构造训练集样本呐?

Din代码和论文不一致问题

@zhougr1993
请教一下,看了din那篇论文里,发现其中activation unit的实现和这里有几点不一样

  1. 论文里的小网络输入是out product,这里是inner product。
  2. 论文里激活函数是prelu,这里是sigmoid。
  3. 论文里接受不用softmax归一化,这里用了归一化。
    不知道哪种效果比较好一些,为什么做了修改。

源码中未点击的广告变量j 相关的问题

你好。
我拜读了[Deep Interest Network for Click-Through Rate Prediction] 这篇文论文,并且下载、运行了你的这份源码。有些问题想请你赐教。

在build_dataset.py中,生成test_set((reviewerID, hist, (最后点击的广告的id, 随机出的一个未点击的广告的id))),单条数据的格式是类似于这样:

(0, [13179,17993,28326,29247],(62275,5940))

在input.py中给j赋值

j.append(t[2][1])

所以我就理解成变量j是未点击的广告。
在model.py中的__init__方法,训练时的前向传播有使用到在指定历史下 未点击的广告id:

jc = tf.gather(cate_list, self.j)
j_emb = tf.concat([
tf.nn.embedding_lookup(item_emb_w, self.j),
tf.nn.embedding_lookup(cate_emb_w, jc),
], axis=1)
j_b = tf.gather(item_b, self.j)
din_j = tf.concat([u_emb, j_emb], axis=-1)
din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
# d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
# d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
self.logits = i_b + d_layer_3_i

那么问题来了,
1:在训练时使用未点击的广告的信息的目的是什么?是为了得到用户不感兴趣的信息?
2:变量x的作用是什么?
希望能得到回复,非常感谢!

logits_all issue?

Hi,
My understanding is that logits_all symbol is used for real inference scenario, in which interactions of a specific user with all possible items are checked.
If this is true, then the fact that u_emb represents attention of a user history with unrelated item i, and then gets broadcasted along all items, seems wrong to me. Instead, I would expect that each item would be attended vs. user's history.
Appreciate your comments!

about code

Hi, I have some problems about your code.

  1. In your code, the prelu and dice parameters doesn't share between i_emb and j_emb. That means the parameters in j_emb code cannot get any training.
  2. about auc calculation, you concat [i_emb,hist] and [j_emb, hist]. But we get the hist from attention between i_emb and h_emb. This is equivalent to give the model a prior information that the hist is closer to i_emb than j_emb.

Questions about Wide and deep model

After reading your paper, I have a few questions about wide & deep model,wish to communicate wiht you.
# wide part d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1), tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1), tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb, [hidden_units // 2], axis=-1)], axis=-1)

  1. Is this place counting cross features?
  2. In the original paper, didn't it say to make cross features of discrete features?
  3. Why select [0],[-1] [hiddens // 2]?

关于输入问题,如何将ID1,ID1的tag1,tag2,tag3,合并起来

这个问题可能无关这个模型,只是个人百思不得其解。
输入的时候,我们有用户的历史点击ID1,以及ID1的tag1,tag2,tag3,那如何将他们组合起来呢?ID1无可厚非,embedding即可,但是tag有三个,如何组装起来呢?继续embedding,然后取sum(avg)?

DIN里输入的i,j是指什么?

din的init代码里,分辨初始化了self.i, self.j

    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]

这个i,j分别是什么含义? 都是历史行为还是其他什么意思?

关于 attention 的 减法问题

在attention里面,注意到有四种特征:vec1,vec2,vec1-vec2,vec1*vec2 ,因为最终我们的权重结果是位于0-1之间的,这很类似于一个逻辑回归,所以我在想,vec1-vec2的绝对值的负数,这样是不是更合理一点?有试验过这种吗?

数据

有数据字段说明么

about DeepFM model

Hello, nice work! I have some doubts about your DeepFM model.
1、why do you concat u_emb[0] i_emb[0] in d_layer_fm_i node?
2、I didnt see linear part in fm part.
3、In predict phrase, self.score_i is obtained by add bias and dnn part without fm part. This really made me confused.
Thanks!

dice疑问

你好,zhou,
通过运行你提供的代码,发现dice的效果不如不使用的效果好。使用model_dice.py模块: epoch 24时: 0.775;使用model.py模块时,epoch 24时: 0.87+。
貌似dice并不能提升效果,反而没有不使用效果好

关于DIN的dropout问题

在训练阶段dropout,测试阶段全连接,这是常用的手法。请问DIN不需要dropout吗?有测试过吗?

din模型实际业务应用

用业务数据训练 auc能达到0.9以上,但是调用logits_sub预测所有物品得分时候,肉眼看发现效果并不好,并且得分都特别接近,仅0.01左右差距 这是为什么嘞

关于GAUC的计算

为啥gauc是这个
x = i_b - j_b + d_layer_3_i - d_layer_3_j
的平均值呢?

其中 x = 正样本的logits - 负样本的logits

好像混用了python2的代码

在din/input.py 中用的是next(), 在din/model.py里用的是print 没加括号。但是readme里写的是python>=3.6.3。这是有某种python3兼容python2的方法,还是单纯混用了python2的代码?

抱歉,我是新手,如果提问有问题,还请见谅!

DIN模型处理dense feature的相关问题

请问DIN模型的时候,如果用户的历史行为涉及到dense feature的表征,可以加入到attention中进行计算吗?目前看到DIN模型的model文件中在进行attention的时候未把dense feature实值特征加入进去。

Random Embedding layers

First off all Thank you for such a great work. I am new in this field and I have one simple question: As I know embedding is initialized here randomly and used during training and testing Is It possible to use random embedding in practice or do I have to train embedding in other place and feed into network.

DIN模型没用上user特征?

@zhougr1993 在下面代码中,u_emb 没用的啊?
u_emb = hist 这里吧u_emb覆盖了,大佬能帮忙解释一下吗?

class Model(object):

  def __init__(self, user_count, item_count, cate_count, cate_list):

    self.u = tf.placeholder(tf.int32, [None,]) # [B]
    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]
    self.y = tf.placeholder(tf.float32, [None,]) # [B]
    self.hist_i = tf.placeholder(tf.int32, [None, None]) # [B, T]
    self.sl = tf.placeholder(tf.int32, [None,]) # [B]
    self.lr = tf.placeholder(tf.float64, [])

    hidden_units = 128

    user_emb_w = tf.get_variable("user_emb_w", [user_count, hidden_units])
    item_emb_w = tf.get_variable("item_emb_w", [item_count, hidden_units // 2])
    item_b = tf.get_variable("item_b", [item_count],
                             initializer=tf.constant_initializer(0.0))
    cate_emb_w = tf.get_variable("cate_emb_w", [cate_count, hidden_units // 2])
    cate_list = tf.convert_to_tensor(cate_list, dtype=tf.int64)

    u_emb = tf.nn.embedding_lookup(user_emb_w, self.u)

    ic = tf.gather(cate_list, self.i)
    i_emb = tf.concat(values = [
        tf.nn.embedding_lookup(item_emb_w, self.i),
        tf.nn.embedding_lookup(cate_emb_w, ic),
        ], axis=1)
    i_b = tf.gather(item_b, self.i)

    jc = tf.gather(cate_list, self.j)
    j_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.j),
        tf.nn.embedding_lookup(cate_emb_w, jc),
        ], axis=1)
    j_b = tf.gather(item_b, self.j)

    hc = tf.gather(cate_list, self.hist_i)
    h_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.hist_i),
        tf.nn.embedding_lookup(cate_emb_w, hc),
        ], axis=2)

    hist =attention(i_emb, h_emb, self.sl)
    #-- attention end ---
    
    hist = tf.layers.batch_normalization(inputs = hist)
    hist = tf.reshape(hist, [-1, hidden_units])
    hist = tf.layers.dense(hist, hidden_units)

    u_emb = hist
    print u_emb.get_shape().as_list()
    print i_emb.get_shape().as_list()
    print j_emb.get_shape().as_list()
    #-- fcn begin -------
    din_i = tf.concat([u_emb, i_emb], axis=-1)
    din_i = tf.layers.batch_normalization(inputs=din_i, name='b1')
    d_layer_1_i = tf.layers.dense(din_i, 80, activation=tf.nn.sigmoid, name='f1')
    #if u want try dice change sigmoid to None and add dice layer like following two lines. You can also find model_dice.py in this folder.
    #d_layer_1_i = tf.layers.dense(din_i, 80, activation=None, name='f1')
    #d_layer_1_i = dice(d_layer_1_i, name='dice_1_i')
    d_layer_2_i = tf.layers.dense(d_layer_1_i, 40, activation=tf.nn.sigmoid, name='f2')
    #d_layer_2_i = dice(d_layer_2_i, name='dice_2_i')
    d_layer_3_i = tf.layers.dense(d_layer_2_i, 1, activation=None, name='f3')
    din_j = tf.concat([u_emb, j_emb], axis=-1)
    din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
    d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
    d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
    d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
    d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
    d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
    x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
    self.logits = i_b + d_layer_3_i
    u_emb_all = tf.expand_dims(u_emb, 1)
    u_emb_all = tf.tile(u_emb_all, [1, item_count, 1])
    # logits for all item:
    all_emb = tf.concat([
        item_emb_w,
        tf.nn.embedding_lookup(cate_emb_w, cate_list)
        ], axis=1)
    all_emb = tf.expand_dims(all_emb, 0)
    all_emb = tf.tile(all_emb, [512, 1, 1])
    din_all = tf.concat([u_emb_all, all_emb], axis=-1)
    din_all = tf.layers.batch_normalization(inputs=din_all, name='b1', reuse=True)
    d_layer_1_all = tf.layers.dense(din_all, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_all = dice(d_layer_1_all, name='dice_1_all')
    d_layer_2_all = tf.layers.dense(d_layer_1_all, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_all = dice(d_layer_2_all, name='dice_2_all')
    d_layer_3_all = tf.layers.dense(d_layer_2_all, 1, activation=None, name='f3', reuse=True)
    d_layer_3_all = tf.reshape(d_layer_3_all, [-1, item_count])
    self.logits_all = tf.sigmoid(item_b + d_layer_3_all)
    #-- fcn end -------

    
    self.mf_auc = tf.reduce_mean(tf.to_float(x > 0))
    self.score_i = tf.sigmoid(i_b + d_layer_3_i)
    self.score_j = tf.sigmoid(j_b + d_layer_3_j)
    self.score_i = tf.reshape(self.score_i, [-1, 1])
    self.score_j = tf.reshape(self.score_j, [-1, 1])
    self.p_and_n = tf.concat([self.score_i, self.score_j], axis=-1)
    print self.p_and_n.get_shape().as_list()


    # Step variable
    self.global_step = tf.Variable(0, trainable=False, name='global_step')
    self.global_epoch_step = \
        tf.Variable(0, trainable=False, name='global_epoch_step')
    self.global_epoch_step_op = \
        tf.assign(self.global_epoch_step, self.global_epoch_step+1)

    self.loss = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(
            logits=self.logits,
            labels=self.y)
        )

    trainable_params = tf.trainable_variables()
    self.opt = tf.train.GradientDescentOptimizer(learning_rate=self.lr)
    gradients = tf.gradients(self.loss, trainable_params)
    clip_gradients, _ = tf.clip_by_global_norm(gradients, 5)
    self.train_op = self.opt.apply_gradients(
        zip(clip_gradients, trainable_params), global_step=self.global_step)

为什么要最后要再加一个bias呢?

您好,看了你写的代码,在模型训练和预测的时候:self.logits = i_b + d_layer_3_i和elf.logits_sub = tf.sigmoid(item_b[:predict_ads_num] + d_layer_3_sub)都额外加了一个偏差,TensorFlow的dense层不是有bias吗,为什么还有在最后输出的时候再加一个?
希望得到您的解答。

对attention方法体中一个细节的疑问

你好,我想提问一个问题在attention方法体中,因为针对每个用户的hist长度不同,所以最初取得是batch中最长的hist来构成[B, T]维的矩阵,于是对于一个用户u其hist长度不够T的时候,会构成一条(h1, h2, h3,... 0. 0. 0)的向量作为其历史行为,但是这里的0最终会取到第0号商品的emb
所以在attention过程中说有一个paddings = tf.one_like(outputs) *(-2 ** 32 +1 )这里不太理解。希望大神解惑,我私以为paddings = tf.zeros_like(outputs)会更合理些,若是为了符合outputs=tf.where(key_masks, outputs, paddings)这个选择的情况的话

Dice issue

注意到dice module 中 的 tf.layers.batch_normalization training 是默认设置,为false
能解答下嘛

about wide_deep model

hello,mr.zhou:

wide part

    d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1),
                                tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1),
                                tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb,
                                                                                           [hidden_units // 2],
                                                                                           axis=-1)], axis=-1)

error: InvalidArgumentError (see above for traceback): indices[0] = -1 is not in [0, 128)
when i change -1 to 1 , it can run.
why ?

关于实际问题中的序列构造问题

请问一下,电商场景中除了用户的点击序列,应该还有购买和收藏序列,然后一般还会按时间切分生产多个序列。这种情况下对所有的序列都用网络不同的attention结构么。还是说,其实只需要一个序列就好了。

attention's issue ?

Hi, zhou
In your original paper, local activation unit perform outer product between user's feature and AD's feature as following.
image
In your public code rep, the attention function , if i'm right, is the implementation of local activation unit. However, I found that there are no outer product operation in this function. If i'm not right, please tell me what's the implementation of local activate unit, or could you explain the main idea of this implementation and what's the equivalent replacement to outer product mentioned in local activation unit.
I'm looking forward to your reply. Thanks.

关于各个模型中的FM部分

比如DeepFM:
d_layer_fm_i = tf.concat([tf.reduce_sum(u_emb*i_emb, axis=-1, keep_dims=True), tf.gather(u_emb, [0], axis=-1) + tf.gather(i_emb, [0], axis=-1)], axis=-1)
这里把u_emb和i_emb直接相乘作为二次交叉项,u_emb[0]和i_emb[0]作为一次项。
我是不是可以理解为:这里的field只有u和i两项,如果有更多field,是不是需要逐个进行二次交叉?
u_emb[0]作为一次项是不是损失了一些信息?用reduce_sum以后的u_emb会不会更好?

同理PNN:
din_i = tf.concat([u_emb, i_emb, u_emb*i_emb], axis=-1)
PNN这里,对field进行二次交叉product,在有多个field时,是否也是需要逐个交叉?

关于常规AUC计算的问题。

代码中有两个AUC。普通的计算方式为
# sort by pred value, from small to big
arr = sorted(raw_arr, key=lambda d:d[2])

auc = 0.0
fp1, tp1, fp2, tp2 = 0.0, 0.0, 0.0, 0.0
for record in arr:
    fp2 += record[0] # noclick
    tp2 += record[1] # click
    auc += (fp2 - fp1) * (tp2 + tp1)
    fp1, tp1 = fp2, tp2

# if all nonclick or click, disgard
threshold = len(arr) - 1e-3
if tp2 > threshold or fp2 > threshold:
    return -0.5

if tp2 * fp2 > 0.0:  # normal auc
    return (1.0 - auc / (2.0 * tp2 * fp2))
else:
    return None

这个思路是求累积的梯形面积。可是为什么要用1.0 - auc / (2.0 * tp2 * fp2)。 而不是auc / (2.0 * tp2 * fp2)呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.