zhougr1993 / deepinterestnetwork Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 557.0 96.76 MB

Python 99.66% Shell 0.34%

deepinterestnetwork's People

Contributors

Stargazers

Watchers

Forkers

chenkeyuparadise leepwang maggione sleeperqp zgcgreat talkischeapgivememoney zpilgrim dylan-fan daiwk hecongqing hrwleo jangmino hwangjohn deepblue666 princewen yangxudong buctdarkness kelvict gucasbrg shuyaoyimei fengnote gloaming2dawn majia1949 wubinzzu hkxiron qxksweat zyfnhct rosefun hadoop73 bellagao1023 xianchengfeng jimmyjunucas chgao96 gusuperstar hhh920406 huzqujm vincentami xiaomaohoujiao2 namizzz gzpan lengzi wangle1218 bayesquant recsys-tools yqf-oo welchkimi xseaty daidaixiong shoumu kevin-14 cxl214a mcmaxmm superchuan nosuggest rongbohui gjjg1331jggj karlyang2013 martin6336 irwenqiang xielm12 gonewithgt alphaseekerli tk-blost libin19861023 spirit-dongdong fengsee hufangjian shenlanyilang wangjianyong vangogh0318 lgdkobe24 heymind paul0m chengli0327 fhqplzj liuleigit breakjiang rugby0823 vkmenon zscwind azizilyosov annie1213 shubhampachori12110095 feedliu jessiesun-2021 goongong batermj sky-github wtbsw dashi1993 daicoolb fengyueqq xujianjlu ingridliu 18871434218 ailanchong gy910210 allensmile lhx121 dreadlord1984

deepinterestnetwork's Issues

对attention方法体的一点点疑惑

attention方法体中的din_all = tf.concat([queries, keys, queries-keys, querieskeys], axis=-1)。为什么要这样扩展？ queries-keys还可以理解为：当前行为与历史行为的差距，可querieskeys是为了什么呢？还望大神解惑

TypeError: iter() returned non-iterator of type 'DataInput'

When I run program according to your instruction, I get such error message. The Python version is 3.65. The iteration protocol in python 3.65 has been changed to next method rather than next() method. Hope to make this program be better

关于attention的实现

问题1:论文中attention里有一个out product是叉乘/外积的意思吗？

问题2:代码中的实现方法是tf.concat([queries, keys, queries-keys, queries*keys], axis=-1)
但是论文中是将queries, keys, out product(queries, keys)这三个东西concat起来，为什么代码和论文不一致呢？

Error

Hello! I'm running on Windows 10 and get the following error in din/log.txt:

[None, 128]
[None, 128]
[None, 128]
[None, 100, 128]
[None, 2]
Traceback (most recent call last):
File "train.py", line 107, in
print('test_gauc: %.4f\t test_auc: %.4f' % _eval(sess, model))
File "train.py", line 75, in _eval
for _, uij in DataInputTest(test_set, test_batch_size):
TypeError: iter() returned non-iterator of type 'DataInputTest'

Is there a way to fix it?
Thanks in advance!

the risk on 2_remap_id.py

cate_list = [meta_df['categories'][i] for i in range(len(asin_map))]
it's really dangerous if meta_df has duplicate data

Mini-batch Regularization

Hi, Guorui, May I ask which part of the code that exactly implements the parameter regularization? It seems a missing part when initializing the computation model in the init() function of din.py file. Waiting for your response. -:)

正负样本构造问题

1.对于测试集的label构造，也是基于下一次点击的正样本，和随机生成一个负样本构成，很正负样本1:1构造容易造成auc偏高，并且在实际应用中，也是无法知道下一次的点击的，请问，在实际中如何构造训练集样本呐？

Din代码和论文不一致问题

@zhougr1993
请教一下，看了din那篇论文里，发现其中activation unit的实现和这里有几点不一样

论文里的小网络输入是out product，这里是inner product。
论文里激活函数是prelu，这里是sigmoid。
论文里接受不用softmax归一化，这里用了归一化。
不知道哪种效果比较好一些，为什么做了修改。

源码中未点击的广告变量j 相关的问题

你好。
我拜读了[Deep Interest Network for Click-Through Rate Prediction] 这篇文论文，并且下载、运行了你的这份源码。有些问题想请你赐教。

在build_dataset.py中，生成test_set（(reviewerID, hist, (最后点击的广告的id, 随机出的一个未点击的广告的id))），单条数据的格式是类似于这样：

(0, [13179,17993,28326,29247],(62275,5940))

在input.py中给j赋值

j.append(t[2][1])

所以我就理解成变量j是未点击的广告。
在model.py中的__init__方法，训练时的前向传播有使用到在指定历史下未点击的广告id:

jc = tf.gather(cate_list, self.j)
j_emb = tf.concat([
tf.nn.embedding_lookup(item_emb_w, self.j),
tf.nn.embedding_lookup(cate_emb_w, jc),
], axis=1)
j_b = tf.gather(item_b, self.j)
din_j = tf.concat([u_emb, j_emb], axis=-1)
din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
# d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
# d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
self.logits = i_b + d_layer_3_i

那么问题来了，
1：在训练时使用未点击的广告的信息的目的是什么？是为了得到用户不感兴趣的信息？
2：变量x的作用是什么？
希望能得到回复，非常感谢！

Mini-batch Aware Regularization

方便给个自适应正则的din版本吗，想学习那个部分是怎么实现的。

logits_all issue?

Hi,
My understanding is that logits_all symbol is used for real inference scenario, in which interactions of a specific user with all possible items are checked.
If this is true, then the fact that u_emb represents attention of a user history with unrelated item i, and then gets broadcasted along all items, seems wrong to me. Instead, I would expect that each item would be attended vs. user's history.
Appreciate your comments!

使用其他模型（如LR）作为对比试验时，是否也需要输入历史ID信息

about code

Hi, I have some problems about your code.

In your code, the prelu and dice parameters doesn't share between i_emb and j_emb. That means the parameters in j_emb code cannot get any training.
about auc calculation, you concat [i_emb,hist] and [j_emb, hist]. But we get the hist from attention between i_emb and h_emb. This is equivalent to give the model a prior information that the hist is closer to i_emb than j_emb.

Questions about Wide and deep model

After reading your paper, I have a few questions about wide & deep model,wish to communicate wiht you.
# wide part d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1), tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1), tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb, [hidden_units // 2], axis=-1)], axis=-1)

Is this place counting cross features?
In the original paper, didn't it say to make cross features of discrete features?
Why select [0],[-1] [hiddens // 2]?

我想问一下，DIN需要做特征工程吗？

如题~

关于输入问题，如何将ID1，ID1的tag1，tag2，tag3，合并起来

这个问题可能无关这个模型，只是个人百思不得其解。
输入的时候，我们有用户的历史点击ID1，以及ID1的tag1，tag2，tag3，那如何将他们组合起来呢？ID1无可厚非，embedding即可，但是tag有三个，如何组装起来呢？继续embedding，然后取sum(avg)？

嗨，大神，冒昧地问一下您，可以方便把数据集给我一下吗，我下的数据集一直跑不通，如果可以的话真心万分感谢！

DIN里输入的i，j是指什么？

din的init代码里，分辨初始化了self.i, self.j

    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]

这个i，j分别是什么含义？都是历史行为还是其他什么意思？

关于 attention 的减法问题

在attention里面，注意到有四种特征：vec1，vec2，vec1-vec2，vec1*vec2 ，因为最终我们的权重结果是位于0-1之间的，这很类似于一个逻辑回归，所以我在想，vec1-vec2的绝对值的负数，这样是不是更合理一点？有试验过这种吗？

劳驾~关于 attention 的激活函数sigmoid

在din/model.py代码中attention里面使用了sigmoid作为激活函数，我们常见的是relu，这里的sigmoid有何特殊之处，对照过效果吗？

数据

有数据字段说明么

wide&deep中tf.gather u_emb,i_emb的第一列/中间一列/最后一列的含义

对这段代码有些迷惑。wide部分的实现难道不是一个dim=(n_user + n_item + n_cate + n_other_cross_fetures)的长向量乘一个相同维度的multi-hot向量吗？为什么代码中是 tf.gather(u_emb,i_emb的第一列/中间一列/最后一列) 然后再加一个全连接层呢？

about DeepFM model

Hello, nice work! I have some doubts about your DeepFM model.
1、why do you concat u_emb[0] i_emb[0] in d_layer_fm_i node?
2、I didnt see linear part in fm part.
3、In predict phrase, self.score_i is obtained by add bias and dnn part without fm part. This really made me confused.
Thanks!

dice疑问

你好，zhou,
通过运行你提供的代码，发现dice的效果不如不使用的效果好。使用model_dice.py模块: epoch 24时: 0.775；使用model.py模块时，epoch 24时: 0.87+。
貌似dice并不能提升效果，反而没有不使用效果好

关于DIN的dropout问题

在训练阶段dropout，测试阶段全连接，这是常用的手法。请问DIN不需要dropout吗？有测试过吗？

Embedding of user (named after `u_emb`) seem to be unused in DIN model

Both variable i_emb and h_emb are irrelevant to embedding vector u_emb, while u_emb is assigned to hist (attention result). It's strange to define user embedding vector without using it.

I would appreciate it if you explain it in detail. Thanks

din模型实际业务应用

用业务数据训练 auc能达到0.9以上,但是调用logits_sub预测所有物品得分时候,肉眼看发现效果并不好,并且得分都特别接近,仅0.01左右差距这是为什么嘞

关于GAUC的计算

为啥gauc是这个
x = i_b - j_b + d_layer_3_i - d_layer_3_j
的平均值呢？

其中 x = 正样本的logits - 负样本的logits

好像混用了python2的代码

在din/input.py 中用的是next(), 在din/model.py里用的是print 没加括号。但是readme里写的是python>=3.6.3。这是有某种python3兼容python2的方法，还是单纯混用了python2的代码？

抱歉，我是新手，如果提问有问题，还请见谅！

hi, can you share your python script about movieLens?

Hi, i'm doing some experiments about ctr. To compare with your din carefully, i want to use the same ways about dealing with the movieLens.
Thanks.
my email is [email protected]

DIN模型处理dense feature的相关问题

请问DIN模型的时候，如果用户的历史行为涉及到dense feature的表征，可以加入到attention中进行计算吗？目前看到DIN模型的model文件中在进行attention的时候未把dense feature实值特征加入进去。

请问在实际应用中，集群训练是采用的ps结构吗？

How to draw the network structure?

I feel the figure in your paper looks very beautiful, so what tool did you use to draw the network structure? thanks a lot.

Random Embedding layers

First off all Thank you for such a great work. I am new in this field and I have one simple question: As I know embedding is initialized here randomly and used during training and testing Is It possible to use random embedding in practice or do I have to train embedding in other place and feed into network.

DIN模型没用上user特征？

@zhougr1993 在下面代码中，u_emb 没用的啊？
u_emb = hist 这里吧u_emb覆盖了，大佬能帮忙解释一下吗？

class Model(object):

  def __init__(self, user_count, item_count, cate_count, cate_list):

    self.u = tf.placeholder(tf.int32, [None,]) # [B]
    self.i = tf.placeholder(tf.int32, [None,]) # [B]
    self.j = tf.placeholder(tf.int32, [None,]) # [B]
    self.y = tf.placeholder(tf.float32, [None,]) # [B]
    self.hist_i = tf.placeholder(tf.int32, [None, None]) # [B, T]
    self.sl = tf.placeholder(tf.int32, [None,]) # [B]
    self.lr = tf.placeholder(tf.float64, [])

    hidden_units = 128

    user_emb_w = tf.get_variable("user_emb_w", [user_count, hidden_units])
    item_emb_w = tf.get_variable("item_emb_w", [item_count, hidden_units // 2])
    item_b = tf.get_variable("item_b", [item_count],
                             initializer=tf.constant_initializer(0.0))
    cate_emb_w = tf.get_variable("cate_emb_w", [cate_count, hidden_units // 2])
    cate_list = tf.convert_to_tensor(cate_list, dtype=tf.int64)

    u_emb = tf.nn.embedding_lookup(user_emb_w, self.u)

    ic = tf.gather(cate_list, self.i)
    i_emb = tf.concat(values = [
        tf.nn.embedding_lookup(item_emb_w, self.i),
        tf.nn.embedding_lookup(cate_emb_w, ic),
        ], axis=1)
    i_b = tf.gather(item_b, self.i)

    jc = tf.gather(cate_list, self.j)
    j_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.j),
        tf.nn.embedding_lookup(cate_emb_w, jc),
        ], axis=1)
    j_b = tf.gather(item_b, self.j)

    hc = tf.gather(cate_list, self.hist_i)
    h_emb = tf.concat([
        tf.nn.embedding_lookup(item_emb_w, self.hist_i),
        tf.nn.embedding_lookup(cate_emb_w, hc),
        ], axis=2)

    hist =attention(i_emb, h_emb, self.sl)
    #-- attention end ---
    
    hist = tf.layers.batch_normalization(inputs = hist)
    hist = tf.reshape(hist, [-1, hidden_units])
    hist = tf.layers.dense(hist, hidden_units)

    u_emb = hist
    print u_emb.get_shape().as_list()
    print i_emb.get_shape().as_list()
    print j_emb.get_shape().as_list()
    #-- fcn begin -------
    din_i = tf.concat([u_emb, i_emb], axis=-1)
    din_i = tf.layers.batch_normalization(inputs=din_i, name='b1')
    d_layer_1_i = tf.layers.dense(din_i, 80, activation=tf.nn.sigmoid, name='f1')
    #if u want try dice change sigmoid to None and add dice layer like following two lines. You can also find model_dice.py in this folder.
    #d_layer_1_i = tf.layers.dense(din_i, 80, activation=None, name='f1')
    #d_layer_1_i = dice(d_layer_1_i, name='dice_1_i')
    d_layer_2_i = tf.layers.dense(d_layer_1_i, 40, activation=tf.nn.sigmoid, name='f2')
    #d_layer_2_i = dice(d_layer_2_i, name='dice_2_i')
    d_layer_3_i = tf.layers.dense(d_layer_2_i, 1, activation=None, name='f3')
    din_j = tf.concat([u_emb, j_emb], axis=-1)
    din_j = tf.layers.batch_normalization(inputs=din_j, name='b1', reuse=True)
    d_layer_1_j = tf.layers.dense(din_j, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_j = dice(d_layer_1_j, name='dice_1_j')
    d_layer_2_j = tf.layers.dense(d_layer_1_j, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_j = dice(d_layer_2_j, name='dice_2_j')
    d_layer_3_j = tf.layers.dense(d_layer_2_j, 1, activation=None, name='f3', reuse=True)
    d_layer_3_i = tf.reshape(d_layer_3_i, [-1])
    d_layer_3_j = tf.reshape(d_layer_3_j, [-1])
    x = i_b - j_b + d_layer_3_i - d_layer_3_j # [B]
    self.logits = i_b + d_layer_3_i
    u_emb_all = tf.expand_dims(u_emb, 1)
    u_emb_all = tf.tile(u_emb_all, [1, item_count, 1])
    # logits for all item:
    all_emb = tf.concat([
        item_emb_w,
        tf.nn.embedding_lookup(cate_emb_w, cate_list)
        ], axis=1)
    all_emb = tf.expand_dims(all_emb, 0)
    all_emb = tf.tile(all_emb, [512, 1, 1])
    din_all = tf.concat([u_emb_all, all_emb], axis=-1)
    din_all = tf.layers.batch_normalization(inputs=din_all, name='b1', reuse=True)
    d_layer_1_all = tf.layers.dense(din_all, 80, activation=tf.nn.sigmoid, name='f1', reuse=True)
    #d_layer_1_all = dice(d_layer_1_all, name='dice_1_all')
    d_layer_2_all = tf.layers.dense(d_layer_1_all, 40, activation=tf.nn.sigmoid, name='f2', reuse=True)
    #d_layer_2_all = dice(d_layer_2_all, name='dice_2_all')
    d_layer_3_all = tf.layers.dense(d_layer_2_all, 1, activation=None, name='f3', reuse=True)
    d_layer_3_all = tf.reshape(d_layer_3_all, [-1, item_count])
    self.logits_all = tf.sigmoid(item_b + d_layer_3_all)
    #-- fcn end -------

    
    self.mf_auc = tf.reduce_mean(tf.to_float(x > 0))
    self.score_i = tf.sigmoid(i_b + d_layer_3_i)
    self.score_j = tf.sigmoid(j_b + d_layer_3_j)
    self.score_i = tf.reshape(self.score_i, [-1, 1])
    self.score_j = tf.reshape(self.score_j, [-1, 1])
    self.p_and_n = tf.concat([self.score_i, self.score_j], axis=-1)
    print self.p_and_n.get_shape().as_list()


    # Step variable
    self.global_step = tf.Variable(0, trainable=False, name='global_step')
    self.global_epoch_step = \
        tf.Variable(0, trainable=False, name='global_epoch_step')
    self.global_epoch_step_op = \
        tf.assign(self.global_epoch_step, self.global_epoch_step+1)

    self.loss = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(
            logits=self.logits,
            labels=self.y)
        )

    trainable_params = tf.trainable_variables()
    self.opt = tf.train.GradientDescentOptimizer(learning_rate=self.lr)
    gradients = tf.gradients(self.loss, trainable_params)
    clip_gradients, _ = tf.clip_by_global_norm(gradients, 5)
    self.train_op = self.opt.apply_gradients(
        zip(clip_gradients, trainable_params), global_step=self.global_step)

What is the reference to the way you calculate AUC?

Hi,

I do not understand the codes on computing AUC score? https://github.com/zhougr1993/DeepInterestNetwork/blob/master/base_model/train.py#L26
Are there any references? Thanks in advance.

为什么要最后要再加一个bias呢？

您好，看了你写的代码，在模型训练和预测的时候：self.logits = i_b + d_layer_3_i和elf.logits_sub = tf.sigmoid(item_b[:predict_ads_num] + d_layer_3_sub)都额外加了一个偏差，TensorFlow的dense层不是有bias吗，为什么还有在最后输出的时候再加一个？
希望得到您的解答。

请问必须显存>=10G才能跑吗~

如题，谢谢

对attention方法体中一个细节的疑问

你好，我想提问一个问题在attention方法体中，因为针对每个用户的hist长度不同，所以最初取得是batch中最长的hist来构成[B, T]维的矩阵，于是对于一个用户u其hist长度不够T的时候，会构成一条(h1, h2, h3,... 0. 0. 0)的向量作为其历史行为，但是这里的0最终会取到第0号商品的emb
所以在attention过程中说有一个paddings = tf.one_like(outputs) *(-2 ** 32 +1 )这里不太理解。希望大神解惑，我私以为paddings = tf.zeros_like(outputs)会更合理些，若是为了符合outputs=tf.where(key_masks, outputs, paddings)这个选择的情况的话

Dice issue

注意到dice module 中的 tf.layers.batch_normalization training 是默认设置，为false
能解答下嘛

请问怎么使用新数据做线上预测？

about wide_deep model

hello,mr.zhou:

wide part

    d_layer_wide_i = tf.concat([tf.gather(u_emb, [0], axis=-1) * tf.gather(i_emb, [0], axis=-1),
                                tf.gather(u_emb, [-1], axis=-1) * tf.gather(i_emb, [-1], axis=-1),
                                tf.gather(u_emb, [hidden_units // 2], axis=-1) * tf.gather(i_emb,
                                                                                           [hidden_units // 2],
                                                                                           axis=-1)], axis=-1)

error: InvalidArgumentError (see above for traceback): indices[0] = -1 is not in [0, 128)
when i change -1 to 1 , it can run.
why ?

为什么GAUC的值一直为0？请问self.mf_auc = tf.reduce_mean(tf.to_float(x > 0))这一句代码的含义是？

关于实际问题中的序列构造问题

请问一下，电商场景中除了用户的点击序列，应该还有购买和收藏序列，然后一般还会按时间切分生产多个序列。这种情况下对所有的序列都用网络不同的attention结构么。还是说，其实只需要一个序列就好了。

attention's issue ?

Hi, zhou
In your original paper, local activation unit perform outer product between user's feature and AD's feature as following.

In your public code rep, the attention function , if i'm right, is the implementation of local activation unit. However, I found that there are no outer product operation in this function. If i'm not right, please tell me what's the implementation of local activate unit, or could you explain the main idea of this implementation and what's the equivalent replacement to outer product mentioned in local activation unit.
I'm looking forward to your reply. Thanks.

关于各个模型中的FM部分

比如DeepFM：
d_layer_fm_i = tf.concat([tf.reduce_sum(u_emb*i_emb, axis=-1, keep_dims=True), tf.gather(u_emb, [0], axis=-1) + tf.gather(i_emb, [0], axis=-1)], axis=-1)
这里把u_emb和i_emb直接相乘作为二次交叉项，u_emb[0]和i_emb[0]作为一次项。
我是不是可以理解为：这里的field只有u和i两项，如果有更多field，是不是需要逐个进行二次交叉？
u_emb[0]作为一次项是不是损失了一些信息？用reduce_sum以后的u_emb会不会更好？

同理PNN：
din_i = tf.concat([u_emb, i_emb, u_emb*i_emb], axis=-1)
PNN这里，对field进行二次交叉product，在有多个field时，是否也是需要逐个交叉？

关于DIN的model代码cate_count

cate_count 这里是什么意思？

关于常规AUC计算的问题。

代码中有两个AUC。普通的计算方式为
# sort by pred value, from small to big
arr = sorted(raw_arr, key=lambda d:d[2])

auc = 0.0
fp1, tp1, fp2, tp2 = 0.0, 0.0, 0.0, 0.0
for record in arr:
    fp2 += record[0] # noclick
    tp2 += record[1] # click
    auc += (fp2 - fp1) * (tp2 + tp1)
    fp1, tp1 = fp2, tp2

# if all nonclick or click, disgard
threshold = len(arr) - 1e-3
if tp2 > threshold or fp2 > threshold:
    return -0.5

if tp2 * fp2 > 0.0:  # normal auc
    return (1.0 - auc / (2.0 * tp2 * fp2))
else:
    return None

这个思路是求累积的梯形面积。可是为什么要用1.0 - auc / (2.0 * tp2 * fp2)。而不是auc / (2.0 * tp2 * fp2)呢？

attention_multi_items和attention

这两个函数各自用于什么？看了半天，一头雾水

DIN的输入只包括商品、类目以及历史行为商品吗？

刚看了一眼代码，模型的输入是 (u, i, j, hist_i, sl)，所以只包括用户的历史行为商品这一维特征是吗？历史行为过的品牌、类目可以加上么？该怎么拼接呢？