Giter VIP home page Giter VIP logo

attentional_factorization_machine's Introduction

attentional_factorization_machine

This is our implementation for the paper:

Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu and Tat-Seng Chua (2017). Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI, Melbourne, Australia, August 19-25, 2017.

We have additionally released our TensorFlow implementation of Factorization Machines under our proposed neural network framework.

Please cite our IJCAI'17 paper if you use our codes. Thanks!

Author: Xiangnan He ([email protected]) and Hao Ye ([email protected])

Environments

  • Tensorflow (version: 1.0.1)
  • numpy
  • sklearn

Dataset

We use the same input format as the LibFM toolkit (http://www.libfm.org/). In this instruction, we use MovieLens. The MovieLens data has been used for personalized tag recommendation, which contains 668,953 tag applications of users on movies. We convert each tag application (user ID, movie ID and tag) to a feature vector using one-hot encoding and obtain 90,445 binary features. The following examples are based on this dataset and it will be referred as ml-tag wherever in the files' name or inside the code. When the dataset is ready, the current directory should be like this:

  • code
    • AFM.py
    • FM.py
    • LoadData.py
  • data
    • ml-tag
      • ml-tag.train.libfm
      • ml-tag.validation.libfm
      • ml-tag.test.libfm

Quick Example with Optimal parameters

Use the following command to train the model with the optimal parameters:

# step into the code folder
cd code
# train FM model and save as pretrain file
python FM.py --dataset ml-tag --epoch 100 --pretrain -1 --batch_size 4096 --hidden_factor 256 --lr 0.01 --keep 0.7
# train AFM model using the pretrained weights from FM
python AFM.py --dataset ml-tag --epoch 100 --pretrain 1 --batch_size 4096 --hidden_factor [8,256] --keep [1.0,0.5] --lamda_attention 2.0 --lr 0.1

The instruction of commands has been clearly stated in the codes (see the parse_args function).

The current implementation supports regression classification, which optimizes RMSE.

Performance Comparison

Parameters

For the sake of a quick demonstration for the improvement of our AFM model compared to original FM, we set the dimension of the embedding factor to be 16 (instead of 256 in our paper), and epoch as 20.

Train

Step into the code folder and train FM and AFM as follows. This will start to train our AFM model on the dataset frappe based on the pretrained model of FM. The parameters have been initialized optimally according to our experiments. It will loop 20 epochs and print the best epoch depending on the validation result.

# step into the code folder
cd code
# train FM model with optimal parameters
python FM.py --dataset ml-tag --epoch 20 --pretrain -1 --batch_size 4096 --hidden_factor 16 --lr 0.01 --keep 0.7
# train AFM model with optimal parameters
python AFM.py --dataset ml-tag --epoch 20 --pretrain 1 --batch_size 4096 --hidden_factor [16,16] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1

After the trainning processes finish, the trained models will be saved into the pretrain folder, which should be like this:

  • pretrain
    • afm_ml-tag_16
      • checkpoint
      • ml-tag_16.data-00000-of-00001
      • ml-tag_16.index
      • ml-tag_16.meta
    • fm_ml-tag_16
      • checkpoint
      • ml-tag_16.data-00000-of-00001
      • ml-tag_16.index
      • ml-tag_16.meta

Evaluate

Now it's time to evaluate the pretrained models with the test datasets, which can be done by running AFM.py and FM.py with --process evaluate as follows:

# evaluate the pretrained FM model
python FM.py --dataset ml-tag --epoch 20 --batch_size 4096 --lr 0.01 --keep 0.7 --process evaluate
# evaluate the pretrained AFM model
python AFM.py --dataset ml-tag --epoch 20 --pretrain 1 --batch_size 4096 --hidden_factor [16,16] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1 --process evaluate

Last Update Date: Aug 2, 2017

attentional_factorization_machine's People

Contributors

hexiangnan avatar tonyfd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attentional_factorization_machine's Issues

dropout in validation/evaluation

非常感谢分享这份代码,
能否解释一下第134和143行中, dropout_keep这个参数在init_graph的时候直接传入tf.nn.dropout

glorot初始化

我认为直接使用Xavier初始化网络参数会有一点问题,因为网络的真实输入输出并不是[2embed_size, 1],而是[n^2embed_size, n^2/2],从这个角度来看,为了保证输出与梯度的分布接近标准正态,应该采用3维的xavier [n^2/2, 2embed_size, 1]来进行初始化。当然,从网络的角度来说,输入输出还是[2embed_size, 1],因此glorot的方法可能并不完全适用,更加合理的初始化可能介于2embed_size, 1]与[n^2/2, 2embed_size, 1]之间,目前的初始化方法容易nan可能是因为方差过大导致的,这一点可以通过调节问题粗暴控制,但是如果在最开始的几个step就产生梯度爆炸的话,调节温度也救不回来

实数特征支持问题

hi 拜读了你的代码,似乎不支持实值特征,要求特征值 为0/1, 但fm 理论上是支持实值特征的, 请问这么设计的原因是什么?

关于AFM不能复现论文实验结果

何老师,您好!我在使用这份代码的时候,对于FM可以复现出论文的结果;但对于AFM,目前在val上取得的最好RMSE为 0.4639,对应参数为:--epoch 100 --pretrain 1 --batch_size 4096 --hidden_factor [8,256] --keep [1.0,0.5] --lamda_attention 2.0 --lr 0.01 --batch_norm 0,而论文中得到的最好RMSE为0.433左右,请问我有哪些地方处理得有问题吗?

reduce_sum和softmax有bug?

            self.attention_relu = tf.reduce_sum(tf.multiply(self.weights['attention_p'], tf.nn.relu(self.attention_mul + \
                self.weights['attention_b'])), 2, keep_dims=True) # None * (M'*(M'-1)) * 1
            self.attention_out = tf.nn.softmax(self.attention_relu)

我认为keep_dims应该为False.

keep_dims如果为True, attention_relu的shape是(batch, m*(m-1)/2, 1)
接下来的softmax会在attention_relu的最后一个axis上做归一化, 这样是错误的.

关于pretrain with FM的疑惑

非常感谢您的分享!
看了您的文章和代码,有一点小疑惑:
文章中提到:For Wide&Deep, DeepCross and AFM, we find that pre-training their feature embeddings with FM leads to a lower RMSE than a random initialization.As such, we report their performance with pre-training.
这个在命令行中具体是怎样实现预训练的呢?
按照Readme中的说明输入AFM.py的命令行后会报错,因为找不到预训练模型,即使将fm预训练模型放入pretrain/afm_ml-tag_16文件夹中也会因meta文件中参数不匹配而无法使用,请问该怎么处理呢?
(另外,因为FM默认hidden_factor=16,那AFM命令行中--hidden_factor [8,256]是不是应该改为--hidden_factor [16,16]?)

train FM model and save as pretrain file

python FM.py --dataset ml-tag --epoch 100 --pretrain -1 --batch_size 4096 --lr 0.01 --keep 0.7

train AFM model using the pretrained weights from FM

python AFM.py --dataset ml-tag --epoch 100 --pretrain 1 --batch_size 4096 --hidden_factor [8,256] --keep [1.0,0.5] --lamda_attention 2.0 --lr 0.1

文件打不开

哥们,链接404错误。你说的那些数据是怎么获取得到的。

数据处理代码

何老师,请问数据预处理部分是怎么实现的呢(train、validation、test三个文件怎么处理得到的),可以分享下代码吗?
邮箱:[email protected]

OOM Error when I am trying to run FM on dataset ml-tag for the best performance

Here is my command:
python FM.py --dataset ml-tag --epoch 100 --pretrain -1 --batch_size 4096 --hidden_factor 256 --lr 0.01 --keep 0.7

But I got a Out Of Memory Error as follows:
OOM when allocating tensor with shape[1404801,256]

What's strange is that when I run AFM, the command works just fine. And the command with frappe dataset works fine as well.
python AFM.py --dataset ml-tag --epoch 100 --pretrain 2 --batch_size 4096 --hidden_factor [256,256] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1

I looked it up, and found that 1404801 is the number of training samples. I think it's because of this code.
self.summed_features_emb = tf.reduce_sum(self.nonzero_embeddings, 1, keep_dims=True) # None * 1 * K

Still trying to figure out what went wrong. Hope someone might give me a clue?

Difference between FM and AFM is not as pronounced as in the original paper

Hi, first at all thanks for sharing the code of your AFM implementation. We tried to replicate the results from the original paper. We executed the code exactly as was given here in this github repository.

However after 20 epochs of training the difference between FM and AFM for the MovieLens dataset was not as large as shown in Figure 5 in the paper. Here are the training results for FM and AFM:

  • python FM.py --dataset ml-tag --epoch 20 --pretrain -1 --batch_size 4096 --hidden_factor 256 --lr 0.01 --keep 0.7
    Best Iter(validation)= 17 train = 0.0838, valid = 0.4855 [891.8 s]

  • python AFM.py --dataset ml-tag --epoch 20 --pretrain 1 --batch_size 4096 --hidden_factor [8,256] --keep [1.0,0.5] --lamda_attention 2.0 --lr 0.1
    Best Iter(validation)= 20 train = 0.1038, valid = 0.4780 [1587.2 s]

In Figure 5 in the paper the RMSE of AFM after 20 epochs are below 0.45.

We also tried to reimplement the AFM model. With the help of the details given in this github we were already able to improve the performance of the AFM model significantly. Did we miss any other details or is the code in this github repository different from the code used in the paper?

We would be very thankful if you can help us since we currently try to include the AFM model into our production pipeline.

About the training script

Thanks for your sharing.
I found two issues while trying to run the training script (at Performance Comparison->Train):

python FM.py --dataset ml-tag --epoch 20 --pretrain -1 --batch_size 4096 --lr 0.01 --keep 0.7

It seems an additional argument --hidden_factor 16 is required to match the description we set the dimension of the embedding factor to be 16.

And

python AFM.py --dataset ml-tag --epoch 20 --pretrain 1 --batch_size 4096 --hidden_factor [16,16] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1

This script do not save the model to file and will cause OSError: File ../pretrain/afm_ml-tag_16/ml-tag_16.meta does not exist. while evaluating. I change --pretrain 1 to --pretrain 2 then it works fine.

Are these modifications correct? Or I am misreading something?

Why there is little improvement of attenion model in my experiment ?

I tried the quick example you gave:


python FM.py --dataset ml-tag --epoch 20 --pretrain -1 --batch_size 4096 --hidden_factor 16 --lr 0.01 --keep 0.7

#params: 1537566
Init: train=1.0000, validation=1.0000 [52.3 s]
Epoch 1 [6.2 s] train=0.5402, validation=0.6012 [10.7 s]
Epoch 2 [7.2 s] train=0.4903, validation=0.5775 [12.5 s]
Epoch 3 [6.9 s] train=0.4604, validation=0.5673 [11.3 s]
Epoch 4 [6.2 s] train=0.4376, validation=0.5611 [5.9 s]
Epoch 5 [6.8 s] train=0.4201, validation=0.5572 [5.2 s]
Epoch 6 [6.8 s] train=0.4054, validation=0.5538 [5.3 s]
Epoch 7 [6.7 s] train=0.3932, validation=0.5513 [5.3 s]
Epoch 8 [7.0 s] train=0.3831, validation=0.5497 [6.0 s]
Epoch 9 [6.3 s] train=0.3743, validation=0.5483 [5.8 s]
Epoch 10 [6.4 s] train=0.3668, validation=0.5471 [6.1 s]
Epoch 11 [7.0 s] train=0.3599, validation=0.5461 [5.4 s]
Epoch 12 [6.9 s] train=0.3538, validation=0.5456 [5.3 s]
Epoch 13 [6.8 s] train=0.3485, validation=0.5446 [6.0 s]
Epoch 14 [6.1 s] train=0.3439, validation=0.5440 [6.0 s]
Epoch 15 [6.2 s] train=0.3398, validation=0.5435 [5.7 s]
Epoch 16 [6.8 s] train=0.3358, validation=0.5432 [5.3 s]
Epoch 17 [6.7 s] train=0.3323, validation=0.5427 [5.9 s]
Epoch 18 [6.1 s] train=0.3289, validation=0.5424 [5.9 s]
Epoch 19 [6.4 s] train=0.3257, validation=0.5420 [5.8 s]
Epoch 20 [6.2 s] train=0.3230, validation=0.5416 [5.7 s]
Save model to file as pretrain.
Best Iter(validation)= 20 train = 0.3230, valid = 0.5416 [353.0 s]


python AFM.py --dataset ml-tag --epoch 20 --pretrain 1 --batch_size 4096 --hidden_factor [16,16] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1

#params: 1537870
Init: train=0.8130, validation=0.8226 [7.9 s]
Epoch 1 [7.5 s] train=0.5058, validation=0.5768 [9.2 s]
Epoch 2 [8.2 s] train=0.4604, validation=0.5640 [11.3 s]
Epoch 3 [8.4 s] train=0.4319, validation=0.5570 [9.4 s]
Epoch 4 [7.8 s] train=0.4121, validation=0.5529 [7.2 s]
Epoch 5 [8.5 s] train=0.3975, validation=0.5504 [7.3 s]
Epoch 6 [8.5 s] train=0.3850, validation=0.5480 [7.9 s]
Epoch 7 [8.7 s] train=0.3754, validation=0.5464 [10.4 s]
Epoch 8 [8.0 s] train=0.3674, validation=0.5457 [8.3 s]
Epoch 9 [9.1 s] train=0.3619, validation=0.5446 [8.9 s]
Epoch 10 [9.0 s] train=0.3568, validation=0.5436 [9.8 s]
Epoch 11 [8.6 s] train=0.3521, validation=0.5431 [10.2 s]
Epoch 12 [9.6 s] train=0.3480, validation=0.5429 [10.8 s]
Epoch 13 [8.6 s] train=0.3443, validation=0.5423 [9.5 s]
Epoch 14 [7.8 s] train=0.3407, validation=0.5415 [7.6 s]
Epoch 15 [7.9 s] train=0.3388, validation=0.5413 [7.3 s]
Epoch 16 [8.2 s] train=0.3358, validation=0.5412 [7.4 s]
Epoch 17 [8.3 s] train=0.3339, validation=0.5405 [16.8 s]
Epoch 18 [7.6 s] train=0.3315, validation=0.5405 [7.5 s]
Epoch 19 [8.4 s] train=0.3305, validation=0.5402 [7.0 s]
Epoch 20 [8.4 s] train=0.3278, validation=0.5398 [10.9 s]
Best Iter(validation)= 20 train = 0.3278, valid = 0.5398 [378.1 s]


It only has 0.0018 improvements in RMSE.
Then I did an experiment on Frappe dataset. I modified the parameters according to the paper in FM and used the parameters you gave in another question.


python FM.py --dataset frappe --epoch 100 --pretrain -1 --batch_size 128 --hidden_factor 256 --lr 0.01 --keep 0.8
#params: 1383175
Init: train=0.9998, validation=0.9998 [42.3 s]
Epoch 1 [6.3 s] train=0.3417, validation=0.4357 [3.0 s]
Epoch 2 [6.1 s] train=0.2542, validation=0.3977 [1.6 s]
Epoch 3 [6.0 s] train=0.2135, validation=0.3938 [1.7 s]
Epoch 4 [6.1 s] train=0.1656, validation=0.3652 [1.7 s]
Epoch 5 [8.0 s] train=0.1428, validation=0.3566 [1.8 s]
Epoch 6 [6.5 s] train=0.1292, validation=0.3603 [1.7 s]
Epoch 7 [6.4 s] train=0.1181, validation=0.3590 [1.8 s]
Epoch 8 [6.5 s] train=0.1056, validation=0.3529 [1.6 s]
Epoch 9 [6.4 s] train=0.0977, validation=0.3509 [1.7 s]
Epoch 10 [6.5 s] train=0.0927, validation=0.3490 [1.7 s]
Epoch 11 [6.3 s] train=0.0891, validation=0.3493 [1.5 s]
Epoch 12 [6.3 s] train=0.0834, validation=0.3491 [1.7 s]
Epoch 13 [7.6 s] train=0.0799, validation=0.3470 [1.9 s]
Epoch 14 [6.3 s] train=0.0789, validation=0.3483 [1.8 s]
Epoch 15 [6.5 s] train=0.0740, validation=0.3425 [1.6 s]
Epoch 16 [6.0 s] train=0.0762, validation=0.3460 [1.7 s]
Epoch 17 [6.0 s] train=0.0719, validation=0.3460 [1.8 s]
Epoch 18 [6.0 s] train=0.0710, validation=0.3436 [1.7 s]
Epoch 19 [6.1 s] train=0.0671, validation=0.3430 [1.8 s]
Epoch 20 [6.3 s] train=0.0665, validation=0.3448 [1.8 s]
....
Epoch 90 [6.4 s] train=0.0416, validation=0.3319 [1.6 s]
Epoch 91 [6.3 s] train=0.0413, validation=0.3318 [1.7 s]
Epoch 92 [6.5 s] train=0.0412, validation=0.3331 [1.7 s]
Epoch 93 [6.8 s] train=0.0413, validation=0.3318 [1.8 s]
Epoch 94 [6.8 s] train=0.0427, validation=0.3316 [1.6 s]
Epoch 95 [7.3 s] train=0.0409, validation=0.3323 [1.9 s]
Epoch 96 [7.8 s] train=0.0404, validation=0.3315 [1.8 s]
Epoch 97 [8.2 s] train=0.0418, validation=0.3315 [1.6 s]
Epoch 98 [6.9 s] train=0.0407, validation=0.3318 [1.8 s]
Epoch 99 [6.7 s] train=0.0412, validation=0.3314 [1.7 s]
Epoch 100 [8.0 s] train=0.0399, validation=0.3311 [2.2 s]
Save model to file as pretrain.
Best Iter(validation)= 89 train = 0.0403, valid = 0.3311 [876.8 s]


python AFM.py --keep '[1.0,0.8]' --lamda_attention 16 --hidden_factor '[256,256]' --batch_size 128 --dataset frappe --pretrain 1 --epoch 100 --valid_dimen 10 --lr 0.015

Init: train=0.7971, validation=0.8043 [7.8 s]
Epoch 1 [16.7 s] train=0.3162, validation=0.4117 [6.6 s]
Epoch 2 [16.2 s] train=0.2379, validation=0.3756 [6.5 s]
Epoch 3 [16.9 s] train=0.1935, validation=0.3594 [7.8 s]
Epoch 4 [17.4 s] train=0.1625, validation=0.3507 [6.6 s]
Epoch 5 [18.5 s] train=0.1399, validation=0.3454 [9.0 s]
Epoch 6 [18.7 s] train=0.1243, validation=0.3434 [6.4 s]
Epoch 7 [16.2 s] train=0.1121, validation=0.3414 [6.5 s]
Epoch 8 [16.4 s] train=0.1022, validation=0.3397 [6.4 s]
Epoch 9 [17.4 s] train=0.0935, validation=0.3384 [7.1 s]
Epoch 10 [16.3 s] train=0.0874, validation=0.3365 [7.8 s]
Epoch 11 [19.0 s] train=0.0806, validation=0.3365 [6.8 s]
Epoch 12 [16.2 s] train=0.0761, validation=0.3362 [6.3 s]
Epoch 13 [16.1 s] train=0.0721, validation=0.3352 [6.4 s]
Epoch 14 [16.1 s] train=0.0690, validation=0.3356 [6.3 s]
Epoch 15 [16.1 s] train=0.0652, validation=0.3353 [6.3 s]
Epoch 16 [16.2 s] train=0.0629, validation=0.3351 [6.4 s]
Epoch 17 [16.2 s] train=0.0613, validation=0.3340 [6.3 s]
Epoch 18 [16.2 s] train=0.0587, validation=0.3347 [6.4 s]
Epoch 19 [16.2 s] train=0.0578, validation=0.3338 [6.3 s]
Epoch 20 [16.2 s] train=0.0561, validation=0.3331 [6.6 s]
Epoch 21 [17.0 s] train=0.0540, validation=0.3344 [7.8 s]
Epoch 22 [17.5 s] train=0.0524, validation=0.3333 [8.8 s]
Epoch 23 [17.6 s] train=0.0514, validation=0.3336 [6.6 s]
Epoch 24 [16.2 s] train=0.0501, validation=0.3331 [6.4 s]
Epoch 25 [16.2 s] train=0.0501, validation=0.3339 [6.6 s]
Epoch 26 [16.2 s] train=0.0493, validation=0.3328 [6.5 s]
Epoch 27 [16.8 s] train=0.0483, validation=0.3337 [7.0 s]
Epoch 28 [18.0 s] train=0.0475, validation=0.3327 [6.4 s]
Epoch 29 [16.5 s] train=0.0471, validation=0.3331 [9.7 s]
Epoch 30 [19.3 s] train=0.0466, validation=0.3328 [6.3 s]
Epoch 31 [16.2 s] train=0.0461, validation=0.3326 [6.6 s]
Epoch 32 [16.2 s] train=0.0452, validation=0.3328 [6.5 s]
Epoch 33 [16.2 s] train=0.0447, validation=0.3325 [6.2 s]
Epoch 34 [16.2 s] train=0.0447, validation=0.3329 [6.3 s]
Epoch 35 [16.2 s] train=0.0437, validation=0.3324 [6.5 s]
Epoch 36 [16.2 s] train=0.0436, validation=0.3322 [6.4 s]
Epoch 37 [16.4 s] train=0.0429, validation=0.3317 [6.3 s]
Epoch 38 [16.2 s] train=0.0420, validation=0.3321 [7.2 s]
Epoch 39 [17.4 s] train=0.0425, validation=0.3317 [6.3 s]
Epoch 40 [18.8 s] train=0.0418, validation=0.3315 [8.3 s]
Epoch 41 [18.9 s] train=0.0411, validation=0.3317 [7.0 s]
Epoch 42 [16.3 s] train=0.0412, validation=0.3316 [6.5 s]
Epoch 43 [16.2 s] train=0.0404, validation=0.3317 [6.5 s]
Epoch 44 [16.3 s] train=0.0409, validation=0.3319 [6.6 s]
Epoch 45 [16.4 s] train=0.0408, validation=0.3316 [6.6 s]
Epoch 46 [16.4 s] train=0.0396, validation=0.3313 [6.4 s]
Epoch 47 [16.2 s] train=0.0395, validation=0.3313 [6.3 s]
Epoch 48 [16.1 s] train=0.0390, validation=0.3316 [6.2 s]
Epoch 49 [16.1 s] train=0.0387, validation=0.3310 [6.3 s]
Epoch 50 [16.1 s] train=0.0399, validation=0.3313 [6.4 s]
Epoch 51 [16.1 s] train=0.0387, validation=0.3312 [6.2 s]
Epoch 52 [16.0 s] train=0.0390, validation=0.3317 [6.2 s]
Epoch 53 [16.0 s] train=0.0387, validation=0.3317 [6.2 s]
Epoch 54 [16.3 s] train=0.0384, validation=0.3310 [6.2 s]
Epoch 55 [16.0 s] train=0.0379, validation=0.3311 [6.5 s]
Epoch 56 [16.3 s] train=0.0384, validation=0.3310 [6.3 s]
Epoch 57 [16.0 s] train=0.0383, validation=0.3311 [6.3 s]
Epoch 58 [16.0 s] train=0.0373, validation=0.3311 [6.5 s]
Epoch 59 [16.1 s] train=0.0374, validation=0.3306 [6.2 s]
Epoch 60 [16.1 s] train=0.0371, validation=0.3308 [6.3 s]
Epoch 61 [16.1 s] train=0.0372, validation=0.3313 [6.4 s]
Epoch 62 [16.1 s] train=0.0366, validation=0.3305 [6.3 s]
Epoch 63 [16.2 s] train=0.0368, validation=0.3308 [6.3 s]
Epoch 64 [16.3 s] train=0.0365, validation=0.3307 [6.6 s]
Epoch 65 [18.0 s] train=0.0365, validation=0.3304 [6.7 s]
Epoch 66 [19.3 s] train=0.0362, validation=0.3309 [8.2 s]
Epoch 67 [16.5 s] train=0.0359, validation=0.3302 [6.2 s]
Epoch 68 [16.1 s] train=0.0360, validation=0.3303 [6.4 s]
Epoch 69 [16.0 s] train=0.0359, validation=0.3306 [6.2 s]
Epoch 70 [16.2 s] train=0.0360, validation=0.3303 [6.2 s]
Epoch 71 [16.4 s] train=0.0358, validation=0.3304 [6.2 s]
Epoch 72 [16.3 s] train=0.0359, validation=0.3305 [6.4 s]
Epoch 73 [16.3 s] train=0.0359, validation=0.3299 [6.1 s]
Epoch 74 [16.2 s] train=0.0354, validation=0.3303 [6.3 s]
Epoch 75 [16.2 s] train=0.0356, validation=0.3303 [6.6 s]
Epoch 76 [16.5 s] train=0.0357, validation=0.3305 [6.6 s]
Epoch 77 [16.2 s] train=0.0351, validation=0.3302 [6.4 s]
Epoch 78 [16.2 s] train=0.0351, validation=0.3296 [6.2 s]
Epoch 79 [16.1 s] train=0.0345, validation=0.3301 [6.2 s]
Epoch 80 [16.1 s] train=0.0342, validation=0.3297 [6.5 s]
Epoch 81 [16.3 s] train=0.0354, validation=0.3299 [6.3 s]
Epoch 82 [16.1 s] train=0.0349, validation=0.3300 [6.2 s]
Epoch 83 [16.1 s] train=0.0344, validation=0.3298 [6.5 s]
Epoch 84 [16.2 s] train=0.0342, validation=0.3296 [6.3 s]
Epoch 85 [16.1 s] train=0.0341, validation=0.3295 [6.3 s]
Epoch 86 [16.0 s] train=0.0341, validation=0.3297 [6.3 s]
Epoch 87 [16.3 s] train=0.0343, validation=0.3299 [6.5 s]
Epoch 88 [16.2 s] train=0.0338, validation=0.3297 [6.3 s]
Epoch 89 [16.1 s] train=0.0335, validation=0.3295 [6.1 s]
Epoch 90 [16.0 s] train=0.0339, validation=0.3294 [6.4 s]
Epoch 91 [16.4 s] train=0.0339, validation=0.3296 [6.4 s]
Epoch 92 [16.1 s] train=0.0332, validation=0.3294 [6.2 s]
Epoch 93 [16.1 s] train=0.0333, validation=0.3295 [6.5 s]
Epoch 94 [16.3 s] train=0.0329, validation=0.3293 [6.3 s]
Epoch 95 [16.0 s] train=0.0335, validation=0.3292 [6.2 s]
Epoch 96 [16.9 s] train=0.0335, validation=0.3295 [6.4 s]
Epoch 97 [17.6 s] train=0.0330, validation=0.3297 [6.2 s]
Epoch 98 [16.0 s] train=0.0330, validation=0.3295 [6.3 s]
Epoch 99 [16.0 s] train=0.0335, validation=0.3296 [6.4 s]
Epoch 100 [16.2 s] train=0.0334, validation=0.3293 [7.4 s]
Best Iter(validation)= 95 train = 0.0335, valid = 0.3292 [2333.9 s]


In this experiment, it still only has 0.0019 improvement in RMSE.

In fact, I also tried an experiment adding regularization (lambda 0.1) in FM model to avoided over-fitting and got 0.1318 RMSE in training data and 0.3468 RMSE in test data. Although AFM got 0.3372 RMSE in test data (0.0096 improvements), it is still far away from our expectation.

I am wondering why the result of AFM relies on a good pre-trained FM model and what kind of FM model is suitable to get such improvement?

Thanks.

AFM训练中出现nan

AFM训练起来很容易nan,请问您遇到过这种情况吗?对于调参数有什么建议?哪些参数比较敏感?

关于attention_score 和 interaction_score 的问题??

您好,请问一下:

最终得到attention_score表示的是交叉特征重要性,但是interaction_score表示的是什么呢?因为最终这两个score是相乘的,interaction_score还有可能是负数。

比如下面截图的paper中,user-tag 和user-item中的interaction_score都是负的,那么最终是看attention_score大小来决定哪个交叉特征重要还是看attention_score*interaction_score谁大谁重要呢?

image

overfitting? or not

python AFM.py --dataset ml-tag --epoch 20 --pretrain 0 --batch_size 4096 --hidden_factor '[16,16]' --keep '[1.0,0.5]' --lamda_attention 100.0 --lr 0.1
#params: 1537870

Init: 	 train=1.0000, validation=1.0000 [3.2 s]
Epoch 1 [4.3 s]	train=0.5210, validation=0.5746 [4.4 s]
Epoch 2 [4.4 s]	train=0.4683, validation=0.5514 [4.3 s]
Epoch 3 [4.8 s]	train=0.4326, validation=0.5401 [4.5 s]
Epoch 4 [5.0 s]	train=0.4055, validation=0.5321 [4.7 s]
Epoch 5 [5.4 s]	train=0.3874, validation=0.5274 [5.2 s]
Epoch 6 [5.0 s]	train=0.3712, validation=0.5232 [5.3 s]
Epoch 7 [5.0 s]	train=0.3558, validation=0.5204 [4.9 s]
Epoch 8 [5.3 s]	train=0.3460, validation=0.5181 [4.9 s]
Epoch 9 [5.4 s]	train=0.3375, validation=0.5163 [5.1 s]
Epoch 10 [5.4 s]	train=0.3287, validation=0.5151 [5.0 s]
Epoch 11 [5.4 s]	train=0.3242, validation=0.5136 [5.0 s]
Epoch 12 [5.1 s]	train=0.3168, validation=0.5126 [5.5 s]
Epoch 13 [5.1 s]	train=0.3119, validation=0.5118 [5.1 s]
Epoch 14 [5.5 s]	train=0.3074, validation=0.5113 [5.1 s]
Epoch 15 [5.5 s]	train=0.3068, validation=0.5106 [5.2 s]
Epoch 16 [5.5 s]	train=0.3022, validation=0.5103 [5.3 s]
Epoch 17 [5.6 s]	train=0.2984, validation=0.5098 [5.3 s]
Epoch 18 [5.2 s]	train=0.2964, validation=0.5092 [5.7 s]
Epoch 19 [5.3 s]	train=0.2934, validation=0.5093 [5.3 s]
Epoch 20 [5.7 s]	train=0.2908, validation=0.5089 [5.3 s]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.