kang205 / sasrec Goto Github PK

View Code? Open in Web Editor NEW

690.0 690.0 148.0 17.23 MB

SASRec: Self-Attentive Sequential Recommendation

License: Apache License 2.0

Python 100.00%

deep-learning recommender-system

sasrec's People

Contributors

Stargazers

Watchers

Forkers

mindis shubhampachori12110095 jdc08161063 yyht zhrlove voladorlu azizilyosov cf904c27 bulba826 nguyenvo09 maryanmorel nickyongzhang hearfishle shuyangli94 chocoluffy jufangshen hui-li imkant qianrenjian towardsun romeowen tsarvy chritter danifree ffffffffire nullees qianyaoyy ccdllyy fishexpert wuqianliangsresearch gy1900 tobyge marymijin reverliu wangyoucao douboo swan815 saurabh3949 liuyn0505 sentiment101 lindsayxx crystal22 taolian marsstones xrosliang yonghangzhou arisohn xiaoliang8006 mysqlsc yu3401 russellkim larspendragon gocaption sandy4321 furongpeng joetag mvijaikumar maosengshulei veralily herrzyz huangniu1124 helenligit ericsimonzhu gbordyugov beathahahaha jzdsml ybling boluochuile duanchao tyacui woolr 623851394 jjjjjie vgeek-z supzq zhaoyuecheng shenyi0516 florianscheidl ctk117 qyq-bot naquiao shkim1980 juyongjiang shaohong352 zhaosiheng willowxy luckdog1 waston-li javi2481 abis330 llxsd bupt-yxy zhangqianjin godofrap hyacinthschatten jhhugo eperrier liliuhaha umapornp zeng1028796085

sasrec's Issues

key_masks

"key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)" at line 185 in modules.py indicates that the key_masks depend on the embedding of those keys, but why not use the original "key_masks = tf.sequence_mask(keys_length, tf.shape(keys)[1]) # (N, T_k)"?

help,please

line 199 of modules.py : outputs *= query_masks # broadcasting. (N, T_q, C)
I think,is it outputs = query_masks # broadcasting. (hN, T_q, C)?

Why put the test[u][0] into the candidate seqences?

In the evaluate function, it makes a item_index list and put the test[u][0] in it.
What I consider is that the test[u][0] should be what we want to predict, but in this way, the model knows it should predict from the possibility of these candidates, including the one we want to predict.
Is this a kind of data leaking? Or did I misunderstand something?

another key_mask minor bug

SASRec/modules.py

Line 185 in d0b823a

key_masks = tf.sign(tf.reduce_sum(tf.abs(keys, axis=-1))) # (N, T_k)

tf.abs has no axis parameter, should be tf.reduce_sum(tf.abs(keys), axis=-1)

cuda error

Hi, thanks for your excellent work.
When i run the code, the following error occurred
E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2021-07-22 23:05:32.120816: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details

tensorflow version = 1.12.0
python version=2.7.18
Looking forward to your reply!

Problem with num_batch

In your code, the num_batch is calculated by below way:
num_batch = len(user_train) / args.batch_size
But it could cause error in
for step in tqdm(range(num_batch), total=num_batch, ncols=70, leave=False, unit='b')
Because

Traceback (most recent call last):
File "main.py", line 61, in
for step in tqdm(range(num_batch), total=num_batch, ncols=70, leave=False, unit='b'):
TypeError: 'float' object cannot be interpreted as an integer

Could you provide the original data of Amazon dataset

Thanks to your outstanding work. But I still have some question about data.
When I open the Amazon dataset website, I find that the data have been updated in 2018. If I try to use preprocess code in new dataset of beauty with 5 core set (in your code) , only a few thousand records are kept, Much less than the preprocessed data in your Repositories.
I need to re-preprocess original data because of my model need to use time-stamp data, not only the interaction order.
If you could provide the original amazon data and the preprocess code(for amz game), I will be very appreciate!

why the causality is True

self.test_logits = self.test_logits[:, -1, :] Wrong if sequence not full?

Selecting the last time step in the sequence makes no sense when the sequence doesn't reach maxlen. So is this a bug that would potentially hurt the test performance?

How to set the amount of negative samples?

Quote the paper:

For each user u, we randomly sample 100 negative items, and rank these items
with the ground-truth item. Based on the rankings of these 101 items, Hit@10 and NDCG@10 can be evaluated.

How could I set the number of negative items in the code?
It seems the (pos, neg) pairs are generated independently.

SASRec/sampler.py

Line 28 in e373896

if nxt != 0: neg[idx] = random_neq(1, itemnum + 1, ts)

will it work on Windows?

how to implement Caser?

Hi, Caser uses Softmax to get the interaction probability of each item. In the experiment of this paper, you select 100 negative samples to test. I would like to know how you deal with it. Thank you very much.

Questions about Performance of Caser

I modified the code based on https://github.com/graytowne/caser_pytorch and ran it on ML-1M dataset but the end result is not ideal.

The summary of the running results is as follows:
hit@10:0.622 , NDCG@10 = 0.481
The above evaluation metrics are much lower than those reported in the paper：
Hit@10 = 0.7886, NDCG@10 = 0.5538

I was wondering whether the difference was made due to my negligence. Could you please share your code. Looking forward to your reply! Thanks!

Need help pls

I my name is Mauro, and I am a computer science student. I am trying to run your code of SASRec, but I have troubles due to the versions of tensorflow, could be possible to you, make a running of it using a lighter version of the sport amazon revies dataset (that i will give you, it is just 60 MB). Your help will be really precious to me. If you want to help me you could send me an e mail to this account [email protected]

WarpSampler

SASRec performance is inferior than BPR-MF!

I find the performance of SASRec is inferior than BPR-MF on some sparse dataset like Amazon beauty, game, and steam dataset. Is it normal? I use the BPR-MF baseline form https://github.com/duxy-me/ConvNCF.

why num_head=1?

For multi-head attention module, why you set num_head=1 according to args in main.py? then it is not using multi-head structure of the attention block, is it?

Thanks,

Problem with multiprocessing(Sampler)

When I run program I got following error.

 An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

To fix this issue I added
if __name=="__main__": main()
but because of close function in sampler, program is closed without training
I used python 3.6.
Thanks a lot beforehand !

Training method of baseline GRU4Rec

Could you tell me the detail of GRU4Rec?
For example, GRU4Rec model is trained by BPTT or normal BP?
Additionally, could you share the code of baselines?

Why does the queries are normalized while keys not when using multihead_attention?

 # Self-attention
self.seq = multihead_attention(queries=normalize(self.seq),
                               keys=self.seq,
                               num_units=args.hidden_units,
                               num_heads=args.num_heads,
                               dropout_rate=args.dropout_rate,
                               is_training=self.is_training,
                               causality=True,
                               scope="self_attention")

https://github.com/kang205/SASRec/blob/e3738967fddab206d6eeb4fda433e7a7034dd8b1/model.py#L54

Thank you!

rank = predictions.argsort().argsort()[0] ?

SASRec/util.py

Line 119 in 641c378

rank = predictions.argsort().argsort()[0]

Why do you do argsort() twice?

SASRec/util.py

Line 116 in 641c378

predictions = -model.predict(sess, [u], [seq], item_idx)

why do you multiply -1 to the prediction?

Preprocessing for ml1m?

Hello.

Your data preprocessing code seems to only regard beauty reviews.

Can you also provide the code for preprocessing ml1m data?

Thank you.

How do we calculate AUC for SASRec?

Hello!

I am currently working on CTR and Sequential Recommendation Tasks. I see many recent papers like TallRec uses AUC to compare their result with SASRec. I am really curious as to how can I calculate AUC for SASRec. In my understanding SASRec require metrics like NDCG@k and HitRate@k. It would be really helpful if you could shed some light on it.

Thanks and Regards
Millennium Bismay

提醒大家一个坑，range里边要为整数：num_batch = math.floor(len(user_train) / args.batch_size)。

如题

Training checkpoint

Can you save the weight of the model (checkpoint) after some epochs?

'review' dataset not correct

In the 'steam_reviews' dataset, the 'recommend' property of all samples is 'True'.

Could that be caused by crawling bugs during data collection?

http://cseweb.ucsd.edu/~wckang/steam_reviews.json.gz

Application for adding third-party re-implemenation link on README

Hi Team, @kang205
I had prepared a PyTorch version of SASRec based on your tf implementation which behaves almost the same:

https://github.com/pmixer/SASRec.pytorch

could u pls consider adding the Third-party Re-implementation section like in https://github.com/wy1iu/LargeMargin_Softmax_Loss to let people know the work?

I hope to get the PyTorch implementation useful for wider audience and have someone help checking why it converges bit slower than tf implementation in model training 🤣

Regards,
Zan

`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks?

SASRec/modules.py

Line 185 in 641c378

key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)

Hi Guys,
I'm reading the code for porting the implementation to PyTorch for personal use, the code looks well written and documented, thx for the great work :)

Moreover, as self attention module is borrowed from another project, some details may not be 100% right according to my observation(despite some magic numbers like -2^32+1 for enforcing softmax to output 0 for the entry which kills code readability), as an example, for query and key mask generation, the code used tf.sign+tf.abs+tf.reduce_sum combination for generating the masks but the order seems slightly wrong, as we are trying to mask the query/key of all 0 values in channel/embedding-dim, the right way might be firstly apply abs, then do reduce_sum and finally use sign to generate the results, but current implementation firstly use reduce_sum, later used abs and lastly apply sign, the two approaches should generate same results for most case as sum-to-zero is of low probability for high-dimensional fp32 vectors but it's still wrong and may generate incorrect outputs for corner cases.

Just want to check my assumption as stated above, pls respond if you happen to have time @kang205 @JiachengLi1995, thx!