tsinghua-fib-lab / dice Goto Github PK

The official implementation of "Disentangling User Interest and Conformity for Recommendation with Causal Embedding" (WWW '21)

License: MIT License

Python 96.38% MATLAB 3.62%

causality causal-inference recommender-system recommendation-system recommendation

dice's Introduction

DICE

This is the official implementation of our WWW'21 paper:

Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Depeng Jin, Yong Li, Disentangling User Interest and Conformity for Recommendation with Causal Embedding, In Proceedings of the Web Conference 2021.

Model training

First unzip the datasets and start the visdom server:

visdom -port 33336

Then simply run the following command to reproduce the experiments on corresponding dataset and model:

python app.py --flagfile ./config/xxx.cfg

Embedding visualization

The visualization codes to reproduce Figure 5(b) and Figure 7 can be found in the viz folder.

First, reduce the dimension of the embedding vectors to 2D using t-SNE (remember to change the path to the model checkpoint in viz.py):

python viz.py

Then, visualize the 2D embedding vectors using MATLAB:

embedding_viz.m

Dataset processing

The dataset process codes are in this repo. Please check this issue for more details.

Citation

If you use our codes and datasets in your research, please cite:

@inproceedings{zheng2021disentangling,
  title={Disentangling User Interest and Conformity for Recommendation with Causal Embedding},
  author={Zheng, Yu and Gao, Chen and Li, Xiang and He, Xiangnan and Li, Yong and Jin, Depeng},
  booktitle={Proceedings of the Web Conference 2021},
  pages={2980--2991},
  year={2021}
}

dice's People

Contributors

Stargazers

Watchers

dice's Issues

popularity

作者您好！
（1）请问与popularity相关的文件（popularity_all.npy, popularity_skew.npy, popularity_blend.npy, popularity.npy）的含义是什么？是统计了item的交互次数吗？它们的区别是什么？
（2）如果我没理解错，您在复现IPS时，是用item出现次数的倒数对样本进行加权，对吗？
最近需要复现一下用IPS对流行度进行消偏，向您请教一下具体的实现细节。

打分融合

作者您好，非常直观有趣的工作。
请问有考虑过用MLP融合代替兴趣和从众拼接融合吗？这种做法相较于拼接有哪些优劣呢？期待您的回复。

Dataset split code is needed.

Hello, I read your paper and find it a fantastic work.
But I have some trouble in dataset split. Could you please share your dataset split code with me if it is convenient?
Thanks a lot!

Qidong Liu

关于数据集处理的细节询问

作者您好！感谢您贡献的高质量论文和代码。
关于数据集处理，我理解是这样做的：首先按照物品流行度的倒数采样出40%作为intervened数据（intervened数据随机地分为四份），另外60%作为normal数据；normal数据和一份intervened数据合在一起作为training set；一份intervened数据作为validation set；两份intervened数据作为test set。
请问您是这样做的吗？如果我理解有偏差，希望您指正，谢谢！

请教下DICE在IID的测试集上的表现

作者您好，

您的工作让我很受启发，有一个问题想请教一下您:

为什么解耦了interest和conformity向量后，模型就会在所有item popularity都一样的测试集上效果好呢？

如果DICE在IID的测试集上表现好这个我可以理解，因为解耦出这两个向量后能更细粒度地建模interaction。但我理解的是，DICE最后还是concate两个embedding来做预测的，所以不太明白是什么因素导致它在non-IID的测试集上性能好。

非常感谢！

进行test_iou时缺少reccommender的PopularityRecommender子类

您好，感谢您开源研究代码，但是我在进行iou测试时，发现reccommender中缺少PopularityRecommender子类，不知道是哪里的问题，期待您的回复，O(∩_∩)O谢谢

pytorch version = 1.12.1

请问pytorch version= 1.12.1,可以跑一下这个代码吗

Experimental details

hi~

Thank you for your great work and code contribution！

I am also working in this area.I would like to inquire about some experimental details. Especially the detailed explanation of the Visualization of the learned item embeddings in DICE experiment.

Can you share the code？I will refer to your work in future work.

Looking forward to your reply, thank you！

About the mask in the BPR loss function

Hello, author. I am a novice who has just entered this field. I found that there are 2 BRP loss functions in your code, and one of the loss functions has a parameter of mask. I would like to ask you about the function of the mask parameter and the generation conditions. Thank you and look forward to your reply.

Issues related to dataset processing

Hello author. In the ml10m dataset, they are officially described as having 71,567 users, while the paper counts only 37,962 users, I would like to ask why this is so and how did you handle the dataset.

Question regarding the test set

Hi!
I was trying to run experiments using the data you have uploaded.
However, it seems like there's only one test_coo_record.npz. It is the non IID testset?
Also, for the train set I would like to clarify if train_blend_coo_adj_graph.npz represents the one used for dice.

Thanks for your help!