Giter VIP home page Giter VIP logo

recommendersystems's Introduction

A library of Recommender Systems

This repository provides a summary of our research on Recommender Systems. It includes our code base on different recommendation topics, a comprehensive reading list and a set of bechmark data sets.

Code Base

Currently, we are interested in sequential recommendation, feature-based recommendation and social recommendation.

Sequential Recommedation

Since users' interests are naturally dynamic, modeling users' sequential behaviors can learn contextual representations of users' current interests and therefore provide more accurate recommendations. In this project, we include some state-of-the-art sequential recommenders that empoly advanced sequence modeling techniques, such as Markov Chains (MCs), Recurrent Neural Networks (RNNs), Temporal Convolutional Neural Networks (TCN) and Self-attentive Neural Networks (Transformer).

Feature-based Recommendation

A general method for recommendation is to predict the click probabilities given users' profiles and items' features, which is known as CTR prediction. For CTR prediction, a core task is to learn (high-order) feature interactions because feature combinations are usually powerful indicators for prediction. However, enumerating all the possible high-order features will exponentially increase the dimension of data, leading to a more serious problem of model overfitting. In this work, we propose to learn low-dimentional representations of combinatorial features with self-attention mechanism, by which feature interactions are automatically implemented. Quantitative results show that our model have good prediction performance as well as satisfactory efficiency.

Social recommendation

Online social communities are an essential part of today's online experience. What we do or what we choose may be explicitly or implicitly influenced by our friends. In this project, we study the social influences in session-based recommendations, which simultaneously model users' dynamic interests and context-dependent social influences. First, we model users' dynamic interests with recurrent neural networks. In order to model context-dependent social influences, we propose to employ attention-based graph convolutional neural networks to differentiate friends' dynamic infuences in different behavior sessions.

Reading List

We maintain a reading list of RecSys papers to keep track of up-to-date research.

Data List

We provide a summary of existing benchmark data sets for evaluating recommendation methods.

New Data

We contribute a new large-scale dataset, which is collected from a popular movie/music/book review website Douban (www.douban.com). The data set could be useful for researches on sequential recommendation, social recommendation and multi-domain recommendation. See details here.

Publications:

recommendersystems's People

Contributors

shichence avatar songweiping avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recommendersystems's Issues

会话中的物品顺序

由于会话划分时只使用会话ID进行排序,一个会话内物品的顺序是不是乱序的?
data = data.sort_values(by=['TimeId']).groupby('SessionId')['ItemId'].apply(list).to_dict()

doubts about the data processing of Criteo in AutoInt

@Songweiping ,Hello,I am very interested in your work(AutoInt) at CIKM'19,but I had some doubts when I reproduced the experiments of the paper.
As we know, Criteo has its own bench mark, including two csv files(train.csv & test.csv).But the preprocess of AutoInt splits the file(train_samples.txt) to get the train data,valid data and test data.
I'm wondering how to transform the original Dataset(train.csv & test.csv) into a Dataset that the data_process code can handle.
Could you post your code for converting original csv files(train.csv & test.csv) into the whole txt file(the data used in your paper, and Format is similar to the sample file train_example.txt)?

Question about ReLU in Multi-Head Attention

In multi-head attention, there is a relu after queries, keys, and values. Is this a correct implementation? The paper did not mention the relu in Eq. 5. Besides, it seems that the relu will make the attention matrix always positive.

# Linear projections
Q = tf.layers.dense(queries, num_units, activation=tf.nn.relu)
K = tf.layers.dense(keys, num_units, activation=tf.nn.relu)
V = tf.layers.dense(values, num_units, activation=tf.nn.relu)```

Converting to top-n recommendation

I am very impressed about the usefulness of AutoInt to my personal project.
I'm wondering if there is any handy way of converting this AutoInt to top-n recommendaiton?
Would it be possible?

数据格式

请问这些框架结构要求的数据格式是什么样的呢?

数据处理

请问,Gowalla数据集是如何处理和划分的?

Session segmentation on Delicious datasets

Hi, I'm trying to test the result in the paper Session-Based Social Recommendation on the Delicious datasets . From the paper and datasets, I understand that you consider each session is a sequence of tags a user has assigned to a bookmark and all the tagging actions for that bookmark will have the same timestamp according to the datasets.
So my question is that when you give the time_id for each session (as in the preprocess_DoubanMovie.py file), will two sessions have the same id only if their timestamps have exactly same date and time ?
E.g: A session with timestamp '01/06/2020 2:30:00 pm' and another session with timestamp '01/06/2020 2:30:01 pm' will have different time_id.

UniformNeighborSampler in SocialRec

While I was looking through the code, I noticed that in neigh_samplers.py, users can sample themselves as the second-order neighbors because when we sample 10 neighbors of the first-order neighbor, the user is also included.

        adj = self.adj_info[node, :]
        neighbors = []
        for neighbor in adj:
            if first_or_second == 'second':
                if self.visible_time[neighbor] <= timeid:
                    neighbors.append(neighbor)    

I was wondering is this intended or a logic flaw. Thank you!

数据集

作者,您好:
我目前在做动态网络表示学习,请问您可提供一下你实验数据吗?(论文里用的豆瓣原始数据就好,未被处理的,非常感谢,我的邮箱是[email protected]

Extracting Attention And Visualize it

weight

After running the model, I am wondering if weights of attention and their visualization can be done as the attached file displays from the original paper.

There are many examples for extracting and visualization for seq2seq but couldn't really find one for feature explanation.

Is there any ideas/code that can be used for attention visualization of meaningful features? ?

Only One class present issue

I am working on my own data -purchase history of financial production.
So I have data as who, with features, purchased which products, with features and when.
However, I have found that the metric of the model is AUC -- and I, as expected, get the following error

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

Since all my Ys are 1s -as I only have who purchased (clicked) which and there is no 0 value ( who didn't purchase which).
As other CTR prediction models also use AUC and log loss, I assume that there must be a way to use AUC for such a dataset.

Do you have any idea how to solve this issue?
That would be really helpful!

SocialRec中的samples

在SocialRec中,samples1和samples2的默认值分别为10和5。如果我没理解错的话samples1代表第二层邻居的数量,samples2代表第一层邻居的数量,这样的话第二层邻居的数量就比第一层邻居的数量还多,与经验不符。而且您的论文中也写到“The neighborhood sample sizes are empirically set to 10 and 15 in the first and second convolutional layers”,请问是少写了一个1吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.