This is a python project for Data Mining class 2020 summer, which implements a recommender system via user-based collaborative filtering. You can refer to the paper of Herlocker et al. and the paper of Breese et al. for a comprehensive review on collaborative filtering.
This project uses python 3.7.3 and a virtual environment is recommended. The required packages are listed in the requirements.txt. Your can use the following command to install them automatically.
pip install -r requirements.txt
The implementation of collaborative filter is encapsulated in a python class named collabFilter. The parameters required for instantiating the class are the path of data, number of users and number of items, respectively. Two optional parameters decide the neighbor users to select. The dataset to be loaded must be an ASCII text file where each row is a non-zero element of rating matrix with three entries: the user ID number, the item ID number and the rating value.
The similarity matrix is calculated while instantiating. Using model_evaluation method gives the mean absolute error (MAE) of the filter. Finally, predict_all method fills all the unrated entry with 1.000-5.000 or nan if prediction is unavailable. Using save_prediction method with a filename to save the prediction result in the format as the input.
Run main.py to get an evaluation of filter and the prediction with name submit_result.txt.
User-based collaborative method can be separated into three steps.
-
Weight all users with respect to similarity with the active user. Many metric can be used such Pearson Correlation, Spearman Correlation, and Vector Similarity. In this project we use the Pearson Correlation. The Pearson correlation is defined as
-
Select a subset of users to use as a set of predictors. This project combines weight thresholding with best-n neighbors, which can provide available predictions as much as possible.
-
Normalize ratings and compute a prediction from a weighted combination of selected neighbors' ratings. In this project, the prediction is made using