Giter VIP home page Giter VIP logo

mindis / recsys-cotraining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fernandobperezm/recsys-cotraining

0.0 1.0 0.0 1.3 MB

This repository holds the implementation, for the thesis project: Improving Collaborative Filtering Techniques by the use of Co-Training in Recommender Systems. This project was done under a thesis research for Fernando Benjamín Pérez Maurera, under supervision of Professor Paolo Cremonesi and Engineer Maurizio Ferrari, at Politecnico di Milano.

Python 76.42% Shell 4.91% Makefile 0.13% C# 2.10% F# 0.25% Perl 16.18%

recsys-cotraining's Introduction

Improving Collaborative Filtering Techniques by the use of Co-Training in Recommender Systems.

This repository holds the implementation, the datasets and results for the thesis project: Improving Collaborative Filtering Techniques by the use of Co-Training in Recommender Systems.

This project was done under a thesis research for Fernando Benjamín Pérez Maurera, under supervision of Professor Paolo Cremonesi and Engineer Maurizio Ferrari, at Politecnico di Milano.

Project organization.

The project is organized as follows:

  • Datasets: A folder where the datasets are located.
    • ml10m: folder containing the Movielens10M dataset.
      • ratings.csv: ratings file.
  • Implementation: RecPy module where the recommenders and helper classes are.
  • read-results: folder where bash scripts are, and are used to read the results output by Co-Training. There are for each recommender combination. The cases inside are ItemKNN/FunkSVD, ItemKNN/SLIM, ItemKNN/SLIMBPR and ItemKNN/BPRMF.
  • run-examples: folder where bash scripts to run Co-Training are. There are several recommenders combinations as: ItemKNN/FunkSVD, ItemKNN/SLIM, ItemKNN/SLIMBPR and ItemKNN/BPRMF.
  • Results: A folder generated when running run-knn.sh and results-knn/sh scripts. In this folder the results for each test-case will be put.
  • scripts: Folder where its located the two main Python files, holdout.py, which makes a holdout@k of the dataset, runs Co-Training and evaluates the recommenders, and read-results.py, which reads the results of each output file and generates new plots.
  • README.md: This file.
  • requirements.txt: File for Conda or PIP that has the libraries and modules required to run the code.
  • results-knn.sh: main bash script to read the results that each Co-Training process outputs for each test case inside read-results.
  • run-knn.sh: main bash script to run the Co-Training process for each test case inside run-examples.

Project installation.

Requirements.

  • Python 3.6+.
  • C++ Compiler.
  • On Linux, ensure that you have packages libc6-dev and build-essentials

Installation instructions

  1. [On Linux] Install Linux packages: apt-get install -y libc6-dev build-essentials.

  2. Install Miniconda for Python 3.6+ here.

  3. Create the virtual environment: conda create -n cotraining --file requirements.txt

  4. Activate the virtual environment: source activate cotraining.

  5. [Installation and run separately] Install the project: cd Configuration/ ; sh install.sh ; cd ..

  6. [Installation and run separately] Run one of the examples: * cd run-examples/ ; sh knn-funksvd.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd .. * cd run-examples/ ; sh knn-slim.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd .. * cd run-examples/ ; sh knn-bprmf.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd .. * cd run-examples/MyMediaLite/bin/ ; sh knn-slimbpr.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd ..

  7. [Installation and run integrated] Run the run-knn.sh script: sh run-knn.sh -p <p-most positive> -n <n-most negative> -u <size of U'>

  8. [Only to generate new plots] Run the results-knn.sh script: sh results-knn.sh -p <p-most positive> -n <n-most negative> -u <size of U'>

Results.

The test cases included with the project are ItemKNN/FunkSVD, ItemKNN/SLIM, ItemKNN/SLIMBPR and ItemKNN/BPRMF. The dataset used is Movielens10M, a holdout technique at 20% was used. A top-10 recommendation list was generated at evaluation time for each user. The items were divided into 10 bins based on their popularity, where the least popular is bin_0 and the most popular is bin_9. When running the test cases, the Results folder will be created, and a subfolder for each test case will be created.

At the moment, the project generates output files for: evaluation of RMSE, MAP, ROC-AUC, Precision, Recall, NDCG and MRR, a file containing the number of p-most and n-most items rated at each iteration, the agreement between the recommenders, and the popularity of the items recommended.

recsys-cotraining's People

Contributors

fernandobperezm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.