Giter VIP home page Giter VIP logo

moviebuddy's Introduction

Code Repository: https://github.com/sumanthvrao/MovieBuddy

Report: https://github.com/sumanthvrao/MovieBuddy/Report.pdf

MovieBuddy

How amazing would it be if you could watch your favorite movie with someone who has similar interests like you! We compared different recommendation system models (Content-based filtering, Collaborative filtering, Restricted Boltzmann Machine) to find common movie interests among a group of people.

Dataset Link (movielens-100k-dataset.zip)

Python 3 Jupyter notebook 3 Collaborative filtering Content Based filtering Surprise Library TensorFlow

Description

MovieLens offers dataset offers about 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 user. Our aim is to bring together users with similar movie interests. In order to do this, we make use of users movie ratings and their information. We account for a variety of factors (location, interests , age .. to name some) before suggesting a MovieBuddy to you.
Our data set contains:

  • 943 users , 1682 movies and 100000 ratings.
  • Each user has rated at least 20 movies.
  • Simple demograpic information about Users.

Approach 1 - Content Based Filtering

Content based filtering also referred to as cognitive filtering recommends items based on comparison between the content of items which means the items recommended by the model is same for any user. Content-based filtering avoids the cold-start problem that forestalls other recommendation techniques, as the the system considers only the content of the movies to make recommendations.

Content Based Recommendations rely on the characteristics of the item itself. The major challenge is in identifying these characteristics of the item to be considered. The Original MovieLens dataset consists of limited information about each movie - details like movie title, year of release, movie id, imdb url and list of genres. This data alone was insufficient to bring out valuable recommendations for a movie. We used tmdb (The Movie Database) api to extract more details for each movie. This api enabled us to obtain other characteristics like names of the protagonists, director etc. We created a hybrid feature for each movie which comprised of the name of the movie, year of release, list of genres, name of the director, name of the primary actor, name of secondary actor.

The Countvectorizer module identified 9105 distinct new features for each movie where each feature is a word extracted from the hybrid feature set of all the movies. We then calculated the self-cosine similarity of the matrix to compare each movie with every other movie in the dataset. Based on this similarity matrix we recommend 15 movies for every given movie.

Approach 2 - Collaborative Filtering

Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences from many users. The collaborative filtering model attempts to recommend movies and how much a user likes each movie by considering either user-user similarity or movie-movie similarity

Surprise (Simple Python RecommendatIon System Engine) library was used for Collaborative filtering. Results of running different collaborative filtering algorithms have been documented in the table below.

Algorithm Mean RMSE Mean MAE Mean fit time Mean test time
SVD 0.9358 0.7375 11.38 0.45
KNN Basic (pearson baseline) 1.0005 0.7917 5.65 10.91
KNN Basic (MSD) 0.979 0.7731 1.23 8.47
KNN Basic (cosine) 1.0174 0.8045 4.41 9.26
KNN with means (pearson baseline) 0.9382 0.731 4.5 8.74
KNN with means (MSD) 0.9502 0.7486 1.34 9.71
KNN with means (cosine) 0.9556 0.7546 4 8.55

We chose SVD as our collaborative filtering algorithm as it had the least testing time, and lower RMSE and MAE values across the 5-folds.

Approach 3 - Restricted Boltzman Machine

The fundamental idea here is to use an RBM for each user with shared weights for users who rate the same set of movies. Every RBM has the same number of hidden units, but an RBM has active softmax visible units only for the items rated by that user. If two users have rated the same movie, their two RBM’s must use the same weights between the softmax unit for that movie and the hidden units. To ensure binary mappings, nodes with ratings from 1 to k are made for every user’s RBM for each movie he/she has rated. Each node is activated or deactivated based on the value it is looking for. It is shown that an RBM slightly outperform carefully tuned SVD models. A 2 layered undirected neural network was used as an RBM in our case.

Authors

moviebuddy's People

Contributors

sumanthvrao avatar sumedhpb avatar surajaralihalli avatar

Stargazers

 avatar

Watchers

 avatar  avatar

moviebuddy's Issues

Font End

Help to build a good front end model.
language preference your choice.
model must support all features - adding users , suggesting movies , suggesting users.

Visualization+statistics

Need to :

  1. cleaning the data , pre-processing it
  2. Modeling the data.
  3. Evaluate by running basic tests on it.

Build a model

  • Build a recommendation model with the data.
  • Try multiple approaches. (collaborative , content based , Knowledge , Hybrid)

Literature Survey

Literature survey report and an update of the Github repository with the first round of results (including any cleaning, pre-processing, etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.