Giter VIP home page Giter VIP logo

recsys.jl's Introduction

RecSys

Build Status

RecSys.jl is an implementation of the ALS-WR algorithm from "Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management. Shanghai, China pp. 337-348, 2008"

Usage

  • Install: Pkg.clone("https://github.com/abhijithch/RecSys.jl.git")
  • Specify the training dataset in one of several ways:
    • Use delimited (CSV) file with columns: user_id, item_id, ratings. E.g.: trainingset = DlmFile("ratings.csv", ',', true).
    • Use a MAT file, specifying the file and entry name. E.g.: trainingset = MatFile("ratings.mat", "training")
    • Provide an implementation of FileSpec for any other format.
  • Initialize: als = ALSWR(trainingset)
  • Train: train(als, num_iterations, num_factors, lambda)
  • Check model quality:
    • rmse(als) to check against training dataset
    • rmse(als, testdataset) to check against a test dataset
    • and repeat training with different parameters till satisfactory
  • Save model: save(als, filename)
  • Load model: als = load(filename)
  • Get recommendations:
    • recommend(als, user_id) for an existing user
    • recommend(als, user_ratings) for a new/anonymous user

Examples

See examples for more details:

recsys.jl's People

Contributors

abhijithch avatar domarps avatar lkuper avatar raghuch avatar shashi avatar skylion007 avatar suranah avatar tanmaykm avatar thirumalakiran avatar viralbshah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recsys.jl's Issues

Set max and min limit

We dont want recommend ratings greater than 5 or lesser than 1 for the movie ratings.

RecSys TODO - Medium Term

API & Interfaces

  • Read from various file formats
  • Output results to different formats
  • Functions to test and benchmark the quality of recommendation system
  • Cross-fold datasets and testing

New Recommendation Algorithms & Techniques

  • SGD matrix factorization
  • Neighborhood methods
  • Functions for assesing user and item similarity

Misc Tasks

  • Add incremental factorization / partial co-learning methods for user update/insert calls
  • Iterate over Jupyter demo
  • Iterate over the toy dataset

Jupyter Demo

Focus on those this limited or no knowledge of recommendor systems. Explain recommendor systems basics, how to use our package, and some interesting demo/ features. Use toy dataset for the basics.

Feature Request - Allow for Implicit Variant

So I was wondering if any work has been done on trying to support implicit values for the ALS-wr algorithm. I have noticed that people have for instance received better results when using hybrid based approaches that use an "alpha" confidence value and rely on AUC and Precision based metric instead of ALS-wr. I might consider trying to implement this myself, but it would be nice to know if any progress has been made on this issue.

Generating Test Dataset Causes MSYNC Error

So I tried calling the generate_test_data function on Julia 0.4.6 and I received the following error:

ERROR: SystemError: msync: Invalid argument
in sync! at mmap.jl:206
in save at /home/$USER/.julia/v0.4/RecSys/src/chunks/matrix.jl:106
in save at /home/$USER/.julia/v0.4/Blobs/src/blob.jl:342
etc..

I am really confused what could be causing this error. It seems to be a bad implementation in the save function of chunks/matrix.jl if I am not mistaken. I am running CentOS so MSYNC should be supported by the Linux kernel. Is there are change in mmap from 0.4 to 0.5 that could be causing the error?

Issues Week 1, June 2015

  • Rename RecSys.jl
  • Sanitize
  • Relative Path
  • Internal data storage
  • Incremental SVD, Real Time
  • Combine two Factorization Techniques

Keywords : SDG, Neighbourhood Methods , Seasonality,

  • Fine tune the regularization parameters
  • Write terse/compact code
  • Test larger datasets
  • Facility for users to rate

RecSys I/O TODO - June 2015

  • create code skeleton for I/O
  • add read input for various formats, output as a DataFrame with common structure as Internal Representation (IR)
  • create sparse matrix for SVD from IR
  • wrapper functions
  • evaluation and validation using MLBase
  • adding I/O macros

Feature Request: ALS GPU Acceleration using CUBLAS or OpenCL

#A welcome addition to this repository would be to offload the ALS operations to the GPU. There exist several libraries which allow Julia to offload many of these operations to the GPU. Furthermore, Nivida has recently released an ALS implementation in CUDA. Adding GPU support would significantly speed up the training process.

README.md is outdated

Following the Readme

trainingset = DlmFile("data3.csv"; dlm=',', header=true, quotes=false)
als = ALSWR(trainingset)

LoadError: MethodError: Cannot `convert` an object of type RecSys.DlmFile to an object of type RecSys.ALSWR{TP<:RecSys.Parallelism,TI<:RecSys.Inputs,TM<:RecSys.Model}
This may have arisen from a call to the constructor RecSys.ALSWR{TP<:RecSys.Parallelism,TI<:RecSys.Inputs,TM<:RecSys.Model}(...),
since type constructors fall back to convert methods.
while loading In[32], in expression starting on line 1

 in RecSys.ALSWR{TP<:RecSys.Parallelism,TI<:RecSys.Inputs,TM<:RecSys.Model}(::RecSys.DlmFile) at ./sysimg.jl:53

Has the API changed ?

Possible Bug with Blobs recommendation & Some Instability

So I've been messing around with the Blob method of parallelization since I have a rather large dataset, I seem to have found a minor bug. When I try to call recommend on a trained ALSWR with 20 iterations and 20 factors, I get this error in the recommendation function.

ERROR: DimensionMismatch("new dimensions (1,20) must be consistent with array size 10")
 in reshape(::Array{Float64,1}, ::Tuple{Int64,Int64}) at .\array.jl:113
 in #recommend#10(::Bool, ::Int64, ::Function, ::RecSys.ALSWR{RecSys.ParBlob,RecSys.DistInputs,RecSys.DistModel}, ::Int6
4) at C:\Users\Skylion\.julia\v0.5\RecSys\src\als-wr.jl:142
 in recommend(::RecSys.ALSWR{RecSys.ParBlob,RecSys.DistInputs,RecSys.DistModel}, ::Int64) at C:\Users\Skylion\.julia\v0.5\RecSys\src\als-wr.jl:130
 in #recommend#32(::Array{Any,1}, ::Function, ::MovieRec, ::Int64, ::Vararg{Int64,N}) at C:\Users\Skylion\Documents\MalDump Data\maldump2\julia2\ALSAnime.jl:46
 in test_chunks(::String, ::String) at C:\Users\Skylion\Documents\MalDump Data\maldump2\julia2\ALSAnime.jl:150

On a side note, I've noticed the training of this library seems a little unstable. If I run the same parameters twice, I am likely to get two VERY different RMSEs (almost as if one doesn't converge). The odd thing is that the hyperparameters are the exact same. I wonder if there is an unsafe update of shared memory somewhere.

I will say that my matrix is actually relatively dense:
85078124 ratings in (1319751,4557) sized sparse matrix

I have been trying to debug this issue for a few months in my free time, but have yet to figure out what the issue could be. It could also be something weird like my CSVs aren't formatted properly. Does the order of the ratings or movies matter?

Error executing examples/demo on Windows

When attempting to start Escher and then execute the demo located in "RecSys/examples/demo" from a Julia session on a Windows machine, I received the error message shown below. The location of the unrecognized keyword argument "selected"that is shown in the stack trace seems to be associated with this line. I will work on determining a fix.

I am curious if anyone has previously attempted running this demo on Windows, and if so was this same error observed?

Microsoft Windows [Version 6.2.9200]
(c) 2012 Microsoft Corporation. All rights reserved.

C:\Users\Administrator>julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-pre+7113 (2015-08-31 17:47 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 092bcf4 (9 days old master)
|__/                   |  i686-w64-mingw32

julia> cd("C:\\Users\\Administrator\\.julia\\v0.4\\RecSys\\examples\\demo")

julia> using Escher

julia> include(Pkg.dir("Escher", "src", "cli", "serve.jl"))
WARNING: This version of the GnuTLS library (3.2.15) is deprecated
and contains known security vulnerabilities. Please upgrade to a
more recent version.
escher_serve (generic function with 3 methods)

julia> include(Pkg.dir("Escher", "src", "cli", "compile.jl"))
escher_make (generic function with 1 method)

julia> escher_serve(5555,"C:\\Users\\Administrator\\.julia\\v0.4\\RecSys\\examples\\demo")
Listening on 0.0.0.0:5555... #At this point I opened a browser window to http:://localhost:5555
943  
1682
(943,)
unrecognized keyword argument "selected"
 in anonymous at C:\Users\Administrator\.julia\v0.4\Escher\src\cli\serve.jl:169
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:15
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:8
 in splitquery at C:\Users\Administrator\.julia\v0.4\Mux\src\basics.jl:28
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:8
 in wcatch at C:\Users\Administrator\.julia\v0.4\Mux\src\websockets_integration.jl:12
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:8
 in todict at C:\Users\Administrator\.julia\v0.4\Mux\src\basics.jl:21
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:12 (repeats 2 times)
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\Mux.jl:8
 in anonymous at C:\Users\Administrator\.julia\v0.4\Mux\src\server.jl:38
 in handle at C:\Users\Administrator\.julia\v0.4\WebSockets\src\WebSockets.jl:354
 in on_message_complete at C:\Users\Administrator\.julia\v0.4\HttpServer\src\HttpServer.jl:364
 in on_message_complete at C:\Users\Administrator\.julia\v0.4\HttpServer\src\RequestParser.jl:103
 in http_parser_execute at C:\Users\Administrator\.julia\v0.4\HttpParser\src\HttpParser.jl:92
 in run at C:\Users\Administrator\.julia\v0.4\HttpServer\src\HttpServer.jl:310
 in anonymous at task.jl:447 (repeats 2 times)

TypeError SharedArrays in ensure_loaded

So since I can't use blobs without using Julia 0.5, I decided to try to load the testdata set from a CSV I split off from the training data. However, when I call rmse(model, DlmFile(...))... I get a type error due to a function expecting an array but receiving a shared array. I am not sure where this function is but here is the stack trace:

ERROR: TypeError: ensure_loaded: in typeassert, expected Array{Int64,1}, got SharedArray{Int64,1}
 in rmse at .../.julia/v0.4/RecSys/src/als-wr.jl:105
 in rmse at MYFILE.jl:45 #(This line is the following: rmse(movierec::MovieRec, args...; kwargs...) = rmse(movierec.als, args...; kwargs...))
 in test_rmse at MYFILE.jl:118

This could be easily rectified by ensuring sdata is called on the array, but I am having a little trouble following the stack trace. It's probably a quick fix, but any help would be appreciated.

Escher bugfix

From @shashi

diff --git a/examples/demo/index.jl b/examples/demo/index.jl
index b54d888..a66485e 100644
--- a/examples/demo/index.jl
+++ b/examples/demo/index.jl
@@ -33,7 +33,7 @@ function main(window)
             title(2, "Top $n recommendations for $(users[user])"),
             vskip(2em),
             intersperse(vbox(vskip(1em), hline(), vskip(1em)),
-                map(showmovie, recommend(user, n))),
+                map(showmovie, recommend(user, n)))...,
         ) |> pad(1em)
     end
 end

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.