Giter VIP home page Giter VIP logo

netflix-prize's Introduction

My Code for the Netflix Prize

I'm not aware of folks having published their code for the Netflix Prize. Here's mine.
Under the team name "Hi!", I competed alone in college. I did it mostly for fun, and to learn modern machine learning techniques. It was an incredibly valuable, but strenuous, time. Well worth it on all fronts, though.
I peaked out at #45 or so, and then dropped out to work on my senior thesis, and came in #145 or so.
What I learned in the process was that smarter wasn't always better -- make an algorithm, and then scale it up, and then make a dozen tweaks to it, and then average all of the results together. That's how you climbed the leaderboard.

Anyhoo, I haven't touched this code in awhile, but perhaps it'll be useful to folks interested in competitive data mining.
Specifically, the lessons I learned:

  • Get the raw data into a saved and manageable format fast. The easier it is to load your data in and start mutating it, the better.
  • If doing simple pivots on your data is hard, and slows you down from visualizing whats in your data, spend time making data structures which make that easy.
  • Generalize. Iterate. If you have a method you think will work, but it has a lot of knobs, and you don't know the best way to set those knobs, make it easy for you to try every possible iteration. There is often not a good way to figure out what the best approach is. You will have to try many of them in order to build up an intuition. Specifically, that means (for me) a pluggable architecture. If there's ten ways to try a particular step, make sure you write your overarching algorithm so that it takes a function that you can pass to it, as opposed to having a method hardwired in the code. That way, you can hotswap all your ideas.
  • Speed is a feature. Of course you make sure it works first. But your goal is to see if something works. If an algorithm takes a day to run, but you can spend five hours making it run in 1/3 of a day, do it. You'll be running it over and over again, and you'll learn more if you can iterate.

As for the technical nitty-gritty, everything that's speed sensitive is written in Cython, which was the best balance of speed and convenience in 2009. If I were to do it al again, I would use (Numba)[http://github.com/numba/numba].

The original data is gone, I believe, but I might have it stored somewhere. I'll look for that.

netflix-prize's People

Contributors

alexbw avatar kencochrane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

netflix-prize's Issues

How do you actually run the code?

I'm new to machine learning and wanted to see an example of a recommendation system running, but I am not sure how to actually run your code. Thanks for putting this up online, and thanks for the help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.