Giter VIP home page Giter VIP logo

average-ratings-spark's Introduction

MovieLens_AverageRatings

Data Mining for average movie/tag ratings

Getting Started

Follow the instructions will get you familiar with how to do data mining on large datasets. The open source datasets can be reached in the MovieLens | GroupLens. In this repository, use the MovieLens 20M Dataset and MovieLens Latest Datasets for implementation. Both datasets provide movieID and the rating record for each single movie. Each movie can also be categortized by tags. The goal is to find average ratings via Spark(PySpark) for movieID and tag, separately.

Data Mining

  • Task1 - find average movie ratings

  • Task2 - find average tag ratings

How to run my program

Put MovieLen datasets and two of python scripts inside the Spark Folder. As the relative code path is defined (For example: "ml-latest-small/ratings.csv"), the program will read the file when we use “sc.textFile”. If you want to test different task, just simply change the path to “ml-20m/ratings.csv”.

Before testing

  1. Put the source code (.py) and both datasets (ml-20m / ml-latest-small) inside the Spark folder

  2. Start testing steps below

Task1 step

1.Open the source code(Po-Chuan_Tseng_task1.py)

2.Change the sc.textfile path depends on the dataset you want to test

3.Save the file and open the Terminal on Mac

4.Cd into the Spark Folder and type the following command

./bin/spark-submit Po-Chuan_Tseng_task1.py

5.After the program finishes task, the txt file will be generated inside the Spark folder.

6.Open the file and check the values.

Task2 step

1.Open the source code(Po-Chuan_Tseng_task2.py)

2.Change the sc.textfile path depends on the dataset you want to test

3.Save the file and open the Terminal on Mac

4.Cd into the Spark Folder and type the following command

./bin/spark-submit Po-Chuan_Tseng_task2.py

5.After the program finishes task, the csv file will be generated inside the Spark folder.

6.Open the file with TextEdit.app and check the values

Credits

This repository is credited to the course project of INF553 at USC

average-ratings-spark's People

Contributors

pctseng7 avatar

Stargazers

vishnummv avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.