MovieLens_AverageRatings

Data Mining for average movie/tag ratings

Getting Started

Follow the instructions will get you familiar with how to do data mining on large datasets. The open source datasets can be reached in the MovieLens | GroupLens. In this repository, use the MovieLens 20M Dataset and MovieLens Latest Datasets for implementation. Both datasets provide movieID and the rating record for each single movie. Each movie can also be categortized by tags. The goal is to find average ratings via Spark(PySpark) for movieID and tag, separately.

Data Mining

Task1 - find average movie ratings
Task2 - find average tag ratings

How to run my program

Put MovieLen datasets and two of python scripts inside the Spark Folder. As the relative code path is defined (For example: "ml-latest-small/ratings.csv"), the program will read the file when we use “sc.textFile”. If you want to test different task, just simply change the path to “ml-20m/ratings.csv”.

Before testing

Put the source code (.py) and both datasets (ml-20m / ml-latest-small) inside the Spark folder
Start testing steps below

Task1 step

1.Open the source code(Po-Chuan_Tseng_task1.py)

2.Change the sc.textfile path depends on the dataset you want to test

3.Save the file and open the Terminal on Mac

4.Cd into the Spark Folder and type the following command

./bin/spark-submit Po-Chuan_Tseng_task1.py

5.After the program finishes task, the txt file will be generated inside the Spark folder.

6.Open the file and check the values.

Task2 step

1.Open the source code(Po-Chuan_Tseng_task2.py)

2.Change the sc.textfile path depends on the dataset you want to test

3.Save the file and open the Terminal on Mac

4.Cd into the Spark Folder and type the following command

./bin/spark-submit Po-Chuan_Tseng_task2.py

5.After the program finishes task, the csv file will be generated inside the Spark folder.

6.Open the file with TextEdit.app and check the values

Credits

This repository is credited to the course project of INF553 at USC

pctseng7 / average-ratings-spark Goto Github PK

average-ratings-spark's Introduction

MovieLens_AverageRatings

Getting Started

Data Mining

How to run my program

Before testing

Task1 step

Task2 step

Credits

average-ratings-spark's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent