UCSB Movie Ratings Data Science Project

Abstract

The goal of this project is to determine the main contributing factors of the gap in movie critic reviews and moviegoer ratings. The analysis of the gap is done using data of movie characteristics such as genre, synopsis, actors, runtime, etc. scraped from Rotten Tomatoes. We then use spaCy NLP libraries to do text analysis of the movie synopsis. With the desired features, we then use scikit-learn machine learning library to determine the gap in audience and critic scores. In addition, the scraped movie synopses present us a unique opportunity to use OpenAI GPT-2 language modules to generate synopses of invented movie titiles.

Contributors

Trung Bui
Minh Hua
Quang Pham

Motivation

Who do you know that didn't love Forrest Gump? Is this the critics "hating the hype"?

Believe the hype!

Dependencies

Pandas Library

spaCy NLP library

OpenAI GPT-2 https://github.com/openai/gpt-2

Methodology

Objective: Determine the contributing factor gap in audience and critic scores First, data on movie characteristics is scraped from Rotten Tomatoes Next, we use pandas to generate important features on studio, actors, etc. Then, we use spaCy and NLP libraries to determine the implications of the movie synopses. Finally, using scikit-learn, we determine leading factors behind the gap with the given data

Bonus: Using the existing synopses and titles, we generate movie synopses based on an invented title and vice versa.

Graphing Data

This is a scatterplot of data generated from a day's worth of images from 10:00 to 22:00 military time(10:00 am to 10:00 pm):

A cleaner representation of the data can be shown in the form of a histogram with variable time and people as weights:

Future Work

Method to track individuals to tell who is leaving or entering in order to improve data accuracy (foviated vision)
Use scraped data on movie and movie reviews to generate movie reviews
Improve the data sample to include writers, actors' previous performance, budget and box office, etc.

duyminh1998 / tomato-paste Goto Github PK

tomato-paste's Introduction

UCSB Movie Ratings Data Science Project

Abstract

Contributors

Motivation

Dependencies

Methodology

Graphing Data

Future Work

tomato-paste's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent