Giter VIP home page Giter VIP logo

oscars-ratings-and-films-oh-py's Introduction

Oscars,Ratings,and Films-Oh Py!

Background

It seems like a bygone activity of yesteryear, but yes people did go to movie theaters. With the advent of streaming services and well Covid, most of us are enjoying film from the comfort of our couch. We have many movie options! How do we know if what we select is “quality” and if something is “quality” is it something the average streamer really wants to watch? We extracted data from an Academy Awards Database and a Streaming Services database to try and determine:

  • Do award winning films actually have high user ratings?
  • Do awards impact film popularity on streaming services?
  • Do award winning directors receive higher ratings?
  • Do films with winning soundtracks have higher popularity?

Extraction

The first dataset was extracted from Kaggle.com in CSV. This data included details on movie titles, user ratings, streaming services supporting the film, director, and release year, genres, and runtime.

The team leveraged DataHub for the second dataset extraction in JSON format. This data set provided details year, award category, winner (T/F), and winner name.

Both datasets were downloaded and imported into Python for cleaning.

Transformation

Summary of Streaming Data Cleanse

  • Drop empty columns
  • Remove rows that did not contain rating information
  • Split First and Second Director, drop any additional director information
  • Split Location cell into Primary and Secondary Location, drop other locations
  • Split Language into Primary and Secondary Language, drop other languages
  • Split Genres into eight separate columns
  • Convert streaming services cells to Booleans

Challenges

The greatest challenge of the Streaming Data was that several cells contained multiple data points separated by commas. Some of these cells were not applicable for the team's analysis, thus the decision was made to drop any additional data elements post the primary and secondary listing. The exception being the genres information which was split and added to eight new genre columns.

Summary of Awards Data Cleanse

  • Identify unique fields;
  • Combine data with similar music categories;
  • Combine data with similar short film categories;
  • Combine data with similarities for cinematography, art direction;
  • Identify winning actors;
  • Remove “year_ceremony” and “ceremony” columns
  • Identify winning best pictures.

Challenges

The greatest challenge with the Awards dataset was that there were several columns containing similar categories. This level of detail was not required for the team’s assessment and therefore the data was merged. As we progressed through the initial dataset, we found that it would be difficult to join to the other dataset, so a new source file was used which listed film titles.

Data Joins

The original data sources did not have a viable unique id to combine into a relation database. A third data source from a CSV file was leveraged that included film name as well as awards to remediate this issue.

Once the data was finalized, a ERD was constructed to establish how the data would be loaded into a relational database. This analysis supported the construction of six tables.
Primary and Foreign keys were identified for data points category_id, nominee_id, film_id.

Load

UNCC Data Analytics Bootcamp Project for ETL- Based on the concept of oscars and successful films on streaming services.

oscars-ratings-and-films-oh-py's People

Contributors

agapeoficarus avatar ljeanparrish avatar znurtokatli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.