Giter VIP home page Giter VIP logo

moviedataproject's Introduction

MovieDataProject

RE4OAgf

General Overview:

The purpose of this analysis is to find trends in movie data that will allow Microsoft to decide what kind of movies to make at their new Microsoft studio. Since Microsoft is just getting into the movie business, they need actionable insights to decide what kind of films to create.

Business Problem:

As a business, Microsoft's goal is to make money. With so many attributes to a successful movie, Microsoft needs to know what kind of movie to make and how to make it, in order to earn a net profit.

This data will explore that question from three angles:

  1. Runtime
  2. Time of year the movie is released
  3. Production budget

Data Understanding:

The data used for this analysis came from two sources: The Movie Database (TMDB) and The Numbers (TN).

The TMDB data was used to analyze runtime, with a total of 4,766 data points used after cleaning the data. This data was collected from a dataset on kaggle: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

The TN data was used to analyze the movie's premiere month and production budget, with a total of 5,782 data points. This data was collected from a dataset provided by the Flatiron School.

Both TMDB and TN are reliable data sources that provide movie insight based on a variety of factors. The main limitation of this data is that it only comes from 2 sources with a total of 10,548 data points.

Unknown

Screen Shot 2022-04-21 at 2 52 03 PM

For the TMDB data, used for evaluating runtime, the three columns used were budget, revenue and runtime, and the rest were dropped. There were only 2 null values between these three columns (out of 4,766 data points) so these were dropped. Further, there were 35 data points with values of 0 for runtime, so these were dropped as well, as a runtime of 0 would equate to a null value in this instance. The net profit was then calculated by subtracting the budget from the revenue and this was plotted against the runtime.

For the TN data, used for evaluating release month and production budget, the data had been pre-cleaned for null values and values of 0. I therefore went ahead and separated the release month from the rest of the date and the budget and revenue values were changed from object types to integers. The net profit was then calculated and the top 100 movies based on net profit were pulled out to be compared to the rest of the data for the premiere month analysis. The two graphs were then plotted.

Data Analysis- Runtime:

Run Time vs  Net Profit

Results/ Recommendation:

With a correlation coefficient of 0.225, there is a very weak, positive correlation between the a movie's runtime and the movie's net profit. Most movies are between 90-140 minutes, however, the runtime of the movie does not have a strong correlation to the sucess of the movie, so Microsoft can choose to make the movie as long as they want without too much concern.

Data Analysis- Release Month:

Frequency of Movies Premiered Per Month

Results/ Recommendation:

A much higher percentage of the top 100 gross movies premeried in May, June, July, November and December, which makes sense given the summer months and before the holidays are popular months to go out. The movies not in the top 100 are much more evenly distributed throughout the year. My recommendation would be to premiere your movies over the summer or before the holidays in order to have a larger audience.

Data Analysis- Production Budget:

Production Budget vs  Net Profit

Results/ Recommendation:

With a correlation coefficient of 0.608, there is a moderately positive correlation between the production budget and the movie's net profit, however this is not an absolute. Once the production budget gets above 50m, the chances of making a higher net profit increases, as we can see from the best fit line.

Conclusion:

Runtime: Most movies are between 90-140 minutes, however, because the runtime of the movie does not have a strong correlation to the sucess of the movie, Microsoft can choose to make the movie as long as they want without too much concern.

Premiere date: A much higher percentage of the top 100 gross movies premeried in May, June, July, November and December, compared to the rest of the movies which premiered in a more even distribution throughout the year. Therefore, Microsoft should premiere their movies over the summer or before the holidays.

Production budget: There is a positive correlation between the production budget and the movie's net profit, however this is not an absolute. Once the production budget gets above 50m, the chances of making a higher net profit increases, as we can see from the best fit line.

Next Steps:

To further analysis how the production budget relates to net profit, it would be helpful to break the production budgets into bins (0-10M, 10-25M, 25-50M, etc) and calculate the mean of each budget as it relates to the mean net profit.

Another helpful factor to analyze would be which genres of movies that have the highest net profit.

Lastly, as discussed above, ancillary revenue has a big impact on net profit so analyzing the ancillary revenue of movies would be another helpful data point for Microsoft.

For Additional Information

Project Workbook: https://github.com/julietday422/MovieDataProject/blob/main/ReportNotebook.ipynb

Presentation: https://github.com/julietday422/MovieDataProject/blob/main/presentation.pdf

Images (JPEG and PNG): https://github.com/julietday422/MovieDataProject/tree/main/Images

For any additional questions, please contact Juliet Day at [email protected]

moviedataproject's People

Contributors

julietday422 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.