The purpose of this analysis is to find trends in movie data that will allow Microsoft to decide what kind of movies to make at their new Microsoft studio. Since Microsoft is just getting into the movie business, they need actionable insights to decide what kind of films to create.
As a business, Microsoft's goal is to make money. With so many attributes to a successful movie, Microsoft needs to know what kind of movie to make and how to make it, in order to earn a net profit.
This data will explore that question from three angles:
- Runtime
- Time of year the movie is released
- Production budget
The data used for this analysis came from two sources: The Movie Database (TMDB) and The Numbers (TN).
The TMDB data was used to analyze runtime, with a total of 4,766 data points used after cleaning the data. This data was collected from a dataset on kaggle: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
The TN data was used to analyze the movie's premiere month and production budget, with a total of 5,782 data points. This data was collected from a dataset provided by the Flatiron School.
Both TMDB and TN are reliable data sources that provide movie insight based on a variety of factors. The main limitation of this data is that it only comes from 2 sources with a total of 10,548 data points.
For the TMDB data, used for evaluating runtime, the three columns used were budget, revenue and runtime, and the rest were dropped. There were only 2 null values between these three columns (out of 4,766 data points) so these were dropped. Further, there were 35 data points with values of 0 for runtime, so these were dropped as well, as a runtime of 0 would equate to a null value in this instance. The net profit was then calculated by subtracting the budget from the revenue and this was plotted against the runtime.
For the TN data, used for evaluating release month and production budget, the data had been pre-cleaned for null values and values of 0. I therefore went ahead and separated the release month from the rest of the date and the budget and revenue values were changed from object types to integers. The net profit was then calculated and the top 100 movies based on net profit were pulled out to be compared to the rest of the data for the premiere month analysis. The two graphs were then plotted.
With a correlation coefficient of 0.225, there is a very weak, positive correlation between the a movie's runtime and the movie's net profit. Most movies are between 90-140 minutes, however, the runtime of the movie does not have a strong correlation to the sucess of the movie, so Microsoft can choose to make the movie as long as they want without too much concern.
A much higher percentage of the top 100 gross movies premeried in May, June, July, November and December, which makes sense given the summer months and before the holidays are popular months to go out. The movies not in the top 100 are much more evenly distributed throughout the year. My recommendation would be to premiere your movies over the summer or before the holidays in order to have a larger audience.
With a correlation coefficient of 0.608, there is a moderately positive correlation between the production budget and the movie's net profit, however this is not an absolute. Once the production budget gets above 50m, the chances of making a higher net profit increases, as we can see from the best fit line.
Runtime: Most movies are between 90-140 minutes, however, because the runtime of the movie does not have a strong correlation to the sucess of the movie, Microsoft can choose to make the movie as long as they want without too much concern.
Premiere date: A much higher percentage of the top 100 gross movies premeried in May, June, July, November and December, compared to the rest of the movies which premiered in a more even distribution throughout the year. Therefore, Microsoft should premiere their movies over the summer or before the holidays.
Production budget: There is a positive correlation between the production budget and the movie's net profit, however this is not an absolute. Once the production budget gets above 50m, the chances of making a higher net profit increases, as we can see from the best fit line.
To further analysis how the production budget relates to net profit, it would be helpful to break the production budgets into bins (0-10M, 10-25M, 25-50M, etc) and calculate the mean of each budget as it relates to the mean net profit.
Another helpful factor to analyze would be which genres of movies that have the highest net profit.
Lastly, as discussed above, ancillary revenue has a big impact on net profit so analyzing the ancillary revenue of movies would be another helpful data point for Microsoft.
Project Workbook: https://github.com/julietday422/MovieDataProject/blob/main/ReportNotebook.ipynb
Presentation: https://github.com/julietday422/MovieDataProject/blob/main/presentation.pdf
Images (JPEG and PNG): https://github.com/julietday422/MovieDataProject/tree/main/Images
For any additional questions, please contact Juliet Day at [email protected]