Giter VIP home page Giter VIP logo

lopez-christian / imdb-web-scraping-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 16.16 MB

This repository includes my IMDb Web Scraping-Flatiron School Module 1 Project. In this project, I incorporated data scraping, data wrangling, data cleaning, and visualization techniques.

Home Page: https://lopez-christian.github.io/2020-02-15-first-post/

Jupyter Notebook 100.00%
action-film action-genre film-industry votes data-scraping python jupyter-notebook imdb-dataset kaggle imdb

imdb-web-scraping-project's Introduction

IMDb Web Scraping

Data Scraping the TOP 250 films of all-time in IMDb

Sorted by number of votes in descending order

This project incorporates some data scraping, data wrangling, data cleaning, and visualizations.

imdb

Example of dataframe:

Screen Shot 2020-02-17 at 3 13 23 PM

Purpose of project:

The purpose of the project was to explore the data within IMDb's TOP 250 films of all-time and determine if there were any meaninful conlcusions I could derive from such. I was looking for any correlations between the features I scraped that would eventually be used to drive my visualizations.

Questions/Answers:

What genre appears to be the most promising for the company that wants to break-into the film-making industry?

Screen Shot 2020-02-17 at 4 31 54 PM

The action genre seems to be the most promising in terms of breaking into film industry. The spread represents many more data points then do the other genres. Although action doesn't have the films with the highest metascores, they are represented much more amongst the films in the TOP 250 IMDb movies of all-time. If a company decided to go ahead and produce an action-packed film, their chances of making it into the TOP 250 would be much greater than if they were to go with any other genre.

Sidenote: If a company decided to go ahead and make a horror film or western film, they would have to execute an exceptional film with an extremely high metascore in order to have any remote chance of making it into the TOP 250.

What directors and actors fare best in said genre? And how do they compare amongst each other?

Screen Shot 2020-02-17 at 5 06 09 PM

The director that seems to garner the most votes in the action genre appears to be Christopher Nolan when looking at the TOP 20 action films. The Dark Knight is his most successful film with almost double the votes compared to his other films. James Cameron comes in second with Avatar amassing around 1.08 million votes. It'd be smart for the company to hedge their bets and hire Christopher Nolan as their director if they aspire to break in.

Screen Shot 2020-02-17 at 4 59 35 PM

When looking at the top actors for the TOP 20 action films, we see three names stick out above the rest. They are: Christian Bale, Mark Hamill, and Robert Downey Jr. With counts of 3, 3, and 2, respectively. If the company had to go with one of these stars, I would definitely recommend they go with Christian Bale. And you may be asking yourself, why not Mark Hamill, since he is also in the TOP 20 action films? ... Hopefully, the scatterplot below illustrates my point a little better.

Screen Shot 2020-02-17 at 5 15 11 PM

As you may have become aware by looking at the scatterplot above, Christian Bale's appearances in the Batman Begins, The Dark Knight Rises, and the Dark Knight all have more votes than do the films where Mark Hamill is in. Those being Star Wars: Episode IV- A New Hope, Star Wars: Episode V- The Empire Strikes Back, and Star Wars: Episode VI- Return of the Jedi. The film with the least votes for Christian Bale has more votes than the movie starring Mark Hamill with the most votes. Also, Christian Bale's films are all directed by Christopher Nolan, whereas Mark Hamill's films are directed by three different directors. Without a doubt, the Christopher Nolan-Christian Bale duo would be a banger in the industry and would definitely disrupt the space in a positive way. The company's venture would certainly reap the rewards.

Are there any meaningful correlations/relationships to look at that might provide more insight into the film-making industry?

Screen Shot 2020-02-17 at 5 34 57 PM

This is an interesting correlational heatmap that may prove valuable in future work.

Screen Shot 2020-02-17 at 5 41 18 PM

An interesting exploration and subsequent insight that was inspired by the correlational heatmap above was the notion to look at the correlation between times and gross in terms of genre. What can be derived from this is that the films in the action genre tend to have higher gross as the timespan of the film increases. What this tells the company is that when making their action film, they should focus on making it on the longer side.

Key takeways:

1. Stick with the action genre.

2. Try your best to hire Christopher Nolan as the director for your film.

3. Christian Bale is by far your safest bet at a quality action-packed film.

4. Make the action film longer. The longer the timespan the better it does in the box-office in terms of gross.

imdb-web-scraping-project's People

Contributors

lopez-christian avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.