Giter VIP home page Giter VIP logo

sadihsn / imdb_movie_insights Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 5.06 MB

Scraped data from an IMDB movie list to find out insights from it through visualizations.

Home Page: https://public.tableau.com/views/IMDBMoviesInsights/Dashboard1?:language=en-GB&:display_count=n&:origin=viz_share_link

License: MIT License

Jupyter Notebook 93.76% Python 6.24%
data-visualization python selenium-webdriver tableau

imdb_movie_insights's Introduction

IMDB_Movie_Insights

Problem Statement

The goal of this project is to get insights from a movie list in IMDB website.Link.
The scraped data was from a movie list created by andreea_nastasa based on his/her personal preferences.I have explored a large set of data to learn important things about movies because I'm really interested in the whole movie scene.

Initial scraped data contains 4900 records.Used google sheets and python to clean the messy dataset and handling missing values,after cleaning the data set contains columns such as 'movie_name','Year','Age','Duration','Rating','Gross Profit',and 'Votes' with 1965 rows.

Later we utilized the scraped data to understand the following movie data and correlations using Tableau Dashboard:

Dashboard 1

  1. Average Movie Ratings Throughout The Years (Line chart):
    Illustrates the trend in average movie ratings over different years using a line chart.

2. Movie Duration Distribution (Histogram):
Displays the frequency distribution of movie durations, showing the range and frequency of different durations.

3. Average Movie Duration Throughout The Years (Line chart):
Shows how the average duration of movies has changed over the years using a line chart.

4. Relationship Between Ratings and Gross Profit (Scatter Plot):
Visualizes the correlation between movie ratings and gross profit, indicating any discernible patterns or trends using a scatter plot.

5. Movie Count by Year.(Bar Chart):
Represents the number of movies released each year, offering a bar chart overview of movie count trends over time.

6. Movie Target Age Group (Child < 13 < Teen/Adult) (Text table):
Summarizes movies into target age groups and displays the count of movies for each group using a text table.

Dashboard 2

  1. Movie Rating Distribution (Histogram):
    Illustrates the distribution of movie ratings, showcasing the frequency of different rating ranges using a histogram.
  2. Votes Distribution Top 20 Movies (Packed Bubbles):
    Visualizes the distribution of votes for the top 20 movies, with each bubble representing a movie and its size indicating the number of votes received.

You can visit the public dashboard here.

Findings and Observations from the Dashboard

  1. Average Movie Ratings: Over the years, the line graph illustrates a fluctuating trend, starting with a significant decline, followed by intermittent rises, and reaching a peak in 2020, primarily within the range of 5 to 8.
  2. Movie Duration Distribution: Illustrates that the majority of movies in the dataset have durations ranging from 90 to 110 minutes.
  3. Average Movie Duration Throughout The Years:Starts going up first, then mostly goes down over the years, with two noticeable increases in between. Movie durations are usually more than 80 minutes and often go beyond 100 minutes.
  4. Relationship Between Ratings and Gross Profit: Indicates that high ratings don't consistently translate to high profits, with only a few movies surpassing a gross profit of 500 million.
  5. Movie Count by Year: Highlights that the dataset is concentrated in the years 2000 to 2020, suggesting a prevalence of movies from this time frame.
  6. Movie Target Age Group: Reveals that the majority of movies in the dataset are categorized as suitable for a Teen/Adult audience.
  7. Movie Rating Distribution: Displays a concentration of ratings between 6 and 7, suggesting a central tendency in the movie rating distribution.
  8. Votes Distribution Top 20 Movies: Despite the dataset's size, the visualization distinctly portrays the highest votes distribution among the top 20 movies, offering a clear representation.

Build From Sources and Run the Selenium Scraper

  1. Clone the repo
git clone https://github.com/sadihsn97/IMDB_Movie_Insights.git
  1. Intialize and activate virtual environment
virtualenv --no-site-packages  venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download Chrome WebDrive from https://chromedriver.chromium.org/downloads
  2. Run the scraper
python web_scraping_IMDB/scraper.py --chromedriver_path <path_to_chromedriver>
  1. To generate a csv file you can use
     df.to_csv("IMDB_Movies_Insights.csv", index=False)
    

imdb_movie_insights's People

Contributors

sadihsn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.