In this project, you will analyze a dataset and then communicate your findings about it. You will use the Python libraries NumPy, pandas, and Matplotlib to make your analysis easier.
pandas
NumPy
Matplotlib
csv
pip install (name libraries)
In this project, you'll go through the data analysis process and see how everything fits together. Later Nanodegree projects will focus on individual pieces of the data analysis process.
You'll use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!
-
Know all the steps involved in a typical data analysis process
-
Be comfortable posing questions that can be answered with a given dataset and then answering those questions
-
Know how to investigate problems in a dataset and wrangle the data into a format you can use
-
Have experience communicating the results of your analysis
-
Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code
-
Be familiar with pandas' Series and DataFrame objects, which let you access your data more conveniently
-
Know how to use Matplotlib to produce plots showing your findings
Click this link dataset to open a document with links and information about data sets that you can investigate for this project. You must choose one of these datasets to complete the project.
link TMDb_Movies dataset click
- drop duplicated
df.drop_duplicates(inplace= True)
- fill non value with mean
df.dropna(inplace=True)
- fix data format
Brainstorm some questions you could answer using the data set you chose, then start answering those questions. You can find some questions in the data set options to help you get started.
Try and suggest questions that promote looking at relationships between multiple variables. You should aim to analyze at least one dependent variable and three independent variables in your investigation. Make sure you use NumPy and pandas where they are appropriate!