Many real-world data sets contain strings, integers, time-stamps and unstructured data. How do you store data like this so that you can manipulate it and easily retrieve important information? The answer is in a pandas DataFrame!
- In Class Instruction: 4 Hours
- In Class code along Dataset: Weather Dataset
- Project Dataset: Indian Premier League
- Estimated Time to complete Project Tasks: 1 Hours
- Total sub tasks within the Project: 6
- Complexity of sub tasks : Mid to High
- Points to be scored : 700
- Why should you care about this project: This project challenges you to manipulate large datasets without using conventional programming techniques to extract business insights.
- Skills Rehearsed
- Python
- Instructor led concept onboarding
- Code Alongs
- In Class Quiz Administration
- Periodic Recap - Closer to the end of session
- In Class Assignments - Motivation
- Take Away Assignments
-
You will become acquainted with the powertool of pandas - the DataFrame. You will learn how to use pandas to import and then inspect a variety of datasets.
-
Having learned the fundamentals of working with DataFrames, you will now move on to more advanced indexing techniques. These are powerful techniques that allow you to tidy and rearrange your data into the format that allows you to most easily analyze it for insights.
After this lesson, you'll be able to
- Understand the need for Pandas in Data Science
- Data Manipulation and Transformations
- Pivot Tables and Group By
- Merging Data
Check the Jupyter Notebook in the top right of the screen
- Practice Pandas on Data Camp
- Anti-Patterns in Python Programming
- There are a ton of public notebooks - just have fun exploring the IPython Notebook Viewer
In IPL teams representing Indian cities contend each year. Chris Gayle is the highest run scorer in IPL. Do you know who is the second highest run scorer (without using βforβ loop)? This module can help you determine the second highest run scorer by manipulating large data sets to extract business insights.
This project challenges you to manipulate large datasets without using conventional programming techniques to extract business insights.