In this lab, let's get some hands on practice working with data cleanup using Pandas.
You will be able to:
- Manipulate columns in DataFrames (df.rename, df.drop)
- Manipulate the index in DataFrames (df.reindex, df.drop, df.rename)
- Manipulate column datatypes
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('turnstile_180901.txt')
print(len(df))
df.head()
197625
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
C/A | UNIT | SCP | STATION | LINENAME | DIVISION | DATE | TIME | DESC | ENTRIES | EXITS | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | A002 | R051 | 02-00-00 | 59 ST | NQR456W | BMT | 08/25/2018 | 00:00:00 | REGULAR | 6736067 | 2283184 |
1 | A002 | R051 | 02-00-00 | 59 ST | NQR456W | BMT | 08/25/2018 | 04:00:00 | REGULAR | 6736087 | 2283188 |
2 | A002 | R051 | 02-00-00 | 59 ST | NQR456W | BMT | 08/25/2018 | 08:00:00 | REGULAR | 6736105 | 2283229 |
3 | A002 | R051 | 02-00-00 | 59 ST | NQR456W | BMT | 08/25/2018 | 12:00:00 | REGULAR | 6736180 | 2283314 |
4 | A002 | R051 | 02-00-00 | 59 ST | NQR456W | BMT | 08/25/2018 | 16:00:00 | REGULAR | 6736349 | 2283384 |
You will be able to:
- Understand and explain what Pandas Series and DataFrames are and how they differ from dictionaries and lists
- Create Series & DataFrames from dictionaries and lists
- Manipulate columns in DataFrames (df.rename, df.drop)
- Manipulate the index in DataFrames (df.reindex, df.drop, df.rename)
- Manipulate column datatypes
#Your code here
#Your code here
# Your code here
Create another column 'Num_Lines' that is a count of how many lines pass through a station. Then sort your dataframe by this column in descending order.
# Your code here
def clean(col_name):
cleaned = #Your code here; whatever you want to do to col_name. Hint: think back to str methods.
return cleaned
#This is a list comprehension. It applies your clean function to every item in the list.
#We then reassign that to df.columns
#You shouldn't have to change anything here.
#Your function above should work appropriately here.
df.columns = [clean(col) for col in df.columns]
#Checking the output, we can see the results.
df.columns
#Your code here
#Your code here
# Your code here