Dodo Flight School (DFS, made up name) has hired you to help them decide the location for their next expansion. DFS has provided historical weather data for various airports across the United States. The deliverable will consist of the 10 most ideal locations to be considered for expansion.
- Continental US
- Near an airport (obviously)
- Weather is most important factor
- Most flight training is done during day
- General Rule of Thumb for ideal weather:
- Visiblity of 10 miles or more
- Cloud Ceiling of 3,000 ft above ground or higher (cloud cover column in dataset)
- Winds less than 15 kts
- Cleaning data
- RegEx
- Filtering/Imputation
- Exploratory Data Analysis
- Clustering Approach
- Recommendations
Name | Source | Link in Repo |
---|---|---|
METAR | Data file is too large for GitHub, link provided to Google Drive | link |
Airports | Kaggle | link |
Population | Data.gov | link |
-
US Population
- Primary column that needed to be cleaned up was the Geolocation, which was a set of coordinates read in as a string. The column was split into latitude and logitude and then converted to floats.
-
Airports
- This data file did not require cleaning as it was already in the desired format.
-
METAR
- Many of the columns were read in with observations in the wrong column.
- Regular Expressions were used to parse out numbers from different columns (eg. Wind Speed has speed, direction, and Knots), but only the speed was necessary.
- The cloud cover column was converted to an ordinal variable with the highest values corresponding to the most desirable condition, Clear.
- Visibility had similar issues to the wind speed column (number with a corresponding letter).