Giter VIP home page Giter VIP logo

cleanup_script's Introduction

Data cleaning scripts

Hurricane Best Track Data (HURDAT2)

The publicly available dataset is not properly formatted for importing into db or further analysis. In order to clean up the data for analysis, I did the following:

  • Added headers based on data descriptions for Atlantic and Pacific datasets (major difference seem to be in the possible values for basins, record identifiers, and hurricane status)
  • Normalize by adding in header row information (alternatively, this could also be moved to different table)
  • Convert date and time fields to ISO8601 format, which is easier for most db and software to interpret
  • Added helper tables that link basins, record identifiers, and statuses to their meanings
  • Normalize longitude/latitude to numbers rather than using E/W, N/S suffixes
  • Mark maximum_wind_knots as null -99
  • Mark -999 wind and pressure values as null so it doesn't interfere with calculations
  • Give option to combine Atlantic and Pacific datasets (Though it it worth noting that the Atlantic dataset has data form 1851, and the Pacific one only has data from 1949)

The CSV and SQLite version of the data

To run

python3 hurricane/process_hurricane.py -h: See help

python3 hurricane/process_hurricane.py: Create new hurricane.db SQLite file in main

python3 hurricane/process_hurricane.py <<path_to_sqlite_file>>: Add all data to SQLite file

python3 hurricane/process_hurricane.py -b atlantic <<path_to_sqlite_file>>: Add atlantic hurricane data to SQLite file

python3 hurricane/process_hurricane.py -c <<path_to_csv_file>>: Add all data to csv file

cleanup_script's People

Contributors

cssherry avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.