Giter VIP home page Giter VIP logo

milwaukee-weather's Introduction

Automated Tufte-style weather graphs

This repository creates the weather graphs below (inspired by Edward Tufte) using R's {{ggplot2}} package. Updated data is pulled directly from NOAA's servers in CSV format. The entire process is automated using Github Actions.

This repo may be useful in three ways.

  1. replicating or adapting this graph for a different weather station
  2. learning more about data viz with ggplot2
  3. learning more about Github Actions with R

Full disclosure: I'm a novice Github Actions user. This repo reflects my best understanding of Github Actions, and I plan to update it as my skills improve.

Daily High Temperature in Milwaukee Cumulative Annual Precipitation in Milwaukee

About this data

NOAA provides daily data for weather stations in the Global Historical Climatological Network (GHCN).

Citation:

Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E. Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCN-Daily), Version 3.26. NOAA National Climatic Data Center. http://doi.org/10.7289/V5D21VHZ [February 21, 2022].

Accessing data for a different station

For every weather station in the daily GHCN, NOAA maintains a file with the station's entire daily history. Each day, they append a new record. In my observation, records are typically lagged by a few days.

Each weather station is assigned a unique indicator. The full list of station names, coordinates, and unique IDs is available here.

I use the station at Milwaukee's General Mitchell Airport, whose code is USW00014839. This station's comprehensive daily dataset is available at https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014839.csv.gz. Simply substitute the code of a different station to retrieve its data instead.

Refer to R/Retrieve_GHCN_USW00014839.R for a demonstration of downloading and processing this dataset. See NOAA's documentation for detailed descriptions of the original variable definitions.

Replicating or altering the graph

The image graphs/DailyHighTemp_USW00014839.png is created by R/BuildDailyHigh.R. See the README in /graphs for a step-by-step tutorial.

Automatic Updating with Github Actions

At a high level, the automated workflow:

(1) runs the script to retrieve the updated data

(2) commits the updated dataset to the repository

(3) runs the scripts to build the graphs

(4) commits the graph to the repository. All this takes less than 1 minute per run.

See the README in /.github/workflows for a line-by-line discussion of the workflow file.

milwaukee-weather's People

Contributors

actions-user avatar jdjohn215 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

milwaukee-weather's Issues

403 error for downloading data

I just noticed that the NOAA server is producing 403 errors when downloading the daily station data. Seems to be a problem on their end and I don't think there's much you can do. I sent an email to one of the many NOAA email addresses to report the issue.

> download.file("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz",temp)
trying URL 'https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz'
Error in download.file("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz",  : 
  cannot open URL 'https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz'
In addition: Warning message:
In download.file("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz",  :
  cannot open URL 'https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_station/USW00014837.csv.gz': HTTP status was '403 Forbidden'

Handling of leap years

When I shared a Madison version of the graph, someone pointed out that a recent record warm day appeared to be missing: December 29, with 55 degrees. I took a closer look at the data and realized that this is caused by the days being sorted by day-of-year yday(). December 29 is usually day 363, but when you filter for day 363 and sort by TMAX you find a record of 62 degrees -- on December 28, 1984. I'm not really sure what the conceptually correct way to handle leap years in the graph would be, but I see how this can lead to confusion.

One solution that works for non-leap years would be to remove any Feb 29 records from the data and then subtract 1 from day_of_year for any dates after:

 mutate(date = as.Date(paste(year, month, day, sep = "-")),
         day_of_year = case_when(
           leap_year(date) & lubridate::yday(date) == 60 ~ NA_real_,
           leap_year(date) & lubridate::yday(date) > 60 ~ yday(date) -1,
           TRUE ~ yday(date)))

I don't know what to do about Feb 29 when the current year is a leap year. I guess one would think of a record that day only compared with other Feb 29ths?

For reference, here's the Madison graph with the original code:

DailyHighTemp_USW00014839

And here the one with leap years fixed:

DailyHighTemp_USW00014839_fixed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.