Materials for the March 12, 2017 ENAR short course on data science for statisticians. Amelia McNamara, Smith College 8 am - 5 pm
Abstract: As statistics becomes more computational and the term ‘data science’ is gaining traction, it is clear there are skills statisticians need to stay current. This short course will get you up to speed on many of the recent developments in this field. While there are many possible tools to use for data science, we will focus on the R ecosystem. Topics covered will include:
- Data visualization (ggplot2, see Introduction folder)
- The tidyverse (dplyr, broom, tidyr, see Introduction folder)
- APIs and web scraping (rvest, httr, see Text folder)
- Version control (git and GitHub, see Reproducibility folder)
- Reproducible research (RMarkdown and Project TIER, see Reproducibility folder)
- Finding help (StackOverflow, google, twitter, see list of books and more)
- Interactivity (shiny and leaflet, see Shiny and Geo folders)
Participants should bring their own laptop with R and RStudio installed on it. Once you have installed both R and RStudio, open RStudio and paste the following code into the Console window and hit the Enter/Return key.
install.packages(c("tidyverse", "broom", "stringr", "RCurl", "tidytext", "httr", "rvest", "curl", "devtools", "rmarkdown", "knitr", "shiny", "RColorBrewer", "leaflet", "rgdal", "maptools", "GGally", "network", "sna", "intergraph", "networkD3", "lubridate", "scales", "mosaic"))
Many materials for this workshop are modified from other existing resources. Some credits:
- NICAR16 Intro to R workshop, with Coulter Jones
- OpenIntro labs, many contributors including Mine Cetinkaya-Rundel, Andrew Bray, Ben Baumer, and Albert Kim.
- Summer DataViz workshop for MassMutual, with Jordan Crouser