Author: Gerard Lopez
Date: 2014-11-10
Class: JHBSPH - Exploratory Data Analysis
Project: 1
This assignment uses data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets. In particular, we will be using the "Individual household electric power consumption Data Set" which I have made available on the course web site:
-
Dataset: Electric power consumption [20Mb]
-
Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
The following descriptions of the 9 variables in the dataset are taken from the UCI web site:
- Date: Date in format dd/mm/yyyy
- Time: time in format hh:mm:ss
- Global_active_power: household global minute-averaged active power (in kilowatt)
- Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
- Voltage: minute-averaged voltage (in volt)
- Global_intensity: household global minute-averaged current intensity (in ampere)
- Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
- Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
- Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
The getPowerData.R script needs to be run first. It will download the data from the website to a local copy. It will then create a data frame with data from only the dates "2/2/2007" and "1/2/2007". It will set the column data types appropriately and add a column for datetime which are the date and time columns joined together. If household_power_consumption.txt is already in the local directory then it will use that copy instead of downloading a fresh one from the website.
getPowerData.R will output a data frame called PowerData that will be used be each of the subsequent scripts to produce the plots.
plot1.R, plot2.R, plot3.R, and plot4.R all use the PowerData data frame produced in getPowerData.R to produce the .png files: plot1.png, plot2.png, plot3.png and plot4.png respectively.