title | author | date | output |
---|---|---|---|
README.md |
David DuPre |
September 25, 2015 |
html_document |
Assignment for Course getdata-032
The purpose of this project is to demonstrate your ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis. You will be graded by your peers on a series of yes/no questions related to the project.
Requirements:
You will be required to submit:
- a tidy data set as described below,
- a link to a Github repository with your script for performing the analysis,and
- a code book that describes the variables, the data, and any transformations or work that you performed to clean up the data called CodeBook.md.
You should also include a README.md in the repo with your scripts. This repo explains how all of the scripts work and how they are connected.
library(data.table)
library(sqldf)
library(dplyr)
library(tidyr)
features <- read.table("UCI_HAR_Dataset/features.txt", quote = """,comment.char = "")
activity_labels <- read.table("UCI_HAR_Dataset/activity_labels.txt", quote = """, comment.char = "")
X_train <- read.table("UCI_HAR_Dataset/train/X_train.txt", quote = """, comment.char = "") y_train <- read.table("UCI_HAR_Dataset/train/y_train.txt", quote = """, comment.char = "") subject_train <-read.table( "UCI_HAR_Dataset/train/subject_train.txt", quote = """, comment.char = "" )
X_test <- read.table("UCI_HAR_Dataset/test/X_test.txt", quote = """, comment.char = "") y_test <- read.table("UCI_HAR_Dataset/test/y_test.txt", quote = """, comment.char = "") subject_test <- read.table( "UCI_HAR_Dataset/test/subject_test.txt", quote = """, comment.char = "")
rownumbers_train <- rep("Train", nrow(X_train)) rownumbers_test <- rep("Test",nrow(X_test))
feature_cols <- as.vector(features[,2])
sub_actvty_train <- cbind(ID = rownumbers_train,subject = subject_train,activity = y_train) sub_actvty_test <- cbind(ID = rownumbers_test,subject = subject_test,activity = y_test)
sub_act_data <- cbind(sub_actvty_train,X_train) sub_act_data_test <- cbind(sub_actvty_test,X_test)
colnames1 <- c("session","subject","activity")
colnames_train <- c(colnames1,feature_cols) colnames_test <- c(colnames1,feature_cols)
sub_act_data <- setNames(sub_act_data,colnames_train) sub_act_data_test <- setNames(sub_act_data_test,colnames_test)
train_test <- rbind(sub_act_data,sub_act_data_test)
train_test$activity <- as.character(train_test$activity)
train_test$activity <- revalue( train_test$activity,c( "1" = "WALKING","2" = "WALKING_UPSTAIRS", "3" = "WALKING_DOWNSTAIRS","4" = "SITTING", "5" = "STANDING","6" = "LAYING" ))
train_test <- train_test[,!duplicated(colnames(train_test))]
sd_first3 <- select(train_test,one_of(colnames1))
sd_mean <- select(train_test,contains("mean"))
sd_std <- select(train_test,contains("std"))
sd <- cbind(sd_first3,sd_mean,sd_std)
TidyFilename <- paste0(path,"/TIDY_UCI_HAR_Datase","",format(Sys.time(), "%Y-%m-%d%H-%M-%S"),".txt")
write.table(sd,file=TidyFilename,row.name=FALSE)