Getting and Cleaning Data Course Project

Project Summary
Repository Contents
The Process

Project Summary

The purpose of this project is to demonstrate collecting, manipulating, and cleaning a data set. Utilizing data collected from the accelerometers within the Samsung Galaxy S smartphone found here (http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones) I merged data sets, extracted specific subsets, performed generally cleaning, executed some calculations, and exported a specific extraction to be used for future analysis.

Repository Contents

The [cjbach1/Getting and Cleaning Data Course Project](https://github.com/cjbach1/Getting-and-Cleaning-Data-Course-Project) repository contains the following files:

File Name	Description
README.md	Documentation explaining the project and how to use files contained in the repository.
CodeBook.md	Codebook describing the tidydataoutput.txt file layout.
run_analysis.R	R script to download, merge, extract, clean, and subset the datasets. See process section below for further details.
tidydataoutput.txt	Final data extraction for future analysis.

The Process

I created the script "run_analysis.R" which does the following:

It downloads and unzips the original data sets, loads the necessary libraries (library(dplyr) and library(data.table)) and reads the data into R. There are two distinct directories of test and training data, to which three text files are provided in each relating to the the experiment subject (read in as testsubject or trainsubject objects), activity (read in as testy or trainy), and features (read in as testx or trainx) are provided.
It joins like data sets (ex testx with testy and testsubject with trainsubject) to create 3 objects with the test and train populations combined. colnames() is used to clean and %>% relocate(Subject) %>% is used to arrange all three objects which are then joined into a single data set via cbind() to create the totaldata object.
It creates a subset of data by extracting only mean and standard deviation measurements now stored in the object extractedtotaldata. I utilized grep("mean|std", names(totaldata), ignore.case = TRUE) to extract the substring of any column names (features) that contained mean and standard deviation measurements.
It cleanes up variable names to be more/better descriptive utilizing the gsub() function.
It creates a second, independent tidy data set with the average of each variable for each activity and each subject. I utilized the following code to extract and then order the new subset: aggregate(. ~Subject + Activity, extractedtotaldata, mean) and tidydata <- tidydata[order(tidydata$Subject,tidydata$Activity),].

beatriz-gutierrez / gettingandcleaningdata-peer-graded-assignment-course-project1 Goto Github PK

gettingandcleaningdata-peer-graded-assignment-course-project1's Introduction

Getting and Cleaning Data Course Project

Project Summary

Repository Contents

The Process

gettingandcleaningdata-peer-graded-assignment-course-project1's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent