Giter VIP home page Giter VIP logo

gettingandcleaningdata-peer-graded-assignment-course-project1's Introduction

Getting and Cleaning Data Course Project

Project Summary

The purpose of this project is to demonstrate collecting, manipulating, and cleaning a data set. Utilizing data collected from the accelerometers within the Samsung Galaxy S smartphone found here (http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones) I merged data sets, extracted specific subsets, performed generally cleaning, executed some calculations, and exported a specific extraction to be used for future analysis.

Repository Contents

The [cjbach1/Getting and Cleaning Data Course Project](https://github.com/cjbach1/Getting-and-Cleaning-Data-Course-Project) repository contains the following files:
File NameDescription
README.mdDocumentation explaining the project and how to use files contained in the repository.
CodeBook.mdCodebook describing the tidydataoutput.txt file layout.
run_analysis.RR script to download, merge, extract, clean, and subset the datasets. See process section below for further details.
tidydataoutput.txt Final data extraction for future analysis.

The Process

I created the script "run_analysis.R" which does the following:
  1. It downloads and unzips the original data sets, loads the necessary libraries (library(dplyr) and library(data.table)) and reads the data into R. There are two distinct directories of test and training data, to which three text files are provided in each relating to the the experiment subject (read in as testsubject or trainsubject objects), activity (read in as testy or trainy), and features (read in as testx or trainx) are provided.
  2. It joins like data sets (ex testx with testy and testsubject with trainsubject) to create 3 objects with the test and train populations combined. colnames() is used to clean and %>% relocate(Subject) %>% is used to arrange all three objects which are then joined into a single data set via cbind() to create the totaldata object.
  3. It creates a subset of data by extracting only mean and standard deviation measurements now stored in the object extractedtotaldata. I utilized grep("mean\(\)|std\(\)", names(totaldata), ignore.case = TRUE) to extract the substring of any column names (features) that contained mean and standard deviation measurements.
  4. It cleanes up variable names to be more/better descriptive utilizing the gsub() function.
  5. It creates a second, independent tidy data set with the average of each variable for each activity and each subject. I utilized the following code to extract and then order the new subset: aggregate(. ~Subject + Activity, extractedtotaldata, mean) and tidydata <- tidydata[order(tidydata$Subject,tidydata$Activity),].

gettingandcleaningdata-peer-graded-assignment-course-project1's People

Contributors

cjbach1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.