Giter VIP home page Giter VIP logo

gettingandcleaningdata's Introduction

Getting-and-cleaning-data-project

This is a repo for the course project in "Getting and cleaning data" coursera course

The script "run_analysis.R" takes the data from the folder "UCI HAR Dataset", that has to be located in R working directory, and manipulates data to create 2 tidy data sets.

Part 1 creates manipulates the raw data to create a tidy data set for the questions 1-4 in the assignement. Part 2 creates a tidy data set for the question 5.

The file "codebook.txt" is a modified version of the initial codebook, provided by the authors of the dataset.

In the beginning of it, I describe the variables added by me and their meaning.

Than there is a text from authors, describing the process of signal processing and features selection, modified to describe only the parts used in our final tidy data.

#Part1:

##Creating a first tidy data set for questions 1-4

Step 1. Getting train and test data into R. I loaded the files in the UCI HAR directory and the first level of subdirectories, ignoring the "Inertial Signals" "rawest" data.

Step 2. Grabbing appropriate column names for both sets

Step 3. Getting activity lavels (i.e. numbers 1,2,3,4,5 or 6), corresponding to performed activity, and adding as extra column in both train and test data

Step 4. Getting id's of subjects, who performed each activity and adding them as an extra column of the data:

Step 5. Last step before merging train and test data - I want to keep the information about whether the subject performed in the test data set or in the train data set, so I am creating an extra column called "dataset"

Step 6. Merging the train and test data

Step 7. Getting activity names for each activity label and merging them with our dataset, so that we have descriptive activity names instead of labels like 1, 2, 3, 4, 5 or 6

Step 8. To find out which columns correspond to "mean" or "standard deviation" values, we look for "std" and "mean" in text of column names (this corresponds to the 2nd step in the assignement), and select out only those columns.

Step9. Saving the data as a first tidy data set

#Part2.

##Creating a second tidy data set for the question #5.

To do this, we simply need to group our data by activityName and subject, and then apply mean() to each column

For that we will use {dplyr}'s summarize_each function, which allows us to apply a function to each column without explicitly telling it. Variables used for grouping (subject and activityName) are excluded automatically, which leaves us only the "dataset" variables, which we need to exclude ourselves

gettingandcleaningdata's People

Contributors

adensur avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.