coursera_cleaning_data_project's Introduction

Tidy_Data_Coursera

The analysis file performs the following order of operations:

Read in the training data sets (X_train, y_train and subjects_train)
Read in the test data sets (X_test, y_test and subjects_train)
Combine the training and test data sets (data, activity_labels and subjects)
Add the subject ID's (subjects) and activty labels (activity_labels) to the merged data
Extract the proper feature names from the features.txt file and assign these names to the columns of merged data file
Create a logic variable that equals True if the feature name contains either mean() or std()
USe this logic variable to select only those rows of the data table that contain mean() and std() variables
Replace the activity ID numbers by substituting them with their corresonding activity name (from the activity_labels.txt file)
Clean up the column names of the data table, replacing all instances of '-' with '.' and all instances of '()' with '' (empty space
Order the data table according to Subject IDs (ascending order)
Split the data in a list accoring to the Subject ID and activity label
Calculate the mean value for all features for each combination of Subject ID and activity label
Reconstruct a table by rowbinding the vectors of the average values
Recreate the SubjectId and Activity columns and add them to the table of average values
Move the Subject ID and Activity columns to the front of the data table
Save the final tidy data table to a txt file (tidy_data.txt)
???
PROFIT!

Recommend Projects