This was an individual project done for a course requirement of AMS 315: Data Analysis. This project was spilt into two parts -- project 1 and project 2. Project 1 was sub-divided into parts A and B. You can find the R code to Project 1 Part A, Project 1 Part B, and Project 2.
Datasets used for Project 1 Part A are P1A_DV69570.csv and P1A_IV69570.csv. For Project 1 Part B, the dataset was P1B69570.csv. FOr Project 2, the dataset was P2_69570.csv. All these datasets were synthetically made by our professor. The data is actually a a replica of a real Human Behavior Data Analysis Project done by Caspi et al. (Link to actual research: https://doi.org/10.1126/science.1083968)
Here are the research questions for each part: (1A) Single variable linear regression model development (1B) Finding the best regression model using various transformations of IV, DV, or both. (2) Finding the best transformed model of the dependent variable Y on 4 environmental variables (E1โE4) and 20 genetic variables (G1โG20) using stepwise regression.