Giter VIP home page Giter VIP logo

example-of-linear-regression-in-r's Introduction

Example of linear regression in R

#Set your working directory path to whatever folder you like. Use double backslash.
sewd("C:\\Users\\Rebecca Merrett\\Documents\\r-workspace\\RebeccaSpace")
getwd()

data(airquality)
head(airquality)

#Check for missing values and see if need to treat them accordingly. Also if there are categorical type variables (including age groups), need to set as factors.
#I noticed I could not get Rsquared on testset when included missing values.
airquality <- na.exclude(airquality)
#na.exclude leaves out rows with missing values, but keeps track of where they were so that when making predictions a vector will be the same length of the actual response.

#Use scatterplot matrix to find correlations between variables. Ozone on y axis has strong positive correlation with Temp on x axis.
pairs(airquality)

#lm plot of predictor along x horizontal axis and response (what want to predict) along y vertical axis.
#The regression line will go with the best fit to the data, with grey band around it meaning uncertainty in fit.
#This is to first see if data is in any way suitable to be fitted in linear form.
require(ggplot2)
ggplot(airquality, aes(x=Temp, y=Ozone)) + geom_point() + geom_smooth(method="lm") + labs(x="Temperature", y="Ozone")

#Split data into train (70%) and test (30%), and build the model on the training set.
airqualitySplit <- sort(sample(nrow(airquality),nrow(airquality)*.7))
airqualityTrain <- airquality[airqualitySplit,]
airqualityTest <- airquality[-airqualitySplit,]
OzoneModel <- lm(formula = Ozone ~ Temp, data = airqualityTrain)
summary(OzoneModel)

#Look at the residuals (difference between actual and predicted values) and plot them, which will produce 4 plots.
#One is Normal Q-Q plot to diagnose if residuals are far off the normal distribution line or not, lack of linearity showing it’s not good fit.
plot(OzoneModel)
#Might want to do an ANOVA/F-test to see if adding more predictors makes a difference or not, looking at lm model with say all predictors.
#This 'anova(Model1, Model2)' comparison looks at p-value against significance 0.05 to accept or reject null hypothesis of a noticeable difference or not.

#Now test the model against the test set, and minus the first column or response (Ozone).
ModelPredictions <- predict(OzoneModel, newdata=airqualityTest[ ,-1],type="response", se.fit=TRUE)
head(ModelPredictions$fit)
head(airqualityTest$Ozone)
actual <- airqualityTest$Ozone
Rsquared <- 1-sum((actual-ModelPredictions$fit)^2)/sum((actual-mean(actual))^2)
Rsquared

example-of-linear-regression-in-r's People

Contributors

rebeccamerrett avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.