Giter VIP home page Giter VIP logo

regression-on-student-performance-data's Introduction

Regression on Student Performance Data

Data source: Universit of California, Irvine, Machine Learning respository

Performance Summary:

Model Bootstrapped MSE Root MSE RMSE as % of Response Range
Multiple Linear Regression 7.121684 2.6686 14.05
Ridge Regression 7.251852 2.6929 14.17
LASSO Regression 7.03579 2.6525 13.96
Principal Component Regression 7.066262 2.6582 13.99
Partial Least Squares 6.87703 2.6224 13.80

Interpretation:

Multiple linear regression provided the most subset-selection, with only 8 predictors selected. It seemed to agree mostly with the only other subset-selection method, LASSO regression. All predictors selected by cross-validation on regsubsets have a coefficient of at least 0.13 in absolute value in LASSO regression. Both methods placed their highest coefficients on predictors such as school, failures, schoolsup and higher.

Multiple linear regression performed relatively well with resampled data. Comparing LASSO and ridge regression, ridge regression seemed to perform much better than LASSO with the test data. Note that the test data here is obtained simply through selecting a random half of the original data. Ridge regression's better performance may be attributed to the fact that it did not perform subset selection, and therefore had lower bias. But when compared in bootstrapped RMSE, the LASSO performed better. This could be an indication that ridge regression overfitted the dataset.

Similar results came from PCR and PLS. PLS provided significanlty more dimension reduction than PCR, and in turn peformance worse on the (non-boostrapped) test data, but performed better on the boostrapped test data. This suggests that having 15 components in PCR may have led to overfitting and excessive variance.

The LASSO regression eliminated 4 predictors (Pstatus, Fjob, traveltime, famsup), and came very close to elminating 5 other predictors (Mjob, guardian, reason, nursery, goout). Given 30 predictors and only ~600 observations, it is very difficult to determine which predictors truly affect the response and which coefficients are correctly assigned. From the models and using common sense, we can likely agree that predictors such as the ones eliminated by LASSO are just noise. And predictors such as school, failures, schoolsup, higher, and studytime are more strongly correlated with student performance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.