This repository contains the project developed for the courses Algorithms for massive dataset and Statistical methods for machine learning. The objective is to implement a Ridge Regression model from scratch in order to predict the person income in the Kaggle dataset 2013 American Community Survey.
The implementation of the machine learning model is based on the Apache Hadoop framework, focusing on the scalability of the implementation across multiple clusters.
File project.ipynb
contains the implemented model, while report.pdf
briefly describes the implementation details.