This repository contains the code I used to participate to the "Hello World" Kaggle competition about the titanic dataset. https://www.kaggle.com/c/titanic
I chose to start with R.
As advised by Kaggle, I first went through the DataCamp tutorial to ML. https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic It introduces:
- decision trees with rPart
- feature engineering
- overfitting
- random forests
The simple decision tree with feature engineering was my best entry so far.
I was looking to improve my score. I went through this tutorial https://github.com/wehrley/wehrley.github.io/blob/master/SOUPTONUTS.md It introduces:
- advances feature engineering
- logistic regression
- adaptative boosting
- random forest
- support vector machines
At that point I learnt what I was looking for. But still I wanted to see what the next step was. this script shows the difference between rForest and cForest: https://www.kaggle.com/uioreanu/titanic/randomforest-cforest-method