CS5228 Final Project.
Predict whether the income exceeds 50K.
- Age: divided into 17-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, encoded with simple continuous number 1-8.
- Work class: divided into four part: private, government, self, other. One-hot encoding.
- Fnlwgt: abandon
- Education: abandon, because itβs related to education_num.
- Education num: use directly.
- Marital status: divided into five class: married, never-married, divorced, separated, widowed. One-hot encoding.
- Occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. One-hot encoding.
- Relationship: simply divided into 6 class and one-hot encoding.
- Sex: simply label encoding into 1 and 2.
- Capital-gain: Normalize.
- Capital-loss: Normalize.
- Hours-per-week: divided into 3 parts: x<35, 35<=x<=45 and x>45. Label encoding.
- Native country:abandon, due to 90% are American.
- AdaBoost
- Linear Regression
- Decision Tree
- SVC
- MLP
- Bagging with KNN base model
- Random Forest