Since I bought a laptop, I wanted to check if I got scammed or not by applying my newly aquired machine learning knowledge. Given the specs of my laptop, how much should it have cost?
A Kaggle dataset about the characteristics and price for 1300 laptop models.
- Data preperation
- Building the model
- Evaluation
Target column: Price_euros
Feature columns:
- Drop columns that contain NA-Values
- Columns with categorical data, cardinality < 10
- Columns with numerical data
My laptop data: For the given columns, I looked up the specs of my laptop.
The tests have been done with a XGBRegressor
-model since that is the most accurate one I learned so far.
- Create a scoring method
get_score
to evaluate quality of the model based on Mean Average Error (MAE) - Run a loop that determines the best parameters (
n_estimators
,learning_rate
) for this model.
XGBRegressor
n-estimators = 200
learning_rate = 0.05
- Mean Average Error: 285.74 €
The model estimates the price of my laptop to be 1727.37€, but I paid barely 800€. This could indicate that the model is bad.
-
Better Data Cleaning:
- Analyse if dataset actually contains reasonable prices.
- Analyse which columns that might cause overfitting.
- Include columns with NA-values.
- Learn from other peoples Work.
-
Better Model Creation:
- Learn about more models and how they work.
- Find a better fitting model.
- Adjust paramters for lower Mean Average Error.