We’ll use the Boston data set from the sklearn.datasets for predicting the median house value (mdev), in Boston Suburbs, using different predictor variables. Steps:
- Step 1: Create a new python project and import/ load the dataset.
- Step 2: Run different statistical inspection and preprocess on the dataset; plot Correlation Heatmap
- Step 3: Split the data into training and testing datasets
- Step 4: Create the regression tree, extract important features
- Step 5: Evaluate model; find the best “complexity parameter” value that minimize the prediction error RMSE (root mean squared error).
- Step 6: Use the model to predict a value
Using tools: python, Sklearn, matplotlib, graphviz, seaborn, numpy