We will create a Random Forest pipeline and use it to predict car prices in PySpark.
We will accomplish it by completing each task in the project:
- Task 1 - Install Spark on Google Colab and load a dataset in PySpark
- Task 2 - Describe and clean your dataset
- Task 3 - Create a Random Forest pipeline to predict car prices
- Task 4 - Create a cross validator for hyperparameter tuning
- Task 5 - Train your model and predict test set car prices
- Task 6 - Evaluate your model’s performance via several metrics