machine-learning-pipelines-in-pyspark's Introduction

Machine Learning Pipelines in PySpark MLlib

We will create a Random Forest pipeline and use it to predict car prices in PySpark.

We will accomplish it by completing each task in the project:

Task 1 - Install Spark on Google Colab and load a dataset in PySpark
Task 2 - Describe and clean your dataset
Task 3 - Create a Random Forest pipeline to predict car prices
Task 4 - Create a cross validator for hyperparameter tuning
Task 5 - Train your model and predict test set car prices
Task 6 - Evaluate your model’s performance via several metrics

https://colab.research.google.com/gist/ruslanmv/28f55c9ab66dcbf80368df58bec41483/random-forest-with-pyspark.ipynb

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.