Giter VIP home page Giter VIP logo

data-pre-processing-bootcamp-machine-learning-'s Introduction

Data Pre Processing Machine Learning Bootcamp

Code Repository for the Bootcamp conducted on 22nd August, 2021 with Uttam Grade (Data Scientist, McKinsey & Company)

Agenda for the Boootcamp

  • Introduction to Python
  • An overview into Numpy & Pandas
  • Data Visualisation
  • Exploratory Data Analysis

Problem Statement

Given a data set which captures gross salary from July 1, 2013 through June 30, 2014 and includes only those employees who were employed on June 30, 2014 Predict the Salaries for Employees in Blatimore. The Dataset used in this repository is Baltimore City Employee Salaries FY2014 and can be downloaded from this link.

Data Cleaning and Data Preparation

Cleaning & preparation measure applied to the dataset are listed below.

  • Remove leading and trailing edges
  • Check Null Values in data set
  • Remove rows having empty hire date
  • Drop Gross Pay column
  • Remove $ from Annual Salary and converting it into Integer format
  • Trim spaces

After all these transformations the dataframe shall appear in the format given below.

Exploratory Data Analysis

Countplot

Histogram

Box plot for annual salary

Annual Salary Distribution Plot

Top 10 Jobs that based on hirings

Top 10 Jobs that fetch the highest Salary

Top 10 Agencies that has highest number of employees

Top 10 Jobs that has highest number of employees

Average salaries of employees based on Hire Month

Hiring with years

Pair Plot

Heat Map

Feature Engineering

  • Apply mean encoding for Job Title
  • Apply mean encoding for Agency
  • Apply mean encoding for AgencyID

Test train split

  • Divide tarin set into Dependent and independent variables
  • Divide test set into Dependent and independent variables
  • Scale the train, test

Scaling

There are two types of scaling

  • Standard Scaling
  • MinMax Scaling

Model Evalution

We have used Linear Regression model.

Distribution plot of Residuals

Scatter plot of Residuals

data-pre-processing-bootcamp-machine-learning-'s People

Contributors

vyuwing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.