Giter VIP home page Giter VIP logo

data-scientist-salary-prediction's Introduction

Data-Scientist-Salary-Prediction

Table of Content

  • Linkdin Profile
  • Project Overview
  • How will this project help?
  • Resources Used
  • Exploratory Data Analysis (EDA) and Data Cleaning
  • Feature Engineering
  • Model Building and Evaluation
  • Model Prediction

Linkdin Profile

For any queries regarding about this project contact me

Link : https://www.linkedin.com/in/anil-l-b023631b6/

Dataset Python 3.6 library

Project Overview

• Created a machine learning model that estimates salary of data scientist based on the features like rating, company_founded, etc.
• Engineered features from the text of each job description to quantify the value companies put on python, excel, tableau and sql

How will this project help?

• This project helps data scientist/analyst to negotiate their income for an existing or a new job

Resources Used

• Packages: pandas, numpy, sklearn, matplotlib, seaborn.
• Dataset by Ken Jee: https://github.com/PlayingNumbers/ds_salary_proj

Exploratory Data Analysis (EDA) and Data Cleaning

Removed unwanted columns: 'Unnamed: 0'
Plotted bargraphs and countplots for numerical and categorical features respectively for EDA
Numerical Features (Rating, Founded): Replaced NaN or -1 values with mean or meadian based on their distribution
rating rating1

Categorical Features: Replaced NaN or -1 values with 'Other'/'Unknown' category
Removed unwanted alphabet/special characters from Salary feature
Converted the Salary column into one scale i.e from (per hour, per annum, employer provided salary) to (per annum)

Feature Engineering

Creating new features from existing features e.g. job_in_headquaters from (job_location, headquarters), etc.
jih

• Trimming columns i.e. Trimming features having more than 10 categories to reduce the dimensionality
Handling ordinal and nominal categorical features
• Feature Selection using information gain (mutual_info_regression) and correlation matrix
• Feature Scaling using StandardScalar

infogain

corr1

Model Building and Evaluation

Metric: Negative Root Mean Squared Error (NRMSE)
• Multiple Linear Regression: -27.523
• Lasso Regression: -27.993
Random Forest: -17.637
• Gradient Boosting: -24.429
• Voting (Random Forest + Gradient Boosting): -19.136
Note: Evaluation scores are obtained using cross validation.

Model Prediction

prediction

data-scientist-salary-prediction's People

Contributors

anillava1999 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.