Giter VIP home page Giter VIP logo

job_analysis_luke's Introduction

LinkedIn Job Posting Scraper

A collection of Jupyter Notebooks that:
    1) LinkedIn.ipynb - Scrape Job Postings from LinkedIn
    2) Job_Analysis.ipynb - Analyze scraped data

โœจ Background

I was looking to better understand what skills where being requested of entry-level data analysts for the subscribers of my YouTube channel. I felt the best place to start was LinkedIn job postings, so this is my start at this project.

Check out this video for more

๐Ÿ›‘ Disclaimer

NOTICE: The use of robots or other automated means to access LinkedIn without the express permission of LinkedIn is STRICTLY PROHIBITED.
More details here

IMPORTANT NOTE: LinkedIn will BLOCK you from searching if you are scraping too much data and/or you don't have permission.

๐Ÿ Overview

๐Ÿค– LinkedIn.ipynb - Job Scraper

Overview: This script scrapes LinkedIn job data. Using a selenium web driver for chrome it launches a headless browser and then scrapes all the relevant job details.

NOTE: LinkedIn only allows you to view 40 pages of a particular search term. Because of this you can only scrape 1000 jobs per search term

To begin

Prerequisites: Python installed and environment established with packages from requirements.txt installed.

  1. Download your appropriate chromedriver and save it to this repository.

  2. Create a new file called .env with your login credentials, also saved to this repository.

[email protected]
LINKEDIN_PASSWORD=password
  1. Adjust your search criteria for what you want to search for in the .ipynb file
# Accepts a list of search keywords to analyze for
search_keywords = ['Data Analyst', 'Data Scientist', 'Data Engineer']

# Accepts one location.. if spaces in name use '%20'
search_location = "United%20States"

# only searches remote positions currently... need to update code for this to search non-remote
search_remote = "true" # filter for remote positions

# this is code to search for past 24 hours, you would have to look at the url to investigate other search periods
search_posted = "r86400" # filter for past 24 hours
  1. Run "All Cells" on .ipynb
    a) In the log directory, a .log file is created that capture the progress of the data scraping and reports any errors
    b) in the output directory, a .csv fils is created for this date.
    NOTE: Script deletes any .csv files that have the same date, so as written you can only run this script once per day.

๐Ÿ“Š Job_Analysis.ipynb - CSV Compiler and Analyzer

Overview: This script analyzes your csv files in the output directory

Prerequisites: Have at least one .csv file in the output folder to analyze.

  1. Modify code to your liking
  2. Run "All Cells" on this .ipynb

job_analysis_luke's People

Contributors

lukebarousse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.