Giter VIP home page Giter VIP logo

jobspy's Introduction

JobSpy is a simple, yet comprehensive, job scraping library.

Not technical? Try out the web scraping tool on our site at usejobspy.com.

Looking to build a data-focused software product? Book a call to work with us.

Check out another project we wrote: HomeHarvest – a Python package for real estate scraping

Features

  • Scrapes job postings from LinkedIn, Indeed & ZipRecruiter simultaneously
  • Aggregates the job postings in a Pandas DataFrame
  • Proxy support (HTTP/S, SOCKS)

Video Guide for JobSpy - Updated for release v1.1.3

jobspy

Installation

pip install --upgrade python-jobspy

Python version >= 3.10 required

Usage

from jobspy import scrape_jobs
import pandas as pd

jobs: pd.DataFrame = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="software engineer",
    location="Dallas, TX",
    results_wanted=10,

    country_indeed='USA'  # only needed for indeed

    # use if you want to use a proxy
    # proxy="http://jobspy:[email protected]:20001",
    # offset=25 # use if you want to start at a specific offset
)

# formatting for pandas
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)  # set to 0 to see full job url / desc

# 1 output to console
print(jobs)

# 2 display in Jupyter Notebook (1. pip install jupyter 2. jupyter notebook)
# display(jobs)

# 3 output to .csv
# jobs.to_csv('jobs.csv', index=False)

# 4 output to .xlsx
# jobs.to_xlsx('jobs.xlsx', index=False)

Output

SITE           TITLE                             COMPANY_NAME      CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
indeed         Software Engineer                 AMERICAN SYSTEMS  Arlington     VA     None      yearly    200000      150000      https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed         Senior Software Engineer          TherapyNotes.com  Philadelphia  PA     fulltime  yearly    135000      110000      https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
linkedin       Software Engineer - Early Career  Lockheed Martin   Sunnyvale     CA     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3693012711      Description:By bringing together people that u...
linkedin       Full-Stack Software Engineer      Rain              New York      NY     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3696158877      Rain’s mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad       ZipRecruiter      Santa Monica  CA     fulltime  yearly    130000      150000      https://www.ziprecruiter.com/jobs/ziprecruiter...  We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer                 TEKsystems        Phoenix       AZ     fulltime  hourly    65          75          https://www.ziprecruiter.com/jobs/teksystems-0...  Top Skills' Details• 6 years of Java developme...

Parameters for scrape_jobs()

Required
├── site_type (List[enum]): linkedin, zip_recruiter, indeed
└── search_term (str)
Optional
├── location (int)
├── distance (int): in miles
├── job_type (enum): fulltime, parttime, internship, contract
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
├── is_remote (bool)
├── results_wanted (int): number of job results to retrieve for each site specified in 'site_type'
├── easy_apply (bool): filters for jobs that are hosted on LinkedIn
├── country_indeed (enum): filters the country on Indeed (see below for correct spelling)
├── offset (num): starts the search from an offset (e.g. 25 will start the search from the 25th result)

JobPost Schema

JobPost
├── title (str)
├── company (str)
├── job_url (str)
├── location (object)
│   ├── country (str)
│   ├── city (str)
│   ├── state (str)
├── description (str)
├── job_type (enum): fulltime, parttime, internship, contract
├── compensation (object)
│   ├── interval (enum): yearly, monthly, weekly, daily, hourly
│   ├── min_amount (int)
│   ├── max_amount (int)
│   └── currency (enum)
└── date_posted (date)

Exceptions

The following exceptions may be raised when using JobSpy:

  • LinkedInException
  • IndeedException
  • ZipRecruiterException

Supported Countries for Job Searching

LinkedIn

LinkedIn searches globally & uses only the location parameter.

ZipRecruiter

ZipRecruiter searches for jobs in US/Canada & uses only the location parameter.

Indeed

Indeed supports most countries, but the country_indeed parameter is required. Additionally, use the location parameter to narrow down the location, e.g. city & state if necessary.

You can specify the following countries when searching on Indeed (use the exact name):

Argentina Australia Austria Bahrain
Belgium Brazil Canada Chile
China Colombia Costa Rica Czech Republic
Denmark Ecuador Egypt Finland
France Germany Greece Hong Kong
Hungary India Indonesia Ireland
Israel Italy Japan Kuwait
Luxembourg Malaysia Mexico Morocco
Netherlands New Zealand Nigeria Norway
Oman Pakistan Panama Peru
Philippines Poland Portugal Qatar
Romania Saudi Arabia Singapore South Africa
South Korea Spain Sweden Switzerland
Taiwan Thailand Turkey Ukraine
United Arab Emirates UK USA Uruguay
Venezuela Vietnam

Frequently Asked Questions


Q: Encountering issues with your queries?
A: Try reducing the number of results_wanted and/or broadening the filters. If problems persist, submit an issue.


Q: Received a response code 429?
A: This indicates that you have been blocked by the job board site for sending too many requests. Currently, * LinkedIn* is particularly aggressive with blocking. We recommend:

  • Waiting a few seconds between requests.
  • Trying a VPN or proxy to change your IP address.

Q: Experiencing a "Segmentation fault: 11" on macOS Catalina?
A: This is due to tls_client dependency not supporting your architecture. Solutions and workarounds include:

  • Upgrade to a newer version of MacOS
  • Reach out to the maintainers of tls_client for fixes

jobspy's People

Contributors

cullenwatson avatar zacharyhampton avatar minicoz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.