Giter VIP home page Giter VIP logo

infojobsscraper's Introduction

Infojobs Scraper

Remark: neither this repository nor its creator are related to Infojobs.

Descripción:

Este módulo Python tiene como objetivo poder hacer una búsqueda concreta en Infojobs y devolver un conjunto de datos con las características más relevantes de las ofertas resultado. Es capaz de buscar unas palabras clave en Infojobs.net, obtener las urls con ofertas resultantes, además de cierta información relevante, y crear un archivo csv.

Infojobs es (tal y como se menciona en este artículo por El País) el mayor portal de empleo en España, y por lo tanto la opción idónea si queremos obtener información de las ofertas de trabajo dentro de España.

Este módulo ha sido desarrollado con fines educativos, no comerciales.

Description:

The purpose of this Python module is to, given some search parameters, retrieve the most relevant information of the answer Infojobs job offers. It is now able to search some keywords in Infojobs.net, get the resulting urls and some info, and save them into a csv.

Infojobs is (as stated in this article by El País) the main job-searching portal in Spain, and therefore the best option when trying to get job offer information within it.

This module has been developed for educative (non-comercial) purposes.

Resulting dataset description:

The name of saved csv file is composed by the given keywords and the current date and time: {keywords}_{date}_{time}.csv.

The following information is given about each job offer, if found:

  • position.
  • company.
  • company_valuation: company score within Infojobs, from 1 to 100.
  • city: the city where the offer is based.
  • country: the country where the offer is based, not always Spain.
  • contract_type: type of contract, including working schedule.
  • salary.
  • min_exp: expected minimum experience.
  • url: url to the offer.

Note that the obtained data is volatile by nature.

Used libraries:

urllib
selenium
BeautifulSoup
pandas
tqdm

This module also requires having the Chrome browser and chromedriver installed.

Files:

  • src folder:
    • main.py: contains the main routine.
    • OnePageScraper.py: contains the code for scraping just one Infojobs offer.
    • SearchPageScraper.py: contains the code for scraping the search results pages in order to get the links to the offers.
  • examples folder: contains some examples of datasets obtained using the scraper.
  • others folder:
    • robots.txt: robots.txt file from Infojobs.net at the beginning of the development of this module.
    • img.png: image for the repository.

License:

This module and the example dataset(s) are Released Under CC BY-NC-SA 4.0 License.

infojobsscraper's People

Contributors

ander-elkoroaristizabal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.